gates and circuits intermediate · 23 min read · By LIPAI WANG · April 22, 2026

Multi-Qubit Gates: CNOT, CZ, SWAP, Toffoli, and Controlled Everything

CNOT is the workhorse of entanglement, but the two-qubit gate zoo is richer than that. This tutorial walks through CZ, SWAP, iSWAP, Toffoli, and arbitrary controlled unitaries — plus the decomposition theorems that turn them all into CNOT + single-qubit primitives for real hardware.

Prerequisites: Tutorial 5: Pauli, Phase, and Rotation Gates

Single-qubit gates are rotations on isolated spheres. The moment you want two qubits to interact — which is the moment you want to do anything computationally interesting — you need multi-qubit gates. This tutorial builds out the two-qubit gate zoo, introduces three-qubit primitives, and walks through how hardware transpilers decompose any controlled unitary into your native gate set.

By the end, when you see qc.cswap(0, 1, 2) or cc.controlled(QFT(4)) in someone’s code, you’ll know exactly what it means and roughly how many two-qubit gates it costs.

CNOT, revisited as entangler

\text{CNOT} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix}.

Three facts worth burning in:

CNOT is its own inverse. $\text{CNOT}^2 = I$ . Apply twice, you get back to where you started.
CNOT is the minimal entangler. On classical inputs ( $|00\rangle, |01\rangle, |10\rangle, |11\rangle$ ) CNOT just permutes basis states — no entanglement. But acting on a superposition in the control qubit, it creates Bell states. That’s the asymmetry at the heart of CNOT’s power.
CNOT is symmetric in a specific way. Conjugating CNOT with Hadamards on both qubits swaps the control and target roles. Formally: $(H \otimes H)\,\text{CNOT}\,(H \otimes H) = \text{CNOT}_{1 \to 0}$ , where the second CNOT has target = qubit 0, control = qubit 1.

Verify the swap-the-roles identity in Qiskit:

import numpy as np
from qiskit import QuantumCircuit
from qiskit.quantum_info import Operator

qc1 = QuantumCircuit(2)
qc1.h([0, 1])
qc1.cx(0, 1)       # CNOT control=0, target=1
qc1.h([0, 1])

qc2 = QuantumCircuit(2)
qc2.cx(1, 0)       # CNOT control=1, target=0

print(np.allclose(Operator(qc1).data, Operator(qc2).data))
# True

That identity means you can switch the direction of a CNOT at the cost of four Hadamards. On hardware where the physical connectivity constrains which qubits can be CNOT controls, this trick is crucial — Qiskit’s transpiler does it automatically.

CZ: the controlled-phase

The CZ (controlled-Z) gate applies a Z to the target iff the control is $|1\rangle$ :

\text{CZ} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & -1 \end{pmatrix}.

Only the $|11\rangle$ amplitude picks up a minus sign; everything else is untouched. That makes CZ symmetric in control and target — you can’t tell which is which by looking at the matrix, and physically it doesn’t matter.

Key identity: $\text{CZ} = (I \otimes H)\,\text{CNOT}\,(I \otimes H)$ . Sandwich a CNOT between Hadamards on the target, and you get CZ. Some hardware (especially superconducting) natively exposes CZ rather than CNOT; the identity lets you convert freely.

qc_cz = QuantumCircuit(2)
qc_cz.cz(0, 1)

qc_decomp = QuantumCircuit(2)
qc_decomp.h(1); qc_decomp.cx(0, 1); qc_decomp.h(1)

print(np.allclose(Operator(qc_cz).data, Operator(qc_decomp).data))
# True

SWAP: exchange two qubits

SWAP exchanges the states of two qubits:

\text{SWAP}\,|a, b\rangle \;=\; |b, a\rangle,

matrix form

\text{SWAP} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}.

SWAP seems innocent but costs three CNOTs to implement:

\text{SWAP}(q_0, q_1) \;=\; \text{CNOT}(0 \to 1)\,\text{CNOT}(1 \to 0)\,\text{CNOT}(0 \to 1).

qc_swap = QuantumCircuit(2); qc_swap.swap(0, 1)
qc_cx = QuantumCircuit(2); qc_cx.cx(0,1); qc_cx.cx(1,0); qc_cx.cx(0,1)
print(np.allclose(Operator(qc_swap).data, Operator(qc_cx).data))
# True

Three CNOTs is expensive on noisy hardware. That’s why qubit routing — rearranging your circuit to minimize SWAPs — is a major transpilation problem. IBM’s heavy-hex topology, Google’s Sycamore grid, and IonQ’s all-to-all connectivity all impose different SWAP costs on otherwise identical algorithms.

iSWAP is like SWAP but with an extra imaginary phase on the off-diagonal entries:

i\text{SWAP} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & i & 0 \\ 0 & i & 0 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}.

Superconducting transmon processors sometimes expose iSWAP (or the related $\sqrt{i\text{SWAP}}$ ) as their native two-qubit gate because it’s what the physical coupling Hamiltonian produces naturally. Conversion to CNOT-basis adds a few gates of overhead.

Qiskit’s iSwapGate is available:

from qiskit.circuit.library import iSwapGate
qc = QuantumCircuit(2)
qc.append(iSwapGate(), [0, 1])
print(qc.draw(output="text"))

Toffoli: the three-qubit reversible AND

The Toffoli (or CCNOT) gate flips qubit 2 iff qubits 0 and 1 are both $|1\rangle$ :

|a, b, c\rangle \;\mapsto\; |a, b, c \oplus (a \wedge b)\rangle.

It’s the reversible version of the classical AND gate and is universal for classical reversible computation. You can’t do arithmetic on a quantum computer without something like it.

Cost: Toffoli decomposes into 6 CNOTs plus 8 single-qubit gates in the standard decomposition. If you only have T and T† as non-Clifford resources, the T-count is 7. On fault-tolerant hardware, Toffoli cost is often reported in T-count rather than CNOT-count because T gates are the expensive ones.

qc_toffoli = QuantumCircuit(3); qc_toffoli.ccx(0, 1, 2)
decomposed = qc_toffoli.decompose()
print(decomposed.draw(output="text"))
# Uses H, T, T†, CX — a canonical 6-CX, 7-T decomposition

Controlled arbitrary: the general construction

Given any single-qubit unitary $U$ , you can build controlled- $U$ — apply $U$ to the target iff the control is $|1\rangle$ :

CU = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & u_{00} & u_{01} \\ 0 & 0 & u_{10} & u_{11} \end{pmatrix}

where the 2×2 $U = \bigl(\begin{smallmatrix}u_{00}&u_{01}\\u_{10}&u_{11}\end{smallmatrix}\bigr)$ appears as the lower-right block.

How many CNOTs does controlled- $U$ cost? Two CNOTs plus 3 single-qubit rotations for any single-qubit $U$ . The classic construction:

CU \;=\; (I \otimes A)\,\text{CNOT}\,(I \otimes B)\,\text{CNOT}\,(I \otimes C),

where $A, B, C$ are single-qubit gates constructed from $U$ ‘s Euler decomposition.

Qiskit exposes this with .control():

from qiskit.circuit.library import RYGate
from qiskit import QuantumCircuit

cry = RYGate(np.pi/3).control(1)      # controlled-RY(π/3)
qc = QuantumCircuit(2)
qc.append(cry, [0, 1])
print(qc.decompose().draw(output="text"))
# Decomposes into 2 CNOTs + 3 single-qubit rotations

For multi-controlled gates — controlled-controlled-controlled- $U$ , with $n$ control qubits — the cost blows up: roughly $O(n^2)$ CNOTs in the optimal decomposition, or linear with ancilla qubits available. Circuits like Grover’s “oracle” and Shor’s modular-exponentiation rely on these, which is why large-instance quantum algorithms need thousands of CNOTs.

Two-qubit circuit identities cheat sheet

A few identities that appear constantly in paper reading:

$(H \otimes H)\,\text{CNOT}_{01}\,(H \otimes H) = \text{CNOT}_{10}$ (CNOT direction flip)
$(I \otimes H)\,\text{CNOT}_{01}\,(I \otimes H) = \text{CZ}_{01}$ (CNOT ↔ CZ)
$\text{SWAP} = \text{CNOT}_{01}\,\text{CNOT}_{10}\,\text{CNOT}_{01}$
$\text{CZ}$ is symmetric: $\text{CZ}_{01} = \text{CZ}_{10}$
$(X \otimes I)\,\text{CNOT}_{01}\,(X \otimes I) = \text{CNOT}_{01}\cdot(I \otimes X)$ (control-qubit X propagation)

Pull out a napkin and verify one of these by matrix multiplication. It’s oddly soothing.

The CNOT-count discipline

On NISQ hardware, two-qubit gate errors dominate. A conservative rule of thumb: assume each CNOT has ~1% error on today’s best machines (IBM Heron, IonQ Forte, Quantinuum H2). Compound this across a circuit:

10 CNOTs → ~90% success
50 CNOTs → ~60% success
200 CNOTs → ~13% success
1000 CNOTs → effectively pure noise

Any NISQ-era algorithm that needs thousands of CNOTs isn’t really running; it’s giving you samples from a noisy distribution that may or may not be close to the ideal. This is why circuit optimization is so important, and why VQE and QAOA (which we’ll meet in the Variational track) are popular — they’re designed for shallow circuits.

from qiskit.circuit.library import QFT
from qiskit import transpile

for n in [4, 6, 8, 10]:
    qft = QFT(n, do_swaps=True).decompose()
    basis = transpile(qft, basis_gates=["h", "t", "tdg", "s", "sdg", "rz", "sx", "cx"],
                      optimization_level=3)
    print(f"n={n}: CX count = {basis.count_ops().get('cx', 0)}, depth = {basis.depth()}")
# n=4: CX count = ~18,  depth ≈ 30
# n=6: CX count = ~40
# n=8: CX count = ~80

Scaling is not linear. Even “clean” quantum algorithms grow quickly in CNOT-count.

Exercises

1. CNOT puzzle

Predict the state after applying CNOT (control=0, target=1) to $(\alpha|0\rangle + \beta|1\rangle) \otimes |0\rangle$ . Verify in Qiskit with $\alpha = \cos(\pi/6), \beta = \sin(\pi/6)$ .

Show answer

CNOT maps $|a\rangle|0\rangle \to |a\rangle|a\rangle$ for $a \in \{0, 1\}$ . By linearity,

\text{CNOT}\big((\alpha|0\rangle + \beta|1\rangle) \otimes |0\rangle\big) = \alpha|00\rangle + \beta|11\rangle.

For $\alpha = \cos(\pi/6), \beta = \sin(\pi/6)$ , this is a weighted Bell-like state. Unless $\alpha = \beta$ , it’s entangled but not maximally.

qc = QuantumCircuit(2)
qc.ry(np.pi/3, 0)          # α=cos(π/6), β=sin(π/6)
qc.cx(0, 1)
print(Statevector.from_instruction(qc).data)
# [0.866..., 0, 0, 0.5]  ← amplitudes on |00⟩ and |11⟩

2. Build CNOT from CZ

Given only CZ and single-qubit gates, implement CNOT (control=0, target=1). Verify.

Show answer

Use the identity $\text{CNOT} = (I \otimes H)\,\text{CZ}\,(I \otimes H)$ .

qc_cnot = QuantumCircuit(2); qc_cnot.cx(0, 1)

qc_from_cz = QuantumCircuit(2)
qc_from_cz.h(1); qc_from_cz.cz(0, 1); qc_from_cz.h(1)

print(np.allclose(Operator(qc_cnot).data, Operator(qc_from_cz).data))
# True

3. Fredkin from Toffoli

The Fredkin gate (controlled-SWAP) swaps qubits 1 and 2 iff qubit 0 is $|1\rangle$ . Build it from one Toffoli and two CNOTs. (Hint: CSWAP = CNOT_{21} · CCNOT_{0,1,2} · CNOT_{21})

Show answer

qc_fredkin = QuantumCircuit(3); qc_fredkin.cswap(0, 1, 2)

qc_built = QuantumCircuit(3)
qc_built.cx(2, 1)
qc_built.ccx(0, 1, 2)
qc_built.cx(2, 1)

print(np.allclose(Operator(qc_fredkin).data, Operator(qc_built).data))
# True

4. CNOT budget

You have a 20-qubit circuit that uses 150 CNOTs and you’re running on a machine with 99.3% two-qubit gate fidelity. What’s the expected probability of zero errors across the whole circuit?

Show answer

$0.993^{150} \approx 0.350$ , or about a 35% chance of a clean run. At 1000 shots you’d expect ~350 “good” samples — enough for some statistics, but not much margin for error.

What you should take away

CNOT = the canonical entangler. Every other two-qubit gate can be decomposed into CNOTs + single-qubit gates.
CZ = CNOT sandwich with Hadamards. Symmetric, easy to reason about, native on some hardware.
SWAP = 3 CNOTs. Qubit routing costs matter on restricted-connectivity hardware.
Toffoli = 6 CNOTs + 7 Ts. Three-qubit reversible AND; foundation for all quantum arithmetic.
Controlled- $U$ = 2 CNOTs + 3 single-qubit rotations for any single-qubit $U$ .
CNOT-count is the primary hardware cost. A 200-CNOT circuit on today’s best machines runs at roughly 13% fidelity.

Next, final tutorial of this track: OpenQASM 3 and running your first circuit on a real IBM Quantum machine. Free tier, your first calibration headache, and how to interpret noisy results without fooling yourself.

CNOT, revisited as entangler

CZ: the controlled-phase

SWAP: exchange two qubits

iSWAP and related beasts

Toffoli: the three-qubit reversible AND

Controlled arbitrary: the general construction

Two-qubit circuit identities cheat sheet

The CNOT-count discipline

Exercises

1. CNOT puzzle

2. Build CNOT from CZ

3. Fredkin from Toffoli

4. CNOT budget

What you should take away

Quantum, for people who already code.