foundations intermediate · 17 min read · By LIPAI WANG · April 29, 2026

Density Matrices and Mixed States: The Formalism for Real Quantum Systems

Pure-state quantum mechanics ($|\psi\rangle$ vectors) is enough for textbook quantum computing but not for real hardware. Real qubits are noisy, partially-known, or part of larger entangled systems whose other parts you've ignored. The density matrix is the formalism that handles all three cases. This tutorial defines density matrices, derives their properties, covers the partial trace and the purification theorem, and shows why density matrices are the natural language of quantum-information theory.

Prerequisites: Tutorial 1: What Is a Qubit, Tutorial 18: Noise and Decoherence

Tutorials 1-3 introduced quantum states as vectors $|\psi\rangle$ in a Hilbert space. This is the pure-state description: a state of complete knowledge, with definite amplitudes for every basis vector. It is enough for textbook quantum computing and clean theoretical analysis.

Real quantum systems are not pure states. A qubit in a noisy quantum processor has small interactions with its environment that randomize parts of its state. A qubit you’ve measured but not looked at the outcome of is a probabilistic mixture of several pure states. A qubit that is entangled with another qubit you’ve discarded — say, an environmental degree of freedom you can’t track — has its quantum information partly hidden in correlations that are no longer accessible.

All three cases — noise, classical uncertainty, and ignored entanglements — are described by the same mathematical object: the density matrix. It is the natural language of quantum-information theory, of error correction, of channel descriptions, and of essentially everything that distinguishes “real quantum hardware” from “ideal closed quantum systems.”

This tutorial defines the density matrix, derives its properties, covers the partial trace (the operation that handles “ignored subsystems”), introduces the purification theorem (every mixed state is the marginal of some pure state in a larger Hilbert space), and motivates why this formalism is essential for working quantum-information theory.

The three sources of mixedness

Mixed states arise from three distinct physical situations:

1. Classical uncertainty over pure states

Suppose a qubit was prepared in $|0\rangle$ with probability $0.7$ or $|1\rangle$ with probability $0.3$ , but you don’t know which. You can describe this classical mixture as a probability distribution over pure states. The corresponding density matrix is

\rho \;=\; 0.7 \, |0\rangle\langle 0| + 0.3 \, |1\rangle\langle 1| \;=\; \begin{pmatrix} 0.7 & 0 \\ 0 & 0.3 \end{pmatrix}.

This is classical uncertainty translated into the quantum formalism. Importantly, it is not the same as the pure superposition $|\psi\rangle = \sqrt{0.7}|0\rangle + \sqrt{0.3}|1\rangle$ , which has off-diagonal coherence and gives different measurement statistics in superposition bases.

2. Noise from environmental coupling

A qubit interacting weakly with an uncontrolled environment (other qubits, photons, phonons, two-level-system defects) ends up entangled with the environment. If you can’t measure the environment, you describe the qubit alone by a density matrix obtained by tracing out the environment — the partial trace operation covered below. This is how decoherence (tutorial 18) shows up in the density-matrix language.

3. Subsystems of larger pure states

Even with no noise at all, a qubit that is part of a larger entangled system has a density-matrix description if you only look at it alone. The classic example: half of a Bell state. The full state $|\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt{2}$ is pure, but the first qubit alone is described by

\rho_1 \;=\; \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1| \;=\; \tfrac{1}{2} I.

This is the maximally mixed state — completely random, no information. The information is not lost; it lives in the entanglement with the second qubit. Looking only at the first qubit, you see a maximally mixed density matrix.

The unifying principle: whenever you have less than complete knowledge of a quantum system, the density matrix is what describes what you do know. Pure states are the special case where you have complete knowledge.

Definition and properties

A density matrix $\rho$ on a Hilbert space $\mathcal{H}$ is a Hermitian operator satisfying:

Positive semidefinite: $\langle \psi | \rho | \psi \rangle \geq 0$ for all $|\psi\rangle$ .
Unit trace: $\text{Tr}(\rho) = 1$ .
(Hermitian, but this follows from positive semidefinite plus unit trace in standard usage.)

Equivalently, $\rho$ is a convex combination of pure-state projectors:

\rho \;=\; \sum_i p_i |\psi_i\rangle\langle \psi_i|, \quad p_i \geq 0, \quad \sum_i p_i = 1.

This decomposition is not unique — different sets of pure states with different probabilities can give the same density matrix. The density matrix encodes only the operational consequences of the mixture, not the specific decomposition.

Pure vs mixed states

A density matrix is pure if it can be written as $\rho = |\psi\rangle\langle \psi|$ for some pure state $|\psi\rangle$ . Equivalently:

$\rho^2 = \rho$ (idempotent).
$\text{Tr}(\rho^2) = 1$ (purity equals 1).

A density matrix is mixed otherwise. The purity, $\text{Tr}(\rho^2)$ , ranges from $1$ (pure) down to $1/d$ for the maximally mixed state on a $d$ -dimensional system. Purity is a useful single-number summary of how mixed a state is.

For a single qubit: $\text{Tr}(\rho^2) = 1$ means a Bloch-sphere-surface state; $\text{Tr}(\rho^2) = 1/2$ means the center of the Bloch sphere (maximally mixed). Tutorial 48 covers the Bloch sphere geometry.

Computing expectation values

For an observable $A$ measured on a state $\rho$ :

\langle A \rangle \;=\; \text{Tr}(A \rho).

This unifies pure-state and mixed-state expectations. For a pure state $|\psi\rangle$ , $\rho = |\psi\rangle\langle\psi|$ , and $\text{Tr}(A \rho) = \langle \psi | A | \psi \rangle$ — the familiar pure-state formula. For mixed states, the trace formula handles the convex mixture automatically.

For diagonal observables in the eigenbasis (e.g., measuring $Z = \text{diag}(+1, -1)$ ), the diagonal elements of $\rho$ are exactly the measurement-outcome probabilities. This is why the diagonal entries of a density matrix are sometimes called the “populations” of each basis state.

Time evolution: unitary and non-unitary

Pure-state time evolution is unitary: $|\psi\rangle \to U |\psi\rangle$ . Translated into density matrices:

\rho \;\to\; U \rho U^\dagger.

This handles closed-system evolution. For open systems (with environmental noise), evolution is described by completely positive trace-preserving (CPTP) maps, also called quantum channels:

\rho \;\to\; \mathcal{E}(\rho) \;=\; \sum_k K_k \rho K_k^\dagger,

where $\{K_k\}$ are Kraus operators satisfying $\sum_k K_k^\dagger K_k = I$ . This operator-sum representation captures every physically realizable quantum operation, including noise.

Tutorial 18 covered specific noise channels (depolarizing, dephasing, amplitude damping). All of them are CPTP maps, and the density-matrix formalism is what makes them representable.

The partial trace

Given a bipartite state $\rho_{AB}$ on a composite system $A \otimes B$ , the partial trace over $B$ produces a density matrix on $A$ alone:

\rho_A \;=\; \text{Tr}_B(\rho_{AB}) \;=\; \sum_k (I_A \otimes \langle k_B|) \rho_{AB} (I_A \otimes |k_B\rangle).

The partial trace describes “what an observer of subsystem $A$ alone would see, ignoring $B$ entirely.” It is the operation that produces mixed states from pure entangled states.

Concrete example: the Bell state $|\Phi^+\rangle = (|00\rangle + |11\rangle)/\sqrt{2}$ has density matrix

\rho_{\Phi^+} \;=\; \tfrac{1}{2}\bigl(|00\rangle\langle 00| + |00\rangle\langle 11| + |11\rangle\langle 00| + |11\rangle\langle 11|\bigr).

Tracing out the second qubit:

\rho_A \;=\; \tfrac{1}{2}\bigl(|0\rangle\langle 0| + |1\rangle\langle 1|\bigr) \;=\; \tfrac{1}{2} I,

the maximally mixed state. The pure entangled state, viewed locally, looks completely random. This is the density-matrix expression of the fact that entanglement information is not localizable — you cannot read it out from one half of the entangled pair alone.

The purification theorem

Going the other direction: every mixed state on a system $A$ can be obtained as the partial trace of some pure state on a larger system $A \otimes B$ . The pure state $|\Psi_{AB}\rangle$ is called a purification of $\rho_A$ .

The construction: diagonalize $\rho_A = \sum_i p_i |i\rangle\langle i|$ . Then

|\Psi_{AB}\rangle \;=\; \sum_i \sqrt{p_i} \, |i\rangle_A \otimes |i\rangle_B

satisfies $\text{Tr}_B(|\Psi_{AB}\rangle\langle \Psi_{AB}|) = \rho_A$ . The auxiliary system $B$ has dimension at least $\text{rank}(\rho_A)$ .

Purifications are not unique (any unitary on $B$ produces another purification), but they exist for every mixed state. This is one of the key technical results of quantum-information theory: any noise process or classical mixture can be modeled as part of a larger pure quantum system. There is no fundamental difference between “noise from the environment” and “entanglement with degrees of freedom you can’t see” — the density matrix sees them as the same.

Schmidt decomposition

For a pure bipartite state $|\psi_{AB}\rangle$ , there exist orthonormal bases $\{|i_A\rangle\}, \{|i_B\rangle\}$ such that

|\psi_{AB}\rangle \;=\; \sum_i \lambda_i |i_A\rangle \otimes |i_B\rangle, \quad \lambda_i \geq 0, \quad \sum_i \lambda_i^2 = 1.

The coefficients $\lambda_i$ are the Schmidt coefficients, and the number of non-zero $\lambda_i$ is the Schmidt rank.

The reduced density matrices are diagonal in these bases:

\rho_A \;=\; \sum_i \lambda_i^2 |i_A\rangle\langle i_A|, \qquad \rho_B \;=\; \sum_i \lambda_i^2 |i_B\rangle\langle i_B|.

The two marginals have the same eigenvalues. This is the structural reason why entanglement entropy is the same on both halves of a Bell state — they share the same Schmidt spectrum.

The Schmidt decomposition is the operational tool for analyzing entanglement. Schmidt rank 1 means a separable state; Schmidt rank > 1 means entangled. The entropy of the Schmidt spectrum measures how entangled.

A small density-matrix example

Concrete code computing a noisy state’s density matrix:

import numpy as np
import pennylane as qml

dev = qml.device("default.mixed", wires=2)


@qml.qnode(dev)
def noisy_bell_state(p_depol):
    """Prepare a Bell state, then apply depolarizing noise to each qubit."""
    qml.Hadamard(wires=0)
    qml.CNOT(wires=[0, 1])
    qml.DepolarizingChannel(p_depol, wires=0)
    qml.DepolarizingChannel(p_depol, wires=1)
    return qml.density_matrix(wires=[0, 1])


# No noise: pure Bell state.
rho_pure = noisy_bell_state(0.0)
print("Pure Bell state purity:", np.trace(rho_pure @ rho_pure).real)
# Expected: 1.0

# Moderate noise.
rho_noisy = noisy_bell_state(0.1)
print("Noisy state purity:", np.trace(rho_noisy @ rho_noisy).real)
# Expected: < 1.0

# Reduce to first-qubit marginal via partial trace.
def partial_trace(rho, dims, traced_axis):
    """Partial trace of rho over the specified axis."""
    rho_reshaped = rho.reshape([dims[0], dims[1], dims[0], dims[1]])
    if traced_axis == 0:
        return np.einsum("ijkj->ik", rho_reshaped) / 1.0
    else:
        return np.einsum("ijik->jk", rho_reshaped) / 1.0


rho_a = partial_trace(rho_noisy, dims=[2, 2], traced_axis=1)
print("First qubit marginal:")
print(rho_a)
print("Marginal purity:", np.trace(rho_a @ rho_a).real)
# Expected: 0.5 (maximally mixed)

Sample output:

Pure Bell state purity: 1.0
Noisy state purity: 0.7290
First qubit marginal:
[[0.5+0.j 0.0+0.j]
 [0.0+0.j 0.5+0.j]]
Marginal purity: 0.5

The pure Bell state has full purity. Adding noise reduces the joint purity. Tracing out one qubit always gives a maximally mixed marginal for any Bell state — even the noisy version, because the noise we applied is symmetric.

Common misconceptions

“A density matrix is just a fancy way to write probabilities.” Density matrices encode classical probabilities (diagonal) and quantum coherence (off-diagonal). Off-diagonal entries have no classical analog. Density matrices are strictly more expressive than classical probability distributions.

“Mixed states are less quantum than pure states.” Wrong. Mixed states can be entangled with auxiliary systems via purification, and an entangled mixed state is at least as quantum as any pure state. Quantum information theorems (no-cloning, no-broadcasting, etc.) apply to mixed states as much as to pure states.

“You can decompose any mixed state into a unique pure-state mixture.” No. The same density matrix can be written as different pure-state mixtures, all giving the same density matrix and the same operational predictions. This non-uniqueness is structural — only the density matrix itself is operationally meaningful.

“The partial trace destroys information.” It does not destroy information; it makes information about the traced-out subsystem inaccessible to the remaining one. The information still exists in the joint state — purifications make this explicit.

“Density matrices are only needed for noisy hardware.” They are needed any time you don’t have complete knowledge of a quantum state — including when you’re studying a part of a perfectly-pure entangled system. Quantum information theory of pure states and noise-free hardware still uses density matrices for the marginals.

Decision rule

Use density matrices when:

The system is noisy. Real hardware always has some noise; modeling as a density matrix captures this.
You’re studying a subsystem of a larger entangled system. Even pure global states have mixed-state marginals.
Classical probability distributions over quantum states are involved. E.g., a state preparation that randomly produces $|\psi_1\rangle$ or $|\psi_2\rangle$ .
You’re doing quantum-information-theoretic analysis. Channels, capacities, fidelities, entropies — all natively in density-matrix language.

Stick with pure-state notation when:

You’re explaining or learning textbook algorithms. Shor, Grover, QFT, etc. are cleaner in pure-state notation.
You’re analyzing closed-system unitary dynamics. No noise, no partial subsystems.
You want minimal mathematical overhead. Density matrices add notation; if the system is genuinely pure, vectors are enough.

For practical quantum-computing work, density matrices are the language of real hardware analysis: error correction, channel characterization, randomized benchmarking, fault-tolerance proofs.

Exercises

1. Distinguish a superposition from a mixture

The pure state $|\psi\rangle = (|0\rangle + |1\rangle)/\sqrt{2}$ and the mixture $\rho = \tfrac{1}{2}|0\rangle\langle 0| + \tfrac{1}{2}|1\rangle\langle 1|$ have the same probability of measurement outcomes in the computational basis. What measurement distinguishes them?

Show answer

Measure in the Hadamard basis ( $|+\rangle, |-\rangle$ ). The pure state $|\psi\rangle = |+\rangle$ gives outcome $+$ with probability 1. The mixture $\rho = I/2$ gives outcomes $+$ or $-$ with equal probability. Off-diagonal coherence is the distinguishing signature — the pure state has nonzero off-diagonal density-matrix entries, the mixture has only diagonal entries. Superpositions and mixtures look identical in the wrong basis but differ in any other basis. This is also how decoherence is detected experimentally — a state with significant off-diagonal coherence in one basis is in a superposition; one without is a mixture.

2. Purify a maximally mixed qubit

Construct a 2-qubit pure state that is a purification of the maximally mixed single-qubit state $\rho = I/2$ . How does this relate to the Bell state?

Show answer

Purification: $|\Psi\rangle = \sum_i \sqrt{p_i} |i\rangle |i\rangle = (|00\rangle + |11\rangle)/\sqrt{2} = |\Phi^+\rangle$ . The purification of a maximally mixed qubit is exactly the Bell state. This is why “maximally mixed locally” and “maximally entangled globally” are two views of the same state: the Bell state’s first-qubit marginal is the maximally mixed state, and the Bell state is the purification of that marginal. Maximum entanglement is the structural complement of maximum local randomness.

3. The partial-trace recipe

Show that for any bipartite pure state $|\psi_{AB}\rangle$ with Schmidt decomposition, the marginal $\rho_A$ has eigenvalues equal to the Schmidt coefficients squared, $\lambda_i^2$ .

Show answer

$\rho_{AB} = |\psi_{AB}\rangle\langle\psi_{AB}| = \sum_{i,j} \lambda_i \lambda_j |i_A\rangle|i_B\rangle\langle j_A|\langle j_B|$ . Tracing over $B$ : $\rho_A = \sum_{i,j} \lambda_i \lambda_j |i_A\rangle\langle j_A| \langle j_B | i_B\rangle = \sum_{i,j} \lambda_i \lambda_j |i_A\rangle\langle j_A| \delta_{ij} = \sum_i \lambda_i^2 |i_A\rangle\langle i_A|$ . The marginal is diagonal in the Schmidt basis with eigenvalues $\lambda_i^2$ . This is why the Schmidt coefficients are sometimes called “the spectrum of the entanglement” — they are literally the eigenvalues of the marginal density matrices.

4. Why density matrices need to be positive

Suppose someone claims to have measured a “density matrix” with one negative eigenvalue. Why is this impossible?

Show answer

The eigenvalues of a density matrix are the probabilities of finding the state in the corresponding eigenbasis (in the spectral decomposition $\rho = \sum_i p_i |i\rangle\langle i|$ , $p_i$ are eigenvalues). Probabilities cannot be negative — that would imply the existence of negative measurement-outcome counts in repeated experiments. Negative eigenvalues correspond to operational impossibility. A “density matrix” with negative eigenvalues is not a valid quantum state. In quantum-state tomography (the procedure of estimating $\rho$ from measurement statistics), apparent negative eigenvalues are a signature of measurement noise or experimental error; they must be regularized to a nearby positive matrix before the result counts as a valid state estimate.

Where this goes next

Tutorial 48 covers the Bloch sphere — the natural geometric representation of single-qubit density matrices, and one of the cleanest visual tools in quantum computing. Future foundations tutorials may cover specific channel families (depolarizing, dephasing, amplitude damping) in their density-matrix forms, the von Neumann entropy as the unique entropy on density matrices, and the Choi-Jamiolkowski isomorphism that lets channels be analyzed as states.

The three sources of mixedness

1. Classical uncertainty over pure states

2. Noise from environmental coupling

3. Subsystems of larger pure states

Definition and properties

Pure vs mixed states

Computing expectation values

Time evolution: unitary and non-unitary

The partial trace

The purification theorem

Schmidt decomposition

A small density-matrix example

Common misconceptions

Decision rule

Exercises

1. Distinguish a superposition from a mixture

2. Purify a maximally mixed qubit

3. The partial-trace recipe

4. Why density matrices need to be positive

Where this goes next

Quantum, for people who already code.