Chapter 2
The Quantum Circuit Model

The goal of this chapter is not to provide a comprehensive overview of quantum computation, as there are already plenty of good general-purpose books on that topic (see Section 1.1.1 for some pointers). In this chapter, we will simply lay out some of the basic ingredients from quantum theory needed to get us started working with circuits, then dive straight into their super interesting underlying structures, which will be the topic of the following chapters.

2.1 Preliminaries

In this section, we will lay out the mathematical foundations and notation which will be used throughout the book. We will introduce Hilbert spaces and various linear algebraic operations on them, and two different notations:

Readers familiar with the mathematics behind quantum theory, but unaccustomed to string diagram notation may wish to skip ahead to section 2.1.6.

2.1.1 Some things you should know about complex numbers

Before we go into defining Hilbert spaces, we’ll need to make a couple of remarks about complex numbers. We will write for the set of complex numbers, and we will often write λ for an arbitrary complex number. There are two main ways to write down a complex number. The first is the familiar cartesian form: λ = a + ib in terms of real numbers a,b and i := 1. The second, possibly less familiar, is the polar form: λ = re for a positive real number r 0 called the magnitude and an angle α [0,2π) called the phase. Phases play an important role in quantum theory, and they are often where the ‘quantum magic’ happens, so we’ll run into the polar form a lot. These two representations are related by the trigonometric identity:

e = cos α+isin α { a = rcos (α) b = rsin (α)
(2.1)

which can be visualised in the complex plane as follows:

(2.2)

A useful operation on the complex numbers is the complex conjugation λλ¯, which flips the sign of b in cartesian form or the sign of α in polar form:

a + iba ibrere

It’s pretty straightforward to derive from (2.1) that these two operations are the same for a complex number λ. This can be visualised by realising that flipping the sign of b or α amounts to a vertical reflection of the complex plane:

We will slightly overload the term ‘phase’, and also refer to complex numbers of the form e as phases. If there is some ambiguity, we will call e the phase and α the phase angle. Because of the behaviour of exponents, when we multiply phases together, the phase angles add:

ee = ei(α+β)

Consequently, multiplying a phase with its conjugate always gives 1:

ee¯ = ee = ei(αα) = e0 = 1.

More generally, we can always get the magnitude of a complex number by multiplying with its conjugate and taking the square root:

λλ¯ = (re )(re ) = r2 e0 = r

As a consequence, whenever we have λ = re and |λ| = 1, we can conclude r = 1 so λ = e. In other words, |λ| = 1 if and only if λ is a phase. Getting the angle out is a bit trickier. For this we use a trigonometric function called arg, which for λ0 is defined (somewhat circularly) as the unique angle α such that λ = re. If λ = a + ib for a > 0, arg (λ) = arctan (b a). This obviously won’t work when a is zero, and needs a bit of tweaking when a is negative. Hence, the full definition of arg needs a case distinction, which we’ll leave as an exercise.

Exercise 2.1. Just to make sure our trig is not too rusty, give a full definition for arg for all non-zero complex numbers, using (2.2) as a guide.

Exercise 2.2. Write cos α and sin α in terms of complex phases. Hint: remember that cos α = cos α while sin α = sin α.

2.1.2 Hilbert spaces

A Hilbert space is a vector space over the complex numbers, with an inner product defined between vectors, which we’ll write this way:

ψ,ϕ

where ψ and ϕ are vectors in a Hilbert space H. Hilbert spaces can be finite- or infinite-dimensional. For finite-dimensional Hilbert spaces, we are already done with the definition. For infinite dimensions, we need to say some other stuff about limits and convergence, but since quantum computing is all about finite-dimensional Hilbert spaces, we won’t need to go into that here. So, for our purposes, the main thing that separates a Hilbert space from a plain ole vector space is the inner product. So let’s say a few things about inner products and introduce some handy notation. First, a definition.

Definition 2.1.1. A mapping ⟨−,−⟩ from a pair of vectors to the complex numbers is called an inner product if it satisfies the following three conditions:

1..

Linearity in the second argument. For ψ,ϕ1,ϕ2 H and λ1,λ2 :

ψ|λ1ϕ1 + λ2ϕ2 = λ1ψ|ϕ1 + λ2ψ|ϕ2.
2..

Conjugate-symmetry. For ψ,ϕ H: ϕ|ψ¯ = ψ|ϕ.

3..

Positive-definiteness. For ψ H, ψ|ψ is a real number and ψ|ψ > 0 when ψ0.

It might seem weird that we only ask inner products to be linear in one of the arguments. However, combining conditions 1 and 2, we can show that the inner product is actually conjugate-linear in the first argument. Conjugate-linearity is pretty much the same as linearity, but scalar multiples pop out as their conjugates:

λ1ψ1 + λ2ψ2|ϕ = λ1¯ψ1|ϕ + λ2¯ψ2|ϕ

Example 2.1.2. Our main example of a Hilbert space is n, the Hilbert space whose vectors are column vectors with n entries:

ψ = ( ψ0 ψ1 ψn1 ) ,ϕ = ( ϕ0 ϕ1 ϕn1 ) ,

Note we adopt the physicist’s convention of allowing subscripts and superscripts to index entries in a vector or matrix. On the other hand, we adopt the computer scientists’ convention of counting from 0. So for example, ψ2 means the third entry of the column vector ψ, not ‘ψ squared’. This looks a bit weird at first, but it will come in really handy when we get to section 2.1.4. To take the inner product, we first take the conjugate-transpose of ψ, i.e. we transpose the column vector into a row vector then conjugate every entry, and then matrix multiply this with ϕ. This amounts to multiplying a 1 × n matrix with an n × 1 matrix, which gives us a 1 × 1 matrix, i.e. just a number:

ψ|ϕ := ( ψ0¯ ψ1¯ ψn1¯ ) ( ϕ0 ϕ1 ϕn1 ) = i=0n1ψi¯ϕi
(2.3)

Since a Hilbert space has an inner product, we know a bunch of things about its vectors that a plain old vector space doesn’t tell you. For one thing, the inner product tells us how long vectors are, i.e. the norm ψ := ψ|ψ, and in particular when a vector is normalised: ψ = 1. It also tells us when two vectors are orthogonal: ψ|ϕ = 0. Putting these pieces together we get the following.

Definition 2.1.3. A basis {hi|0 i < n} H for an n-dimensional Hilbert space H is called an orthonormal basis (ONB) if:

hi|hj = δijwhereδij := { 1 if i = j 0  if  i j  is the Kronecker delta.

So, an inner product defines ONBs. In particular, if we were to equip the vector space with a different inner product, then which bases you consider orthonormal will change. This also works the other way: an ONB defines an inner product, in the sense of the following exercise.

Exercise 2.3. Show that any basis B = {hi|0 i < n} H on a vector space H defines a unique inner product that makes H into a Hilbert space and B into a ONB.

Since the inner product is linear in it’s second argument, any vector ψ H defines a linear map ψ : H by ϕψ|ϕ, which we call the adjoint of ψ. As you can see from (2.3), the adjoint of a column vector ψ n is the row vector given by its conjugate-transpose:

ψ = ( ψ0 ψ1 ψn1 ) ψ = ( ψ0¯ ψ1¯ ψn1¯ )

If we have a linear map between two (possibly) different Hilbert spaces H and K, the adjoint A of A is the unique linear map satisfying the equation Aψ|ϕ = ψ| for all ϕ H, ψ K. This definition takes a bit of head-scratching at first, but when H = m and K = n, this again just amounts to the conjugate-transpose of A:

A = ( a00 a10 am10 a01 a11 am11 a0n1 a1n1 am1n1 ) A = ( a00¯ a01¯ a0n1¯ a10¯ a11¯ a1n1¯ am10¯ am11¯ am1n1¯ )

Like with matrix transposes, one can show that adjoints satisfy (A) = A and (AB) = BA. Inner products and adjoints are so ubiquitous in quantum theory that we typically build them right into the way we write vectors down. For this, we introduce Dirac bra-ket notation. Rather than writing a vector as it’s usual naked self, we’ll wrap it up into a symbol called a ket:

ψ|ψ

If we want to write the adjoint of |ψ, we flip the ket into a bra:

ψψ|

Now, the ket |ψ is still secretly just ψ, a vector in the Hilbert space H. However, the bra ψ| is an element of the dual space H. That is, it is a linear map from H to the complex numbers. In particular, we can try plugging a ket into it, and some magic happens: we get a bra-ket, which is just the inner product again.

ψ||ϕ := ψϕ = ψ|ϕ

Breaking the two parts of an inner product apart gives us some extra flexibility. For example, if we have some linear map M : H H that we want to sandwich in the middle of the inner product, we can do that: ψ|M|ϕ. This also leads us to more naturally think of bras and kets themselves as processes, which are dual to one-another. We can think of |ϕ as a process which starts with nothing and gives us something, namely the vector ϕ H:

Conversely, we can think of ψ| as a process which ‘swallows’ a vector and leaves nothing (well, actually nothing but a scalar):

The pictures to the right of ‘ ‘ are in string diagram notation. This gives us a way to visualise compositions of linear maps as boxes (or other kinds of shapes), connected by wires. We’ve already seen how to draw bras and kets above. The only thing missing at this point (at least until we meet tensor products in section 2.1.4) is linear maps M : K L between two arbitrary Hilbert spaces. These are pictured as a box with an input wire labelled H and output labelled K:

We visualise composition by plugging wires together. For example:

There’s a couple of things to note about these two different notations. First, the string diagrams give slightly more information, since the wires can be labelled to tell us which Hilbert spaces the maps are defined on. We can drop these labels when it is clear from context, but sometimes it’s handy to get an extra visual cue for which maps can be plugged together, and which can’t. Second, the Dirac notation is being read from right-to-left, whereas the pictures are being read from left-to-right. This is an artefact of the usual ‘backwards’ way composition is defined symbolically. For example, function composition g f(x) := g(f(x)) means first apply f to an element x, then apply g to the result. In other words: g f means ‘g AFTER f’. Similarly, with composition of linear maps (and hence matrix multiplications) BA|ϕ means first apply A to |ϕ then apply B to the result, i.e. BA means ‘B AFTER A’. This is not such a big deal, as long as we remember to flip the order of maps around when we write them in Dirac notation. What is more of a big deal, as we’ll see in section 2.1.4, is that most quantum computations really want to be done in 2D. Dirac notation is only really handy for working with 1D chains of matrix multiplications like the ones above, and it gets a bit clunky when we start mixing (sequential) composition of linear maps with (parallel) tensor products. On the other hand, these more general kinds of composition are what string diagrams are meant for, so this is where they really start to shine.

Remark 2.1.4. In some circles, it is popular to symbolically write compositions in ‘diagram order’, using a notation like f;g to mean g f. We won’t use this notation in the book, preferring to jump to full-fledged diagrams instead.

Another advantage of writing vectors as kets is it gives us a compact way of writing the standard basis. Namely, we write the standard basis for n as {|i|0 i < n}, i.e. simply as numbers in a ket, ranging from 0 to n 1:

|0 := ( 1 0 0 )|1 := ( 0 1 0 )|n1 := ( 0 0 1 )

In particular, the two-dimensional Hilbert space 2 has standard basis elements written as follows:

|0 := ( 1 0 )|1 := ( 0 1 )

As we’ll see in Section 2.2, 2 represents quantum bits, or qubits. In 2, the basis vectors |0 and |1 play the role of the classical bits 0 and 1. Part of the power of quantum computing is that we cannot just access these classical bit states, but also many more, coming from taking arbitrary linear combinations of them.

2.1.3 Types of linear maps

The adjoint operation (−) lets us define a handful of special kinds of linear maps that crop up in quantum theory. This first is isometries, where the adjoint acts as a one-sided inverse.

Definition 2.1.5. A linear map U : H K is called an isometry if UU = I.

Next, we have a whole slew of maps from a Hilbert space to itself.

Definition 2.1.6. A linear map M : H H is called:

There are clearly some containments here: projectors are always positive (since M = MM = MM), and positive maps are always self-adjoint (since M = (NN) = NN = M). Finally, everything in Definition 2.1.6 is normal. Unitaries are normal because MM = I = MM. Self-adjoint maps (and hence also positive maps and projectors) are normal because MM = M2 = MM. While this is a bit of a definition dump, we’ll treat the most important kinds of maps from Definitions 2.1.5 and 2.1.6 – as well as their significance to quantum theory – in their own sections later in this chapter. A nice feature about normal maps (and hence all the maps in Definition 2.1.6) is that they can always be diagonalised. That is, we can find an ONB M = {|ϕj}j such that for all i, M|ϕj = λj|ϕj. The scalars λj and vectors |ϕj are called eigenvalues and eigenvectors, respectively, whereas M is called an eigenbasis. Bra-ket notation (and hence string diagram notation) gives us a convenient way to write maps in diagonal form.

A special case is the identity map, which diagonalises with respect to any ONB, with eigenvalues all equal to 1:

We call such a sum over an ONB a resolution of the identity by ONB {|ϕi}i. As it turns out, all of the particular kinds of normal maps can be characterised by the types of eigenvalues they have.

Exercise 2.4. Let M : H H be a linear map such that MM = MM. Then M = jλj|ϕjϕj| for some sets {λj}j,{|ϕj}j. Show that:

a)

M is self-adjoint iff all λj

b)

M is positive iff all λj 0

c)

M is a projector iff all λj {0,1}

d)

M is a unitary iff all λj U(1)

where 0 is the set of all real numbers 0 and U(1) := {e|α [0,2π)} is the set of all phases.

One thing you might notice is what happens when a map is unitary and self-adjoint: it’s eigenvalues are all in U(1) = {1,1}. This is indeed what happens with the Pauli maps, which as we’ll see later, have many nice properties.

2.1.4 Tensors and the tensor product

Quantum computing relies crucially on multiple systems interacting with each other. To build up to how we should represent multiple systems in quantum theory, we can first see what happens if we bring two classical systems together. The simplest classical system that has more than one state is a bit, which we can represent as the set 𝔹 := {0,1}. A single bit has two possible states: 0 and 1. A pair of bits has four possible states: 00, 01, 10, and 11. In other words, the system describing two bits is the cartesian product of two one-bit systems:

𝔹 × 𝔹 = {(0,0),(0,1),(1,0),(1,1)}

Since the role of classical bit values 0 and 1 in the qubit system 2 is played by the basis vectors |0 and |1, it stands to reason that the system of two qubits should have four basis vectors |00,|01,|10,|11, labelled by the four possible bit strings. This is exactly what taking the tensor product does for us. There are lots of ways to define the tensor product of two vector spaces. For the sake of simplicity, we’ll adopt a fairly brutal definition here, which makes explicit use of some fixed bases for a pair of Hilbert spaces.

Definition 2.1.7. For an m-dimensional Hilbert space H and an n-dimensional Hilbert space K with fixed ONBs {|i|0 i < m} H and {|j|0 j < n} K the tensor product H K is the mn-dimensional Hilbert space consisting of all linear combinations of states in the following product basis:

{|i|j|0 i < m,0 j < n}

The inner product on this tensor product is fully determined by requiring {|i|j}i,j to form an ONB. That is, we have: |i|j,|k|l = δikδjl.

We call this a ‘brutal’ definition, because the tensor product doesn’t really depend on a choice of basis, but choosing a basis will make it easier for us to see concretely what’s going on. Note that the symbol used in defining the product basis doesn’t really mean anything in it’s own right. The important thing is just that we have one basis vector for each pair of basis vectors from H and K. We could just as well have written basis elements as |ij. We will indeed do this sometimes, e.g. we will write the basis vectors of the four-dimensional space 2 2 as:

|00 := |0|0 |01 := |0|1 |10 := |1|0 |11 := |1|1

However, this notation is suggestive of how we want to send a pair of states |ψ H and |ϕ K into the bigger tensor product space H K. Suppose these states decompose in our chosen bases as follows:

|ψ := i=1mψi|i|ϕ := j=1nϕj|j

Then, we’ll define the product state as follows, just by pulling the sums and scalars out:

|ψ|ϕ = ( iψi|i)( jϕj|j) := ijψiϕj|i|j

This looks a bit abstract, but if we instantiate this to the case of qubit states, it becomes pretty obvious what is going on. We just ‘multiply things out’:

(ψ0|0 + ψ1|1)(ϕ0|0 + ϕ1|1) = ψ0ϕ0|00 + ψ0ϕ1|01 + ψ1ϕ0|10 + ψ1ϕ1|11

If we write things out as column vectors, it should be even more clear what’s going on:

( ψ0 ψ1 ) ( ϕ0 ϕ1 ) = ( ψ0ϕ0 ψ0ϕ1 ψ1ϕ0 ψ1ϕ1 )
(2.4)

Note that, by writing |ψ|ϕ as a four-dimensional column vector, we are implicitly treating the four basis states as the standard basis for 4, just by counting them base-2:

{|0 := |00,|1 := |01,|2 := |10,|3 := |11}

This is a general phenomenon. When we take the tensor products of Hilbert spaces, the dimensions multiply. Hence, we can treat a basis vector |i,j n n as a basis vector |in + j nn . So, taking the tensor product of m with n just multiplies the dimensions. Since every finite-dimensional Hilbert space is isomorphic to n for some n, it follows that the tensor product is associative and has a unit given by the one-dimensional Hilbert space = 1:

(H K) L≅H (K L)H ℂ≅H≅ℂ H

The symbol ‘’ here means ‘is isomorphic to’. Because of this associativity we are justified in dropping the brackets when writing tensor products of many different spaces, e.g. H1 Hk. The equation (2.4) is actually a special case of a more general kind of operation we can apply to matrices of any size.

Definition 2.1.8. The Kronecker product of an n × m matrix A with an n× m matrix B is a new nn× mm matrix A B whose entries are all possible products of the entries of A and B, i.e.

(A B)im+jkn+l := a ikb jl

While this definition might be a bit of head-scratcher the first time you see it, it’s pretty easy to think about in terms of block matrices. To form the matrix A B, we form a big matrix which has a block consisting of the matrix aijB for every element of A. That is, for matrices:

A := ( a00 am10 a0n1 am1n1 ) andB := ( b00 bm10 b0n1 bm1n1 )

the Kronecker product is the nn× mm matrix:

AB := ( a00B am10B a0n1B am1n1B )

This applies to any dimension of matrix, not just matrices with the same numbers of rows or columns. For example, the Kronecker product of the 2 × 1 matrix of a state |ψ and the 2 × 2 matrix of map A is computed as follows:

|ψA = ( ψ0 ψ1 )( a00 a10 a01 a11 ) = ( ψ0A ψ1A ) = ( ψ0a00 ψ0a10 ψ0a01 ψ0a11 ψ1a00 ψ1a10 ψ1a01 ψ1a11 )

The Kronecker product is furthermore non-commutative: B AA B. The matrix of B A will be the same size and contains the same elements, but those elements will be in different places. For example:

A|ψ = ( a00 a10 a01 a11 )( ψ0 ψ1 ) = ( a00|ψ a10|ψ a01|ψ a11|ψ ) = ( a00ψ0 a10ψ0 a00ψ1 a10ψ1 a01ψ0 a11ψ0 a01ψ1 a11ψ1 )

Often, rather than smooshing the upper and lower indices together into a single index, as we did in the definition of Kronecker product, we can keep them separate and simply allow ‘generalised matrices’ that have zero or more upper and lower indices:

Note, rather than having a fixed number of ‘rows’ and ‘columns’, we have fixed input dimensions D0,,Dm1 and output dimensions D0,,Dn1. These generalised matrices are called tensors. In fact, we’ve already met some tensors: kets are represented by tensors with zero inputs and one output:

bras by tensors with one input and zero outputs:

and plain old numbers are tensors with no inputs and no outputs:

We can relate tensors to bra-ket notation by taking a big sum over all the basis elements, with the tensor elements as coefficients:

T = j0...jn1 i0...im1ti0...im1j0...jn1 |j0...jn1i0...im1|

A tensor with two indices of dimensions n and n has the same data as one with a single index of dimension nn, just packaged up in a different way. With that in mind, we can simplify the definition of the Kronecker product to give us a very similar operation, which is often just called the tensor product:

(A B)i,jk,l := a ikb jl

This readily generalises to tensors with lots of indices:

(S T)i0...,j0...k0...,l0... := s i0...k0...t j0...l0...

In either case, we represent the tensor product in string diagrams by ‘stacking boxes’:

We can sequentially compose tensors the same way we would matrices. For matrix composition, we get the components of BA by multiplying components A and B together, and contracting (i.e. summing over) the output index of A with the input index of B:

For tensors, the story is much the same, except we contract all the upper indices of one with the lower indices of the other:

As the pictures indicate, we should think of BA as a sequential composition and A B as parallel composition. When we start mixing these two together, the magic happens.

Exercise 2.5. We have seen that we can construct some vectors in the tensor product by combining states of the component Hilbert spaces together to form a product state. This raises the question:

Are all the states in H K product states?

The answer is no. We know that any product state e.g. in 2 2 has the form given in equation (2.4). Find a vector |ψ in 2 2 that is not a product state and prove that is the case by showing the following equation is not solvable:

( ψ0ϕ0 ψ0ϕ1 ψ1ϕ0 ψ1ϕ1 ) = |ψ = ( a b c d )

2.1.5 Sums and diagrams

Since these diagrams describe linear maps, it makes perfect sense to take linear combinations of them: for linear maps f,g and scalars λ,μ, the linear map λf + μg sends a vector v to λf(v) + μg(v). At the level of matrices, this amounts to scalar multiplication and adding matrices together element-wise. In fact, we were already doing this back in Section 2.1.3 when we wrote diagonalisation as:

Conveniently, we can summarise all of the various linearity and bilinearity conditions involving composition and tensor products into one principle:

Sums distribute over string diagrams.

In other words, when a summation occurs somewhere inside a diagram, it can always be pulled to the outside. We can see how this works using the following example. Consider the following quantum state, which is often called the (unnormalised) Bell state:

This state famously satisfies the following yanking equation with its adjoint Φ|. Shown both in bra-ket notation and as a string diagram, this equation is the following:

(2.5)

Intuitively, we can think of the diagram of the left-hand side as a zig-zag shape that we are “yanking” into a straight piece of wire. We can prove this by substituting the definitions of |Φ and Φ| into the diagram and pulling the sums out to the outside:

2.1.6 Tensor networks and string diagrams

What happens when we compose two matrices A,B in sequence and two more matrices C,D in sequence, then compose the results in parallel?

(BADC)i,jk,l = (BA) ik(DC) jl = xaixb xk ycjyd yl = xyaixb xkc jyd yl

How about when we do it the other way around?

[(BD)(AC)]i,jk,l = xy(AC)i,jx,y(BD) x,yk,l = xyaixc jyb xkd yl = xyaixb xkc jyd yl

We get the same thing! The resulting equation is called the interchange law:

BA DC = (B D)(A C)

Let’s see what happens if we do the same thing in the string diagram notation. Doing the composition one way around, we plug A and B together in sequence, plug C and D together in sequence, then stack the results in parallel:

The other way around, we stack A and C in parallel, then stack B and D in parallel, then plug the results together in sequence:

We see that the interchange law becomes completely trivial in the language of string diagrams:

This is a good thing! The interchange law isn’t capturing something fundamental about the processes we are studying, but is instead just a bit of unavoidable bureaucracy we have to deal with to fit sequential and parallel composition all in one line. The more our notation can swallow this bureaucracy and let us focus on what’s really important, the better! You might have noticed that, while we have already been using string diagrams in this section, we haven’t yet defined them in generality. For our purposes, string diagrams are a graphical notation for a general kind of composition of tensors called a tensor network. A tensor network is a particular way of composing tensors, where the elements of each of the component tensors are multiplied together and some pairs of indices are matched and summed over. We can represent this as a string diagram by depicting each of the component tensors as boxes and indices as wires. Wires that are open at one end represent free indices (i.e. the indices corresponding to inputs/outputs of the tensor), whereas wires that are connected at both ends correspond to bound indices. The latter are matched pairs of indices that get summed over. This might sound a little abstract, but if we see an example, it should be pretty clear what is going on. Suppose we have the following string diagram describing a big linear map:

(2.6)

We can (arbitrarily) assign index names to each of the wires:

Then, this diagram describes the following tensor network:

Φabcde = xyzfabxdg xyezh czy
(2.7)

We will typically work purely with string diagrams and omit the indices, which are only relevant inasmuch as they uniquely identify each wire in the diagram. One thing to notice is that wires can freely cross over each other and even form feedback loops. Also, if we deform the string diagram and draw it different on the page, it will still describe the same calculation (2.7). For example:

In other words, the only relevant data is which inputs and outputs are connected to which others. This fact can be summarised in the string diagram mantra:

Only connectivity matters!

The final thing to note is we can also build up arbitrary tensor networks from and , provide we add two additional ingredients: swap maps and the partial trace operation. The former is a linear map which interchanges the two tensor factors of a vector in A B:

By linearity, it is totally defined by how it acts on product vectors:

SWAP :: |ψ|ϕ|ϕ|ψ

Note that the SWAP for two qubits has the following matrix:

(2.8)

The latter operation, the partial trace, sums the last input index together with the last output index of a tensor. That is, for a linear map f : A X B X, TrX(f) : A B is a linear map whose matrix elements are given by:

TrX(f)ij = kfikjk

In the special case where a map has only one input and output, this is just the normal trace, i.e. summing over the diagonal elements of a matrix: for g : X X we have Tr(g) = kgkk. Graphically, we depict the partial trace as introducing a feedback loop:

Exercise 2.6. Write an expression for (2.6) using ,,SWAP, and Tr(). Show that it evaluates to the same thing as (2.7).

2.1.7 Cups and caps

If this is your first time seeing this notation for the partial trace it might make you... a little uncomfortable. Don’t these feedback loops introduce all kinds of nasty complications like grandfather paradoxes and other problems to do with time travel? One way to see that it is in fact all fine, is to remember that the string diagrams are just notation for particular linear maps, and a wire denotes an index you contract over, where a feedback loop simply means we are contracting an input and an output ‘in the wrong order’. So you can just ‘shut up and calculate’ and treat the loops as a handy piece of notation. But we can also go one step further and decompose the loop into a couple of smaller pieces, which correspond to some special types of tensors. We have in fact already seen these pieces: they are the Bell state |Φ and effect Φ|. We introduce some special notation for these tensors:

(2.9)

The reason this notation works is because it makes the yanking equation of Eq. (2.5) particularly nice:

(2.10)

Here the second equation states the state is symmetric in its two outputs. In the context of string diagrams we call this special state the cup and the effect the cap. Note that when we consider the cup and cap of qubits, their matrices are:

(2.11)

We can generalise this to a matrix representation for a cup and cap over any dimension. We can also write the cup and cap using indices: ij = δij and ij = δij, where δij is the standard Kronecker delta. In this notation it becomes especially clear that the cup and cap are like ‘bended identities’, since idji = δij. Using the cup and cap we can view a trace not as a feedback loop, but as a composition of a cup, a cap, and an identity wire connecting the two of them:

We are then left with a string diagram where every input is connected to an output, and not vice versa.

2.2 A bluffer’s intro to quantum theory

Okay, so now we have all the mathematical apparatus to describe quantum theory. But what is quantum theory actually? You will have almost certainly heard of quantum mechanics, and you may have heard of a few things starting with the word ‘quantum’, like quantum optics, quantum electrodynamics, quantum chromodynamics, and the standard model (okay, that last one doesn’t start with ‘quantum’, but it’s still relevant here). These are all theories of physics, which cover the behaviours of certain kinds of things that exist in the world (electrons, photons, quarks, the Higgs boson, ...). They are all underpinned by quantum theory, which can be seen more as a framework to build theories of physics on top of, rather than a full-fledged theory of physics in its own right. All the theories of physics we might build based on quantum theory have some characteristics in common, which can pretty much be summed up in the following 4 statements, which we’ll call the SCUM postulates.

These 4 postulates are a (loose) paraphrase of the Dirac-von Neumann axioms of quantum theory, where we’ve put a bit more emphasis on the bits that are most relevant for quantum computing. Everything in bold above will get defined soon enough. The point here is to show that there really isn’t very much at the core of quantum theory. Quantum theory is in fact very simple, and yet it somehow happens to be very good at making predictions about a whole lot of different kinds of physical systems. However, if you have encountered other theories of physics which can be derived from simple postulates (and one in particular about space and time proposed by a dude with crazy hair), you’ll notice something a little strange about the von Neumann postulates: they are all about maths and not at all about stuff . This is pretty odd for a theory of physics, because of course physics is fundamentally about stuff, and mathematics is just the tool. This has led to almost a century of dissatisfaction about the ‘fundamental’ nature of quantum theory, and led to a lot of interesting arguments, many books, and the whole field called quantum foundations, which asks not just what quantum theory is, but why it is the way it is. That is to say, if after reading the four postulates and learning what all the words in bold mean, you are still scratching your head about what the words in bold mean, you are actually in good company.

2.2.1 Quantum states

There are a handful of equivalent ways we can represent a quantum state. The first is as a normalised vector |ψ in a Hilbert space H. This representation is the simplest, but also contains a bit of redundancy, because as we will see in Section 2.2.4, quantum theory will make the exact same predictions about a state |ψ as it will about e|ψ for any angle α. Hence, there is no physical difference between |ψ and e|ψ. We call the scalar factor e a global phase, and we say the states |ψ and e|ψ are equivalent up to global phase. So, technically, a quantum state is not just one vector, but a set of vectors {e|ψ|α [0,2π)} which are equivalent up to a global phase. But then, this is just all the normalised states in a one-dimensional subspace of H.

Proposition 2.2.1. The set of all normalised states in a one-dimensional subspace S of a Hilbert space H is always of the form {e|ψ|α [0,2π)} for some normalised state |ψ.

Proof. Every vector in a one-dimensional space is a scalar multiple of some fixed vector, i.e. such spaces are always of the form S := {λ|ψ|λ } for some non-zero vector |ψ. Since S contains every scalar multiple of |ψ, we can also assume without loss of generality that |ψ is normalised. From this it follows that any state of the form e|ψ is a normalised state in S:

(e|ψ)e|ψ = (eψ|)(e|ψ) = 1 ψ|ψ = 1

Conversely, if λ|ψ S is normalised, we have:

1 = (λ|ψ)(λ|ψ) = λ¯λψ|ψ = λ¯λ = |λ|2

But, as we saw in Section 2.1.1, |λ|2 = 1 if and only if it is of the form e for some α. Hence, all the normalised states in S are of the form e|ψ. □

So, the second equivalent way to represent a quantum state is to write down it’s one-dimensional subspace S. Picking any normalised vector in that space will recover |ψ, up to a global phase, whereas looking at the space spanned by |ψ will recover S. A third equivalent way to represent a quantum state is by doubling |ψ. Somewhat counter-intuitively, we can make the representation of the state less redundant by putting two copies of |ψ together. Or, more precisely, by multiplying the ket |ψ with its adjoint bra ψ|:

|ψ|ψψ|

The result is a linear map P|ψ := |ψψ| which goes from H to H. So, why is this representation actually less redundant than just taking the ket |ψ? Well, if we think about doubling an equivalent state |ϕ := e|ψ, something magic happens:

|ϕϕ| = (e|ψ)(e|ψ) = ee|ψψ| = |ψψ|

The global phase disappears! So, what’s going on here? There’s a couple of ways to think about this. The most natural way to see this, is that it has something to do with measurement probabilities, and the actual reason we don’t care about phases in the first place. We’ll return to this point in Section 2.2.4, once we know how measurement probabilities are computed. For now, perhaps the best way to think about this is that |ψψ| is just another way of representing a one-dimensional subspace S, which we already know is an equivalent way of representing a quantum state. To see that, first recall from Definition 2.1.6 that a projector P is a linear map that is self-adjoint (P = P) and idempotent (P2 = P). Projectors are linear maps that ‘squash’ vectors down into a subspace, called the range of the projector: S := {|ϕ||ψ H : |ϕ = P|ψ} H. If S is an n-dimensional subspace, we say P is a rank-n projector on to S. Now, it happens to be that P|ψ := |ψψ| is a rank-1 projector, whose range is exactly S := {λ|ψ|λ }. If we draw this as:

we can imagine vectors coming in the wire on the left, then getting ‘squashed’ down to nothing (i.e. just a scalar factor), then getting embedded back into the H as a scalar times |ψ:

Exercise 2.7. Show |ψψ| is a projector with image S := {λ|ψ|λ }. Conversely, show that if P is a projector with 1D range S, P = |ψψ|.

So, that completes our three equivalent pictures of the quantum state. A quantum state is:

1..

a normalised vector |ψ H, up to a redundant global phase e,

2..

a one-dimensional subspace S H, or

3..

a rank-1 projector |ψψ|.

Of course, this really just tells us how a quantum state is represented mathematically, which is enough for our purposes. What the quantum state it is actually representing is a whole other question, which has been debated for more than a century. We’ll make some remarks about this and point to some further reading in Section 2.8.

Remark 2.2.2. Note that the quantum states we introduced in this section are sometimes called pure quantum states, to distinguish them from more general mixed quantum states. The latter allow us to represent having only partial information or access to a quantum system. For the most part, we won’t need this extra generality for the topics covered in this book, so we’ll focus purely on pure states for now.

2.2.2 Qubits and the Bloch sphere

We now turn our attention to the most interesting system from the point of view of quantum computation: the quantum bit, or qubit. We’ll see in the next section that the one-dimensional Hilbert space 1 = is the trivial, ‘no system’ system. So, to get something non-trivial, we should go one dimension up, to 2. Hence, a qubit is a system whose state lives in the Hilbert space 2. Physically, qubits can be implemented with any physical system that has at least 2 states that can be perfectly distinguished by a single quantum measurement (which we’ll cover in Section 2.2.4). As the name suggests, a qubit is the quantum analogue of a bit. As such, we’ll give the two standard basis elements some suggestive names:

|0 := ( 1 0 )|1 := ( 0 1 )

Graphically, we will depict these states as follows:

These are the quantum analogues of the two possible states 0 and 1 that a classical bit could take. In Section 2.3, we’ll see how this embedding of classical bits into two-dimensional vector spaces can be used to lift classical logic gates on bits to quantum logic on qubits. Because of this connection to classical computation, the basis {|0,|1} is often called the computational basis. We will occasionally use this term, but will instead mostly call it the Z-basis, for reasons that will shortly become clear. We know from the last section that quantum states in 2 correspond to normalised vectors |ψ 2, up to a global phase. We’ll now use this fact to find a convenient way to parametrise a generic state |ψ := a|0 + b|1 in 2. First note that normalisation puts a restriction on the values a and b can take:

1 = ψ|ψ = a¯a + b¯b = |a|2 + |b|2.

Since |a|2 + |b|2 = 1, we can draw the following triangle with 1 on the hypotenuse:

From the sine and cosine laws, we can always express |a| and |b| in terms of a single angle 𝜃: |a| = cos 𝜃 and |b| = sin 𝜃. As a matter of convention, it’s slightly more convenient to use 𝜃 2 instead of 𝜃, so let |a| = cos 𝜃 2 and |b| = sin 𝜃 2. As we saw in Section 2.1.1, the absolute value of a complex number λ = re fixes r. Hence, we can conclude that a = cos 𝜃 2e and b = sin 𝜃 2e, for some phase angles β,γ. So, using just normalisation, we can replace 2 complex numbers with 3 angles:

|ψ = cos 𝜃 2e|0 + sin 𝜃 2e|1

Now we can use the freedom to choose a global phase to get rid of another one of these angles. Since we can freely multiply |ψ by a global phase without changing the physical state, the angle β is actually redundant. We can cancel it out by multiplying the whole thing by e.

e (cos 𝜃 2e|0 + sin 𝜃 2e|1) = cos 𝜃 2|0 + sin 𝜃 2ei(γβ)|1

Letting α := γ β, we can therefore write a generic qubit state |ϕ conveniently as:

|ϕ := cos 𝜃 2|0 + sin 𝜃 2e|1

Since the quantum state is now totally described by two angles, we can plot it on a sphere, called the Bloch sphere:

This picture is useful for our intuition. For example, the more ‘similar’ two states are, that is, the higher the value of their inner product, the closer they are on the Bloch sphere. In particular, antipodes are always orthogonal states.

Exercise 2.8. Show that, for the following antipodal states on the Bloch sphere:

|ϕ0 = cos 𝜃 2|0 + sin 𝜃 2e|1 |ϕ1 = cos 𝜃 + π 2 |0 + sin 𝜃 + π 2 e|1

we have ϕ0|ϕ1 = 0.

For a qubit, picking an orthonormal basis just comes down to picking a pair of orthogonal normalised states, hence, we can always think of orthonormal bases as different axes cutting through the Bloch sphere.

Exercise 2.9. The standard basis |0, |1 corresponds to the Z axis of the Bloch sphere. Show that the following states

|+⟩ := 1 2(|0 + |1) |−⟩ := 1 2(|0|1) |+i := 1 2(|0 + i|1) |i := 1 2(|0 i|1)

correspond to the X and Y axes, respectively. That is, show they are located on the Bloch sphere as follows:

The three pairs of antipodal points each correspond to an orthonormal basis for 2:

X-basis := {|+,|} Y -basis := {|+i,|i} Z-basis := {|0,|1}

These bases mark out the X, Y, and Z axes of the Bloch sphere. We’ll see in Section 2.3.2 that they are also eigenvectors of the Pauli X, Y , and Z operations, respectively. Let’s now take a look at what unitaries do to qubit quantum states visually. A paradigmatic example is the Z phase gate:

Z(α) = |00| + e|11|

On eigenstates |0 and |1, the Z(α) doesn’t do anything: or rather it only changes the state up to a global phase:

Z(α)|0 = |0Z(α)|1 = e|1|1

However, on a generic state, we end up multiplying the coefficient of |1 by e, which amounts to adding α to its phase:

|ψ = cos 𝜃 2|0 + e sin 𝜃 2|1 Z(α)|ψ = cos 𝜃 2|0 + ei(α+β) sin 𝜃 2|1

Since the phase of the coefficient of |1 determines how far around the Z-axis to plot the state, we can conclude that Z(α) amounts to a Z-rotation of the Bloch sphere by an angle of α.

While we said that Z(α) is a paradigmatic example, it is in fact (essentially) the only example of a unitary over 2, up to a global phase. We can see that by taking some generic unitary and diagonalising it:

U = λ0|ϕ0ϕ0| + λ1|ϕ1ϕ1|

Then, as we saw in Exercise 2.4, the eigenvalues of a unitary are always phases, so λj = eiαj, hence up to a global phase, we have:

U |ϕ0ϕ0| + e|ϕ 1ϕ1|
(2.12)

So, U is basically Z(α), written with respect to another ONB {|ϕ0,|ϕ1}.

Exercise* 2.10. We have already seen that ONBs correspond to axes on the Bloch sphere. Show that U as defined in (2.12) corresponds to a rotation about the axis defined by {|ϕ0,|ϕ1} by an angle of α.

Hence, single-qubit unitaries correspond exactly to rotations of the Bloch sphere. A standard fact about rotations of a sphere is that any rotation can be generated by a sequence of three rotations about two orthogonal axes. The standard choice of orthogonal axes you see crop up in the quantum computing literature is a Z-axis rotation, which we’ve already met, and an X-axis rotation:

X(α) = |++| + e||

Exercise* 2.11. Show that, up to a global phase, we can always write a qubit unitary as a composition of Z and X rotations as follows:

This is called the Euler decomposition of U.

2.2.3 Unitary evolution

We won’t talk much about the Schrödinger equation in this book, but no bluffers intro to quantum theory would be complete without showing it at least once. So here it is, the time-dependent Schrödinger equation:

i d dt|ψ(t) = H(t)|ψ(t)
(2.13)

where H(t) is a self-adjoint operator, called the Hamiltonian, which may depend on the time t (and yes physicists, we are ignoring the constants and units). The actual form this operator takes depends on what sorts of physics is actually going on. If we for instance are describing the internals of a molecule we would have a particular Hamiltonian describing the momentum, the spin, and attraction or repulsion between all the electrons, protons and neutrons. If instead we are describing a lab situation where we are shining a laser on trapped ion, we might only care about the terms having to do with the time-dependent laser interaction. This differential equation describes how a quantum state evolves in time. When H doesn’t depend on t, solutions to (2.13) take a nice form in terms of an initial state |ψ(0) and the matrix exponential eitH. We’ll talk about matrix exponentials a lot more in Chapter 7, but for now, suffice to say that matrix exponentials behave a lot like plain ole number exponentials when you take their derivatives. In particular, for matrices we have d dtetM = MetM, so taking the derivative of eitH|ψ(0) with respect to t just pops out a factor of iH. Hence:

i d dteitH|ψ(0) = (i)(iH)eitH|ψ(0) = HeitH|ψ(0)

So, eitH|ψ(0) gives a solution to the Schrödinger equation, and we can let |ψ(t) := eitH|ψ(0). Another handy thing is, whenever H is self-adjoint, eitH is unitary (again we’ll see why in Chapter 7). Hence, U(t) := eitH gives us the unitary time evolution operator of a quantum system. The reason we don’t talk much about the Schrödinger equation in this book is, for the purposes of quantum computation, it usually suffices to talk about time evolution abstractly, fully in terms of unitaries. That is, if a system is initially in state |ψ := |ψ(0), its time evolution for a fixed chunk of time (say t = 1) is described by some unitary map U := U(1). Conversely, there is a powerful theorem that says any unitary map arises as the time evolution of some Hamiltonian in this kind of way. For this reason, it is often convenient to just talk directly about the unitaries and leave it to the physicists to worry about Hamiltonians.

2.2.4 Measurements and the Born rule

Most quantum computations consist of preparing a quantum state, doing some unitary evolution of the state, then measuring the result. The only ingredient we are missing so far is measurement. In order to do a quantum measurement, we need to choose what we want to measure. A measurement is a (non-deterministic) process which extracts some information from a quantum system and usually changes the state of the system when we do it. Mathematically, this is represented as a set of projectors that sum up to the identity:

M = {M1,,Mk} iMi = I

The probability of getting the i-th outcome when we measure a state |ψ with M is computed with the following M-sandwich:

Prob(i|ψ) = ψ|Mi|ψ
(2.14)

This quantum sandwich is called the Born rule. Note that iMi = I implies that the set of projectors in Mi are mutually orthogonal, i.e. MiMj = 0 whenever ij. So, one can think of measuring roughly as checking whether a state |ψ is “in” one of a collection of k orthogonal subspaces, given by the images of the projectors M1,,Mk. The reason for the scare-quotes around “in” is that a given state doesn’t need to be totally inside the image of any of the projectors Mi. But if it is completely inside the i-th subspace, then Mi|ψ = |ψ and:

Prob(i|ψ) = ψ|Mi|ψ = ψ|ψ = 1.

So in that case we get outcome i with certainty. However, in general |ψ could be a linear combination of vectors in more than one orthogonal subspace given by M. But here’s the funny thing: after we measure a system and get outcome i, the state of the system will always be entirely in the image of Mi. In other words, if it was only partially in the image of Mi before, it has to move! Quantum theory tells us that, if we measure |ψ with M and get outcome i, the resulting state is given by projecting |ψ onto the image of Mi and renormalising, i.e.

This rule for updating a state after a measurement is called Lüder’s rule. This idea that measuring something changes it is not so strange if you think about it. Even in the classical world, if we want to know something about a system, we need to interact with it somehow. For instance, if we want to know what colour a ball is, we can shine a light on it, and see which light reflects back. In the process, we barrage the ball with a bunch of photons which interact with the particles in the ball, energizing them and even sloughing off a tiny, tiny bit of paint. The only reason we tend to ignore such things in the classical world is that there are many ways to measure something which essentially don’t change it. For example, you would be hard pressed to figure out if someone shined a light on that ball or not. The story is different in quantum systems. A fundamental property of quantum theory says that the amount of information we can get out of a system is directly proportional to how much we disturb it. So, if we want lots of information, we need lots of disturbance. We can see this with the two extremes of measurement. At one extreme we have ONB measurements. These are perhaps the most common kinds of measurements considered, whose components are the rank-1 projectors given by the ONB states. That is, for an ONB B = {|ϕi}i, we have a measurement MB = {|ϕiϕi|}i. As we saw in Section 2.1.3, summing over these projectors gives a resolution of the identity:

i|ϕiϕi| = I

So this does indeed define a measurement. This kind of measurement gives us as much information as possible, since each of the subspaces corresponding to the measurement outcomes is one-dimensional. However, it also disturbs the system a great deal, since any state will get projected into one of these 1D subspaces after the measurement. That is, |ψ will collapse to a basis state |ϕi after the measurement. For the qubit Z ONB measurement {|00|,|11|}, this can be visualised as sending a state the north or south pole of the Bloch sphere, depending the associated Born rule probabilities:

At the other extreme, we have the trivial measurement I = {I}. This is indeed a collection of projectors which sum up to I, but we only have one outcome: I. Afterwards, the state is “updated” by |ψI|ψ. Since there’s only one outcome, measuring I doesn’t tell us anything, but at least it doesn’t disturb anything either! Between these two extremes, there are lots of examples which can all be seen as some sort of “coarse-graining” of an ONB measurement. Perhaps the most relevant example for us will be measuring just part of a state. For example, the measurement M = {|ϕiϕi| I}i on a system A B corresponds to performing the basis measurement {|ϕiϕi|}i on just the subsystem A.

Exercise 2.12. Define the following 3 measurements on a pair of qubits:

Show that the probability of getting outcome i for A then getting outcome j when measuring B on the resulting state is the same as the probability of getting outcome (i,j) for C.

We will often be interested in computing the post-measurement state after measuring just some of the qubits of a quantum state. For example, this plays an essential role in state injection and quantum teleportation in Section 3.3 and is the key ingredient in measurement-based quantum computing, introduced in Chapter 8. For example, consider measuring the first qubit of a 2-qubit quantum state |Ψ in the computational basis, i.e. performing measurment A from Exercise 2.12. If we get outcome 0, the post-measurment state will be the following, up to renormalisation:

whereas if we get outcome 1, the state will be:

In either case, the state of the second qubit could change, depending on which measurement was performed and the measurement outcome. This sometimes referred to as the back-action of a quantum measurement. It is worth noting that after a basis measurement is performed, the measured qubits are no longer entangled with the rest of the state, so we will sometimes simply ignore them:

(2.15)

Depending on how we implement a measurement, we may have no longer have access to the system after we measure it anyway. For example, if a photon hits a detector and makes a click, then for all intents and purposes, that photon is gone. This kind of measurement is sometimes called a demolition measurement. We can compute the state of the remaining systems after a demolition ONB measurement simply by applying the basis effect to the measurement qubit (and renormalising), e.g. sending |Ψ to (i| I)|Ψ as in (2.15). There are also ways to measure a system without physically destroying it. For example, we may apply a unitary interaction between the system of interest and an ancillary system, and measure the latter. Measurements which leave the measured system in place (albeit changed via a projector) are called non-demolition measurements. This kind of measurement will be especially important for quantum error correction, which we study in Chapter 11. We conclude this section with an alternative way for specifying a quantum measurement. Often in quantum theory and quantum computing literature, measurements are defined in terms of self-adjoint maps called observables. However, an observable just amounts to packing a bunch of projectors up together into one map. That is, if we fix distinct real numbers r1,,rk, we can define a self-adjoint map M from a measurement M = {Mi}i as follows:

M = i=1kr iMi
(2.16)

Conversely, by diagonalising a self-adjoint map M and gathering up terms in the sum with the same eigenvalues, we can see that any self-adjoint map can be written in the form of (2.16) for same unique set of projectors Mi. That is,

M = j=1dλ j|ϕjϕj| = i=1kr iMi

where Mi is the sum over all the projectors |ϕkϕk| associated with a (possibly repeated) eigenvalue λk = ri.

2.3 Gates and circuits

Now that we know about states, unitaries, and measurements, we have all the basic ingredients we need to start talking about computation in the quantum circuit model. In this model, computation proceeds in three steps:

1..

Prepare N qubits in a fixed state.

2..

Apply a quantum circuit to the input state.

3..

Measure one or more of the qubits.

4..

(Sometimes) perform some classical post-processing on the measurement outcome.

A quantum circuit describes a large unitary map in terms of many smaller unitaries (typically drawn from a fixed set) called gates, combined via composition and tensor product. This is essentially the “code” of a quantum algorithm. After performing all the gates, we measure some of the qubits. It is these measurement outcomes that is the actual output of the quantum circuit. Generally, we will have to run a quantum circuit many times in order to gather more measurement outcomes and estimate the probability of getting a certain outcome. The quantum circuit model abstracts away all the physical details of how the computation is actually performed. In practice a qubit might be represented by two chosen energy states of an ion trapped in a magnetic field, or as the different polarisations of a photon, or in terms of any other physical system that has two different quantum states that we have arbitrarily chosen to label as the states |0 or |1. Each individual gate will then physically correspond to some type of interaction with this chosen model of a qubit. It might mean we have to shine a laser on some ions for 10 nanoseconds, or let some photons meet and entangle using a beam splitter. It doesn’t matter, as long as the physical operation corresponds to applying (or approximating) that unitary operation on the quantum state. Quantum circuits enable us to construct large and complex unitaries from known, relatively simple ones. The most interesting sets of quantum gates are universal ones, i.e. sets of gates that allow us to construct any unitary, or at least approximate it to high precision. We will spend a large part of this book thinking about circuits and how we can construct interesting behaviour from simple components.

2.3.1 Classical computation with quantum gates

Quantum computation is a strict generalisation of classical computation. This means any computation involving bits on a classical computer can be lifted to a quantum computation over qubits. We can see this by first noting that any reversible function (a.k.a. bijection) on N bits ϕ : 𝔹N 𝔹N lifts to a unitary permutation over basis vectors:

Uϕ|b := |ϕ(b)b 𝔹N

The simplest non-trivial such gate is the NOT gate:

NOT|0 = |1NOT|1 = |0

More interesting is the controlled-NOT, or CNOT gate:

CNOT :: { |00|00 |01|01 |10|11 |11|10

One way to think of this gate is that the first bit is controlling whether the second bit gets a NOT applied to it. Constructing the ‘controlled’ variant of a quantum gate is a common theme in quantum circuits. Another way to think about the CNOT gate is as an exclusive-OR (XOR) gate, where we’ve also kept a copy of one of the inputs to keep things reversible. We write XOR (a.k.a. addition modulo 2) as . Using this notation, we have:

CNOT|x,y = |x,x y

Another useful classical gate is the controlled-controlled-NOT gate, which is also called the Toffoli gate. It is defined as follows:

TOF|x,y,z = |x,y,(xy) z

where xy is the product (a.k.a. the AND) of the two bits x and y. This gate flips the third bit if and only if the first two bits are 1. The NOT, CNOT, and Toffoli gates are all in a family of N-controlled NOT gates, i.e. NOT is a 0-controlled NOT gate, CNOT is 1-controlled, and Toffoli is 2-controlled. We draw them in a similar way, with dots on each of the N control qubits, connecting to an on the target qubit, i.e. the one receiving the NOT:

Note that plugging a |1 into any of the control qubits gives a smaller N-controlled NOT gate on the remaining qubits:

whereas plugging a |0 into any of the control qubits leaves an identity map on the remaining qubits. We can think of the Toffoli gate as a reversible version of the AND gate, where again we keep copies of the inputs and ‘store’ the output of the AND gate by XOR’ing it with the third input qubit. In particular, if we provide it with a ‘fresh’ qubit in the |0 state for the third input, we have:

TOF|x,y,0 = |x,y,xy

This is in fact a special case of a standard technique, sometimes called the Bennett trick, for turning any classical function f : 𝔹N 𝔹M into a unitary:

Uf :: |x,y|x,f(x) y

where x 𝔹N,y 𝔹M and is taking the XOR of the two M-bit strings elemement-wise. Even if f is not itself a bijection, the mapping (x,y)(x,f(x) y) is bijective, hence Uf is a permutation of basis states. We call this a quantum oracle for the function f. This construction plays a central role in many quantum algorithms, since it allows us to query a classical function f using a quantum state. If this is a basis state of the form |x,0, this essentially amounts to evaluating the function and storing its output in the last qubit:

Uf|x,0 = |x,f(x)

However, choosing other input states allows us to query f in non-classical ways, which if we are clever, can give us some extra quantum oomph.

2.3.2 Pauli and phase gates

The Pauli gates are single-qubit unitaries which will play a major role throughout this book, especially in Chapters 67 and 11. We’ve seen one of the Pauli’s in the previous section already: the NOT gate, a.k.a. the Pauli X gate. It is called such because it represents a 180 rotation around the X-axis on the Bloch sphere:

NOT = X = X(π)

There are two other canonical choices for ‘quantum’ NOT gates, corresponding to 180 rotations around the Z and Y axes, respectively. Together, these three maps form the single-qubit Pauli matrices:

X := ( 0 1 1 0 )Y := ( 0 i i 0 )Z := ( 1 0 0 1 )
(2.17)

All three of these matrices are unitary and self-adjoint (hence equal to their own inverse), and multiplying any two of them gives the third one, up to a factor of ± i. As a result, they generate a finite group called the (single-qubit) Pauli group P1. We’ll have a lot more to say about Pauli groups in Chapter 6. As we already saw in Section 2.2.2, we can produce unitary rotations about an axis of the Bloch sphere given by an ONB {|ϕ0,|ϕ1} as:

RB(α) = |ϕ0ϕ0| + e|ϕ 1ϕ1|

Applying this to the X, Y, and Z basis, we obtain the X-, Y-, and Z-phase gates as follows.

X(α) := |++| + e|| Y (α) := |+i+i| + e|ii| Z(α) := |00| + e|11|
(2.18)

We also noted in Section 2.2.3 that the X(α) and Z(α) gates can generate any single-qubit unitary, up to a global phase. Indeed we have:

(2.19)

Remark 2.3.1. We write to mean equal up to a non-zero scalar factor. It is important that we say non-zero, because 0 M = 0 N for any M,N, so if we allowed 0 the statement becomes vacuous. In the case of equation (2.19) above, this scalar is a global phase, which is always non-zero since |e| = 1 for all α.

From (2.18), there is an evident matrix presentation for Z-phase gates:

Z(α) = ( 1 0 0 e )

We could also compute matrix presentations for X- and Y-phase gates from (2.18), yielding matrices containing sums of 1,i, and e. However, it is often more convenient when working concretely with the matrices to pass to an equivalent (up to global phase) form using sine and cosine functions:

X(α) ( cos α 2 isin α 2 isin α 2 cos α 2 ) Y (α) ( cos α 2 sin α 2 sin α 2 cos α 2 )

There are two Z-phase gates that crop up so often in quantum computing that they have gained special names. These are the S gate S := Z(π 2 ) and the T gate T := Z(π 4 ). Note that we have T2 = S and S2 = Z. These names ‘S’ and ‘T’ don’t really mean anything (apart from that ‘T’ follows ‘S’ in the alphabet), they are just names that people happened to decide on. In some (especially older) quantum computing papers you might see people call the S gate the P gate, where in this case ‘P’ stands for ‘Phase’. Some people also refer to the T gate as the π 8 -gate. This is because:

T ( eiπ 8 0 0 eiπ 8 )

2.3.3 Hadamard gates

The Hadamard, or H gate is a gate that sends the Z basis to the X basis, and vice-versa:

H = |+0|+|1| = |0+|+|1| = 1 2 ( 1 1 1 1 )

It can be visualised, like all self-inverse unitaries on qubits, as a 180 rotation of the Bloch sphere. This time, it is through the diagonal axis that lies halfway between the X-axis and the Z-axis:

(2.20)

One way to see this is by computing the Euler decomposition of H:

As rotations of the Bloch sphere, this gives:

While it might not be immediately obvious from staring at the pictures above that this indeed gives the rotation depicted in (2.20), you can probably convince yourself this is true if you try it on a real world object. If you are using a coffee cup or a beer can, we suggest draining it first. We can write the effect of a Hadamard gate on computational basis elements compactly as follows:

H :: |x 1 2 y{0,1}(1)xy|y
(2.21)

This behaviour of Hadamard gates, where it introduces a phase of 1 = e whenever the product of two boolean variables (in this case x and y) is 1, will play a special role in Chapter 7, when we study the path sum representation of quantum circuits. We can generalise equation (2.21) to the action of n H-gates applied in parallel to an n-qubit computational basis element as follows:

Hn :: |x 1 2n y{0,1}n(1)x1y1++xnyn |y = 1 2n y{0,1}n(1)xy|y
(2.22)

where x y is the dot product of the two vectors x and y, taken modulo 2. This works because, for an integer k, the value of (1)k only depends on where k is even or odd. This is sometimes called the Hadamard transform of a vector, which is a type of discrete Fourier transform. We will discuss different types of Fourier transforms, particularly in the context of boolean logic, in Chapter 9.

2.3.4 Controlled unitaries

Controlled unitaries are unitaries where the first qubit acts as a control, dictating whether the unitary U gets applied to the remaining qubits.

CTRL-U :: { |0|ψ|0|ψ |1 |ψ |1 U|ψ

This can be seen as a sort of “quantum IF statement”:

 IF q[0] THEN APPLY U TO (q[2] .. q[n-1])
 ELSE DO NOTHING

Exercise 2.13. The CNOT gate is also often called the CX gate, because it applies an X gate to the target if the control is |1. We similarly also have a CZ gate that applies a Z instead. We can construct this using a CNOT and Hadamard gates as follows:

Find the matrix of the CZ by directly calculating the matrices involved in the circuit above.

2.3.5 (Approximate) universality

A set G of gates is exactly universal if the gates in G can be combined to construct any N-qubit unitary U. A typical example of an exactly universal set of unitaries is G = {CNOT,H}{Z(α)|α [0,2π)}. Using just H and Z(α), we can produce X(α) = HZ(α)H. Hence, any single-qubit unitary can be constructed via the Euler decomposition. What is trickier to show is that, by combining single-qubit unitaries and CNOT, we get can any N-qubit unitary. We will leave this as an exercise to the reader. :) Since there are uncountably many N-qubit unitaries, exactly universal sets of gates are necessarily infinite. As a result, it is often more convenient to consider approximately universal (or simply universal) sets of gates, which are able to approximate any N-qubit unitary up to arbitrarily high precision. Put a bit more precisely:

Definition 2.3.2. For some 𝜀 > 0, a unitary U 𝜀-approximates U if for all normalised states |ψ, U|ψ U|ψ 𝜀.

The more mathematically-inclined literature may say in this case that U and U are 𝜀-close in the operator norm. Note that it is important we take only the normalised states here (or at least states with some bounded norm). Otherwise, two unitaries will never be 𝜀-close unless they are equal, because we can always just pick some huge number λ where U(λ|ψ) U(λ|ψ) = |λ|U|ψ U|ψ > 𝜀.

Definition 2.3.3. A set of gates G is approximately universal if for any unitary U and 𝜀 > 0, we can construct a unitary U using just gates from G that 𝜀-approximates U.

Unlike exactly universal sets of gates, approximately universal sets can be finite. Perhaps the most commonly considered approximately universal set of gates is the Clifford+T set:

Clifford+T := {CNOT,H,T := Z (π 4 )}

This is called the Clifford+T set, because it is often regarded as adding the T gate to this set:

Clifford := {CNOT,H,S := Z (π 2 )}

Note that the Clifford circuits are a subset of the Clifford+T circuits, since S = T2. Clifford+T circuits are approximately universal. While we won’t go into the details here (they are covered in many standard quantum computing textbooks), we can sketch a fairly simple argument. First, note that if we can rotate around any axis of the Bloch sphere by an irrational multiple of π that we can approximate any rotation around that axis: we just need to keep going around and around until we land somewhere close. Then, defining R = HTH, we can observe that TR is an irrational rotation around some axis of the Bloch sphere and RT is an irrational rotation around some different axis. Putting these together, we can approximate generic rotations around two distinct axes, so that we can approximate any rotation. Then, just like in the exactly universal case, as soon as CNOT gets involved, we can boost up generic single-qubit unitaries to generic many-qubit unitaries. The way we defined approximate universality, we are only asking that the gate set can approximate any unitary, we are not asking that it can do so with small overhead, or that we can find the approximation efficiently. Luckily, we get those features for free. The Solovay-Kitaev theorem states that if we have any finite set of single-qubit unitaries that can approximate any single-qubit unitary arbitrarily well, then it can also do so efficiently (to be precise: with a poly-logarithmic overhead in the precision 1𝜀). The algorithm is constructive, so it also tells you how to find this approximation. In Chapter 10 we will describe a concrete approximation algorithm using the Clifford+T gate set, which works even better than just using the Solovay-Kitaev algorithm. A consequence of the Solovay-Kitaev algorithm is that any finite approximately universal gate set is essentially equivalent to any other. If we require N gates for a particular circuit in one gate set, then we require O(Nlogc(N𝜀)) gates in the other gate set if we want to approximate it up to an error 𝜀. Here c is a particular (small) constant that depends on the precise gate set being used. What all of this means is that we are essentially free to choose whatever universal gate set we want, and it won’t change which unitaries we can efficiently approximate.

Exercise 2.14. In this exercise we will see that the eigenvectors of the Hadamard can be represented by a Clifford+T circuit. Let |H := |0 + (2 1)|1.

a)

Show that |H is an eigenvector of the Hadamard gate. What is its eigenvalue?

b)

Give an eigenvector of the Hadamard with a different eigenvalue.

c)

We will now work towards showing that |H can be constructed (up to non-zero scalar) using a |+⟩ state and Clifford+T gates. First, note that tan π 8 = 2 1, and then write |H = 2cos π 8 |H as a|0 + b|1 with a and b written in terms of ± e±iπ 8 factors (you will need the result from Exercise 2.2).

d)

Now write eiπ 8 HS|H as a superposition of |0 and |1.

e)

Give a sequence of Clifford+T gates G1,,Gk such that GkG1|+|H.

2.3.6 Quantum circuit notation

A certain set of conventions for representing quantum circuits has developed throughout the years that is affectionately called quantum circuit notation. We have already seen many aspects of this in this section: qubits are represented by lines going from left to right, with multiple qubit lines being drawn from top to bottom. Unitary gates are represented by boxes on these qubits, usually with certain standardised names or symbols:

The NOT gate (the Pauli X) has one of these special symbols: . Adding a control-wire to a unitary is represented by a black dot connected to that unitary (a unitary controlled on the control being |0 instead of |1 is often depicted with a white dot instead of a black dot, though we will not need that in this book). A multi-qubit unitary, like U here, is a box connected to multiple input and output wires. A SWAP gate is depicted as literally just a swapping of the qubit wires. Quantum circuit notation also allows for representing non-unitary components like state preparations, measurements and classically controlled operations. Consider for instance the following implementation of magic state injection (see Section 3.3.1):

Here the second qubit has a |T in front to denote that this qubit is prepared in the specific |T := T|+ state. The box with the arrow that looks a bit like a meter on a dial represents a demolition measurement in the computational basis that returns a classical value of 0 or 1 depending on if the measurement outcome was |0 or |1. The doubled-up wire coming out of it on the right represents a classical state, the outcome of the measurement. This classical wire flows into the S gate, meaning this S gate is controlled on this classical outcome: we only apply the S gate it the measurement outcome was 1. Quantum circuit notation has been very useful in representing operations one would want to run on a quantum computer. It is however not ideal when trying to prove properties about computations, we will see a much more versatile way to do that in the next chapter.

2.4 A dash of quantum algorithms

It might be a bit strange to spend this long talking about quantum circuits and computations without saying much about what those computations are actually being used for. Indeed we imagine all of the circuits we are working with are part of a quantum algorithm. We have deliberately chosen not to talk too much about algorithms in this book because we wanted to focus on many other important aspects of quantum software like compiling, verification/classical simulation, and error correction. However, in the interest of being somewhat self-contained, we will give a brief taster of what a typical “oracle-style” quantum algorithm looks like. While it cannot be said for every quantum algorithm out there, many algorithms (including Shor’s famous factoring/period finding algorithm) fit a very simple template. We start with a classical function f that is easy to describe (e.g. as a small boolean formula or program) and we want to know some kind of global property P about f. Then, the quantum algorithm proceeds as follows:

1..

Prepare a superposition of computational basis states.

2..

Apply the quantum oracle Uf to that state.

3..

Measure some or all of the qubits (usually in a non-standard basis).

4..

Perform some classical post-processing on measurement outcome(s) to compute P.

The quantum oracle Uf is the unitary map derived from the classical function f using the Bennett trick we explained in Section 2.3.1:

Uf|x,y = |x,y f(x)

By global property, we mean something that would classically require evaluating f on (much) more than one input. The prototypical example is the boolean satisfiability problem SAT, where we want to know if there exists any bit string x such that f(x) = 1. Naïvely, we might need to try lots and lots of bit strings before we found one where f(x) = 1. For various reasons, the prospects of solving SAT on a quantum computer are not looking too good (see the discussion on NP vs BQP in Section 2.5.3), but we can still look to uncover other global properties about some function f. The most exciting such algorithms extract a global property from f exponentially faster than any known classical algorithm. However, these are relatively elaborate to explain in detail, so for the sake of an example, we will settle for something that is simply faster. The Bernstein-Vazirani problem takes as input a function f from n bits to a single bit. This function has a secret bit string s “hiding” inside, in the sense that f(x) = s x, where s x is the dot product (modulo two) of the vectors s and x. Our task is simply to find s. This is a global property of f because there is no way to figure out s from querying f on any single input. In fact, it is possible to show that, in order to recover s classically, we will need to evaluate f on precisely n different inputs. The easiest choice is just to take all of the bit strings ei that have a 1 in the i-th place and 0’s everywhere else. Then f(ei) = ei s = si, the i-th component of s. However, quantumly, we can do this with just a single evaluation of the quantum oracle Uf. The quantum solution to this problem is called the Bernstein-Vazirani algorithm. To solve this problem, we simply perform the following circuit on n + 1 qubits:

(2.23)

Then, with probability 1 (assuming no noise), the measurement outcome will be precisely the n-bit string s. We can see that this works doing a concrete calculation involving bras and kets. The details of this calculation are not too important, since we’ll see a much nicer way to do this same derivation in the next chapter. However, this is typical of the kinds of calculations one meets when showing the correctness of a quantum algorithm. First, we can simplify the presentation of the algorithm a little bit. Rather than using a full Bennett-style oracle Uf on n + 1 qubits, we can use the reduced oracle V f.

Exercise 2.15. Show that Uf(I |) = V f |, where V f|x := (1)f(x)|x.

This lets the |−⟩ “fall through” Uf in (2.23):

Since the last qubit is not entangled with the first n qubits, we can simply compute the state of the first n qubits to predict the measurement outcome. Recall from Section 2.3.3 that Hn acts as follows:

Hn :: |x 1 2n y{0,1}n(1)xy|y

Since H is its own inverse, we also have

Hn :: 1 2n y{0,1}n(1)xy|y|x

This is enough to compute the state of the first n qubits. They start in the initial state of |0 = |0n and evolve through the unitaries as follows:

|0 Hn 1 2n y{0,1}n(1)0y|y = 1 2n y{0,1}n|y V f 1 2n y{0,1}n(1)f(y)|y = 1 2n y{0,1}n(1)sy|y Hn|s

Since the first n qubits end up in the computational basis state |s, if we measure in the computational basis, we’ll get outcome s with certainty. Hence, in a single quantum query, we obtain the hidden bit string s. As long as we are forced to treat f as a black box (i.e. the only thing we can do classically is query values f(x)), this is polynomially better than any classical algorithm. However, there is a variation of this problem, called Simon’s problem which is in fact exponentially better in terms of query complexity. In Simon’s problem, we are given a function f from n-bit strings to n-bit strings. It also hides a secret bit string s, but in a more subtle way. We promise that f(x) = f(y) if and only if y = x s for some fixed bit string s. In order to find s classically, we need to find a collision, i.e. a pair of distinct x and y such that f(x) = f(y), or prove that no such collision exists. Classically, the best strategy is just querying f at random, as we will expect to find a collision after about 2n queries. Quantumly, we can solve this using a circuit that looks very similar to the one before, but with Uf now having 2n qubits, instead of n + 1, since f outputs n qubits:

With a bit of work, we can show that the measurement outcome t that pops out isn’t s itself, but it satisfies the property that s t = 0. Furthermore, we are equally likely to get any bit string t satisfying this property, so if we run this quantum part over and over again, pretty soon we get a system of n independent linear equations involving s, so we can solve for s. (Note, we’ll talk a lot more about doing linear algebra with bits in Chapter 4.)

Remark* 2.4.1. While we won’t give the derivation for Simon’s problem here, it is part of a whole family of problems called hidden subgroup problems which have an efficient quantum solution. You can find a diagrammatic derivation in Picturing Quantum Processes, Chapter 12.

As mentioned before, many algorithms follow the broad outline we gave at the beginning of this section. A variation on this theme is the “Grover-like” family of algorithms, which include Grover’s quantum search algorithm as a prototypical example, where now Uf is called many times, with a thin layer of quantum gates between each iteration.

2.5 A dash of complexity theory

In Section 2.3.5 above we wrote O(Nlogc(N𝜀)) to denote how fast a function would grow. Expressions like these and some other concepts from complexity theory will crop up here and there throughout the book, so we will give a quick primer in this section. We will keep this mostly informal, as we won’t be needing any really formal complexity-theoretic definitions in this book.

2.5.1 Asymptotic growth

Let’s give a formal definition of what it means for something to be bounded by some polynomial.

Definition 2.5.1. We say a function f : is of order O(nk) for some k when there is some constant c such that f(n) cnk for all n .

The notation O(nk) is known as big O notation. When a function is O(nk) it means that it grows at most as quickly as the function nk. Note that an equivalent definition is that the quotient f(n)nk stays bounded as n increases. For a function, its order only depends on the factors in the function that grow the quickest. For instance, if we had f(n) = 3n5 + 400n2, then f is O(n5), as for large values of n, the n5 term contributes more and more to the value of f. Note that if a function is O(nk) that it is also O(nk ) for any k > k.

Exercise 2.16. Prove that the function f(n) = 3n5 + 400n2 is order O(n5) but not O(n4).

The reason we care about the order of functions is that it allows us to say which functions end up growing the quickest in ‘the asymptotic limit’, where the input to the function is (very) large. Suppose for instance that f and g are two functions that measure the runtime of two algorithms for solving the same problem and that g is O(n2) while f is O(n3) (but not O(n2)). Then we know that there will be some (possibly very large) N such that for any n > N we have f(n) > g(n), and furthermore, that as we increase n that the value of f will speed away from that of g. So because the order of g is lower than that of f we know that eventually it will be better to run the algorithm corresponding to g. But note that we could have for instance f(n) = n3 while g(n) = 1000n2 so that the switchover would only come for n > 1000. In the definition above we only talked about O(nk), but we could really replace nk with any monotonically increasing function. For instance, we could say a function f is order O(2nlog(n)) meaning that f is bounded by some constant multiple of the function 2nlog(n). Additionally, we could consider a function of multiple variables, like f(n,k), and say for instance that it is O(k2n), meaning that f grows slowly when you increase k, but really fast when you increase n. In this book we will often say something is ‘efficient’ to calculate. What we mean by this is that if we let f(n) be an upper bound on the cost of calculating an instance of the problem of size n, that this f is order O(nk) for some k. In words: there is some polynomial that upper-bounds f. What do we mean here by ‘size’? This depends on the thing we want to calculate! Generally, there will be some set of size parameters that determine how much data we need to specify the problem instance that we want to calculate something about. For instance, if we wished to calculate something about a quantum circuit, then the relevant size parameters might be the number of qubits and the number of gates in the circuit. When we say something is efficient to calculate, it should be taken to mean ‘efficient in all relevant parameters’ unless we say otherwise. For an example of something that is efficient in one parameter, but not in another we could look at calculating the matrix of a quantum circuit. This scales linearly with the number of gates, but exponentially with the number of qubits, and so we might say this problem is ‘efficient in the number of gates’. An important note must be made here, and that is that ‘efficient’ should be taken with a grain of salt. We say something is efficiently calculable when the calculation time is bounded by some polynomial. But what if this polynomial is n100? In that case a computer would quickly need to run for thousands of years for even small values of n. The reason computer scientists have decided to make the benchmark of ‘efficient’ be ‘polynomial’, is because it is the smallest useful class of orders that is closed under composition: if we have an algorithm taking polynomial time that is used as a subroutine in another program which is called a polynomial number of times, then the resulting algorithm still takes polynomial time. With this definition we can hence safely combine efficient programs and be ensured that the result is still ‘efficient’. Another caveat is that while it is technically possible for an algorithm to have a runtime that scales with n100, in practice, most polynomial time algorithms used in the real world have a complexity where the exponent is small, such as O(n3). Ultimately, the only real way to know whether an algorithm is efficient is to implement it on a computer and benchmark it on the problem instances you care about.

2.5.2 Classical complexity classes

We can classify problems in terms of how hard they are to solve in the asymptotic case. We call these classes complexity classes. We have for instance the complexity class P that contains all the problems that can be solved deterministically in polynomial time. We have to be a bit careful here by what we mean by ‘problem’. P is a class of decision problems. These are problems where the answer is either yes or no, true or false, 0 or 1. An example problem that is in P is determining whether a given input bit string returns 1 when we apply a given Boolean function to it. We often care about problems where the answer is not just a yes or no answer, but where we have a more complicated type of output, like an integer, a matrix, or a quantum circuit. We call such problems function problems, and the complexity class of function problems that can be solved deterministically in polynomial time is called FP. For instance, adding or multiplying two numbers is in FP, and so is Gaussian eliminating a matrix. We consider the complexity classes P and FP as containing the problems that are efficiently solvable. Probably the most (in)famous complexity class is NP, standing for Non-deterministic Polynomial time. NP contains the problems you could efficiently solve if you had a computer where every time you had to make a decision, you could actually run both branches in parallel, and as long as one of the branches returns ‘yes’, the whole computation returns ‘yes’. This definition is a bit mind-bending to think about, but luckily there is an easier way to think about NP: it contains the problems where we can verify the solutions in polynomial time. Suppose for instance I am given a half-finished sudoku, and I’m asked whether this sudoku can actually be finished. The only way I might be able to do that is just trying to finish the sudoku and see if I got stuck. This can be quite inefficient (especially if you are bad at sudoku). If someone gives you a completion of the sudoku however, it is easy to check that it was filled in correctly, and hence that the sudoku is indeed solvable. A canonical problem that is in NP is determining whether a given Boolean formula f : 𝔽2n 𝔽2 can be satisfied, that is whether there exists an x 𝔽2n such that f(x) = 1. If we are given an x such that f(x) = 1, then we can easily check this is the case by just calculating what f does on x. This problem of Boolean satisfiability, or SAT for short, is in fact an NP-complete problem. That means it is among the ‘hardest’ problems in NP. More specifically, if we have some algorithm for solving SAT, then we can use this to solve any other problem in NP with at most polynomial overhead. Arguably the most famous problem in computer science is whether PNP. That is, whether there are any problem for which we can efficiently check the solution, but we cannot efficiently find the solution. Most researchers believe this is the case, but through decades of effort we still do not know the answer. In the complexity theory literature you will find many results that are conditional on PNP (or on similar claims about complexity classes not being equal to each other): “If the problem X is easy to solve, then P = NP, hence we think that X is not easy to solve.” In this book we will also sometimes make claims like these, when we want to argue that we don’t expect to be able to solve some problem efficiently.

2.5.3 BQP

The classes P and NP are classical, in the sense that they describe problems that a classical computer can (not) efficiently solve. In this book we will obviously also care about which problems a quantum computer can (not) efficiently solve. To define the relevant complexity class, we will first define another classical complexity class. The class BPP contains the decision problems that can be solved with Bounded error, Probabilistically, in Polynomial time. That is, we are allowed to solve the problem using an algorithm that makes random probabilistic choices, and that is allowed to make some errors, as long as it gives the right answer in the (vast) majority of cases. We have P BPP, but we don’t know whether they are equal. The class of problems efficiently solvable by a quantum computer is defined similarly to BPP. The class BQP contains the decision problems that can be solved with Bounded error, using a Quantum computer, in Polynomial time. Formally, we would construct a particular quantum circuit based on the particular instance of the problem (this translation into a quantum circuit would be given by some classical algorithm running in polynomial time), and then measure a single qubit. Our answer is then whether this qubit gives the outcome 0 or 1. In practice we probably want to post-process the output of a quantum computer, or run the computation many time to gather more statistics, but formally we can make all these steps also ‘quantum’, so that there is just one big quantum computation taking care of everything, hence why BQP is defined this way. Note that we haven’t specified what gate set the quantum computer is using. Because all approximately universal gate sets are essentially equivalent (Section 2.3.5), this is fine, and the complexity class is the same regardless of which gate set is being used. We have of course P BPP BQP. It is not known whether NP BQP or BQP NP or if neither is the case. It is widely believed that NPBQP, and hence that NP-complete problems like SAT should not have an exponential speed-up on a quantum computer, but at most a polynomial speed-up. The final complexity class we will consider is a quantum analogue to NP. Quantum Merlin-Arthur (we will let the reader speculate where this name comes from), or QMA for short, is the class of decision problems that a quantum computer can verify with high probability. A problem that is QMA-complete is for instance determining whether a given quantum circuit is equal to the identity, or is far away from being the identity, promised that one of these is the case: given a circuit not equal to the identity, there will be a state that is mapped to something quite different, and hence when giving the verifying quantum computer this state, it can run the circuit and see that it is indeed mapping this state to something else by doing some appropriate measurement.

2.6 Summary: What to remember

1..

A tensor is a generalisation of a matrix that can have any number of inputs and outputs.

2..

A tensor network is a linear map built out of tensors by tensor product and contraction.

3..

A string diagram is a graphical representation of a tensor network where the tensors are boxes, and wires connecting boxes denote contractions.

4..

Deforming a string diagram by moving tensors around or bending wires preserves the meaning the map.

5..

We can summarise the basics of quantum theory with the SCUM model:

6..

The quantum circuit model represents a quantum computation by starting with a simple state, and then applying a sequence of gates, unitaries usually acting on a small number of qubits, before finally measuring the qubits in a fixed basis.

7..

The state-space of a qubit can be modelled by the Bloch sphere. Single-qubit unitaries then correspond to rotations of this sphere. Using Euler angles any unitary can be decomposed into rotations along the Z- and X-axis.

8..

Other useful quantum gates include the CNOT, CZ, Hadamard, and Toffoli.

2.7 Advanced Material*

2.7.1 Quantum mixed states and channels*

This first “Advanced Material” section isn’t really that advanced, but it is optional, since we will focus on pure quantum states and operations for almost the entirety of this book. However, for the sake of completeness, we will briefly explain how to model more general states and processes, as this can often be more convenient for certain calculations. Pure quantum states and processes are the correct mathematical objects for representing a quantum system in isolation evolving according to a process that is completely known. Of course, this never happens in the real world. The way we model all physical processes is necessarily a simplification, where we choose to disregard parts of the system which (we hope!) won’t have too big of an effect on the outcome. When we do this, we limit the knowledge and degree of control we have over a quantum state. We can model these limitations in a rigorous mathematical way using quantum mixed states and channels. A quantum mixed state is a generalisation of the pure states we saw before, and can be used to model one of two seemingly different scenarios which actually turn out to be mathematically identical in quantum theory. A mixed state can capture:

1..

an unknown member of a fixed set of pure states, which are drawn from a classical probability distribution, or

2..

part of a quantum pure state on two systems where we don’t know (or care) what happens to the state on one of the systems.

Situation 1 above is sometimes called an ensemble, whereas situation 2 is called the reduced state. From a practical point of view, the only utility of a quantum state is to provide just enough data to compute Born rule probabilities of measurements. To do that in either of these situations, we just need the following data.

Definition 2.7.1. A quantum mixed state on a Hilbert space H is a positive operator ρ : H H such that Tr(ρ) = 1.

We say that mixed states generalise pure states because, for every pure state |ψ, the associated ket-bra |ψψ| is a positive semi-definite operator where Tr(|ψψ|) = ψ|ψ = 1. A nice feature of this representation is that the need to consider states “up to” a global phase is now handled automatically:

e|ψ(e|ψ)(eψ|) = ee|ψψ| = |ψψ| |ψ

A recurring theme with mixed states is that things we said with inner products for pure states can now be stated in terms of the trace of a linear map. Normally, if we specialise to pure states, we can use the cyclicity of the trace (i.e. that Tr(AB) = Tr(BA)) to recover our original pure state concept. For example, if we fix a measurement {M1,,Mm}, the Born rule for mixed states says that Prob(i|ρ) = Tr(Miρ). Specialising to pure states gives the Born rule from the previous section:

Prob(i||ψψ|) = Tr(Mi|ψψ|) = ψ|Mi|ψ

If we have situation 1 above of an ensemble of pure states, i.e. a collection of pure states {|ψ1,,|ψk} drawn from a probability distribution {p1,,pk}, we can represent this as:

ρ = jpj|ψjψj|
(2.24)

Applying the Born rule gives:

Prob(i|ρ) = Tr(Mi( jpj|ψjψj|)) = jpjψj|Mi|ψj = jpjProb(i||ψj)

In other words, it is the sum over all of the different ways we could have gotten outcome i, weighted by the appropriate probabilities. This is exactly how classical probability theory would tell us we should compute this quantity. For situation 2 of a reduced state, we can represent ρ as

ρ = j(I j|)|ΨΨ|(I |j)
(2.25)

We saw this operation, the partial trace, at the end of Section 2.1.6. Notably, it doesn’t depend on the choice of ONB {|j} and can be graphically depicted as:

Now, it is easy to check that, for any measurement on the first system, the Born rule for ρ gives:

Prob(i|ρ) = jProb(ij||ΨΨ|)

That is, we get the same thing as if we measured both systems of the pure state |ΨΨ|, and then marginalised over the second measurement outcome. Since Eq. (2.25) doesn’t depend on the choice of ONB {|j}, it actually doesn’t matter which measurement we did on the second system, so this is like not knowing or caring what happened to it. Notably, both (2.24) and (2.25) can represent a generic trace-1 positive operator ρ, so they are equivalent. Quantum mixed states are the most general representation of a quantum state in quantum theory. The most general representation of a process is a quantum channel. This is a linear map which completely preserves the set of quantum states. That is, a quantum channel Φ should have the property that if ρ is a quantum state in H, then Φ(ρ) should be a quantum state in H. However, this is not quite strong enough, because we need to take compound systems into account. Let L(H) be the vector space of linear operators from H to itself. Then Φ should be a linear map from L(H) to L(H). This is sometimes called a linear super-operator. We can define the tensor product of linear super-operators as follows:

(Φ Ψ)(eij ekl) := Φ(eij) Ψ(ekl)

where eij represents the matrix containing a single 1 in position (i,j) and zeros everywhere else. Since such matrices span the whole space of operators, this fully defines Φ Ψ by linearity. For a map Φ to be a quantum channel, it should not only preserve the set of quantum mixed states when it is applied to the whole state, but it should also preserve the set of states when it is applied to just one subsystem. We can say this in terms of the tensor product of Φ with the identity superoperator 1K(ρ) := ρ on an arbitrary Hilbert space K.

Definition 2.7.2. A linear super-operator Φ : L(H) L(H) is called a quantum channel if for any Hilbert space K and any trace-1 positive operator ρ L(H K), (Φ 1K)(ρ) is also positive and trace-1.

Just like in the case of states, pure maps can be interpreted as a special case of quantum channels. For any unitary U, we can form the unitary channel U^(ρ) := U. If we apply such a channel to a pure state, it will send the ket-bra of |ψ to the ket-bra of U|ψ, as expected:

U^(|ψψ|) = U|ψψ|U = U|ψ(U|ψ)

There is a lot that can, and has, been said about quantum channels. For this, we refer the interested reader to one of the textbooks in the References for this chapter. For the purposes of this book, we will conclude with one of the most useful tools for doing concrete calculations with channels.

Theorem 2.7.3 (Kraus Representation Theorem). A linear super-operator Φ : L(H) L(H) is a quantum channel if and only if there exists a set of linear operators {Bj : H K}i called the Kraus operators, such that Φ(ρ) = jBjρBj and jBjBj = I.

Clearly unitary channels are a special case, but this allows more general maps such as probabilistic mixtures of unitaries like:

Φ(ρ) = jpjUjρUj

and even more general things such as discarding our state all together and preparing a fixed pure state.

Exercise 2.17. Show that the super-operator Φ(ρ) := Tr(ρ)|00| is indeed a quantum channel by giving its Kraus operators. Show that furthermore it cannot be written as a probabilistic mixture of unitaries.

2.8 References and further reading

On the origin of quantum computing There is a rich history of connections between physics and computation in the early 20th century, typified by the works of [162] connecting principles of information, computation, and thermodynamics. Inspired by these lines of thought, Paul [26] showed that a Turing Machine could be accurately modelled by quantum mechanics, but stops short of using quantum processes to go beyond classical computation. At the same time, on the other side of the Iron Curtain, Yuri [168] proposed the first thing to look something like a genuine quantum computer in his book Computable and Uncomputable, in the form of a “quantum automaton” that can make use of superposition and entanglement for its computations. One year later, and seemingly independently, Richard [96] quipped that “Nature isn’t classical, dammit” and went on to lay out a proposal for how something like a quantum computer could be used to simulate quantum phenomena, a problem that seems to intractable on a classical computer.

Quantum Turing machines The first universal quantum computer, in the sense that we understand today, was described by David [84], using the formalism of quantum Turing machines. Quantum Turing machines are analogous to classical Turing machines, but with the “tape” replaced by Hilbert spaces and the typical read/write operations replaced by quantum operations. It is interesting to note that Deutsch’s original paper was motivated largely by answering foundational questions about the nature of quantum physics and computation. Deutsch, a lifelong proponent of Everett’s “Many worlds” interpretation of quantum theory, emphasises that the parallelism present in the quantum Turing machine is something that cannot be meaningfully captured by any classical description of computation, and claims in the abstract of his seminal paper: “The intuitive explanation of these properties places an intolerable strain on all interpretations of quantum theory other than Everett’s.”

Quantum circuits While quantum Turing machines give a universal model of quantum computation, their definition is somewhat unwieldy. Hence, a few years later they were largely superseded by another universal model of quantum computation, also due to [82], which is now called the quantum circuit model. Deutsch himself called quantum circuits “quantum networks”, and this may have been a more appropriate name. As we noted at the beginning of this chapter, the analogy between classical and quantum circuits is already somewhat strained. However, this name stuck, and it was the name used by [52] when he showed the polynomial-time equivalence of the quantum circuit model with quantum Turing machines.

Universal gate sets The notion of universal sets of gates for quantum circuits goes all the way back to the first papers in the field. [84] showed that arbitrary classical reversible computation plus a generating set of single-qubit unitaries is approximately universal for quantum computation. Since any classical reversible computation can be generated from the Toffoli gate, this implies that 3-qubit gates are universal for quantum circuits. This was made slightly stronger in [82], where Deutsch showed that a single 3-qubit gate and ancillae could be used to approximate arbitrary n-qubit unitaries. However, the question of universality with 2-qubit gates remained open until [86] showed that 2-qubit gates generate the entire family of n-qubit unitaries. That same year, the result was strengthened to show that a controlled-NOT gate and single-qubit unitaries suffice for universal computation [25]. What is now known as the Solovay-Kitaev theorem, that any approximately universal gate set can approximate any other arbitrarily well with just a poly-logarithmic overhead, was first announced by Solovay in 1995 and independently proved by Kitaev [152, Lemma 4.7]. A good reference for this result is [73].

Quantum algorithms Following the formalisation of the quantum circuit model, many influential quantum algorithms started to appear in the 1990s, such as the Deutsch-Jozsa algorithm [83], Shor’s factoring algorithm [209210], and Grover’s search algorithm [120]. The hidden subgroup algorithm, along with its encoding of Shor’s factoring algorithm and Simon’s problem, was given by [139]. Quantum algorithms is now a huge area, which we hardly touch on in this book, but there are many excellent resources out there. A standard starting point is Nielsen and Chuang’s canonical textbook Quantum computation and quantum information [188]. Some overviews and surveys of quantum algorithms include [185], [9], [182], and [68]. Finally, the Quantum Algorithms Zoo [(year?)] is an excellent online resource which has collected and categorised many quantum algorithms.

Quantum complexity theory The complexity class BQP was defined by Bernstein and Vazirani [(year?)] and used to give evidence, by means of an oracle separation proof, that bounded-error quantum computation is strictly more powerful (in a complexity-theoretic sense) from bounded-error probabilistic classical computation. The complexity class QMA was defined by Kitaev, apparently in a 1999 lecture at Hebrew University [6], in order to give quantum analogues of the complexity class NP and the famous Cook-Levin theorem for the NP-completeness of SAT.

Quantum programming There are a number of higher-level quantum programming languages. The first attempt at a practically usable Quantum Programming Language (QPL) that went beyond a theoretical framework like quantum Turing machines or quantum lambda calculus was the imperative language QCL originally presented in Ömer’s Master thesis in 2000 [189]. A more modern and more developed functional language is Quipper [117], which is an embedded language in Haskell. Microsoft released their Q# in 2017, which is based on C# [179]. The main challenge of a quantum programming language is to deal with the interactions between classical and quantum information, with quantum variables for instances not being able to be cloned or freely read out. Every high-level QPL needs to compile down into low-level instructions that can be more easily understood by physical quantum devices. A useful intermediary language for this is Open Quantum Assembly, known as OpenQASM [66], which at the time of writing sits at version 3.0 [65]. There are also a number of integrated programming environments and compilers that offer the user a combination of features related to programming high-level algorithms, transpile this down to lower level QASM, and compiling it further down into machine-specific instructions. Some examples include IBM’s Qiskit [133], Quantinuum’s TKET [195], Xanadu’s PennyLane [237], or the library written by the authors of this book, PyZX [146].

Learning quantum computing In addition to the books we mentioned in Section 1.1.1, there are many sets of lecture notes that provide a good all-around introduction to quantum computing. For a strong emphasis on algebraic and quantum information theoretic aspects, a good resource is the lecture notes of [233], which later became a textbook [234]. Another excellent set of lecture notes is that of John [193], which contains among many other things a very nice introduction to stabiliser theory and quantum error correction, which we’ll cover in Chapters 6 and 11, respectively.