Chapter 7
Universal circuits

We will now turn from studying Clifford circuits to universal ones. Whereas the Clifford circuits were generated from CNOT, H, and S gates, we can obtain an exactly universal set of gates by replacing the S := Z[π 2 ] gate with the family of Z[α] gates for arbitrary angles α. Perhaps more miraculously, we can obtain an approximately universal set of gates by replacing S with some fixed phase gate Z[𝜃] for any 𝜃 that isn’t a multiple of π2, the most popular choice being T := S = Z[π 4 ]. This tiny tweak takes us from the safe, efficient world of Clifford circuits to the wild and wonderful world of full-powered quantum computation. Given we can no longer efficiently compute the outputs of a universal quantum circuit with a classical computer, one might wonder how much we can still reason about these circuits. In this chapter we’ll see that the answer is, perhaps surprisingly, quite a lot! In this chapter we will see two different ways to think of universal circuits: path sums and Pauli exponentials. Path sums are an extension of the idea of phase polynomials. We already briefly encountered phase polynomials in Chapter 5 when discussing the AP normal form. In the same way that we could fully characterise a CNOT circuit as a parity matrix in Chapter 4, it turns out we can fully characterise a circuit consisting of CNOT gates and phase gates as a phase polynomial together with a parity matrix. However, this representation only deals with Hadamard-free circuits. To also work with Hadamards we need path sums. We can write the action of a Hadamard as |x y(1)xy|y. In a path sum we interpret this as introducing a new variable y on your qubit, and adding a term (1)xy to your phase polynomial. The final expression for your circuit then contains a sum y over all the different ‘paths’ a computational basis state can take trough your circuit. On the other hand, Pauli exponentials offer quite a different perspective on universal circuits. We have seen in the previous chapters that there is a whole lot to be said about the Pauli gates. They allow us to define stabiliser theory and Pauli projections, which lead to the idea of stabiliser states and Cliffords, which have a rich rewrite theory and will allow us to define quantum error correction in Chapter 11. But on the other hand the Clifford computation we have seen is efficiently classically simulable, so if we want to get our money’s worth with a real quantum computer, we had better do something more than just Cliffords. Luckily we don’t have to look far, we can still work with Paulis, but just in a slightly modified form. It turns out that for any Pauli P we can construct a family of unitaries out of it by considering the matrix exponential ei𝜃P for phases 𝜃 . The resulting Pauli exponentials form a universal gate set for quantum computing. Certain Pauli exponentials also correspond to the native gate set of several types of quantum computers, like ion traps and superconducting quantum computers. In fact, Pauli exponentials form arguably the native gate set to understand quantum computation. They have a natural relation to Clifford computation; the number of them in a circuit directly corresponds to the cost of classically simulating a quantum computation; Hamiltonian simulation can be directly understood in terms of Pauli exponentials; Pauli exponentials can be readily understood in certain quantum error correcting codes like the surface code; and finally, which is important for us, Pauli exponentials have a very natural representation in the ZX-calculus.

7.1 Path sums

7.1.1 Phase polynomials

First, let us recap what we know about phase polynomials. Lets begin with a neat trick that works for any circuit built only out of CNOT and Z-phase gates. We already learned back in Chapter 4 that CNOT circuits correspond to parity maps, i.e. linear maps which send basis states to other basis states that can be described as the parities of input variables. We also saw that a convenient way to compute the parity map is to label each input wire with a distinct variable and simply push those variables through the circuit:

(7.1)

That is, we start with only the input wires labelled and work from left to right. When a CNOT is encountered, we label the output of the control qubit with the same label as the input of the control qubit and label the output of the target qubit with the XOR of the labels on the two input qubits. In fact, this is the same as computing the overall parity matrix by doing one primitive row operation (i.e. CNOT gate) at a time. Indeed if we look at the action of the unitary U described in (4.1), it maps basis states to an XOR of basis states given by the expression on the output wires:

U :: |x1,x2,x3|x1,x2 x3,x3
(7.2)

We also saw in Chapter 4 that any two CNOT circuits computing the same parity function are equal. By re-synthesising a circuit according to its parity function, we can often find a much more efficient implementation. In the example above, we can implement the unitary U with just one CNOT gate:

(7.3)

Now, lets sprinkle some Z-phase gates into the circuit (4.1) above:

(7.4)

Note that we left the parity labels on the wires. Why are we justified in doing that? Z-phase gates are diagonal in the Z basis, so they preserve basis elements, up to a phase which depends on the basis element |x:

Z[α] :: |xex|x

We can now plug a single computational basis element |x1,x2,x3 into (7.4) and track its progress through the circuit. Applying the first two gates, we see the CNOT first updates the variables in the ket, then the Z-phase gate introduces a new phase, which depends on whether x2 x3 is 0 or 1:

|x1,x2,x3 |x1,x2 x3,x3 e(x2x3)|x 1,x2 x3,x3

Continuing this through the rest of the circuit, the overall expression we get for the unitary V described by circuit (7.4) is:

V :: |x1,x2,x3ei[(α+γ)(x2x3)+β(x1x2)+𝜃x2]|x 1,x2 x3,x3
(7.5)

More generally, any CNOT+phase circuit yields a unitary of the form:

U :: |xe(x)|Lx

for some 𝔽2 matrix L and function ϕ : 𝔽2n . We already met the parity matrix L, which describes U’s action on basis vectors as an 𝔽2-linear map. To this we add ϕ, which is called the phase polynomial. This describes the phase associated with each input as a polynomial over XOR’s of the input bits. Just as we could read the parity map off the labelled CNOT circuit (4.1), we can read both the parity map and the phase polynomial off of the labelled circuit (7.4). As before, we can read the parity map off the output wires. For the phase polynomial, we introduce a term of α (xj1 xjk) for each Z-phase gate Z[α] occurring on a wire labelled xj1 xjk. We can now come up with a new circuit that implements the same parity map and phase polynomial. We will give a reasonably efficient, general-purpose circuit synthesis algorithm for this in Section 7.2, but for now lets just focus on our example. We already saw that, by ignoring the phase gates, we could produce a very efficient circuit (4.3) to implement the parity map |x1,x2,x3|x1,x2 x3,x3 of the unitary V . In order to get the full unitary, we should inspect the phase polynomial of V :

ϕ(x1,x2,x3) = (α + γ) (x2 x3) + β (x1 x2) + 𝜃 x2

Going term-by-term, it looks like we need to place Z-phase gates on wires labelled with the expressions x2 x3, x1 x2, and x2. Inspecting the circuit in (4.3), we see that we already have wires labelled by x2 x3 and x2. Unfortunately, x1 x2 is missing. However, we can temporarily make this parity available by introducing a redundant pair of CNOT gates:

We can then recover the full circuit for V by placing Z-phase gates α + γ on any wire labelled x2 x3, β on x1 x2 and 𝜃 on x2:

(7.6)

By computing the phase polynomial and re-synthesising, we reduced the circuit (7.4), which had 6 CNOT gates and 4 Z-phase gates to (7.6), which has 3 CNOT gates and 3 Z-phase gates, giving a significant savings. Notably, if two phases occur at the same parity, even in totally different parts of the circuit, their angles add in the phase polynomial. Hence they can always be re-synthesised into the same phase gate. This phenomenon is called phase-folding, and is one of the major advantages of the phase polynomial approach. For example, two phase gates in different parts of circuit (7.4) contributed a single term (α + γ) (x2 x3) to the phase polynomial in (7.5). When we re-synthesised the circuit in (4.3), they became a single phase gate.

Exercise 7.1. Show that the phase polynomial representation also works for circuits made of CNOT, Z-phase, and X gates, using the fact that an X gate updates wire labels as follows:

More specifically, give an example circuit, compute its parity map and phase polynomial by wire-labelling, and re-synthesise a smaller circuit. Finally, show that further simplifications are possible: namely expressions of the form α y + β (y 1), where y is some XOR of input variables, can be simplified in the phase polynomial, up to global phases.

7.1.2 Phase gadgets

Phase polynomials give a very powerful, efficient way to manipulate CNOT+phase circuits. Some might even (blasphemously!) suggest that they are more effective than graphical techniques for many applications. However, it is nevertheless useful to see how these structures are reflected in the ZX-calculus. Along the way, we will meet certain little ZX diagrams called phase gadgets, which correspond exactly to the terms in a phase polynomial. We’ll meet phase gadgets a lot in the coming chapters, as they are really useful tools for lots of different applications, so it is worth spending some time now getting a handle on how and why they work. In this section, we’ll see how phase polynomials and in particular phase-folding show up graphically. Note that in some cases phase folding follows trivially from an application of spider fusion. For example, this circuit:

gives the following unitary:

U :: |x1,x2ei(α+β)(x1x2)|x 2,x1 x2

We see that the phases α and β combine in the phase polynomial. This can also be seen by commuting the α phase gate through the control wire of the last CNOT gate via spider fusion:

However, as we saw in the last section, phases can merge together even in completely different parts of the circuit. In this case, there is no obvious way to apply spider fusion to merge them. Hence, we’ll need to develop a generalisation of spider fusion, which allows distant phases in a CNOT+phase circuit to find each other. Let’s start with a slight variation on the example from the previous section:

(7.7)

All we’ve done is add one more CNOT gate at the end to make the overall parity map L trivial. This unitary is now diagonal in the computational basis. That is, if we wrote down its matrix, it would only have a bunch of phases down the diagonal, as described by the phase polynomial. Explicitly, the circuit above implements the following unitary:

U :: |x1,x2,x3ei[(α+γ)(x2x3)+β(x1x2)+𝜃x2]|x 1,x2,x3
(7.8)

This circuit has 4 unknown, possibly non-Clifford phases in it, so none of the strategies we’ve studied so far let us reduce the whole circuit to any kind of normal form. However, what we can do is unfuse the unknown phases:

Now, we can treat the 4 wires connecting to unknown phases as outputs of the diagram and apply the simplification strategy we used back in Chapter 4 for CNOT circuits to the rest of the (phase-free) ZX-diagram. This diagram where we treat the phases as additional outputs is an isometry, and so after simplifying, we’ll get a parity normal form which only has Z spiders adjacent to inputs and only has X spiders adjacent to outputs and our 4 phases. Here’s what this gives:

(7.9)

Here, the actual output wires have all ended up with a 2-legged X spider, i.e. an identity map. This corresponds to the fact that the parity map L is trivial. We will refer to this reduced form as the phase gadget form of a ZX diagram. These are so named after phase gadgets, which are little compound creatures consisting of a Z phase spider connected to a phase-free X spider with k additional legs:

(7.10)

We will often treat these two spiders together as a single node in our diagram. Seen as a map with k inputs and 0 outputs, a phase gadget takes a computational basis state to its XOR, then depending on whether the XOR is 0 or 1, this produces a scalar of either 1 or e. To see this, lets first think about a 1-legged Z spider as a map from 1 qubit to 0 qubits, i.e. it sends a single-qubit state to a number. On the computational basis, it acts as follows:

We can summarise this using a very simple phase polynomial:

Recall from Section 3.1.1 that X spiders act like XORs on computational basis states, up to a scalar factor:

Putting these two together, we see that the phase gadget itself does this:

In order to get a unitary map from a phase gadget, we can use a bunch of Z spiders to make a copy of the basis state. One copy is sent to the phase gadget and one copy is output. This results in the following diagonal unitary:

(7.11)

Note that we have suppressed a 2(k1) scalar factor. Since we’ll be working with unitaries in the rest of this section, we’ll just assume the overall scalar is chosen to make everything unitary. Returning to (7.9), we can see that this diagram can be seen as a composition of unitaries of the form of (7.11):

Each of these phase gadget unitaries contributes one term to the phase polynomial:

|x1,x2,x3 U1ei[α(x2x3)]|x 1,x2,x3 U2ei[α(x2x3)+β(x1x2)]|x 1,x2,x3 U3ei[α(x2x3)+β(x1x2)+γ(x2x3)]|x 1,x2,x3 U4ei[α(x2x3)+β(x1x2)+γ(x2x3)+𝜃x2]|x 1,x2,x3

But there’s some redundancy here. As we noted in the previous section, α and γ are applied to the same parity of input variables x2 x3. By inspecting the phase polynomial, we can see that the unitary can be more compactly represented by just 3 phase gadgets:

(7.12)

To see this graphically, we would need a ZX rule that lets us “fuse” the phases on phase gadgets connected to an indentical set of wires. We indeed have such a rule, which we call the gadget fusion rule:

(7.13)

It is in fact a derived rule, which follows from the usual spider fusion rule and strong complementarity:

Since the α and γ phase gadgets connect to the exact same spiders in (7.9), we can apply (gf ) directly to this diagram to simplify it:

Now, by unfusing spiders, we can see exactly the decomposition into phase gadget unitaries from (7.12). We can also directly read the phase polynomial from the diagram:

We computed the phase gadget form (7.9) by appealing to the simplification strategy for CNOT circuits from Chapter 4. However, there is a more direct way we can translate CNOT+phase circuits into circuits of phase gadgets, by studying the way that phase gadgets commute with CNOT gates. There are a few cases to think about here. First, is the trivial case where a phase gadget and CNOT gate share no qubits. Obviously these commute. Only slightly harder to see is if a phase gadget appears on the control qubit (i.e. the Z spider) of a CNOT gate. Then, commutation follows from spider fusion:

(7.14)

If the phase gadget overlaps with the target qubit (i.e. the X spider) of a CNOT gate, we can still push a phase gadget through, but it picks up an extra leg:

(7.15)

Just by reading the equation above in reverse, we can also see what happens when a phase gadget overlaps with both qubits of the CNOT gate. It loses a leg:

(7.16)

Exercise 7.2. Prove equations (7.15) and (7.16). Use the three “phase-gadget walking” equations (7.14), (7.15), and (7.16), as well as the fact that Z-phase gates are equivalent to 1-legged phase gadgets, to show that an n-legged phase gadget has the following decomposition:

(7.17)

7.1.3 Universal circuits with path sums

In the previous section, we saw that CNOT+phase circuits can be expressed compactly as phase polynomials, or equivalently using ZX diagrams with phase gadgets. Both of these gates send Z basis states to Z basis states (up to a phase), so they can’t possibly be universal. In order to get a universal family of circuits, we need to include a gate which can produce superpositions of Z basis states, like the Hadamard gate. Unfortunately, Hadamard gates break our nice phase polynomial representation. However, we can salvage things somewhat if we are willing to introduce some extra variables that don’t appear in the input state. Let’s start by re-visiting the Euler decomposition of a Hadamard gate.

Back in Section 3.2.5, we saw that there are in fact equivalent ways to write the Hadamard. One way is to unfuse the middle π2 from the X spider and change its colour:

This gives us a phase gadget connecting the input and the output spiders. However, unlike the phase gadgets we’ve met so far, this one is not appearing as part of a diagonal unitary gate like the one depicted in (7.11). That is, it doesn’t seem to be connected to different qubit wires, which can then be labelled by the variables in our phase polynomials. We can fix this by unfusing some spiders:

The result is a representation of the Hadamard gate where we first prepare an ancilla in the |+⟩ state, then do a diagonal 2-qubit unitary, and then post-select the input qubit onto ⟨+|. We already know how to write the middle part in terms of a phase polynomial:

Note that we didn’t supress the 1 2 factor coming from the phase gadget (as we did before), so technically this is only unitary up to a scalar. This will help us get the Hadamard gate on-the-nose. The Z-spider plugged into the input means we should sum over y:

whereas the Z-spider plugging into the output means that we should send the computational basis state |x to 1 (i.e. delete it), regardless of whether x = 0 or x = 1:

Putting these pieces together, we see that we can write the action of a Hadamard gate as something much like the phase polynomial form from before, but now with an extra variable y getting summed over:

(7.18)

The extra variable y is called a path variable and the overall expression is called a sum-over-paths or path sum representation of the unitary. The expression we got in (7.18) may not look much like the definition of a Hadamard gate as we know it, so let’s check that this indeed gives us the right thing. First, recall that a Hadamard acts like this on computational basis states:

We can summarise this as something that looks like a path sum as follows:

(7.19)

That is, we pick up a 1 = e coefficient when the input variable x = 1 AND the summed variable y = 1. However, we don’t (yet) know how to introduce the AND of two variables into a phase polynomial. We only know how to introduce XORs using phase gadgets. Thankfully, for pseudo-boolean functions, i.e. functions from bitstrings to the real numbers, these two logical operations are related by the following equation:

xy = 1 2 x + 1 2 y + 1 2 (x y)
(7.20)

This equation is easy to check just by trying all of values of x,y {0,1}. There is a deeper reason why this (and many related) equations hold, related to the Fourier transform of a pseudo-boolean function. We’ll see more about that in Chapter 9. For now, we can see that this is exactly what we need to show that the two expressions for the Hadamard gate, equations (7.18) and (7.19), agree. This trick of replacing a Hadamard gate with phase gadgets will work for any number of Hadamard gates in a circuit. Hence, for any circuit written in the Clifford+phase gate set, we can transform it into a phase polynomial circuit, at the cost of introducing some ancillae and post-selections:

(7.21)

The procedure for computing the path sum expression off of a circuit with CNOT, phase, and Hadamard gates can therefore be derived from the simpler one used for CNOT+phase circuits. We start labelling wires with parities as before, but as soon as a Hadamard gate is encountered, we replace the label with a new, fresh path variable. We then add 3 new terms to the phase polynomial and sum over the path variable.

Example 7.1.1. Doing this procedure with the circuit of Eq. (7.21) we see that we get the path sum

|x,y,z h1,h2ei(αx+βh1+γh1z+ϕH(xy,h1)+ϕH(h1,h2))|x,h 2,h1 z

where ϕH(a,b) = π 2 (a + b a b) is the phase polynomial that is added by a Hadamard.

Remark 7.1.2. The physicists among you might be thinking: “hmm, a sum over paths you say, that sounds familiar.” And indeed they would be right. The path sums we are using here are exactly a discretised version of the Feynman path integrals that feature so prominently in quantum mechanics. Just like how we can view a quantum particle as evolving over a superposition of paths that interfere together, so we can view a quantum computation as evolving a computational basis state |x through a superposition of paths towards the final outcome.

Exercise 7.3. We know of course that two Hadamards in a row form the identity.

a)

Write down the path sum of a single-qubit circuit containing two Hadamards.

b)

Figure out how you could prove from this expression that this is equal to the identity.

7.2 Circuit synthesis and path sums

In the previous section we’ve seen how to convert a circuit into a phase polynomial or path sum expression. Now we will see how we can go back and get a circuit back out again. First of all, why would we want to do this? The obvious first application for this is circuit optimisation. Namely, we can start with a circuit, compute its path sum expression, then re-synthesise a (hopefully smaller!) circuit that implements the same unitary. We have already met this simplify-and-extract methodology with CNOT circuits in Section 4.2, and we’ll meet it again a couple more times in this book. We saw an instance of this already in passing from circuit (7.4) to the smaller circuit (7.6). This was a simple example, so we were able to find a smaller circuit implementing the same phase polynomial in an ad hoc way before. In Section 7.2.1 we will explore some algorithms for this which work for any phase polynomial. For general path sum expressions there is no known efficient algorithm for turning it back in a circuit. We will however in Section 7.2.2 consider a specific example where we can make it work. This actually brings us to the second application for constructing circuits from these expressions: it allows you to build new circuits from a higher-level description of the circuit behaviour. In Chapter 9, we’ll look at this in more detail and see how, thanks to a bit of boolean Fourier theory, we can build circuits in the Clifford+phase gate set implementing arbitrary classical functions on quantum data using phase polynomials and the circuit synthesis techniques from this section.

7.2.1 Synthesis from phase polynomials

Just like how we were able to commandeer the phase-free simplification algorithm from Chapter 4 to simplify CNOT+phase circuits, we can (essentially) repurpose the phase-free extraction algorithm for the parity form of a ZX-diagram to extract CNOT+phase circuits from the phase gadget form. Recall that a ZX-diagram in parity form looks like a row of Z-spiders connected to a row of X-spiders, e.g.

From this form, we can associate a biadjacency matrix:

( 1 1 0 1 0 1 0 0 1 )

and performing Gauss-Jordan reduction of this matrix gives us a CNOT decomposition, where each CNOT operations corresponds to a primitive column operation:

Now, suppose we generalise from the parity form to the phase gadget form, i.e. we add some phase gadgets connected to the input Z-spiders:

(7.22)

This describes a unitary with phase polynomial expression:

U :: |x1x2x3e(x1x2)+β(x2x3)|x 1 x2,x1 x3,x3
(7.23)

The first thing we realise from inspecting (7.22) is that most of the diagram is already in parity form, except for the two phases themselves:

Writing down the biadjacency matrix, we see that we now get a tall, skinny matrix, with some extra rows at the beginning corresponding to phase gadgets:

(7.24)

Just like before, precomposing with CNOTs performs primitive column operations. For example, this:

corresponds to this equation of ZX-diagrams:

But note that the α phase is now attached to a 1-legged phase gadget, i.e. an identity X-spider. So, we can use spider fusion to pull it out:

The reason we could do that, is because our primitive column operation turned one of the rows above the line in matrix (7.24) into a unit vector. Since this means that phase gadget is now 1-legged, we can always pull it out to become a Z phase gate. The remaining bit of the diagram in phase gadget form then has a biadjancency matrix obtained by deleting the first row:

We can now create a unit vector in the first row by adding the second column to the third column:

Again, the phase gadget turns into a simple Z phase gate, which we can pull out to the left:

Now that we have no phase gadgets left, there is nothing above the line in our biadjacency matrix, so we finish off by doing normal Gaussian elimination:

This then yields as a final circuit:

We can check our work by labelling wires with parities:

Comparing this labelled circuit to the phase polynomial expression in equation (7.23), we see that we have indeed synthesised a correct circuit for this unitary. Hence, the general algorithm for synthesising a circuit from a phase polynomial expression for a unitary U is the following:

1..

Form a block diagonal matrix over 𝔽2 whose columns are labelled by input variables x1,,xn, where:

2..

Perform primitive column operations (i.e. CNOT gates) until one of the first k rows is reduced to a unit vector with a 1 in column j

3..

Apply a phase gate to the j-th qubit and delete the unit vector.

4..

Repeat from step 2 until the top block of the parity matrix is empty.

5..

Synthesise a CNOT circuit corresponding to the remaining n rows.

Note that step 2 will always succeed: we can reduce any non-zero row to a unit vector using primitive column operations, and since steps 2-4 always remove a row, the algorithm will terminate. However, just as we saw in the CNOT case, the size of the resulting circuit is very sensitive to which column operations are performed and in which order. In fact, here we have two kinds of decisions to make: which rows to eliminate first, and which column operations should be used to eliminate a given row? The answer to these questions will depend on what our goals are. Namely, do we want to choose operations to minimise the depth of the circuit or just the CNOT count? Or maybe it could be beneficial on our architecture to do as many phase gates in parallel as possible. For the remainder of this section, we will discuss some heuristics for achieving these goals. It may also be the case that we cannot apply CNOT gates between arbitrary pairs of qubits, but only nearest neighbours in our architecture. We discuss how to deal with this in the References of this chapter.

7.2.2 Quantum Fourier transform

While it is possible to efficiently extract a circuit from a phase polynomial description, doing the same from a path-sum description is still an open problem. For certain special cases we can try to make it work just by using raw brainpower. In this section we will find a circuit for the Quantum Fourier transform based on its description as a path sum. Recall that this unitary is for instance the magic trick that makes Shor’s algorithm work. In general, for a d-dimensional space d we define its Fourier transform as follows. Let |0,,|d 1 be the standard basis states. Then

QFT :: |x 1 d y=0d1e2πi d xy|y.

For instance, when d = 2 we get |x 1 2 yeiπxy|y, which is exactly the Hadamard. We will be interested in the case where d = 2n, so that the space corresponds to n qubits. We should read the expression xy then as multiplying the numbers x,y {0,,2n 1}, not as taking the inner product of two bit strings. But since we do like thinking about bit strings, let’s define a translation. For a bit string x 𝔽2n, define b(x) := 2n1x1 + 2n2x2 + + 20xn. Then we can write the n-qubit QFT as follows:

|x 1 2n y𝔽2ne2πi 2n b(x)b(y)|y.
(7.25)

Exercise 7.4. In this exercise we will use the path-sum formalism to find a circuit implementing the n-qubit QFT.

1..

To compile the QFT unitary to a quantum circuit, we will need to know first how to compile controlled phase gates. These are unitaries that act like |x1,x2ex1x2|x1,x2. Using phase polynomials and the identity x y = 1 2(x + y x y) show that the following circuit implements a controlled phase gate:

(7.26)
2..

For n = 2, expand the definition of b in Eq. (7.25) to write the phase polynomial as a product of exkyl terms. Argue that the x1y1 term can be removed without changing the resulting map.

3..

Using the path-sum approach, show that the following circuit implements the 2-qubit QFT, by verifying that it is equal to the expression you derived above:

(7.27)

Note that you might need to ‘rename’ the variables y you are getting out of an application of a Hadamard gate to make sure it matches the expression you had above.

4..

Now for n = 3, again expand the definition of b in Eq. (7.25) to write the phase polynomial as a product of exkyl terms and remove the terms that you can remove. Construct a circuit to implement the 3-qubit QFT.

5..

Give a recipe for constructing the n-qubit QFT for any n.

7.3 Pauli exponentials

Now we will switch to a very different way of looking at universal quantum circuits. Not as a series of logic gates that create branches of computational basis states and evolve these with phase polynomials, but as a series of applications of rotations around Pauli gates.

7.3.1 Unitaries from Pauli boxes

In Chapter 6, we introduced Pauli boxes, which gave us a way to wrap up the pair of projectors coming from a Pauli measurement into a single object. We also saw in Exercise 6.8 that these are special case of measure boxes, which can represent an arbitrary (2-outcome) measurement using a single map satisfying the following properties:

These three conditions are equivalent to the condition that the following forms a measurement:

So, we get a measurement when plugging in X spiders into the control wire. Here’s a (seemingly) random question: what happens if we plug in Z spiders? To find out, lets see what happens when we form a linear map L as follows:

Thanks to the third property of measure boxes, we have:

Using the other two properties:

and similarly, LL = I. So L is a unitary!

Proposition 7.3.1. For any self-adjoint Pauli string P, the following map:

(7.28)

is a unitary.

Proof. Follows immediately from the fact that Pauli boxes are measurement boxes, as shown in Exercise 6.8. □

Let’s expand out what is in the Pauli box to see what these unitaries from Proposition 7.3.1 actually look like. Recall from Definition 6.2.3 that

where

Hence, for the special case of P = Z Z we get:

(7.29)

We get phase gadgets! For a general unitary built from Pauli boxes, we can adapt Eq. (6.8): for any Pauli string P = P1 Pn with Pi {X,Y,Z}, we have:

(7.30)

The general case can be obtained by additionally representing any Pi = I by simply not connecting to the i-th qubit. The unitaries built from Pauli boxes in this way hence generalise phase gadgets: they are just phase gadgets surrounded by some particular single-qubit Cliffords. We will call such unitaries Pauli exponentials. As the name suggests, we can view these unitaries as special cases of matrix exponentials, which we’ll introduce in the next section.

7.3.2 Matrix exponentials

We first mentioned matrix exponentials back in Section 2.2.3 as a means of constructing solutions to the Schrödinger equation, but avoided giving a formal definition. We’ll do that now. At this point, we should be pretty used to raising e to the power of a complex number z. For instance, when z = is imaginary, this is how to obtain phases. If we crack open a textbook on complex analysis, we’ll see that ez for any complex number is defined as the following power series:

ez := k=0zk k!

Swapping out the complex number z for a square matrix A gives us a definition for the matrix exponential:

eA := k=0Ak k! .
(7.31)

Here the powers Ak are just multiplying A with itself k times, and in particular A0 = I is the identity matrix. There are a few things we should know about matrix exponentials. The first is that if we are working with a diagonal matrix, then its matrix exponential is just the exponential of each of its components:

ediag(a1,,an) = diag(ea1 ,,ean ).
(7.32)

This works because, when we raise a diagonal matrix to a power k, it just goes inside:

ediag(a1,,an) = k=0diag(a1,,an)k k! = k=0diag(a1k,,ank) k! = diag ( ka1k k! ,, kank k! ) = diag(ea1 ,,ean )

The second thing we need to know is that conjugation by a unitary commutes with exponentiation:

eUAU = k=0(UAU)k k! = k=0UAkU k! = UeAU.
(7.33)

With these facts we can easily compute the matrix exponential of any diagonalisable matrix. Let A be a matrix with diagonalisation A = jλj|ϕjϕj| for some ONB {|ϕj}. Write U = j|ϕjj| for the unitary that maps the computational basis {|j} into the eigenbasis {|ϕj} of A, and write D = jλj|jj| = diag(λ1,,λn) for the diagonal matrix of eigenvalues of A. Then we have A = UDU. Hence, we calculate eA = eUDU = UeDU. As eD is the exponential of a diagonal matrix, it is easily calculated and so we get an easy expression for eA, simply by multiplying out the matrices of its diagonalisation. There is one final property of the matrix exponential we will be interested in. Remember that for regular complex numbers z1 and z2 we have ez1+z2 = ez1ez2. The same holds for the matrix exponential as long as the matrices commute. That is, if we have matrices A and B that commute, then eA+B = eAeB. When A and B allow diagonalisations, this is easily proven, as we can then find a joint diagonalisation, and reduce the problem to diagonal matrices.

Proposition 7.3.2. Let A and B be commuting square matrices. Then eA+B = eAeB = eBeA. In particular I = e0 = eAA = eAeA so that (eA)1 = eA for any matrix A.

When A and B do not commute, we generally have eA+BeAeB. In fact, studying ways to approximate the value of eA+B using just eA and eB is incredibly important for simulating quantum mechanical systems using a quantum computer. We’ll study this problem in more detail in Section 7.5.

7.3.3 Building unitaries as exponentials

With this mathematical machinery of matrix exponentials in hand, let’s look at how it can help us understand unitary maps. Suppose we have some n-dimensional unitary U. As discussed in Section 2.1.3, a unitary is a normal linear map and hence can be diagonalised: U = jλj|ϕjϕj|. Using the description above we can then also write this as U = V DV where V is some unitary and D is some diagonal matrix consisting of the eigenvalues of U. Now remember that all the eigenvalues of a unitary are phases so that λj = eiαj for some phases αj. This means that D is a matrix of exponentials. In the previous section we saw that the exponential of a diagonal matrix is a matrix of exponentials. The converse is however also true. We can hence write D = eiD where D = diag(α1,,αn). We have here taken out the factor of i out of the matrix for reasons that will become clear soon enough. So now we have U = V eiD V . We can take this conjugation by V inside of the matrix exponential to get U = eiH where H = V DV . Now the eigenvalues of H are precisely the values in D, which are real numbers. As we saw in Exercise 2.4 a matrix whose eigenvalues are real numbers is self-adjoint. We hence have proven the following result.

Proposition 7.3.3. Any unitary U is the complex exponential of some self-adjoint matrix H via U = eiH.

Note that H is not unique: we can add 2πI to it and still get the same result: ei(H+2πI) = eiHei2πI = eiH. Here in the last step we used e2 = 1. In fact, adding any multiple of the identity to H just changes the global phase of U, and so we don’t care about it. The converse of Proposition 7.3.3 is also true: if we take some self-adjoint matrix H, then eiH is unitary (This follows from Exercise 2.4, as the eigenvalues of eiH will all be complex phases). This gives us a way to construct unitaries. In fact, given a self-adjoint H, it will be useful to consider the entire family of unitaries constructed by rescaling H: for any t we have that eitH is unitary. We use the letter t here, as this gives the elapsed time in the Schrödinger equation.

Example 7.3.4. Let Z = diag (1,1) be the standard Pauli Z. Then for any t we calculate eitZ = diag (eit,eit) = eit diag (1,e2it). So we see that

eitZ Z(2t).
(7.34)

In particular, the S and T gates can be written (up to global phase) as S eiπ 4 Z and T eiπ 8 Z. It is for this reason that in some works the T gate is also called the π 8 phase gate.

Example 7.3.5. For the Pauli X gate we can easily calculate its matrix exponential by realising that X = HZH (of course H = H, but it will be useful to write it like this) and using Eq. (7.33): eitX = eitHZH = HeitZH HZ(2t)H = X(2t), where we used Eq. (7.34) to convert the Z exponential into a Z phase gate. Hence, the exponentiated X matrices correspond to X phase gates. It is interesting however to see how you would calculate this directly using the definition of the matrix exponential (7.31). The trick is to observe that X2 = I, which allows us to split the infinite sum into two parts:

Here in the last step we used the standard Taylor series equality for sin and cos .

In this direct calculation of the matrix exponential of the Pauli X we actually only used that X2 = I, and nothing else, so this calculation works for any matrix with this property. As this is actually quite useful, let’s record this for later.

Proposition 7.3.6. Let M be any matrix for which M2 = I. Then eitM = cos (t)I + isin (t)M.

7.3.4 Pauli exponentials

Now that we can understand any unitary as a matrix exponential, let’s zoom in a bit and take a look at Pauli exponentials. The Examples 7.3.4 and 7.31 show that we can view the Z and X phase gates as exponentials of Pauli matrices. It turns out that this holds for essentially all unitaries if we do the same with multi-qubit Pauli strings. A Pauli string P is just a tensor product of single-qubit Paulis P = eP1 Pn where Pi {I,X,Y,Z} and α is some global phase. Since we want to take exponentials of these to construct unitaries, we want P to be self-adjoint, which requires e {1,1}.

Definition 7.3.7. A Pauli exponential is any unitary of the form:

e±i𝜃P1Pn

where the Pj {I,X,Y,Z} are Paulis and 𝜃 is a phase.

Here we write 𝜃 instead of t, because we will be thinking of this as an angle in a rotation, instead of an elapsed time. Using Proposition 7.3.6 we calculate:

e±i𝜃P1Pn = cos (𝜃)I I ± isin (𝜃)P1 Pn.
(7.35)

Using this, we see that an identity in a Pauli string makes the Pauli exponential act as an identity on that qubit:

e±i𝜃IP = cos (𝜃)I I ±isin (𝜃)I P = I (cos (𝜃)I ±isin (𝜃)P) = I ei𝜃P .
(7.36)

We can use this to calculate a circuit that implements any Pauli exponential using gates we have already encountered. Before we do that in full generality it will be useful to consider a specific example.

Example 7.3.8. Let’s calculate what unitary ei𝜃ZZ is. First, let’s do it directly. Note that Z Z = diag (1,1,1,1). Then using Proposition 7.3.6 we get

ei𝜃ZZ = cos (𝜃)I I + isin (𝜃)Z Z = cos (𝜃)diag (1,1,1,1) + isin (𝜃)diag (1,1,1,1) = diag (ei𝜃,ei𝜃,ei𝜃,ei𝜃).

Here in the last step we used e±i𝜃 = cos (𝜃) ± isin (𝜃). Now, let’s calculate this in a more systematic way. Note that Z Z = CNOT(I Z)CNOT (see Section 6.1.1). Hence:

e±i𝜃ZZ = e±i𝜃 CNOT(IZ)CNOT =(7.33)CNOT e±i𝜃IZCNOT =(7.36)CNOT(I e±i𝜃Z)CNOT(7.34)CNOT(I Z(2𝜃))CNOT = diag (1,e2𝜃e2𝜃,1).

This last matrix is indeed equal to what we got before, up to global phase.

The expression we got in this example, CNOT(I Z(2𝜃))CNOT consists only of gates we have already encountered, and that we can in particular write as a ZX-diagram:

It is also a phase gadget! This in fact remains true if we add more qubits, by realising that a CNOT ladder sends Z Z to Zn:

(7.37)

By Exercise 7.2 such CNOT ladders with a phase in the middle are phase gadgets. As the unitaries we constructed from Pauli boxes also were phase gadgets, we see that we can now relate them:

(7.38)

For a more general Pauli string P = P1 Pn with Pi {X,Y,Z} we can use the fact that the Ui from Eq. (7.30) satisfy Pi = UiZUi to then write any Pauli exponential as a unitary from a Pauli box: setting U = U1 Un we have P = UZZU, and hence

Hence:

(7.39)

It will be helpful to introduce some shorthand for ei𝜃P. For Z and X phase gates we have been writing Z(α) for a phase rotation over α. We have also seen that ei𝜃Z Z(2𝜃), or in other words that Z(α) e2Z. This suggests the following notation for any Pauli string P:

P(α) := eiα 2 eiα 2 P.
(7.40)

Here we have included a global phase for good measure, so that Z(α) and X(α) match exactly to the definition of those phase gates that we were using before. For a multi-qubit Pauli string we will again adopt the shorthand of not writing the tensor product symbol . So we will for instance write ZZ(α) for the unitary e2e2ZZ. Note that replacing P by P in a Pauli exponential is equivalent to flipping the phase of the exponential: ei𝜃(P) = ei𝜃P. Hence, in the definition Eq. (7.40) we get [(1)kP](α) P((1)kα). For the future let’s record some useful facts now that we have this correspondence between Pauli exponentials and Pauli boxes. First, we can represent any Pauli exponential using just a small number of standard gates.

Proposition 7.3.9. Let P be any n-qubit Pauli string. Then we can represent ei𝜃P as a circuit of at most 4n 1 gates of type CNOT,H,X(π 2 ),X(π 2 ), and Z(2𝜃).

Second, we see a Pauli exponential is Clifford iff its phase is Clifford.

Proposition 7.3.10. A Pauli exponential P(α) for a non-trivial P is Clifford iff α = kπ 2 for some k .

The condition on ‘non-trivial’ here is necessary since the trivial Pauli exponential ei𝜃I is of course just the identity gate up to global phase for any 𝜃.

7.4 Pauli exponential compilation

Now that we know a bit about Pauli exponentials, let’s see how they help us think about quantum circuits and compilation.

7.4.1 Pauli exponentials are a universal gate set

We have already seen that Z and X phase gates can be represented as Pauli exponentials. This means in particular that any single-qubit unitary can be represented as a composition of Pauli exponentials. We know that single-qubit unitaries together with the CNOT gate form a universal gate set, so if we can show that the CNOT gate can be built out of Pauli exponentials, then we know that all unitaries are just compositions of Pauli exponentials. It will in fact be easier to look at the CZ gate. Remember that the CNOT is just a CZ conjugated by Hadamards on the target qubit:

As a Hadamard is just a series of Z and X phase gates, which are Pauli exponentials, it then indeed remains to look at the CZ.

Lemma 7.4.1. The CZ gate is a composition of Pauli exponentials:

Proof. For the ZX-diagram equality use Eq. (3.84) of Exercise 3.15, spider fusion (sp) and then the Euler decomposition (eu) of the Hadamard. That this diagram is indeed the sequence of Pauli exponentials follows because a phase gadget is just a ZZ Pauli exponential. □

Theorem 7.4.2. Pauli exponentials form a universal gate set. Or in other words, any n-qubit unitary can be written as a composition of Pauli exponentials.

Proof. Z and X phase gates are Pauli exponentials (see Examples 7.3.4 and 7.3.5). These gates generate all single-qubit unitaries. Combining this with the CZ gate gives a universal gate set, and this CZ can also be written as a Pauli exponential by the previous lemma. □

Remark* 7.4.3. There is a more ‘abstract nonsense’ argument for why you should be able to construct any unitary by composing Pauli exponentials. The n-qubit unitaries form a Lie group, a group that is also a differentiable manifold in an appropriate way. When we have a continuous family of unitaries we can take its derivative to get an associated matrix, which in this case will be a self-adjoint one. This is in fact the matrix exponentiation we have been looking at: given a self-adjoint matrix H, we get a family of unitaries via eitH, and taking the derivative of the function teitH gives you back H (or well, actually iH, but this is just the physicist convention; mathematicians please don’t get angry at us). All of this is to say that the self-adjoint matrices form the Lie algebra of the group of unitaries. Now, there is a result (see the References of the chapter) that says that if we have a basis of the Lie algebra H1,,Hk, that then the associated families of Lie group elements eitH1,,eitHk generate the entire (connected part of the) Lie group, meaning that we can write any Lie group element as a finite combination of elements from these families. It just so happens that the Paulis form a basis of the n-qubit self-adjoint matrices, and hence the Pauli exponentials generate the n-qubit unitaries!

7.4.2 Compiling to Pauli exponentials

Theorem 7.4.2 gives one way to write a quantum circuit as a series of Pauli exponentials, but if we are interested in doing this with a small number of these, it doesn’t do a very good job. There is in fact a more interesting way to transform a circuit into Pauli exponentials, which also has practical relevance in quantum compilation (see the References for this chapter). To do this we need to use what we learned in the previous chapter. There we saw that pushing a Clifford unitary U through a Pauli P gives you another Pauli: PU = UQ for some Q. Now, in Eq. (7.33) we saw that conjugating a matrix exponential is the same thing as exponentiating the conjugated matrix. This can be recast as a way to commute a unitary through a matrix exponential:

(7.41)

So if our matrix A is actually a Pauli P, and our unitary U is a Clifford, then we can push the Clifford through the Pauli exponential to get another Pauli exponential:

ei𝜃PU = Uei𝜃UPU = Uei𝜃Q
(7.42)

Example 7.4.4. Suppose our Pauli exponential is just ei𝜃Z2 and that we want to push a CNOT through it. In terms of ZX-diagrams this becomes:

How does Eq. (7.42) help us? Well, it helps us if we put on a different pair of glasses and view our circuit not as being given by X and Z phase gates and CZ/CNOT gates, but instead view it as consisting of Clifford gates and non-Clifford gates. With this perspective we consider CZ, CNOT, Hadamard, S and X(π 2 ) gates to all be of the same sort, namely Clifford, while the other class of gates contains any Z(α) and X(α) where α is not a multiple of π 2 . These non-Clifford phase gates we can then view as Pauli exponentials (which are only acting on a single qubit). So wearing these glasses we see the circuit as a series of Clifford gates and Pauli exponentials, where each Pauli exponential corresponds to a non-Clifford phase gate:

U = U1ei𝜃1P1 U2ei𝜃2P2 Ukei𝜃kPk Uk+1.
(7.43)

Here each of the Clifford unitaries Uj can consist of multiple basic Clifford gates like CNOT, S and H. Now let’s pick the first non-Clifford Pauli exponential in this circuit ei𝜃1P1 and push it through U1 using Eq. (7.42) to get:

ei𝜃1Q1 U1U2ei𝜃2P2 Ukei𝜃kPk Uk+1.

Here Q1 = U1P1U. We then push ei𝜃2P2 through both U2 and U1 and so on, until finally we get the circuit:

ei𝜃1Q1 ei𝜃2Q2 ei𝜃kQk U1U2Uk+1.
(7.44)

The circuit now has a very different structure! We see that each non-Clifford phase gate in the original circuit, which we saw as single-qubit Pauli exponentials, is transformed into some other Pauli exponential, which can now act on any number of qubits. All the Clifford gates of the original circuit are still present here, but are now pushed all the way to the end. In Chapter 5 we saw how to optimise Clifford circuits, and in particular that those circuits can be reduced to a normal form whose maximal size only depends on the number of qubits (Proposition 5.3.12). This structure is useful for a variety of reasons that we’ll explore in this chapter, so let’s give a name to it.

Definition 7.4.5. We say a circuit is in Pauli exponential form when it consists of a series of Pauli exponentials followed by a Clifford circuit.

We’ll often shorten this to PE form or say it is a PE circuit.

Proposition 7.4.6. A circuit consisting of Clifford gates and k non-Clifford phase gates can be efficiently transformed into a Pauli exponential form containing k Pauli exponentials.

As a first result, this form gives us a bound on the number of gates we need to write down an arbitrary circuit based on the number of non-Clifford gates it contains.

Proposition 7.4.7. Let U be an n-qubit circuit consisting of Clifford gates and k non-Clifford phase gates. Then it can be written in the {CNOT,H,S,S,Z(α)} gate set using at most k(4n 1) + 4n(n + 1) 1 = O(kn + n2) gates.

Proof. By Proposition 7.4.6 our circuit has k Pauli exponentials, each of which requires at most 4n 1 gates to write down in our gate set by Proposition 7.3.9 for a total of k(4n 1). The final Clifford circuit requires an additional 4n(n + 1) 1 gates (Proposition 5.3.12). □

Note that in most useful quantum computations (those that do calculations that we can’t efficiently do classically) we will have k n, and hence the number of gates in Pauli exponential form scales as O(kn). Let’s zoom out a bit. Why did it make sense to view our circuit as consisting of Clifford and non-Clifford gates, instead of using a more fine-grained difference between gates? In Chapter 5 we saw that computation involving just Clifford gates is efficiently classically simulable. The power of computation must then necessarily come from the inclusion of non-Clifford gates. We can then interpret a circuit in Pauli exponential form as first doing the ‘actual’ quantum computation using the Pauli exponentials, which are all non-Clifford, and then doing some final Clifford operations to ‘post-process’ the data in the right way. This analogy of Clifford operations being a type of post-processing becomes explicit when we are working with quantum error correcting codes. As we saw in Chapter 6, Clifford operations can often be done natively in a quantum error correcting code and so are ‘free’ operations. This is in contrast to non-Clifford operations which often require more complicated techniques to do fault-tolerantly.

7.4.3 Phase folding

Okay, so we can limit the number of Pauli exponentials we need to write an arbitrary circuit. But we can do even better! Remember from Proposition 7.3.2 that if we have two commuting matrices A and B that then eA+B = eAeB. It will be helpful to think of this going in the other direction as well: if we have commuting matrices, then we can combine adjacent matrix exponentials eA and eB into a single matrix exponential. In particular, if we have a circuit in PE form then there will be many Pauli exponentials ei𝜃jPk in a row. When we then have two Pauli exponentials ei𝜃kPk and ei𝜃lPl right next to each other such that Pk and Pl commute we can combine them into a single exponential. This is especially helpful if Pk = Pl =: P. Then we get:

ei𝜃kPk ei𝜃lPl = ei𝜃kPk+i𝜃lPl = ei(𝜃k+𝜃l)P

With the notation of Eq. (7.40) we can write this as P(α)P(β) = P(α + β). Using Pauli boxes we can write this as:

(7.45)

However, the Pauli exponentials we want to combine into one might not be next to each other, as there might be other exponentials in the way. But remember also from Proposition 7.3.2 that if A and B commute, that then eA and eB will commute as well. So if two Pauli strings commute, then the associated exponentiated Paulis will commute as well:

PQ = QPP(α)Q(β) = Q(β)P(α)

So using this we can move exponentials out of the way in order to combine the ones we want.

Example 7.4.8. Consider the following circuit:

XX commutes with ZZ, and hence we can combine the ZZ(α) and ZZ(γ), by moving the XX(β) out of the way. However, we can’t combine these with ZZ(ϕ) as there is an X1(δ) in the way that does not commute with the ZZ phases (in general of course: if δ = 0 then we can ignore that gate, and if δ = π, then it is a Pauli X that we can push through the Pauli exponentials).

All of this suggests the following algorithm for reducing the number of Pauli exponentials needed to write a circuit, which we will call phase folding.

1..

Start with a circuit in PE form (Definition 7.4.5)

2..

Consider the first (leftmost) Pauli exponential P(α).

3..

If there is another P(β) in the circuit, look at all the Pauli exponentials in between P(α) and P(β). If P commutes with all of these, then combine P(α) and P(β) into one P(α + β). Do this check and combination for all Pauli exponentials of P in the circuit.

4..

Repeat the procedure for the next Pauli exponential in the circuit.

5..

Check if any of the resulting Pauli exponentials is Clifford. If so, push these Clifford ones past the Pauli exponentials as described in Section 7.4.2, resulting in a new circuit in PE form. Repeat the algorithm. If no Clifford Pauli exponentials were created in the previous steps, we are done.

7.5 Hamiltonian simulation

Back in Section 2.2.3, we mentioned briefly that the time evolution of a quantum system comes from taking the matrix exponent of a certain operator, called the Hamiltonian, which encodes all of the physical interactions going on. Now, suppose a physicist or an engineer hands us a Hamiltonian, and asks us to simulate the behaviour of a quantum system it represents. Is there a way we can do this on a quantum computer? That is, if we start in a known quantum state |ψ, can we use a quantum computer to efficiently transform this state into it’s time-evolved state eitH|ψ, then perform some measurements to learn something about it? It turns out, if we just want to approximate eitH|ψ, the answer is yes! One way to do this is to employ the Suzuki-Trotter method, a.k.a. Trotterization. In our case, this will let us synthesise a circuit the approximates eitH using Pauli gadgets. H is a self-adjoint operator on n qubits, so a generic H could take an exponential amount of data to describe. Of course, in that case, we don’t have any hope of simulating it efficiently, because even reading H will take exponential time. Thankfully, lots of useful, physically meaningful H’s can be written as a linear combination of Pauli strings:

H = 1 2 j=1mα mPj

where m is polynomial in n. Then, if all the Pj commute, we’re laughing because thanks to Proposition 7.3.2, then U(t) := eitH splits up as a bunch of Pauli gadgets:

eitH = eiαmt 2 Pm++iα1t 2 P1 = eitαm 2 Pm eitα1 2 P1

i.e.

(7.46)

We already know how to synthesise Pauli gadgets, so we can build a nice Clifford+phase circuit, parametrised by t, that will perform time evolution for a generic input state. Life is good. Unfortunately, if the Pauli strings in H don’t commute, we can’t use Proposition 7.3.2 to turn a sum in the exponent into a product of matrix exponents. Nevertheless, we can power on and try to decompose H as a product of Pauli exponentials, but we might introduce some errors along the way. We’ll start with some Hamiltonian that is the sum of just two other Hamiltonians: H = H1 + H2. In order to decompose its exponential into a composition of exponentials, we would like it to be the case that:

(7.47)

for some 𝜖. Note that here we introduced some new notation for approximate rewriting by writing an 𝜖 on top of the sign. By definition, this means:

eit(H1+H2) eitH1 eitH2 𝜖

It turns out that the left-hand side is bounded by t2κ for some κ > 0, and hence to make this smaller than epsilon we can choose t small enough such that t2κ 𝜖. To see this is true, and to find this number κ, we will need to introduce some new tools, and to understand these tools, we will first do a warm-up exercise. We will prove that

eiA eiBA B

for A and B self-adjoint. Then in particular eitH1 eitH2itH1 + itH2 = |t|H1 H2, so that in this case the ‘κ’ is H1 H2. Define the matrix-valued function f(t) = eitAei(1t)B. Then note that f(0) = eiB and f(1) = eiA. We hence want to calculate the expression f(1) f(0). Now, for some of you this might bring back very old memories, but we will need to use the Fundamental Theorem of Calculus to help us simplify this. Remember that this says that f(x) f(y) = yxf(t)dt where f is the derivative of the function. This also works for matrix-valued functions, so that we get

eiA eiB = f(1) f(0) = 01f(t)dt 01f(t)dt,

where in the last step we used a version of the triangle inequality for integrals (which is after all just a limit of sums). So let’s calculate what f(t) is, which we will do using the product rule (f1f2) = f1f2 + f1f2:

f(t) = (iA)eitAei(1t)B + eitA(iB)ei(1t)B = ieitA(A B)ei(1t)B.

Here in the last step we commuted A past eitA and grouped the terms together. Now, also recall that the norm is unitarily invariant, meaning that UA = A for any unitary U and arbitrary matrix A. As both eitA and ei(1t)B are unitaries, we can hence simplify the expression of the norm: f(t) = eitA(A B)ei(1t)B = A B. Putting it all together, we see then that:

eiA eiB = f(1) f(0)01f(t)dt = 01ABdt = AB.

Now, for the real deal:

Exercise* 7.5. In this exercise we will show that

ei(A+B) eiAeiB 1 2[A,B],
(7.48)

for self-adjoint A and B, where [A,B] := AB BA denotes the commutator of A and B. We start by defining the function f(t) = eitAeitBei(1t)(A+B).

1..

Using the product rule and commutations of matrix exponentials, show that f(t) = ieitA[A,eitB]ei(1t)(A+B).

2..

Show that hence ei(A+B) eiAeiB 01[A,eitB]dt.

3..

Define another function g(x) = eixBAeixB and show that we can write [A,eitB] = g(t) g(0).

4..

By writing it as an integral, show that g(t) g(0) t[A,B].

5..

Combine what you have now shown to prove that ei(A+B) eiAeiB 01t[A,B]dt = 1 2[A,B].

Hence, setting A := tH1 and B := tH2 we get

eit(H1+H2) eitH1 eitH2 1 2[tH1,tH2] = 1 2t2[H 1,H2].
(7.49)

Hence, setting κ = 1 2[H1,H2] we have what we set out to prove! This basic fact lets us take a Hamiltonian written in terms of Pauli strings, and “peel off” the Pauli exponentials, one-by-one. That is, for H = 1 2 kαkPk, there exist some constants κ1,,κm such that:

By triangle inequality, we conclude:

(7.50)

where κ = kκk. So, we have managed to put a bound on our approximation, and hence the error we’ll give if we try to simulate H. But is this bound any good? There is no reason κ needs to be small, and in fact usually it won’t be.

Exercise 7.6. Give an explicit form of κ in Eq. (7.50) using Eq. (7.48).

So, our best bet is taking t to be very small. But what if we don’t want to take t to be small, because we want to simulate H for a larger amount of time? It turns out there is a nice trick for this: we perform the decomposition (7.50) many times and compose the results. That is, we can take advantage of the fact that

then approximate each of the maps ei t dH individually using (7.50). Initially, we might think this is a stupid thing to do, since now we’ll be making even more approximations, so maybe things will get worse. The crucial point, however, is that the error in (7.50) depends not on t, but on t2. So, even though we have to make this approximation d times, we can still bound the error by d ( t d)2κ = t2 d κ. Hence, we get:

(7.51)

Supposing we wish to approximate within some error 𝜖, we should choose d such that t2 d κ 𝜖. When t = 1, we should therefore choose some d κ 𝜖 . So, we now know how to build a quantum circuit C that approximates eitH up to some parameter 𝜖. We can use C to do a pretty good job at predicting how the real system behaves. That is, for a fixed input state |ψ and any measurement outcome b, the probabilities of seeing b on the “real” system described by H and the “simulated” system described by C are pretty close. To see this, we’ll compare the two probabilities:

Probreal(b) = |r|2Probsim(b) = |s|2

for amplitudes r := b|eitH|ψ and s := b|C|ψ. First, since |ψ and |b are normalised, we have:

eitH 𝜖Cb|eitH|ψ𝜖b|C|ψ

so r 𝜖s. Since taking the adjoint preserves norms, we also have r¯ 𝜖s¯. Since |r| and |s| must both be less than 1, we have:

|r|2 = r¯r 𝜖r¯s 𝜖s¯s = |s|2

Applying the triangle inequality, we have |r|2 2𝜖|s|2. Hence:

Probreal(b) 2𝜖Probsim(b)

7.6 Simplifying universal diagrams

In this chapter our focus has been on universal circuits. But we of course shouldn’t be neglecting our diagrams. A universal ZX-diagram is one where we put no restriction on the phases the spiders may contain. It turns out that we can apply some of the tricks we’ve been developing for reasoning about universal circuits to universal ZX-diagrams. In particular, phase gadgets are also really useful in this broader context. For one, we can use them to power-up the simplification strategy of Clifford diagrams of the previous chapter. Let’s recap this strategy. Recall from Sections 5.2 and 5.3 that we have two simplification rules, local complementation and pivoting, that allow us to remove certain spiders with a Clifford phase. We then do the following:

1..

First we rewrite the diagram to a graph-like diagram, so that all spiders are Z-spiders and they are only connected via Hadamard edges.

2..

Then using local complementation we remove all internal spiders with a ±π 2 phase.

3..

Similarly, using pivoting we remove any pair of connected spiders that are both internal and have a 0 or π phase.

Since we’ll need to refer to spiders with a phase of 0 or π a lot in this section, we’ll give them a name. We will call such a spider a Pauli spider. Now, using this strategy, if the diagram is Clifford, the only remaining internal spiders will be Pauli spiders that are only connected to boundary spiders. In Section 5.3.2 it is described how we can also get rid of these spiders, so that all internal spiders are removed, and the diagram will be in GSLC form. Now, let’s look at what happens if we do the same strategy for a non-Clifford diagram, i.e. where our spiders are allowed to be labelled by arbitrary phases. Such a diagram we can still rewrite to graph-like form using Proposition 5.1.8, but now the spiders might carry a Clifford phase (a multiple of π 2 ), or a non-Clifford phase (any other value). We can apply the local complementation and pivoting rewrites as before to remove any internal spider with a ±π 2 phase and a connected pair of internal Pauli spiders. When we do this we have several types of internal spiders that can remain. First, none of these rewrites remove spiders that have a non-Clifford phase, so if an internal spider has a phase that is not a multiple of π 2 , then it will still be in the diagram. Second, we can only remove the Pauli spiders that are connected to other Clifford spiders (either an internal Pauli spider, or an arbitrary boundary Clifford spider). Hence, we can also have internal Pauli spiders remaining that are connected only to non-Clifford spiders. We cannot actually remove these Pauli spiders, but we can do something interesting with them: transform them into phase gadgets. As we are working here with graph-like ZX-diagrams, the phase gadgets will look slightly different:

(7.52)

Lemma 7.6.1. We can transform an internal Pauli spider connected to another internal spider into a phase gadget using pivoting:

Proof. First push the phase to the right, and then unfuse the γ phase onto its own spider:

Now just apply the regular pivot Lemma 5.2.11 to the two spiders marked by *. □

We see that we end up with the same number of spiders, but that they are connected differently. The γ phase is now isolated on a spider that only has arity 1, i.e. it has become a phase gadget. This turns out to have several benefits. But first, note that if the internal Pauli spider is only connected to boundaries that we can do a similar rewrite.

Lemma 7.6.2. The following boundary pivot simplification holds:

Here we take the spider with the γ phase to be a boundary spider with the top wires being inputs and outputs to the diagram.

Proof. First add spiders to the outputs of the γ spider as in Eq. (5.15) and then use Lemma 7.6.1. □

As in Section 5.3.2, doing operations on the boundary introduces Hadamards on the input and output wires of the diagram so that we are working with graph-like diagrams with Hadamards. After we’ve applied the rewrites of Lemmas 7.6.1 and 7.6.2 to all the internal Pauli spiders we see that these internal Pauli spiders are now all part of a phase gadget. As a side-note, in order to not get stuck in an infinite loop in doing these rewrites, we should only be applying these lemmas to Pauli spiders that are not already part of a phase gadget (i.e. we should check if they do not have any single-arity neighbours).

Example 7.6.3. After we apply the rewrite strategy we described above, we might be left with a diagram that looks something like the following:

Every internal spider is either part of a phase gadget or has a non-Clifford phase.

Remark 7.6.4. In the diagrams we get from this rewrite strategy, none of the phase gadgets will be connected to each other. To see why this is, note that in Lemmas 7.6.1 and 7.6.2 the phase gadget we create is connected to precisely those spiders that the original Pauli spider was connected to. Our starting assumption was that no Pauli spider is connected to any other internal Pauli spider, since if that was the case, then we could have just applied a regular pivot to them to remove them. As each phase gadget is hence connected to what a Pauli spider was connected to, none of them will be connected to each other. Conversely, if we were given a diagram where two phase gadgets are connected to each other, then we could apply a regular pivot to them, and this would remove the ‘base’ of these gadgets, and change the phases into ‘regular’ internal spiders.

While these ‘gadgetisation’ rewrites don’t get rid of more spiders, it is still very useful to do them. First, it turns out that the diagrams we can bound the size of the diagrams we get in terms of the number of non-Clifford phases, and second, there are interesting new rewrites that we can apply to the phase gadgets. For the first point, note first that if the phase γ in the phase gadget is Clifford that we can remove the phase gadget.

Exercise 7.7. Show that if we have a phase gadget in a graph-like diagram with a phase that is a multiple of π 2 , that we can remove it using a series of local complementation or pivots.

Hence, the only remaining internal spiders are either non-Clifford or are Pauli spiders part of a non-Clifford phase gadget. As none of our rewrites introduce new non-Clifford phases, this means that the number of internal spiders we end up with is bounded by the number of non-Clifford phases we started out with.

Proposition 7.6.5. Let D be a ZX-diagram with n inputs, m outputs and k non-Clifford phases. Then we can efficiently rewrite D to a diagram D which has at most n + m + 2k spiders and n + m Hadamards on the inputs and outputs.

Proof. Transform D into graph-like form and apply the rewrites described above. When we are done with the rewrites we can have a spider for each of the inputs and outputs, contributing at most n + m spiders. Each of the remaining internal spiders is either non-Clifford, or is a Pauli spider that is part of a non-Clifford phase gadget, so that there are at most 2k internal spiders. Since the resulting diagram is graph-like with Hadamards we can potentially also have Hadamards on the input and output wires. □

This result is in a sense a more fine-grained version of Proposition 5.3.7 that shows that we can rewrite any Clifford diagram to GSLC form: if k = 0 in the proposition above, then the result is a GSLC form diagram. We can in fact view all the internal spiders and phase gadgets as additional states and effects plugged into a GSLC diagram (see also Exercise 5.5):

(7.53)

This is pointing in the direction of how we can do arbitrary quantum computations by preparing the right graph state and doing the right measurements. We will explore this type of measurement-based quantum computing in detail in Chapter 8. The fact that the size of our diagram is bounded by the number of non-Clifford gates, and not by the overall number of spiders in the starting diagram turns out to be useful when we want to simulate universal quantum circuits (see Section 7.8.1).

Remark* 7.6.6. So if we can rewrite our diagram to a form that is bounded in size by the number of non-Clifford spiders, and if the phase gadget rewrites we need to do to get there don’t remove any spiders, does that mean that we could have already bounded the size of the diagram before doing these rewrites? Well, no. This is because Lemmas 7.6.1 and 7.6.2 can actually result in spiders being removed from the diagram. This happens when we have two Pauli spiders that are connected to exactly the same set of non-Clifford spiders. In this case, when we apply Lemma 7.6.1 to one of the Pauli spiders and one of its neighbours, the other Pauli spider gets completely disconnected and becomes a scalar, which we can then ignore. In general, Lemma 7.6.1 will change the connectivity of the Pauli spiders in the diagram, and hence through a sequence of rewrites we could end up with Pauli spiders that are connected to the same set of non-Clifford spiders, resulting in more spiders getting removed by these rewrites.

7.6.1 Removing non-Clifford spiders

Once we have simplified our diagram to a point where all internal spiders are either non-Clifford or phase gadgets, there are a couple more useful rewrite rules we can apply that remove additional spiders. The first is very simple, as it is just a case of removing an identity spider.

Lemma 7.6.7. We can fuse a one-legged phase gadget with its neighbour:

Proof.  

So whenever we have a phase gadget in our diagram that is connected to exactly one spider, we can fuse it with its neighbour. There is really nothing special about phase gadgets here: as noted in Remark 7.6.4, the phase gadget is connected to what the original Pauli spider was connected to. So if we have a one-legged phase gadget in our diagram, then the original Pauli spider must have also already have been an identity spider, and we could have removed it then and there. The other rewrite rule we can apply to phase gadgets is one we have already seen: gadget fusion.

Lemma 7.6.8. We can fuse two phase gadgets connected to the same set of spiders:

Proof.  

So why do we point out these two rewrite rules for getting rid of phase gadgets? Two reasons: first, note that this phase gadget fusion rule is a generalisation of the phase-folding we can do in the phase polynomial framework. Second, the rewrites of Lemmas 7.6.7 and 7.6.8 are essentially the only rewrite rules we can do to get rid of additional non-Clifford phases in a diagram. Or that is, this is the case when we treat the non-Clifford phases as ‘black boxes’, where we treat the phase as some arbitrary angle that we don’t know anything else about. See the References of this chapter for more details. If we know more about the phases, such as that they are all multiples of π 4 , then there are other potential rewrites we could apply that simplify the diagram further. We look at this in detail in Chapter 10.

7.6.2 Circuits from universal diagrams

In previous chapters we have seen that certain gate sets have a natural diagrammatic counterpart, and that for unitary diagrams we can always transform them back into circuits over this gate set. In Chapter 4 we saw that any unitary phase-free diagram can be rewritten into a CNOT circuit, and in Chapter 5 we saw that any unitary Clifford diagram can be turned back into a Clifford circuit. This then raises the question of how we can transform these universal diagrams back into universal circuits when they are unitary. Perhaps disappointingly, but not too surprisingly, this does not turn out to be efficiently doable in general.

Exercise* 7.8. In this exercise we will see that if we had some function CircuitExtraction that takes in a poly-size unitary ZX-diagram and spits out a poly-size circuit implementing it, that then we could solve NP-complete problems with it. Hence, there is probably no implementation of CircuitExtraction that is efficient. In particular, what we will show is how to determine whether a Boolean formula f : 𝔽2n 𝔽2 has a solution using CircuitExtraction.

1..

Let Lf be the linear map defined by Lf|x = |f(x). We will see in Chapter 9 how we can construct maps like Lf efficiently as a ZX-diagram. Argue that

(7.54)

where Na is the number of values x for which f(x) = a.

2..

If we could calculate the value of Eq. (7.54) then we could determine whether f has a solution. However, CircuitExtraction expects a unitary diagram as input, which Eq. (7.54) is not. So we need to construct a unitary diagram containing Eq. (7.54) that allows us to determine N1. Using the fact that a superposition λ0|0 + λ1|1 can be written as Y (α)|0 show that

(7.55)

and determine α in terms of N1.

3..

The left-hand side of Eq. (7.55) is proportional to a unitary diagram, and hence when given to CircuitExtraction will spit out a 1-qubit circuit equal to the right-hand side of Eq. (7.55). Argue that with such a circuit we can efficiently determine α and hence N1 (you may ignore issues about numerical precision and the precise gate set that CircuitExtraction is using).

4..

Give the full algorithm that takes in a Boolean function f and with a single call to CircuitExtraction tells you whether f has a solution or not.

In this procedure we described we actually know more than just whether f has a solution: we get N1, the total number of solutions. CircuitExtraction hence doesn’t just allow us to solve NP-complete problems, it allows us to solve #P-complete problems, which are generally considered to be much harder.

Universal diagrams can just become too complicated, making it impossible to efficiently see what circuit it is equal to. Luckily for us though, for many diagrams we care about we can tame these complications and efficiently extract out a circuit. This is made possible by a strong connection between graph-like ZX-diagrams and measurement patterns. The ability to extract a circuit from the diagram then corresponds to whether the measurement pattern can be made deterministic. We will explain all of this and more in detail in the next chapter on measurement-based quantum computing.

7.7 Summary: What to remember

1..

A circuit consisting of CNOT and Z phase gates can be represented by a phase polynomial followed by a linear Boolean map acting on the computational basis states.

2..

In the ZX-calculus a phase polynomial looks like a collection of phase gadgets.

3..

The phase-polynomial representation can be synthesised back into a CNOT+Phases circuit using a variation on the Gaussian elimination algorithm of Chapter 4.

4..

The pathsum technique allows us to incorporate Hadamards in the phase-polynomial representation. Each Hadamard gate introduces a new path variable that replaces the current variable on that qubit. By thinking about pathsums we can synthesise complex unitaries like the Quantum Fourier Transform.

5..

A different way to think about universal circuits is through Pauli exponentials. Any quantum circuit can be written as a series of (non-Clifford) Pauli exponentials, followed by a Clifford circuit.

6..

The Pauli exponential representation allows us to bound the size of an n-qubit circuit with k non-Clifford gates by O(kn + n2).

7..

We can construct a quantum circuit for a Hamiltonian evolution eitH by splitting it up into small time-slices ei t nH and approximating each of these by a series of Pauli exponentials.

8..

We can use the Clifford simplification strategy of Chapter 5 to optimise universal diagrams. By using a variation on pivoting we can introduce phase gadgets in such diagrams, and these can then be merged together.

9..

In these optimised graph-like diagrams, the only internal spiders are those with non-Clifford phases, or they form the ‘base’ of a phase gadget.

10..

In general, extracting a circuit out of a unitary ZX-diagram is a hard problem (though in certain special cases we can do it efficiently).

7.8 Advanced material*

7.8.1 Simulating universal circuits*

Each of the gates we have considered in this chapter—Z phase gates, CNOTs, and Hadamards—has a different interaction with the path-sum expression of the circuit.

What is interesting about these three types of gates is that we truly need all of them in order to do interesting quantum computations. If we only had phase gates and CNOTs, then as we saw in Section 7.1.1 we can efficiently evaluate what it does on a computational basis state, as there is just one path involved. If we instead only had phase gates and Hadamards, then all the qubits would be acting on their own, and so there wouldn’t be any complex behaviour and we could directly calculate the matrices involved. And finally, if we only had CNOTs and Hadamards, then the circuit would be Clifford, so that we can also efficiently calculate the outcomes. The interesting behaviour then comes from a multitude of paths (because of Hadamards) which are non-trivially using all parts of the available Hilbert space (because of CNOTs), and each of which can have a different phase so that when we do a measurement the paths interfere in non-trivial ways. While these arguments only show that you need at least some of each type of gates, it turns out that we can prove a bit more. Suppose for instance that we have a n-qubit circuit with a polynomial number of CNOT, Hadamard and Z phase gates, but that we only have klogn CNOT gates for some k. Then this means that most qubits don’t have any CNOT on them: at most 2klogn qubits can be connected by CNOTs. The size of the matrix associated to these qubits is then O(22klogn) = O(n2k). So calculating what this circuit is doing can be done in polynomial time just by directly representing the matrix and calculating the action of all the gates on it one-by-one. Analogously, if we have only klogn Hadamard gates, then using the path-sum representation of the circuit we only have to keep track of O(2klogn) = O(nk) different paths, so that we can efficiently simulate the action of the circuit as well, just by sampling from this polynomial number of paths. These two simulation techniques, based on a cost that scales based on the number of Hadamards or CNOTs are not seriously considered (though lots of tricks for tensor network contractions are based on understanding where entanglement is created and how, and so counting where the CNOTs are is definitely implicitly part of that), but one that is based on a limited number of non-Clifford phases turns out to work quite nicely.

7.8.1.1 Stabiliser Decompositions

As it turns out, if we only have klogn non-Clifford Z phase gates, then we can also efficiently strongly simulate the circuit. To see this, let’s assume that we have some polynomial size circuit of CNOT, Hadamard and Z phase gates of which t are non-Clifford phases, and that we wish to calculate a specific amplitude of this circuit (let’s say the amplitude of observing 00| on input of |00). Then we can write this as a scalar ZX-diagram. We can rewrite this diagram into graph-like form, and simplify it further to remove all the Clifford spiders. As described in Section 7.6.1, the number of spiders in the resulting diagram is then O(2t) as each spider either carries a non-Clifford phase, or is part of a phase gadget which has a non-Clifford phase. In such a diagram, we can’t remove any more Clifford spiders, since there aren’t any. However, it turns out we can decompose a non-Clifford spider into a sum of Cliffords, so that then the whole diagram becomes a sum of ‘slightly more Clifford’ diagrams. To do this in a nice generalisable way we first unfuse the non-Clifford phase onto its own spider:

The remaining phase is then Clifford Now, this state with the α phase is equal to |0 + e|1 and hence can be decomposed into a sum of two Clifford spiders corresponding to these terms |0 and |1:

When we do this unfuse-and-decompose to a single α phase in a bigger diagram, we are then left with two diagrams that each have one fewer non-Clifford phase. We can however do this decomposition to all of the non-Clifford phases in the larger diagram at the same time, giving 2t different diagrams that are now completely Clifford. Each of these diagrams had O(t) spiders, and hence to completely ‘simplify away’ these diagrams into a single scalar number using the Clifford simplification strategy of Chapter 5, requires O(t3) graph operations. As we have to do this for each of the 2t diagrams, the total cost is then O(t32t). Hence, if we have a logarithmic number of non-Clifford phases t = klogn, then the simulation cost is O(k3nk log3n), which is polynomial in n for a fixed k. We could also decide not to decompose all phases simultaneously, but instead try to simplify the diagram further after every decomposition. Since a decomposition makes the diagram look a little more Clifford, the Clifford rewrites could make the diagram a bit simpler. More importantly, they might reveal places where the non-Clifford removing rewrites of Section 7.6.1 might apply, so that there are fewer terms we need to decompose in the first place. If all the non-Clifford phases are arbitrary, with all the non-Clifford phases not being equal to each other then this decompose-optimise-repeat strategy is pretty much the best we can do with this method. However, if the non-Clifford phases are some specific angles, then there is more interesting stuff we can do! In particular, if all the non-Clifford phases are multiples of π 4 , for instance when the circuit we start out with is Clifford+T, then it turns out that we can decompose a group of π 4 phases simultaneously, and this requires fewer terms. For instance, if we have a pair of |T := 2T|+ = |0 + eiπ 4 |1 states, then:

|T|T = |00 + eiπ 4 |01 + eiπ 4 |10 + eiπ 2 |11 = (|00 + eiπ 2 |11) + eiπ 4 (|01 + |10).

Here we’ve grouped the terms together into two Bell-like maximally entangled states. Written in diagrams we hence have:

Naively decomposing these two non-Clifford π 4 phases would give us 4 terms, but by grouping them into these Clifford states we require only 2 terms. Hence, if we have t of these π 4 phases, then the amount of terms we get when we decompose all of them in terms of pairs like this is 21 2t (assuming t is even for simplicity). This hence scales quadratically better than decomposing them all separately! This is an example of a stabiliser decomposition and there are many interesting things to say about them. A stabiliser decomposition is any decomposition of a quantum state into a sum of Clifford states. This is always possible to do, but for an n-qubit state will require in general 2n stabiliser terms. Finding more efficient decompositions is generally a hard problem. However, for very special states, like a tensor product |T |T, much better decompositions are known. For instance, it turns out we can decompose six copies of |T into just 6 terms, meaning we only require 6t6 = 2t1 6 log26 20.43t. There are also specific configurations of π 4 phases that allow for more efficient decompositions.

7.8.1.2 Gate-by-gate simulation

Note that this stabiliser decomposition simulation technique allows us to calculate an amplitude. If we want to calculate a marginal probability (cf. Section 5.4.2), then we need to ‘double’ the diagram. This doubles the number of non-Clifford phases, which is a big problem since this method scales exponentially in the number of non-Cliffords. In practice, the interleaved optimisation steps do get rid of some of the redundancy in this doubled representation, but the problem does persist. Usually we are doing strong simulation, and hence calculation of marginal probabilities, because we are actually trying to do weak simulation (i.e. sampling). In that case there are ways around having to double the diagram. One way is to not even try to calculate amplitudes, but instead directly do sampling by writing our circuit as a stochastic combination of Clifford computations. We will say a bit more about this when we are talking about computational universality in Section 10.5.2. When doing this, we no longer care about the number of terms in the decomposition, but instead we care about the total weight of the terms, i.e. the scalar factors in front of them. This weight is known as the robustness of magic or the stabiliser extent of the state (which one is used depends on which weights exactly we are talking about, see the References of this chapter). But there is another way, where it turns out to be sufficient to calculate just amplitudes, no marginals necessary. This is known as the gate-by-gate simulation method. This does come at the cost of having to calculate a number of amplitudes that scales with the number of gates instead of in the number of qubits. To see how this works, let’s suppose our circuit consists of unitaries U1,,Uk and write Ct := UtU1 for the circuit consisting of just the first t unitaries. Hence C0 is the identity, and Ck is the final circuit. Write Pt(x) = |x|Ct|0|2 for the probability of measuring the x outcome when applying Ut to the all-zero state |0. Our goal is to sample bit strings from the final distribution Pk. We are going to do that by starting with a sample from P0 and then transforming this into a sample that matches the distribution of P1, and then P2, and so on until we get a sample distributed according to Pk. The reason this is a beneficial thing to do is because sampling from P0 is trivial (just always output the all-zero bit string 0), and the update process from a distribution over Pt1 to one over Pt turns out to not require any marginal probability calculations. The reason for that is because the distributions Pt1 and Pt are the same for most qubits. They are based on two circuits Ct1 and Ct that are related via Ct = UtCt1. Let’s assume for now that Ut is a single-qubit gate acting on the first qubit. Then it turns out the distributions marginalising over x1 are actually equal:

(7.56)

Hence, to update a bit string y distributed according to Pt1 to one distributed over Pt, we only need to resample the first bit y1 according to the conditional distribution Pt(x1|x2 = y2,,xn = yn). When we do this we indeed get a correct sample, as

Pt(x1,x2,,xn) = Pt(x2,,xn)Pt(x1|x2,,xn).

Rewriting this a bit, we recall that the conditional distribution is defined as:

Pt(y1 = x1|x2 = y2,,xn = yn) = Pt(y1,y2,,yn) x1Pt(x1,y2,,yn),

and hence sampling from it is straightforward: we just calculate pa := Pt(x1 = a,y2,,yn) for both a = 0 and a = 1, both of which can be done by amplitude calculations, and then we set y1 = 1 with probability p1(p0 + p1). We hence don’t require any calculation of marginals! We assumed Ut was a single-qubit gate. If it is instead an l-qubit gate, then in Eq. (7.56) we would have an l-qubit marginal on those l qubits that Ut is acting on. Then instead of resampling just a single bit, we would resample these l bits, which would require the calculation of 2l amplitudes. If our gate set consists of just single-qubit gates and the CNOT, then this would hence require at most 4 amplitudes calculations per gate, with the two-qubit CNOT being more expensive. But we can in fact do a bit better. If Ut is a classical gate, like the CNOT, that maps a computational basis state to a computational basis state, then the distributions Pt1 and Pt are related to each other by a simple relabelling of the outcomes:

Here in the last step we absorbed the first CNOT into Ct1 to get Ct, and we absorbed the second CNOT into the effect by setting x2 := x1 x2. Hence, when we have a sample according to Pt1, we can map it to Pt by just applying the classical gate to the sample directly, no calculation of amplitudes necessary. This also works for for instance the Toffoli gate, or even an entire complicated classical oracle. There are also other cases where updating the sample comes for free. If Ut preserves the computational basis states, i.e. it is a diagonal phase gate, then the distributions Pt1 and Pt are exactly equal, and so then no updating is necessary at all! Hence, if our gate set consists of CNOT, Hadamard and Z(α) phase gates, then the only time we have to actually calculate amplitudes in order to update our sample is when we encounter a Hadamard gate. Putting this all together, we get Algorithm 3, which allows us to sample a bit string from a CNOT +Hadamard+Z(α) gate set.

_____________________________________________________________________ Algorithm 3: Gate-by-gate weak simulation by calculating amplitudes__________________________________________________________________________________ Input: A circuit C = UkU1 consisting of CNOT, Hadamard and Z(α) gates Output: A sample from the distribution P(x) = |x|C|0|2 Procedure SAMPLE(U1,…, Uk) let C = I let y = 0 for t = 1 to k do // forward part let C := UtC if Ut is CNOT then // Update the sample classically y := Uty end if Ut is Hadamard then // We calculate some amplitudes in order to update the sample let q be target qubit of Ut let z0 := y0yq10yq+1yn|U|0 let z1 := y0yq11yq+1yn|U|0 let p := |z0|2(|z0|2 + |z1|2) with probability p set yq := 0, otherwise set yq := 1 end // If Ut is Z(α) we don’t have to do anything end return y end_____________________________________________________________________________

Theorem 7.8.1. Let C be a CNOT+Hadamard+Z(α) circuit with h Hadamard gates. Then we can produce a sample from the distribution P(x) = |x|C|0|2 using 2h calculations of amplitudes.

Now recalling that the cost of calculating an amplitude of a Clifford+T circuit with t T gates was O(t32αt) for some number α 0.43, we see that we can sample from such a circuit with cost O(ht32αt) where h is the number of Hadamard gates. Moreover, assuming that the Hadamards and T gates are roughly evenly distributed throughout the circuit, most of these amplitude calculations will be of circuits that contain significantly fewer T gates, so that their cost of calculation is a lot cheaper.

7.8.2 Higher-order Trotterisation*

The approximation of eitH we found in Section 7.5 required a number of decompositions n that scaled as O(𝜀1). This is fine. But it does raise the question of whether we could do better. Let’s look back at the case with just two terms H1 and H2. The reason we didn’t get better scaling than O(𝜀1) is because eitH1eitH2 only approximates eit(H1+H2) up to a O(t2) term (Eq. (7.49)). If we could somehow boost their agreement up to some O(tk) for a k > 2, then we would get better scaling. Let’s expand both eitH1eitH2 and eit(H1+H2) to see where this O(t2) approximation error comes from. Recall that we have by definition:

eitH := j=0(itH)j j!
(7.57)

When we apply this to the expression we want, we get:

ei(H1+H2)t = I i(H 1 + H2)t 1 2t2(H 1 + H2)2 + O(t3) = I i(H1 + H2)t 1 2t2(H 12 + H 22 + H 1H2 + H2H1) + O(t3) (7.58) 

Doing the same on the approximation, we get:

eiH1teiH2t = (I iH 1t 1 2t2H 12 + O(t3))(I iH 2t 1 2t2H 22 + O(t3)) = I it(H1 + H2) 1 2t2(H 12 + H 22) t2(H 1H2) + O(t3) = I it(H1 + H2) 1 2t2(H 12 + H 22 + 2H 1H2) + O(t3) (7.59) 

where in the last step we grouped together the t2 terms. We see that Eqs. (7.58) and (7.59) agree on the constant term and the t term, but differ on the t2 term. Subtracting these two expressions gives:

eit(H1+H2) eitH1 eitH2 = 1 2t2(H 2H1 H1H2) + O(t3) (7.60) 1 2t2[H 1,H2] + O(t3). (7.61)

As we’ve seen in Exercise 7.5, we don’t actually get the O(t3) term, and the first term in the inequality is already enough to bound it. So if this derivation of the bound is worse, why did we do it? Well, looking at these expansions and where they agree and disagree tells us what we need to do to get better agreement. Looking at the t2 expressions in Eqs. (7.58) and (7.59) we see that the t2 terms are respectively 1 2(H12 + H22 + H1H2 + H2H1) and 1 2(H12 + H22 + 2H1H2). So the only problem is that we get two copies of the term H1H2 instead of H1H2 + H2H1. If we had decomposed eitH2eitH1 instead of eitH1eitH2, then we would have lacked the H1H2 term instead of the H2H1 term. This is pointing us towards the solution: we need to have both eitH1eitH2 and eitH2eitH1 in our decomposition. So let’s see what we get when we decompose the product of these decompositions:

(eitH1 eitH2 ) (eitH2 eitH1 ) = I it(2H1 + 2H2) t2 2 (2H12 + 2H 22) t2(2H 1H2 + H12 + H 22 + 2H 2H1) + O(t3) = I i(2t)(H1 + H2) (2t)2 2 (H12 + H 22 + H 1H2 + H2H1) + O(t3).

Here in the last equality we suggestively grouped together the t2 term in terms of (2t)2. We see that in this case we get the correct t2 term from Eq. (7.58). So replacing t by 1 2t to ensure we get the correct constants, we see that:

eit(H1+H2) = e1 2itH1 eitH2 e1 2itH1 + O(t3).
(7.62)

Exercise* 7.9. In this exercise we will find a way to lift Eq. (7.62) to a full-fledged procedure for Hamiltonian simulation. Our goal is to find an approximation of eitH where H = j=1lHj and we have Hj 1 for all Hj.

a)

Calculate the difference eit(H1+H2) e1 2itH1eitH2e1 2it up to O(t4) terms. I.e. calculate the t3 term of this difference.

b)

Express this difference in terms of (nested) commutators like [H1,[H1,H2]] and give an inequality of the difference in norm.

c)

Let S2({H1,,Hl},t) := e1 2itH1e1 2itHle1 2itHle1 2itH1. Give an upper bound on

eit jlHj S2({Hj},t)

which depends on t3 (while ignoring the O(t4) terms).

d)

Give an upper bound to the difference in norm of eit jlHj S2({Hj}, t n)n and use this to show that if we want to make this difference smaller than 𝜀, that we can then choose n = O(t32l32𝜀12).

This approach is known as second-order Trotterization. It is called second order because it recovers eit(H1+H2) correctly up to the second order. It is possible to generalise this technique to kth order. This involves a map Sk({Hj},t) that approximates eit jHj up to a O(tk+1) error. To approximate it up to an error 𝜀 we then split t up into n pieces where n = O((tl)1+1k𝜀1k). While this looks like it is then better to use higher and higher-order Trotterisations, there are some hidden constants here: the number of terms in Sk({Hj},t) scales exponentially with k. As a result it is often not beneficial to go beyond k = 2, and essentially never to go beyond k = 8.

7.8.3 Randomised compiling*

It turns out that if we want to approximate a unitary we can also do this using an ensemble of, somewhat randomly chosen, unitaries instead of just a single approximating unitary. This turns out to have several benefits. Let’s see how this could work. Let’s again consider a Hamiltonian H = jlλjHj. Here we choose the decomposition of H such that λj 0 and Hj = 1 for all j. Then the unitary channel corresponding to a 1n fraction of the total time evolution is

Un(ρ) = ei t nHρei t nH = ρ + i t n( ρH) + O ( t2 n2 ) = ρ + i jthj n (Hjρ ρHj) + O ( t2 n2 ) . (7.63)

We see that up to some error term, this unitary channel can be written as a sum over some expression involving just a single Hj. This suggests we should be able to find a probabilistic channel that can approximate Un, just by sampling from Hj. Let’s give this a try. Let’s define E(ρ) = jpjeHjρeHj. Here pj is some probability distribution over the terms H1,,Hl, and τ is some fixed number that we will try to determine later. Now let’s see what happens when we expand E in terms of τ:

E(ρ) = ρ + i jpjτ(Hjρ ρH) + O(τ2)
(7.64)

We see that this matches the expansion in Eq. (7.63) when we have pjτ = hj t n. Since the pj must form a probability distribution, we have jpj = 1, and hence

τ = jpjτ = jhj t n = Λ t n,

where we have set Λ = jhj. Then solving for pj we get pj = hj Λ . This means we can approximate the unitary channel Un up to a second order O( t2 n2) error using a channel that is a probabilistic combination of unitaries, each of which is just a single simple conjugation with eHj, as long as we choose τ = Λ t n and pj = hj Λ . Here Λ is the sum of all the weights hj of the sub-Hamiltonians Hj. But of course, we don’t want to approximate Un, but Unn. Using a similar type of analysis as we have done before (but adapted to work with channels instead of unitaries), we can show that we can approximate Unn with En up to an error 𝜀, by choosing n 2Λ2t2𝜀. This is interesting because the amount of terms we need does not depend on l, the amount of terms in H, but rather on Λ, the sum of the weights of the terms. This method hence works better than the non-randomised technique when l is large, but Λ is not so large. But even in the worst case, when we have Λ = l (this is when λj = 1 for all j), the scaling is n 2l2t2𝜀, which is still better than the non-randomised version: recall that we got n 1 2l2t2𝜀. But additionally, each iteration required every eitλjHj to be placed, so that each iteration consists of l term, giving a total gate count scaling of O(l3t2𝜀), instead of O(l2t2𝜀) in the randomised case.

Remark 7.8.2. You might find it very weird or counter-intuitive that we could approximate a specific unitary by creating an ensemble of random ‘bad’ approximations. However, remember that at the end of a quantum circuit we need to do measurements, and that our output, the only thing we can really ‘see’, is just a probability distribution over measurement outcomes. So the only thing we need for something to approximate a quantum circuit well, is for all the measurement outcomes to follow the correct probability distribution. So even though every single run of this protocol might not be a good approximation to the unitary, because we use a different one for every run of measurement, the errors can cancel out. We will see a similar idea explored in Section 10.5.2.

This technique here is a variation on Trotterisation. There are however also other techniques that can approximate eit jHj in quite a different way that also result in very favourable scaling. See the References of this chapter for more on this.

7.9 References and further reading

Path-sums It is hard to pinpoint the earliest appearance of phase polynomials and/or path-sum techniques in the literature, since essentially any computation involving Dirac notation and summations is, in some sense, a path-sum calculation. A notable feature of such a calculation is that it makes it clear that one doesn’t need exponential space to compute a single amplitude of a state vector prepared using a polynomial-sized quantum circuit. This insight plays an important role in proving some of the first complexity bounds for quantum computation, as in e.g. [29], [3], and [97]. The “wire labelling” trick that we used for computing phase polynomials in Section 7.1.1 seems to first appear in the work of [72], under the name annotated circuits. The authors use this technique to show that sum-over-path expressions can be efficiently computed from circuits over a universal gate set (in their case Toffoli+Hadamard), yielding a simple way to relate quantum circuit simulation to counting problems involving boolean polynomials. This was used to give a simplified proof of the inclusion BQP PP PSPACE of [29].

Optimisation with phase polynomials Dawson et al. only consider phase-polynomial expressions of the form (1)f for f : 𝔽2m 𝔽2 a boolean polynomial. More general expressions that can capture T-like phases (i.e. eiπ 4 f) were used by [15] to represent CNOT+T circuits and synthesise exact depth-optimal circuits for low numbers of qubits. The first algorithm to use phase polynomials to efficiently optimise large circuits was T-par [14], which used the phase polynomial representation and matroid partitioning to reduce T-depth. The circuit synthesis algorithm we gave in Section 7.2.1 is essentially a simplified version of this technique. Subsequently, phase polynomials were used extensively in circuit optimisation. These essentially fall into two categories: those that are agnostic to the particular values of non-Clifford phases, such as ‘phase folding’ and parity network optimisation, and those that rely on phases taking particular values such as multiples of π4 (or more generally π2k for k 2). We will discuss the latter kinds of optimisations in more detail in Chapter 10. Good overviews of phase-polynomial-based synthesis and optimisation methods can be found in the PhD theses of [11] and [116]. There are also techniques for constructing a circuit from a phase-polynomial that takes into account hardware constraints on two-qubit interactions, see for instance [173].

Phase gadgets and Pauli gadgets A unitary exp (iα 2 Z Z) is sometimes called an Ising-type interaction and is (up to a global basis change) the unitary that is implemented by the natural 2-qubit interaction in ion trap quantum computers, the Mølmer-Sørensen interaction [215]. The diagrammatic form of this expression was first given in [145], and it was called a phase gadget for the first time in [147], where the optimisation routine of Section 7.6 is from. The compilation of an arbitrary circuit to a series of Pauli exponentials followed by a Clifford circuit is described in [164], where it is used to argue that a simple set of fault-tolerant operations on the surface code that correspond to Pauli exponentials is sufficient for implementing any computation. For more about how this works, see Chapter 11 The original form of Pauli exponentials as ZX-diagrams was given in [64]. The observation that we get a unitary out of a measurement box if we plug in a Z-spider, but get a projector if we plug in an X-spider, can be explained in an abstract way as a duality between measurements and observables that occurs for any pair of strongly complementary observables [110].

Trotter decompositions Trotterisation is named after H. F. Trotter, for the techniques he found in [221]. Masuo Suzuki studied these decompositions in a series of papers, culminating in the definition of higher-order Trotterisations that are now also called Suzuki-Trotter decompositions [219]. The result from Lie theory that any basis of the Lie algebra generates the connected part of its Lie group can for instance be found in [124, Corollary 3.47]. The randomised Trotterisation technique is known as QDRIFT and was developed by Earl [47]. The current state-of-the-art higher-order Trotter decompositions are given in [183], where they find some settings wherein an 8th order decomposition is the best possible for some realistic Hamiltonians. The bound on the error of Hamiltonian approximations of Exercise* 7.5 is from [109, Appendix A]

Other techniques for Hamiltonian simulation Instead of using Suzuki-Trotter decompositions, there are a couple of other techniques that can be used to do Hamiltonian simulation. There is the technique of linear combination of unitaries (LCU). This gives an approach to approximately and with some probability implement a linear map M where M is given as a sum of unitaries M = jmαjV j [30]. This technique works as long as we can implement a controlled version of the V j, and is efficient when these implementations are efficient and m is not too large. As long as M is close to being unitary itself, this technique succeeds with high probability. To use this for Hamiltonian simulation we realise that by cutting off the Taylor expansion of eit jHj at a certain order, that we get a linear combination of products Hj1Hjk. As long as each of the Hji is unitary itself (for instance, when it is a Pauli string), we can use the LCU method. This requires a circuit consisting of O(ltlog(lt𝜀)) components. While this is better than any of the Trotter techniques asymptotically, the circuits themselves are more complex and require ancillae, so that in practice it might not always be better to use this technique.

Stabiliser decompositions Using the efficiency of simulating Clifford operations in order to boost this to an exponential-time universal quantum circuit simulation scheme was first proposed by [1]. However, the idea of grouping together magic states in order to decompose them into fewer terms and get better simulation time is from [37] where they used a simulated annealing algorithm to find a decomposition of 6 |T states into 7 terms. This was later improved to a 6 term decomposition in [194], where they also showed that more complicated arrangements of magic states into the shape of a ‘cat’ state allow for even better decompositions. Combining stabiliser decomposition methods with ZX-based optimising was introduced by [148]. In the follow-up work [150] ‘cat’ states were shown to be related to phase gadgets, giving a nice way to incorporate the better decompositions for these states into the ZX pipeline. This paper also introduced the idea of a ‘partial magic state decomposition’, where the terms in the decomposition don’t necessarily have to be stabiliser themselves, they only have to have fewer non-Clifford resources then in the ‘mother’ state. Using this idea, they find a decomposition of |T5 into 3 terms, each of which contain a single |T, this hence ‘effectively’ removes 4 |T’s at the cost of 3 terms. This is currently the best known generic decomposition.

Stabiliser extent and weak simulation A stabiliser decomposition allows for exact strong simulation by just enumerating all the different terms of the computation. However, we don’t need this exactness when wanting to do sampling, i.e. weak simulation. In [33] they introduce the concept of an approximate stabiliser decomposition, which only gives the desired state up to some small error. This already greatly improves the simulation time. The approximate stabiliser rank of a state |ψ can be bounded using the stabiliser extent, which was introduced by [32], and is equal to i|λi| minimised over all stabiliser decompositions |ψ = iλi|ϕi. A closely related concept is the robustness of magic [131] which is defined for a mixed state ρ as the minimum of i|λi| over all decompositions ρ = iλi|ϕiϕi|. Robustness of magic directly upper bounds the weak simulation cost of a computation [131]. Note that these techniques based on approximate stabiliser rank and robustness of magic are instances of quasiprobabilistic simulation techniques. Here the ‘quasi’ means that the probabilities are allowed to be negative, and there also other techniques that belong to this family [190]. In these methods you have a set of states that are ‘free’ (like Clifford states), and then you write your non-free states as a quasiprobabilistic combination over the free states, with the cost of simulation scaling with the 1-norm of the quasiprobability distribution (the sum of the absolute weights, also called the negativity of the distribution). The ‘gate-by-gate‘ simulation technique of Section* 7.8.1.2 was introduced in [34], where they also already observed that this combines well with the stabiliser decomposition approach.