In the previous chapters, we have often seen a dichotomy between two classes of quantum computations. On the one hand, we have Clifford computations, which have a great deal of structure we can exploit to efficiently solve problems like circuit synthesis, deciding equality of computations, and strong classical simulation. On the other hand, we have looked at universal quantum computation, typically arising from adding gates for arbitrary angles . While Clifford angles, i.e. integer multiples of , satisfy many identities, we have so far treated non-Clifford angles as “black boxes”, which don’t seem to satisfy any extra rules, except for trivial ones coming from the spider fusion rule like
and variations thereof. It turns out that for generic angles, this is pretty much all we can do. In fact, as we’ll explain in the References, there is a way to make this statement precise. However, for dyadic angles, i.e. angles of the form for some integer , there is a great deal more structure at play. As we’ll see in this chapter, we can take advantage of this dyadic structure in a variety of ways. First, we will see that it simplifies the problem of synthesising generic unitary maps using just Clifford gates and the first non-Clifford dyadic phase gate . We can characterise the set of unitary matrices that we can synthesise exactly using Clifford+T gates as those whose entries are all within a certain subset, called , of the complex numbers. Using some special properties of this set, and a little (light) ring theory, we can figure out precisely how many gates we need to synthesise such a matrix exactly and do this synthesis efficiently. Note, here “efficient” means efficient in the size of the matrix, not the number of qubits. For generic unitaries, this is the best we can hope for. We’ll also see that for any tolerance , we can approximate any unitary matrix within with a -matrix, which we can in turn synthesise exactly using Clifford+T gates. Second, we will see that for dyadic angles, new rules start to hold that wouldn’t for generic angles. In particular, certain complex configurations of phase gadgets can all cancel each other out in ways that are not implied by the gadget-fusion law we met back in Chapter 7. These so-called spider nest identities come from a particular interaction between the parity (i.e. mod-2) structure of the phase gadget itself and the mod- structure of its angle. Similar to the representation from Chapter 7 of CNOT+phase circuits, we can represent phase gadgets as collections of binary vectors representing parities of qubits where phases get applied, and as before we can stick these vectors together into a matrix. We introduce a new Scalable ZX notation so that we can directly represent these matrices in an efficient way in our diagrams. Using these matrices representing collections of phase gadgets we can also recognise which configurations of phase gadgets will cancel out. Namely, it will be those whose “gadget matrix” satisfies a condition called strongly 3-even. In this chapter, we will work out some properties of strongly 3-even matrices and a closely related -linear structure called Reed-Muller codes. Finally, we will put the pieces together by classifying all the spider nest identities and explaining how they can be used for T count optimisation. Clifford+T circuits are especially important for fault-tolerant quantum computation, so in this chapter we will also be (secretly) laying the groundwork for Chapter 11, where we explain techniques for implementing universal quantum computation within a quantum error correcting code. In particular, in Section 10.5 we will see how we can relate circuits over different types of gate sets together using a technique called catalysis that allows you to perform certain ‘hard’ operations in an easy way as long as you have a suitable catalyst state lying around. Together with the classification of spider nest identities this will prove very handy for when we want to implement non-Clifford gates in fault-tolerant architectures.
Back in Chapter 2, we noted that CNOT gates plus arbitrary single-qubit unitaries are universal for quantum computation, in the sense that they can be used to build arbitrary unitaries over qubits. In this section, we will show that, in fact Clifford+T circuits are approximately universal, in the sense that they can get arbitrarily close to any -qubit unitary. One way to show this, is to show how for any single-qubit unitary there exists some arbitrary close to expressed totally in terms of and gates. There are several ways to prove this. The “classic” way is to show that corresponds to a rotation about some axis of the Bloch sphere by an irrational multiple of . Then, by raising to larger and larger powers, we will eventually land close to any possible rotation around that axis. We can do the same around some other axis, e.g. with , to obtain a pair of rotation gates that suffice to build any single-qubit unitary. It takes quite some work to spell out the details of this argument, and this has been done in several standard textbooks on quantum computing. There are also many variations one can use to obtain more or less efficient decompositions of single-qubit unitaries. We’ll give some pointers to where you can find all the gory details of this approach and variations at end of this chapter. However, in this section, we will start with a totally different approach to synthesis of unitaries in Clifford+T, which is based on number theory. This approach starts from the realisation that the numbers appearing in a unitary matrix built from CNOT, H, and T are not just any old complex numbers, but are actually quite special. First, lets have a look at the matrices again:
Clearly any that we can construct from these gates will have as its entries sums and products of integers, , and the complex phase . If we think about where lies on the complex plane:
| (10.1) |
we see that and . Using these facts, it’s not hard to see that we can already build using just integers, , and :
If we look at just the numbers we can build with integers and , this consists of precisely the rational numbers whose denominator is a power of two. We give this set of numbers a special name.
Clearly , and for , and are also in . Hence, forms a ring. However, unlike the full set of rational numbers, not every has a multiplicative inverse in , so it is not a field. It turns out that, for the purposes of circuit synthesis, this is not a bug, but a feature, since rings can have very interesting properties owing to the fact that we cannot arbitrarily divide numbers by each other. To discuss the single-qubit synthesis algorithm, there are two relevant rings based on we need to study. The first is , the ring obtained by allowing arbitrary sums and products of dyadic rationals with the complex number . The second is , which is restricted just to integer multiples of powers of . Such rings are called ring extensions, i.e. rings obtained by adding one or more elements outside of the original ring and closing under sums and products. Since , we can concretely represent elements of and elements of using quadruples of numbers taken respectively from or :
Exercise 10.1. Define ring addition and multiplication for and as operations acting on quadruples of numbers respectively taken from and .
Clearly any unitary matrix built out of CNOT, H, and T will consist of elements from . Perhaps surprisingly, the converse is also true: any unitary matrix whose elements are in can be constructed exactly as a composition of CNOT, H, and T gates. We will show this by giving a concrete algorithm for synthesising unitaries over in the following sections, then conclude with some remarks on how this lifts to an approximate synthesis algorithm for unitaries over all complex numbers.
We’ll begin by considering the simplest non-trivial synthesis problem we can have: namely preparation of an arbitrary single-qubit state . That is, we want to find a sequence of and to transform into . Equivalently, we can find a sequence of and transforming into , then take the adjoint. One way to do this is starting with a state whose entries are in , and then applying gates to it until all of the entries are in . The following lemma should make clear why this helps us.
Lemma 10.1.2. If is normalised and all of its entries are in , then it must be a computational basis vector, up to a global phase.
Proof. For any element in , we can get an explicit form for in terms of :
| (10.2) |
If we suppose that all the are integers, then the only way to have is when the part here is negative. Let be a normalised state with entries in . Writing where and and , we have . Since we know that the total sum is , the components should cancel: . But all the are negative, so the only way this can happen is if all . Similarly, since all the are positive integers and they sum to , they must all be zero except for one which is equal to . But then exactly one of and is , and the rest are zero. Hence for some . □
If we get to , mission accomplished, up to a global phase. If we get to , we simply apply an gate (i.e. ) and again we are done. So, the name of the game is turning the coefficients of into elements of . One potential strategy is to factor as:
then apply gates to to try and make the coefficients and into even numbers. Then, we can factor out a from and and the leading scalar becomes and we’ve made some progress toward getting a state whose coefficients are in . This does work, but it is a bit tricky to find the gates we need to apply to to “make progress”. We can make life a bit easier if we express differently, as:
| (10.3) |
where is a “special” number that has some nice number-theoretic properties that will help with our synthesis algorithm (for more details on the number theory side of things, check out the advanced section 10.7.1). First, we should be able to check that is indeed a vector over . For this, we should check that , which is not immediate because not every element in has a multiplicative inverse. The elements of a ring that do have multiplicative inverses are called units. We can see that is a unit in by letting and calculating . Conversely, we would like to know that any with coefficients in can be written in the form of (10.3). This is true if any can be written as for some and a high enough power of . Put another way, we need to prove the following property of .
Exercise 10.2. Show that, for any , there exists such that . Hint: Using the fact that , first show that , where is in .
The smallest needed to express in the form of equation (10.3) is called the least denominator exponent (lde). If the lde is , then is already has coefficients in , so by Lemma 10.1.2, it must be a basis state, up to a phase. It is easy to see that is minimal precisely when and are not both divisible by , i.e. when there exists no pair such that and . We can easily check divisibility by using . Since , then it is not necessarily the case that for some . In fact, it will be in precisely when divides . The final piece of the puzzle is the following lemma, which lets us decrease the lde by applying H and T gates.
Lemma 10.1.3. Let be a state where are non-zero. Then there exists some such that for divisible by .
Since become divisible by , we can factor a out and reduce the least denominator exponent by 1. If we repeat this process over and over, eventually the lde will be , so by Lemma 10.1.2 it will be a basis vector. The reason why Lemma 10.1.3 works has to do with the special behaviour of within the ring . Namely, if we start computing numbers modulo (or powers of ) in , we will see that and can always be broken down into a particular form that tells us how many gates to apply to make progress. To understand this, we will need to define the notion of a residue class, which lets us formalise what it means to work “modulo” some element of an arbitrary ring. We will do this and provide a proof for Lemma 10.1.3 in Section 10.7.1. However, if we believe the lemma, we now have a synthesis algorithm for qubit states. In fact, this also already gives a synthesis algorithm for single-qubit unitaries. This is because the columns of a unitary matrix must be orthogonal, so if we send the first column to a basis vector, the second column will also get sent to a basis vector, up to a phase. So after transforming the basis vector to the correct positions and phases, we will have synthesised the entire single-qubit unitary. We summarise this procedure in Algorithm 4.
Let and be the columns of . Since is unitary, and must be orthogonal.
Express as:
where is the least denominator exponent.
Try to apply for all to to obtain where divides and .
Factor out a to obtain:
Repeat until the lde is to obtain a sequence of gates sending to . Optionally, apply a final gate to send to .
Since unitaries preserve orthogonality, the same sequence of gates will send to . Perform a final to remove the relative phase between and . Then .
Return , which implements up to a global phase.
Example 10.1.4. Consider the following unitary over :
We need to first express in terms of a matrix multiplied by , where is the lde of . This can be accomplished by multiplying by repeatedly until we get a matrix whose entries are in . For our chosen , we needed to multiply by three times to get a matrix, giving us the following expression:
So, we have an initial lde of 3. We then try to apply for some followed by an gate to reduce the lde. Picking (i.e. just applying an gate) does not reduce the lde, but reduces the lde from 3 to 2:
The lde is not zero yet, so we do another iteration, this time finding we can reduce the lde at . Namely, if we apply followed by , we reduce the lde from 2 to 0:
Our matrix now contains only elements of , so we know the first column must be a basis vector, up to a phase. We find it is . We can change it to by applying an gate:
Finally, we finish by correcting the relative phase with a final application of , obtaining identity, up to a global phase:
Moving everything but to the RHS, we conclude that:
Suppose we want to use the exact synthesis algorithm from the previous section to approximate arbitrary, single-qubit unitaries. We now know that we can build any unitary matrix over the ring , and it is not hard to see that any complex number can be approximated to arbitrarily high precision by some element of . Indeed, if we take for and some suitably high values of , we can get as close as we like to an arbitrary complex number. We seem to be most of the way there on coming up with an approximate synthesis algorithm. The problem is, if we go through a complex-valued unitary matrix element-by-element and approximate each complex number with an element of , odds are we won’t get something unitary but just very nearly unitary. So we need a better idea. Like in the exact synthesis case, inspiration comes from looking at the least denominator exponenent. The idea is, for a target unitary matrix and a fixed error bound , we progressively raise the denominator exponent until we can find some where
is a matrix over ,
is unitary, and
for some global phase , .
It turns out condition 2 is already quite restrictive. Since is a unitary, its columns need to be normalised. Hence, the norm-squared of the columns of must be . Using this fact, we can prove that for fixed , there are only finitely many to choose from.
Exercise 10.3. Show that, for a fixed , there are only finitely many such that for any constant . Show that this implies there are only finitely many matrices over such that is unitary for any fixed . Hint: Use (10.2) to get an explicit form for and in terms of their integer coefficients.
Since we already know that there are finitely many satisfying conditions 1 and 2 above for fixed , we know that there must also be finitely many satisfying all 3 conditions. Hence, we can enumerate them for a fixed . If we find a solution, we accept it. Otherwise, we increase and try again. It turns out that there is a way to do this enumeration of candidates satisfying conditions 1–3 efficiently, and that it terminates for . The way this works has to do with properties of certain discrete subsets of the complex plane, which are a bit advanced for our purposes. Hence, we’ll finish our story on synthesis here, and make a few more remarks about this in the advanced Section* 10.7.3.
So far in this book we have used ZX-diagrams where each wire represents a single qubit. As we have seen, this already allows us to do a lot of interesting stuff. However, sometimes there is repetitive structure in the diagram that we should really try to abstract away, so that we don’t miss the forest for the trees. The way we do that will be with scalable ZX notation. This notation allows us to compactly represent operations on registers of many qubits, while still maintaining much of the flavour of calculations with standard ZX-diagrams. We will represent a register of qubits as a single thick wire:
| (10.4) |
Sometimes we will need to split a register into two registers, peal one qubit off to do something with, or merge registers together again. For that we introduce a little bit of extra notation:
| (10.5) |
We call these operations divide and gather. They don’t actually do anything as linear maps (they are just the identity), but they help us compactly write down more complicated maps. They come with some simple rewrites:
| (10.6) |
Note that in general we will only label the thick wires with the number of qubits they represent when it is necessary for clarity, and leave it implicit otherwise. Just being able to represent a qubit register as a single wire is not that useful, so let’s introduce some things we can do with them. We will use bold spiders (with a thick border) to represent a product of (unconnected) copies of a Z or X spider as a bold spider:
| (10.7) |
Note that in this notation each thick wire connected to the same spider needs to represent the same number of qubits, otherwise it is not a valid diagram. These bolded spiders can be labelled by a phase, and this phase then gets applied to each qubit separately. As the bolded Z- and X-spiders represent non-interacting parallel spiders, all the standard ZX rewrites still apply to them, and hence we can still do spider fusion, strong complementarity, etc. We can push spiders through a dividers and this in fact gives us a type of bialgebra between the Z/X spider and the dividers and gatherers (cf. Section 3.2.4):
| (10.8) |
Note that this also works when the spider has no additional outputs:
| (10.9) |
The most important component of the scalable notation, and what makes all of this worthwhile, is a new piece of notation called the matrix arrow, or just arrow for short, which allows us to represent arbitrary connectivity from Z-spiders to X-spiders using an biadjacency matrix :
| (10.10) |
Taking the convention that represents the entry in the -th column and -th row of the matrix , we have in Eq. (10.10) that if and only if the -th Z-spider on the left is connected to the -th X-spider on the right. Just using a single matrix arrow we can hence write down an arbitrary phase-free ZX-diagram in parity normal form (Definition 4.2.2), by directly writing the biadjacency matrix on the matrix arrow. Concretely, Eq. (10.10) corresponds, up to scalar factors, to a linear map that acts as on computational basis vectors:
| (10.11) |
Note that we treat the bit string as a column vector for the purposes of matrix multiplication. Let’s consider some special cases for the matrix arrow, an all zero matrix and the identity:
| (10.12) |
Using Eq. (10.11), we can see that when we compose matrix arrows, we just calculate the matrix product:
| (10.13) |
We can also prove this diagrammatically by unrolling the definitions and using strong complementarity on all the internal spiders (this is essentially doing the reduction to parity normal form from Chapter 4). Spiders and arrows interact in nice ways. First, we have two “copy” laws allowing us to push arrows through spiders:
| (10.14) |
These can be proven by unrolling the definitions and using strong complementarity and spider fusion.
Second, we can express block matrices in terms of spiders:
| (10.15) |
We can combine these operations to get a decomposition of a matrix into smaller matrices:
| (10.16) |
We could view this as the definition of the matrix arrow, because this allows us to define it inductively, starting from trivial components and combining these into larger matrices. The block matrix and composition rules also imply a graphical rule for the sum of two matrices.
Collections of phase gadgets turn out to look quite simple when using scalable ZX notation. First, let’s see how we can represent a single phase gadget:
| (10.17) |
Here is the row vector , and hence the matrix arrow represents a collection of Z-spiders connected to a single X-spider, as needed. If instead the phase gadget is only connected to a subset of the wires, then we can replace by a vector where iff the gadget is connected to the th qubit. We can then see how we can represent a composition of phase gadgets in scalable notation:
| (10.18) |
This construction generalises to any number of phase gadgets, for which we then get:
| (10.19) |
In this matrix each of the rows describes a phase gadget, and each of the columns corresponds to a qubit. Hence, iff the th phase gadget is connected to the th qubit. Note that we can prove the gadget fusion rule of Section 7.1.2 in the scalable setting using what we have seen. This applies when we have two phase gadgets with the same connectivity, and hence the same row appears twice in the matrix:
| (10.20) |
Conversely, if we have gadgets with phases that are not , but multiples of that, we can also represent them like (10.19), by unfusing them into additional rows of the matrix (doing (10.20) in reverse).
In Chapter 9 in order to construct the Toffoli and CCZ gate using more low-level gates, in particular T gates, we used a Boolean Fourier transform to switch from the multilinear phase polynomial used in the CCZ gate to a phase polynomial built out of XOR terms that can be constructed using phase gadgets. In essence this all boiled down to the equation:
| (10.21) |
We used this equation to argue for the following diagrammatic equality:
| (10.22) |
But there is nothing special about the phase of the CCZ gate here, and in fact we can write a similar equation for a gate:
| (10.23) |
Now when both sides of this equation are obviously equal to the identity (just copy some spiders and cancel some identity spiders). But this should then also hold for , and then this fact becomes less obvious. On the left-hand side we then have so that it is still the identity, but on the right-hand side we get a bunch of phases. In that case we can show this by using the eigenstate identity of Exercise 3.15, the Euler decomposition of the Hadamard and local complementation:
| (10.24) |
In this case we can hence still prove this identity with the tools we have already seen. But when we generalise Eqs. (10.21) and (10.23) to work with more than 3 wires we start getting new and very useful identities. As the number of XOR terms blows up exponentially it will be helpful to introduce a slightly more compact way to talk about them. Note that we can represent a parity like with a bit string since . We can hence write Eq. (10.21) more compactly as
Here and the sum over the goes over all the bit strings except for . We can then easily write down the generalisation of this equation to variables as follows:
| (10.25) |
Now if we take , and we consider the CCCZ gate, which applies the phase polynomial , then applying Eq. (10.25) would result in a bunch of phase gadgets with a phase of . But this is the Clifford+T chapter, so we want phases. We can get those by instead considering the trivial phase polynomial . This then results in a constellation of phase gadgets:
| (10.26) |
While this might look like some sort of confusing alien spacecraft, there is some order to the picture above: it contains all the possible phase gadgets on four qubits: all those with one leg (the phases directly on the qubit wires), two legs, three legs, and the single four-legged one. All the gadgets with an odd number of legs have a phase of , and all the gadgets with an even number of legs have a phase of . Eq. (10.26) is a genuinely new diagrammatic equation, a type of equation we call a spider-nest identity. As we will see in this chapter and the next, there are many uses of such equations. As a first application, note that if we bring the four-legged phase gadget to the other side of the equation that this says that whenever we have a four-legged gadget with a phase, we can replace it by a collection of 14 three-, two- and one-legged phase gadgets involving all of the four-legged phase gadget’s legs. In fact, we can decompose any -legged phase gadget with a phase of into a collection of phase gadgets with a most 3 legs. For instance, when :
| (10.27) |
So while we would a priori think that we could need different phase gadgets, one for each possible parity, we see that we actually only need , only those with at most three legs. A different use-case for Eq. (10.26) is that we can use it to optimise the number of gates needed for a circuit. When we have a collection of phase gadgets with phases we can look for any subset of four qubits that has many phase gadgets and then use a version of Eq. (10.26) where we have those phase gadgets on the left and all the other parities on the right. Then by applying this equation we essentially ‘toggle’ which phase gadgets on these four qubits were present. As long as we started out with at least half of all the possible phase gadgets present, so at least 8, we end up with fewer phase gadgets. If we had 8 phase gadgets, then we get phase gadgets at the end. If we had , then we end up with of them. The exact phases, whether or is not important for this, since if the signs don’t match, this just introduces a Clifford phase gadget.
Exercise 10.4. In Eq. (10.26) the two-legged and four-legged phase gadgets have a phase. Show that by unfusing a phase gadget from the four-legged one and applying a set of rewrites similar to those in Eq. (10.24) that we can rewrite it to a collection of phase gadgets all having a phase.
Using scalable ZX notation, we can represent a collection of phase gadgets using a parity matrix to describe the connectivity of the gadgets as in (10.19). The 4-qubit spider nest identity of Eq. (10.26) with 15 gadgets (more specifically, the modified version of Exercise 10.4 where all the phases are so that we can get rid of the repeated rows) then corresponds to the following matrix:
| (10.28) |
Note that we write here the transpose of the matrix to use the space on the page a bit better. Any spider nest identity will be an identity of the form:
| (10.29) |
This means that when hunting for such identities, we are really looking for a particular type of boolean matrix. To find out what kind of matrices we need, we need to look at the pseudo-Boolean Fourier transform again. This time however, we want to translate phase gadgets back into controlled phase gates. The ‘inverse’ of the equation (10.25) which relates an AND to a sum of XORs is given by:
| (10.30) |
Here , and is the number of elements of . Hence, each term has a weight that is a power of in the decomposition. Now, if we are considering this in a phase polynomial, where our phase gadget has a coefficient of , then in the decomposition the different terms will have a weight of . Since these are phases, we are working modulo so that when each term is a multiple of and disappears:
Hence, each phase gadget corresponds to a collection of gates (the linear terms), CS gates (quadratic terms) and CCZ terms (cubic terms).
| (10.32) |
If instead of a single gadget we have a collection of phase gadgets, then we can add together the respective CS and CCZ gates of their decompositions. It turns out that such a circuit can only be equal to the identity if all the gates cancel in the trivial way. That is, each qubit should have a number of gates that is multiple of 8 since , each pair of qubits should have a multiple of 4 CS gates and each triple should have an even number of CCZ gates.
Exercise 10.5. Let be a circuit consisting of , CS and CCZ gates and suppose we have done all trivial cancellations as described above. Show that if implements the identity, that then must be the empty circuit. Hint: First consider input states with Hamming weight 1 to show using that there can’t be any gates, then consider inputs with Hamming weight 2 to show there can’t be CS gates, and finally consider input states with Hamming weight 3 to rule out CCZ gates.
From Eq. (10.31) we see that if a phase gadget involves the qubit , that then a gate appears on that qubit. If it connects to both qubits and then there will be a CS gate on those qubits, and similarly a CCZ appears when the qubits and are in the phase gadget. Hence, a collection of phase gadget implements the identity when each qubit is part of gadgets (), each pair of qubits is part of gadgets (), and each triple is part of gadgets (). We can formalise this cancellation property as follows.
Definition 10.3.1. We say a matrix is strongly 3-even when
That is, when each column, product of pairs of columns and product of triples of columns has a Hamming weight that is a multiple of respectively 8, 4 and 2. We say is 3-even when all three conditions only hold modulo 2.
When we represent a collection of phase gadgets as a set of bit strings , where means the th phase gadget is connected to the th qubit and set the matrix we see that is strongly 3-even precisely when the gadgets make a spider nest.
Being just 3-even, as opposed to strongly 3-even, is a weaker condition that means the collection of gadgets is not exactly equal to the identity, but instead are equal to some Clifford unitary: instead of the gates exactly cancelling due to them appearing a multiple of times, they only appear a multiple of times on each qubit and hence combine into gates, which are Clifford. Similarly, the CS gates appear an even number of times to create CZ gates, and the CCZ gates still completely cancel. For convenience we will also call a set of gadgets that is equal to a Clifford (and hence corresponds to a 3-even matrix) a spider nest identity. We can find which Clifford it implements by rewriting all the gadgets using Eq. (10.31). However, if we have gadgets with many legs this can involve exponentially many terms. There is also an efficient way to do it.
Exercise 10.6. Describe an efficient procedure that determines the stabiliser tableau for a circuit consisting of phase gadgets with the promise that it implements a Clifford unitary. Hint: It is diagonal, so all the Pauli ’s trivially commute through. Pushing an through the circuit however results in some additional phase gadgets that can be commuted to the end. If the unitary is Clifford these should all be decomposable into Paulis.
We can now see when a collection of phase gadgets is a spider nest identity: check whether its associated parity matrix is strongly 3-even. Diagrammatically this is not very satisfying however, as it doesn’t tell us how to find strongly 3-even matrices or what is required to diagrammatically prove all these identities. In this section we will see that we can build all spider nest identities from one particular one. To do so, we need to go back to the 4-qubit spider nest Eq. (10.26) which we arrived at by decomposing the ‘trivial’ phase polynomial into phase gadgets using Eq. (10.25). We can similarly get a 5-qubit spider nest by decomposing . This then will have different phase gadgets. Note the factor of instead of which is needed to get a collection of phases instead of phases. We can in the same way build an -qubit spider nest by decomposing , which will result in a spider nest with phase gadgets with a phase. Now, let’s take the 5-qubit spider nest, and then ‘subtract’ the 4-qubit spider nest from it:
| (10.33) |
Here on the left-hand side we first have all the phase gadgets on five qubits, but then we subtract from those all the phase gadgets that don’t act on the first qubit. On the right-hand side we are then left with precisely those phase gadgets that do involve the first qubit. These are hence the parities like , , …, . Since these parities can involve any combination of the other four qubits, there are such parities (or alternatively you can see that we subtract parities from parities in Eq. (10.33) to arrive at parities).
Exercise 10.7. Generalise Exercise 10.4 by showing that for any , the collection of all phase gadgets with phase is the identity. Use this to argue that in the identities described above where we subtract two fully connected spider nests from each other we also get an identity that only has positive phases.
It is this 16-gadget identity (10.33) that turns out to generate all the other ones. We need to do some work to see that though. The first step is to construct such collections of phase gadgets in a more systematic way.
Intuitively, this inductive definition results in a phase gadget connecting the single input wire to every subset of the output wires. For example:
where the last step follows from applying strong complementarity to the marked spider pair, and then applying spider fusion as much as possible. Let’s formalise this intuitive explanation of using scalable notation. Let be the matrix whose rows consist of all -bit strings. That is, the matrix defined inductively as follows:
where and are respectively the column vectors of all ’s and all ’s. For example, we have:
| (10.35) |
Proof. First, note that:
| (10.37) |
Using this equation and the scalable rules, we can prove (10.36) from (10.34) by induction on :
Now, we can put in a circuit and it can represent exactly the 16-gadget 5-qubit identity (10.33):
| (10.38) |
By plugging into all the inputs and yanking the first output to be an input we can present this in a slightly simpler way:
| (10.39) |
We call this the (S4) rule, and it is enough to prove all the spider nest identities. First, let’s note that because disconnects, all the for also disconnect.
In Eq. (10.38) we used to represent the spider nest consisting of all gadgets connected to the first qubit. By using for and using Lemma 10.3.5 we see then that the collection of all gadgets on qubits that are connected to the first qubit is also an identity. We can generalise this to the set of all gadgets connected to the first qubits. To do this, note that if we connect to an X-spider on the left that we obtain the following:
| (10.41) |
where is the matrix where every entry is 1. Hence, to represent a connection to the first qubits, we can just compose with an X-spider on its inputs. This then also leads to an identity:
| (10.42) |
Note that the special case of gives us the set of all phase gadgets on a set of wires, like the 15-gadget identity (10.26) (in that case has 16 rows, but the all-zero row of corresponds to an unconnected phase that can be removed as a scalar). As it turns out, every spider nest identity can be decomposed into a composition of the identities of the form (10.42), and so (S4) indeed suffice to prove all of them. To show this we will need some more machinery however.
In the previous sections we saw that we can reduce a collection of phase gadgets to a series of bit strings denoting the connectivity of the gadgets:
Here we are ignoring the phase gadget with a phase, as it is Clifford. We saw in Eq (10.19) that these bit strings can be stored in one big matrix and in this way we can efficiently write down a collection of gadgets in scalable notation. Storing them all in a matrix is however not the only way in which we could capture the information of a set of bit strings. We could instead represent them using its Boolean indicator function. That is, to a set of bit strings we associate the function defined by iff . We should note two important details about representing a collection of gadgets by its indicator function. First, this representation cannot deal with repeated gadgets / bit strings, and so this does not capture the exact phase of the gadgets (whether it is or for instance). This means that when representing a collection of gadgets by its indicator function that we only represent it up to some Clifford information. Second, because corresponds to a phase gadget not interacting with any qubit, we don’t care about the value of the function at this value. It would hence be more accurate to write the functions as , but we will ignore this detail for now. Any Boolean indicator function corresponds to a collection of phase gadgets: a gadget with connectivity is in the collection if . Some collections of gadgets are spider nests, so let’s call a spider-nest function when its corresponding collection of gadgets forms a spider nest (up to a possible Clifford unitary). I.e. if a collection of bit strings forms the rows of a 3-even matrix, then is a spider-nest function. The indicator function of the 4-qubit spider nest of Eq. (10.26) is the constant function that always returns 1, since every phase gadget is part of the spider nest. The 5-qubit spider nest of Eq. (10.38) that contains all the gadgets using the first qubit has as indicator function that maps , since any bit string is part of the set of gadgets if . The spider nest of Eq. (10.42) that uses all gadgets touching the first qubits has indicator function , which acts as . These spider-nest functions are examples of Boolean monomials. A monomial is a function constructed by multiplying simple bit-indicator functions together. For instance . We call the number of bit-indicator functions in such an expression the degree of the monomial. For instance the spider-nest function has degree . It turns out that in general, any monomial of degree at most corresponds to a spider-nest identity. Indeed, for is a spider-nest function. By permuting the qubits these give us all monomials of degree at most . The number 4 in comes from the fact that the smallest spider-nest identity, corresponding to the Boolean function acts on four qubits. Note that the spider-nest functions form a linear space: suppose that both correspond to spider nests. Then if we take the composition of all the phase gadgets with parities in and that of we end up with a new set of phase gadgets covering all the parities in and of . However, the gadgets that are in both and will fuse and hence get a Clifford phase. We are ignoring the Clifford unitaries, so we see then that the collection of non-Clifford phase gadgets corresponds to the symmetric difference . The indicator function is then . As this XOR of functions is just the sum in , we see that any sum of spider-nest functions is again a spider-nest function, so that the spider-nests form a linear subspace of all Boolean functions. In particular, we can take a sum of monomials that are all spider-nest functions and create a Boolean polynomial that is a spider-nest function. Any Boolean function can be written in a unique way as a polynomial where are the coefficients that determine which monomials are in the decomposition of . The degree of a polynomial is then the maximum degree of its monomials. As a sum of spider-nest functions is still a spider-nest function, we then see that any Boolean polynomial of degree at most is a spider-nest function. What about the converse? Does any spider-nest function correspond to a degree at most polynomial?
Lemma 10.3.6. A matrix with columns is 3-even if and only if its indicator polynomial is of degree at most .
Proof. Let be a matrix obtained from by removing all repeated pairs of rows. is 3-even if and only if is, and both matrices have the same indicator polynomial. Hence, we can assume without loss of generality that has no repeated rows. Write for the indicator function of . Let denote the set of -bit polynomials of degree at most . We need to show that . Define an inner product on Boolean functions by . We call and orthogonal when and we define as the space of functions that are orthogonal to all functions in . We claim that . With this claim it then remains for us to show that . Let’s do that first. Let . This is then a polynomial with iff and . Hence is equal to the Hamming weight of the th column of . This also works for products of columns: for , we have equal to the Hamming weight of the element-wise product of the and th rows, which is hence zero mod 2 because of 3-eveness. Now so that is orthogonal to all degree-3 monomials . These span , and hence as desired. Now to prove the claim first note that if a polynomial of variables has degree less than , then . This is easy to check for monomials, as any monomial of degree must omit some variable , so that
By -linearity of the map this then holds for all polynomials. Now, for any polynomial of degree at most and of degree at most , has degree at most . Hence . This implies . To show this is actually an equality, we will do dimension counting. Note that for a -vector space and we have because the dimension of is restricted by independent linear equations specified by a basis of . Since has the monomials of degree as its basis, . By manipulating binomial coefficients, we can then see that:
so that and these spaces must be equal. □
We then see that we have proven the following.
Theorem 10.3.7. Let describe the connectivities of gadgets with a phase acting on qubits. Then the following are equivalent.
Theorem 10.3.8. The Clifford ZX rules plus the (S4) rule suffice to prove all spider-nest identities. That is, given any collection of gadgets that implements a Clifford unitary, we can rewrite this into this Clifford unitary using just the regular Clifford ZX rewrite rules together with (S4).
Proof. Let be the 3-even matrix describing a spider-nest identity. Then its corresponding indicator polynomial is a sum of monomials of degree at most . Let the corresponding matrices of these monomials be . We have already shown how to prove the spider-nest identities corresponding to the in Eq. (10.42), hence we can freely introduce them into the circuit of gadgets described by :
Then, the indicator polynomial of is . Hence, every row in appears an even number of times. Using gadget-fusion, we can therefore reduce all angles to integer multiples of . Hence, the entire diagram is then Clifford and can be rewritten into a Clifford circuit. □
We see then that when we restrict to just thinking about what we can do with diagrams, all the complexity of strongly 3-even matrices and degree Boolean polynomials reduces to just adding (S4) to the Clifford rules. Do note though that in the proof above we needed to know about Boolean polynomials and its decomposition into low-degree monomials in order to find which rewrites we should be applying to prove the spider-nest identity. In addition, the matrices corresponding to the monomials might contain exponentially many rows, and hence this rewriting is not efficient. In fact, it seems very likely that an efficient rewrite strategy for spider nests should not exist (see Exercise 10.9). Forgetting about all these details again, we can see that we can rephrase this result into a completeness result, which very neatly ties in some of the earlier completeness results we have seen.
Theorem 10.3.9. The Clifford rules plus (S4) are complete for CNOT+ circuits. That is, given two CNOT+ circuits and written as ZX-diagrams, we can rewrite into using just the Clifford rules and (S4).
Proof. Note that we can trivially rewrite to by consecutively introducing pairs of cancelling gates from and . Hence, if we can show that can be rewritten to the identity we are done. In Section 7.1 we saw how we can write any CNOT+Phases circuit into a layer of phase gadgets followed by a CNOT circuit. Now since , it must be the case that both the phase gadget part and the CNOT circuit implement the identity. Hence, we can use the completeness of phase-free ZX (Theorem 4.3.6) to rewrite the CNOT circuit into the identity and Theorem 10.3.8 to rewrite the collection of phase gadgets, which necessarily forms a spider-nest identity, into a Clifford. This Clifford still must be equal to the identity, and hence by Clifford completeness (Theorem 5.5.7) it can be rewritten into the identity. □
We have now seen that there is a large number of configurations of gates that actually correspond to Clifford circuits. Getting rid of non-Clifford parts of a circuit is usually a good thing, as we’ve seen that we can do a lot of rewriting and simplification with the Clifford parts of a circuit. In addition, as we will see in Chapter 11, the gate in particular is quite costly to implement in many fault-tolerant architectures, and so we want to include as few of them as possible. The spider-nest identities suggest a simple rewrite strategy to optimise the number of gates in a Clifford+ circuit. First, since spider nest identities apply to a collection of phase gadgets, and hence to CNOT+ circuits, we need to split up our Clifford+ circuit into CNOT+ subcircuits. We can view the Clifford+ gate set as consisting of CNOT, Hadamard, and . Of these, only the Hadamard is not in the CNOT+ set, and so we need to ‘split our circuit on Hadamards’. Then, pick a number of identities, preferably not containing too many gadgets and not acting on too many qubits. We want to see where in the CNOT+ circuit we can apply an identity so that it reduces the -count. This is done as described in the beginning of Section 10.3: for each identity in our list, we check whether more than half of the gadgets are present in the circuit. If this is the case, then we add all the gadgets from the identity to the circuit, which by gadget fusion makes the matching gadgets already present in the circuit into Cliffords, and adds the other gadgets as non-Cliffords. This is repeated greedily until none of the identities has an overlap of more than half with the gadgets present in the circuit. Note that if we have qubits in our circuit, then to match an -qubit spider-nest identity, we need to check different groupings of qubits to see whether the identity ‘matches’ there. Hence, as long as is bounded, the complexity of this algorithm is polynomial: . In practice, only checking spider nests with up to qubits, and using a simple heuristic based on the ‘density’ of the number of gadgets on a set of qubits, the run-time can be made quite reasonable. Of course such an algorithm is only a heuristic, and is heavily dependent on the type of identities we include in our search, and since we are applying the identities greedily, you might get stuck in a local optimum. The problem of optimisation of CNOT+ circuits using spider nests is actually related to two well-known problems in computer science, which offer interesting and useful perspectives on this problem.
We have seen that -qubit spider nests correspond to Boolean polynomials of degree at most . In addition, we have a correspondence between Boolean functions and collections of spider nests specified by their parities as . Let’s call the set of all -bit Boolean functions , and the set of -bit spider-nest functions . Now, if we naively implement the set of gadgets corresponding to by just implementing each of the gadgets in turn, then this will require number of gates. We call the Hamming weight of . It is the number of s in ’s truth table. Note that we do not include in this sum, as this corresponds to the trivial phase gadget not connected to any qubit. However, we don’t have to naively implement , because we know that all are actually free to implement: these correspond to Cliffords and hence do not require any gates. Instead of directly implementing , we can implement , which corresponds to applying the spider nest identity of to the collection of gadgets of . The cost of this implementation will then be . To implement with as few gates as possible we are hence looking for a such that is minimal. Let’s state this problem again in a slightly different way, and a bit more abstractly. We have a vector space with a specified subspace . Given a vector , we want to find the closest to , i.e. such that is as small a vector as possible (in some norm). This is known as a linear decoding problem, where is our code space consisting of code words , and is our ‘noisy message’ we are trying to decode. We will spare you the details for now, as we will have a lot more to say about linear codes in Chapter 11. In our case the subspace consists of the spider-nest functions, which we know to be all the degree Boolean polynomials. This code space actually has a name: it is the degree Reed-Muller code, and hence optimising the -count of a CNOT+ circuit corresponds to decoding the Reed-Muller code. To summarise this optimisation approach now a bit more concretely: We start with a unitary that is built out of phase gadgets with phases that are multiples of . We take its corresponding Boolean function . We then find the ‘closest’ degree polynomial such that is as low as possible (corresponding to decoding the Reed-Muller code). We then implement the circuit corresponding to by composing its phase gadgets. We know that is equal to up to some Clifford. We find the Clifford such that (see Exercise 10.6). Then is our new circuit, and this has -count . Reed-Muller codes are used a lot in practice and their decoding problem has been extensively studied. So in principle we could use such a decoder to optimise the T-count of a circuit. However, the codes that are used in practice mostly have a size, corresponding in our case to the number of qubits , that is not too large. However, we don’t want to restrict to just small , so that is a problem. Decoding Reed-Muller codes for large is believed to be a hard problem, so we don’t expect there to be an efficient algorithm to optimally minimise the T-count of a CNOT+ circuit. For optimising the T-count of general Clifford+ circuits (that are allowed to contain Hadamards), there is a simple argument to see that T-count optimisation is NP-hard.
Exercise* 10.9. In this exercise we will work towards a simple proof that determining whether a Boolean function is satisfiable reduces to -count optimisation of general Clifford+ circuits, and hence is -count optimisation is NP-hard. Let be some Boolean function, which is described as some poly-size Boolean expression consisting of AND, XOR and NOT terms. We we want to determine whether there is a such that . We have seen in Chapter 9 how we can construct , the -qubit unitary acting as , using Clifford+ gates.
Let the circuit be defined as follows:
| (10.43) |
Show that is a diagonal unitary which can be described by the following path-sum expression:
| (10.44) |
Show that if is not satisfiable or everywhere satisfiable (meaning for all ) that then is a Clifford unitary (up to global phase) and hence can be implemented with -count zero.
If is satisfiable but not everywhere satisfiable, then by definition there exist and such that and . Then it is easy to see from Eq. (10.44) that
Show that in this case is not Clifford and hence it’s -count non-zero, by finding a Pauli string such that is not in the Pauli group. Hint: You don’t have to calculate the full operator . For the right choice of it is enough to observe that maps to something that a member of the Pauli group could not.
Now, note that if we could efficiently determine the optimal -count of any circuit, then for a given we could construct and ask whether it’s -count is zero: if it is not then we know it has to be satisfiable. If it is zero, then either the circuit is not satisfiable or everywhere satisfiable. We then just check the value to see which is the case. We can hence in either case efficiently determine whether is satisfiable, an NP-complete task.
There is another way we can formalise the optimisation of the number of gates in a diagonal CNOT+ circuit. To do this, we again need to consider the multilinear decomposition of a phase gadget as in Eq. (10.31):
| (10.45) |
In particular, the action of an arbitrary collection of phase gadgets can be represented by some degree-3 multilinear polynomial
Here the coefficients can be taken modulo 8, modulo 4 and modulo . By Exercise 10.5 two collections of phase gadgets correspond to the same polynomial if and only if they implement the same linear map. Similarly, two phase gadget circuits are equal up to a Clifford when the coefficients of their polynomials have the same parities . Hence, if we don’t care about the Clifford part of a computation, we can forget that the coefficient should be taken modulo 8, and instead take it modulo 2. Similarly we can take modulo 2 instead of 4. This is nice, because this information about the coefficients modulo 2 can be captured in a single object.
Definition 10.4.1. An -dimensional binary 3-tensor is an element where we label the components of the vector by for indices . We say is symmetric when for all indices , i.e. when is invariant under permuting its indices.
We define the symmetric 3-tensor corresponding to a degree-3 multilinear polynomial by setting
for . All other coefficients of are then completely determined by symmetry. It is clear that from any symmetric 3-tensor we can also read of the coefficients of a degree-3 multilinear polynomial, which then corresponds to a phase gadget circuit. Note though that if we start with a phase gadget circuit, find its 3-tensor, and then translate it back into a phase gadget circuit, that we do lose some Clifford information in the process, and the resulting circuit is only equal to the original one up to some Clifford. Let’s look at what the 3-tensor of a single phase gadget looks like. Let describe the connectivity of the phase gadget. Then by Eq. (10.45) the corresponding multilinear polynomial has coefficients , and . That is: there is a gate on all the qubits the gadget is connected to, a on all pairs of qubits it is connected to, and a CCZ on all triples of qubits it is connected to. This means the 3-tensor corresponding to the gadget has a particularly simple form: . 3-tensors that have this form are said to have rank 1. When we have a set of gadgets , the circuit has the corresponding tensor , and hence it is a sum of rank 1 tensors. Conversely, any way to write as a sum of rank 1 tensors corresponds to a way to implement it as a series of phase gadgets.
Definition 10.4.2. A symmetric 3-tensor is rank 1 when it is of the form for some vector . For a general symmetric 3-tensor its rank is the minimal number of terms needed to write as a sum of rank 1 tensors, and we call such a sum a decomposition of .
We see then that we have the following strategy for optimising the -count of a phase gadget circuit: first find its corresponding 3-tensor. Then find a decomposition of this tensor into as few rank 1 tensors as possible. These rank 1 tensors directly correspond to an optimised set of phase gadgets which implements the same linear map as the original circuit up to a Clifford. Find what Clifford this is using the procedure of Exercise 10.6. The resulting -count of the circuit is exactly equal to the rank of the decomposition we found. In particular, determining the optimal -count of a given phase gadget circuit is equivalent to determining the rank of a symmetric 3-tensor. Unfortunately, determining the rank of a (symmetric) 3-tensor is believed to be a hard problem. Fortunately, there are some good heuristics that try to find low rank decompositions that work well in practice. We will say a bit more about these in the References of this chapter.
We saw way back in Section 3.3.1 that we can implement a gate by injecting the magic state into the circuit:
This is useful, because it is sometimes difficult to directly implement the gate (as we will see in the next chapter), and instead having the ability to prepare magic states ‘offline’ and inject them when needed is preferable. However, in this circuit we consume the magic state when we inject the gate. Wouldn’t it be nice if we could preserve this state instead? This does turn out to be possible, if we use a different injection circuit that contains some other non-Clifford gates. In particular, using a CS gate and a single state, we can apply a gate and get the starting state back:
| (10.46) |
Here we used the decomposition of the CS gate written using an H-box (Section 9.2) into elementary gates:
| (10.47) |
Just using Clifford operations and CS gates, it is not possible to construct a gate. We can see this, because the matrices produced by the Clifford+CS gate set have entries in a ring that does not include . However, with Eq. (10.46) we see that as soon as we have just one state available to us, we can use gates to apply as many gates as we want:
This is an example of what we call catalysis: a process that needs some resource to be present, but doesn’t consume that resource. In this case the resource is and the process is the implementation of a gate using a CS gate. Another example of catalysis is using a magic state, Clifford operations and a single gate in order to get 3 states out:
| (10.48) |
So again, if we can perform CCZ and Clifford gates and have just a single available, then we can inject as many gates as we want. There are three things we can do with catalysis of magic states that we will explore in this section. First, in some cases it turns out to be easier or cheaper to perform a CS or CCZ gate than a gate, and then these methods allow us to save resources. Second, they allow us to prove that some gate sets are already computationally universal, even if they are not (obviously) approximately universal for unitary synthesis. And third, catalysis gives us a nice way to extend complete graphical calculi to larger domains.
In this section we will see how catalysis can be used to derive an efficient way to implement small angle rotations. To do that we first need to generalise Eq. (10.46) to allow us to implement controlled-phase gates. To see how this works it will be helpful to first write Eq. (10.46) in circuit notation:
| (10.49) |
Here we wrote a slightly more general circuit where we replace the and controlled- gates with and controlled- gates. As a shorthand we write as a generalisation of . Since this is a circuit equality that holds on the nose (with a correct global phase), it should continue to hold when we add additional control wires:
| (10.50) |
We can prove this is correct using some H-box rules (see Figure 9.0):
Exercise 10.10. Applying Eq. (10.50) to implement a CS gate requires using both a Toffoli and a CCZ. It turns out there is a different construction that only requires a single CCZ and some real Clifford operations (those that do not contain numbers with an imaginary part like CNOT, Hadamard, CZ, ). Prove the following identity:
Hint: First prove the identity for , and then show for how the phase can be pushed through the circuit to cancel
Because we can apply catalysis equally well to controlled phases, we can start iterating the procedure producing bigger and bigger controlled-phase gates, where the phase being controlled is also increasingly large. For instance, if we want to implement a gate, we can do the following:
| (10.51) |
Here in the last step we are left with a controlled operation. But since this does not do anything and we can remove it. So at this point we can stop the iteration of the catalysis. We see then that we can implement a gate just using multiple-controlled Toffoli gates, if we have the right catalysis states lying around. This procedure works to implement any gate: we then get a ladder of Toffoli gates. We have actually seen such a Toffoli ladder before: in Exercise 9.15 we saw that this is actually a controlled-decrementer circuit that decreases the value of an -bit number by , controlled on the top wire. In that same exercise we saw that we can build a subtraction circuit if we make a ladder of these controlled-decrementers. For this reason, when we apply a subtraction circuit to a collection of catalysis states, this implements phase gates on on the top qubits:
| (10.52) |
The reason this is nice, is because in Section 9.5 we found a very efficient construction of the adjoint of the subtraction circuit: the adder. So if we can transform Eq. (10.52) slightly, so that it uses an adder instead of a subtracter, this gives us a way to efficiently implement a whole collection of phase gates at once. The way we do this is by taking Eq. (10.52) and composing both sides on the right by , and on the left by . After cancelling with the adjoints we are then left with the following equation:
| (10.53) |
We showed the construction here for 3 bits, but this works for any number of bits , in which case the smallest phase we implement is . While this is nice and all, this might not seem immediately useful: we have this pattern of phase gates appearing in parallel, where we have a small-angle phase gate, a slightly-large-angle one, and so on. You might wonder, surely it will not happen often that we can use this exact pattern of phases in a real quantum circuit, and you would be wondering right. However, with the magic of ancillae we can pick and choose exactly which phases we want to appear and where. We can transfer the application of a phase gate to a zeroed ancilla:
| (10.54) |
Now when we have a complicated phase, we can decompose it into simple components, and put each of these onto its own ancilla. Suppose for instance we want to implement the phase . We can then write bitwise as so that . We can then put each of these component phases onto their own ancilla to get:
| (10.55) |
We have here also sneakily added a zeroed ancilla that gets a applied that does nothing. We need this qubit though to complete the pattern: we see then that we get the right shape needed to use Eq. (10.53). However, note that Eq. (10.53) has adjoint phases, instead of the actual phases we need. There are multiple ways we can deal with this. One way is to realise that for phase gates, the adjoint is the conjugate: . Hence, if we take the conjugate of both sides of Eq. (10.53) we do get the right phases. Since the Adder is a real matrix, this stays the same, but the states needed for the catalysis also flip: . We then have everything we need to produce the circuit we are after:
| (10.56) |
Well, that certainly seems like overkill. Why would we go through all this trouble just to prepare a single phase rotation. Well, it turns out that in a fault-tolerant setting we can’t just go and do whatever gate we want to do. We are restricted to just a small set of gates we can (cheaply) implement. So if our computation requires us to do some phase rotation on a weird angle, we have to find a way to this with the gates that we have access too. One way to do this would be to approximate the phase rotation by combining together unitary Clifford+T gates like in Section 10.1.2. But as we have now seen, another way to do it is to prepare just a single copy of each state to serve as a catalyst which can be reused, and then apply some CNOTs and an adder. Because the catalysts can be reused, the asymptotic cost of this procedure is just the cost of the adder and the CNOTs. Let’s calculate more accurately what the cost then is. Suppose we want to implement a phase gate with angle up to a precision . We then find the smallest such that . We can then approximate by a phase where is an integer such that . It hence suffices to implement instead. Since is an -bit number, we can implement the gate using a generalisation of Eq. (10.56) to bits. We saw in Section 9.5 that we can implement an -qubit quantum adder using Toffoli gates. In a fault-tolerant architecture the implementation of these Toffoli gates is what dominates the cost. As we have , and hence we can express the cost also in terms of the error budget, and say that we require Toffoli gates. Decomposing each Toffoli with gates we see that we can equivalently say that the cost is gates.
Using catalysis we can replace any occurrence of some gate (like) with some other gate (like CS), as long as we have access to some special catalyst ancilla state. We can use this idea to prove that certain gate sets are also universal for quantum computing. In this section we will demonstrate this idea by showing the Clifford+CS gate set is universal. This notion of universality we will use is however not the approximate universality like that of the Clifford+ gate set we demonstrated in Section 10.1.2. Instead it is what we will call computational universality. Approximate universality requires that we can approximate any unitary and hence quantum circuit arbitrarily well. But such a strong condition is actually not needed for a gate set to be useful. It is sufficient if we can just simulate the run of an arbitrary quantum circuit using some runs of a quantum computer using the restricted gate set. Let’s work through an example to make this more clear. For this section we will say that the purpose of a quantum computer is to estimate the expectation value of some observable . We start with some state , apply some unitary to it, and then do measurements and post-process these measurements to get an estimate of . After many such runs we will get a close approximation of . Mathematically we can represent the exact expectation value as:
| (10.57) |
However, when we are trying to estimate this observable, we don’t have to do this with just a single quantum circuit we run over and over again. Instead we could have a collection of different quantum circuits (potentially acting on a different number of qubits), input states , and observables , such that taking a particular weighted average gets us the outcome we are after:
| (10.58) |
where here we define to be the expectation value of with respect to and . We see then that if we can estimate each of the , that we can also estimate itself, by just summing our estimates like . This might seem like quite a hypothetical situation, so let’s give a concrete example. Suppose we have a Clifford+T circuit applied to the input state . Then we can transform into a circuit containing just Clifford gates and CS gates using catalysis, so that . If we were trying to estimate the observable we can then check that:
| (10.59) |
So instead of just running the circuit , we can run , which doesn’t contain any gates. This is then an example of Eq. (10.58) where the sum is over just one term and we have , , and . But now suppose we don’t even want to use that single we need for the catalysis. What we can do then is decompose this magic state into a sum of Clifford states. Because each term in the sum needs to retain the form of an expectation value like (10.57), we can’t just decompose into pure states , instead we need to decompose into a sum of density matrices. One way to do this is the following:
Hence, we can decompose into four Clifford states , , and with weights and . Starting with the left-hand side of Eq. (10.59) we then have:
| (10.61) |
We see then that this is a case of Eq. (10.58) with and and for all . While we can reduce the calculation of an expectation value to the calculation of a sum of (potentially simpler to calculate) expection values in this way, there is an important issue here that we have however glossed over. We can only ever estimate the expectation value, not get an exact value. Generally, we want to determine an error budget for how close we want the estimate to be, and then that determines how many times we need to sample from the quantum computation. Since we are summing together different expectation values, we need to be careful that we aren’t blowing up the error in the estimates. Suppose for instance that some . Then a small error in our estimate of will blow up by a factor of a 100. On the other hand, if , then any error will also be decreased by a factor of a 100, so that even a large error is not that important. In you want to be efficient and not over-sample a given expectation value, so that we get its estimate at just the right target precision we then need to sample a number of times proportional to . We can calculate that this summing approach gives a total overhead in the number of samples of compared to just determining the desired expectation value with the original circuit. For instance, in the above example where we decomposed into four terms, we have . Hence, if we decompose the magic state in this way we need to collect samples more than we would have needed to if we did use the magic state directly. Summarising the full procedure we see then that we need to do the following:
Start with the Clifford+T computation you want to calculate.
Replace all gates by a CS gate catalysis circuit using a .
Replace the state needed for all the catalysis by the Clifford states .
Run each of the resulting four circuits a number of times proportional to .
Combine the resulting estimates of the observable by scaling by to get the final outcome.
When we have Clifford gates and CS gates, the gate set is generated by CNOT, Hadamard, and CS. Of course, CNOT can be constructed using CS and Hadamard, and if we allow states to be prepared into and , then we can also prepare an using a CS. Hence, this gate set is equivalent to just the CS and Hadamard gate. We see then that we have proven the following.
Theorem 10.5.1. The CS+Hadamard gate set is computationally universal. In particular, a Clifford+T computation can be simulated by a CS+Hadamard computation with a linear overhead in the number of samples, qubits and gates needed.
Remark 10.5.2. Our decomposition (10.60) of is a stabiliser decomposition, a concept we also looked at in Section* 7.8.1. But as we saw here, the simulation overhead was not based on the number of terms in the decomposition, but rather on the weight . This value is known as the stabiliser extent, or equivalently, the robustness of magic of the decomposition. Without using any catalysis, we could have chosen to write each of the gates as a magic state injection, and then replace each of the states by its stabiliser decomposition. When we do this however, the stabiliser extent scales as where is the number of gates, so that the simulation overhead becomes exponential in the number of non-Clifford gates. We expect such an exponential dependence, since replacing all the gates gives us a Clifford circuit, and we don’t expect this gate set to be computationally universal. Note that this however does give us a classical simulation technique: write a Clifford+ circuit as a Clifford circuit where each gate is replaced by a magic state injection using the Clifford states, and then efficiently classically simulate each of these Clifford circuits. The cost of this method is then roughly where is the number of qubits, and is the total number of gates in the circuit.
Remark* 10.5.3. We haven’t actually given a formal definition of what ‘computational universality’ really is. There are multiple ways we could define it that all differ in the details. One particular way we could define it, which we could also call ‘BQP-completeness’ is as follows: for a given gate set define the complexity class as the types of decision problems that can be solved with high probability by a quantum computer just using gates from . Then is BQP-complete if . That is, if a classical computer that can query a quantum computer using gates from can solve the same problems in polynomial time as a universal quantum computer.
Because we can also catalyse gates using a CCZ gate, we can also prove a version of Proposition 10.5.1 for the Clifford+CCZ gate set, showing that Clifford+CCZ is also computationally universal. In fact, we can restrict the gate set a bit more, as Toffoli+Hadamard is itself already computationally universal. This however requires a different argument then we have been using here, and hence we leave this for the advanced section 10.7.4.
We saw in Theorem 10.3.9 that the (S4) spider nest rule combined with the standard ZX rules we have been using throughout the book is enough to get a complete set of rules for CNOT+ circuits. However, this extended rule set is not enough to prove all identities of ZX-diagrams in the -fragment. For instance, a simple identity that cannot be proven is the following rule known as supplementarity:
| (10.62) |
Formally showing that this cannot be proven using the ZX+(S4) rules is difficult, but intuitively we can see this because none of the standard ZX rules have any special behaviour for phases, while all the spider nest identities only deal with at least 15 phases, so there is nothing that says anything about the pair of phases here. Finding a rule set and then proving it is complete is usually a difficult challenge. However, using catalysis it turns out we can make our lives much simpler, and simply extend an existing complete calculus. Recall that in Chapter 9 we introduced a new type of generator for ZX-diagrams we called H-boxes. In particular we found a number of rewrite rules for phase-free H-boxes in Section 9.2.2, summarised in Figure 9.0. We remarked there that when we restrict to spiders with and phases, together with phase-free H-boxes, that these rules give us a complete calculus for postselected Toffoli-Hadamard circuits. Such diagrams correspond to a specific type of matrices. Namely, all such matrices are of the form where is an integer matrix. Hence, the matrix entries are from a subset of the ring . As these matrices are just integer matrices up to a global scalar of we will call this the fragment. Note that in particular there are no complex numbers in this calculus. Using catalysis we can make the set of matrices this calculus can represent larger in a ‘controlled way’ where we can also see which rules we need to add to preserve completeness. To see how this works, we want to first generalise the gate catalysis of Eq. (10.46). First, as our goal will just be to produce states, we can plug into the top wire of Eq. (10.46). We can then simplify the expression to a more symmetric form:
| (10.63) |
We can then identify the underlying reason this catalysis works. It is because:
| (10.64) |
Here we suggestively used Eq. (9.15) to write the phases as H-boxes. We do this because such a rule doesn’t just hold for an H-box with a label that is a complex phase like , it in fact holds for any complex :
| (10.65) |
This then allows us to write down a generalisation of Eq. (10.63) to arbitrary H-boxes:
| (10.66) |
Here we wrote in the 2-ary H-box instead of so that we don’t have to work with square roots. When we take we get Eq. (10.63), but this works for any value. A particularly simple, but still interesting case is when . Translating this back into circuit form gives us a catalysis of states using a CZ. While this might seem trivial (it is after all provable using Clifford rewrite rules) in the context of the phase-free H-boxes it allows us to add complex numbers to the fragment. Namely, we can just add as a generator , a single-ary H-box with label to the calculus, and this in fact turns out to be sufficient to then represent any matrix where now has entries in . We will call this the fragment. However, just getting this universality was the easy part. How do we know what new equations to add to this calculus to make it complete again? Generally, proving completeness is very difficult, as you first need to search for new equations, and then show that those equations are sufficient to prove all true equalities. However, as it turns out, adding the rule Eq. (10.66) for to the already existing rules for the label-free fragment is already almost enough to get a complete calculus for this bigger fragment which includes . To see this, let’s first consider what a generic diagram in the fragment looks like. We added the generator , so now a diagram consists of generators from the old phase-free fragment plus this new generator. These generators could all combine to give us really complicated rewrites, so we want to prevent them from combining. Using Eq. (10.66) we can reduce all these separate instances of into just one of them, reducing the complexity of the diagram. That is, given some diagram in the fragment, we can rewrite it to a diagram containing just generators from the fragment such that:
| (10.67) |
Or, as it turns out, this is possible for most diagrams in the fragment (Can you see for which ones it doesn’t work? If not, don’t worry, the authors of this book also originally forgot about this case. As we said: completeness is hard). We will talk about the failing case later, but for now let’s assume that our diagram satisfies Eq. (10.67). As a shorthand, we will write for the diagram we get when we plug into the bottom input in Eq. (10.67). So here we have . Note that . Hence, if we expand it like this we see that is equal to a sum of two diagrams: where we plugged in into the bottom wire, and where we plugged into the bottom wire: . Now suppose we have two diagrams and in the fragment and that they implement the same linear map: . We can both decompose them as described above to get . Each of these diagrams represents a matrix that is entirely real, so the only way for this equation of complex matrices to hold, is if it holds for the real part and for the complex part separately:
| (10.68) |
We then conclude that and are equal when we input either or into the bottom wire. As these states form a basis, this must then hold for any input. We can then leave this wire open and still have an equality:
| (10.69) |
Now we are in business! We have this equality as linear maps, but both diagrams are in the fragment for which we have completeness. We hence know how to rewrite one into the other. This gives us then a path to rewrite the original into :
| (10.70) |
Here each equality is now a diagrammatic equality, and with (*) we denote we are using rewrites from the original complete calculus for the fragment. We have then very easily proven completeness for this larger fragment, all made possible using a single rule about catalysis. Well..., we would have proven completeness, if it were true we could always rewrite a diagram in the fragment as in Eq. (10.67). We however made a hidden assumption: that there is at least one generator present in the diagram. When that is the case we can use Eq. (10.66) to reduce all these instances of the generator to just a single copy. But if the diagram didn’t contain any to start with, then this rule does not apply. In fact, we currently have no rewrite rules that relate a diagram containing a to one that does not contain any . This means in particular that our current rule set cannot prove the following true equation:
| (10.71) |
However, when we also add Eq. (10.71) as an additional rule, then this problem is solved and it is true that we can then always rewrite a diagram in the fragment as in Eq. (10.67): if the diagram contains at least one we can already use Eq. (10.66) to transform to the form of Eq. (10.67), and if it doesn’t we use Eq. (10.71) once to introduce one , in which case it is also in the form of Eq. (10.67).
Proposition 10.5.4. The fragment of Z- and X-spiders with and phases and phase-free H-boxes, augmented with the generator, the catalysis rule Eq. (10.66) for , and the rule is complete for matrices of the form where has entries in .
This trick for extending the calculus doesn’t just work for : it works for any complex number such that . Let’s for example take . We can then do all the steps as before, translating a diagram containing an arbitrary number of the H-box with label into a diagram in the fragment which just requires a single input of the H-box: . If we then have an equality between two diagrams and in the fragment, we get . Because each of the component diagrams only contains integers, this equation can only hold if the integer part and the part hold separately. We hence again get two equalities and , which allows us to conclude that with the wire left open. We can then use a modified version of Eq. (10.70) to conclude that we have completeness. We also have a modified version of Eq. (10.71) that continues to be true: . Adding the catalysis rule and this scalar rule then gives us a complete calculus for the ring . Although this covers many possible extensions, it does not cover one we care about: extending it with a gate. This is because taking we see that . However, it turns out we can just iterate the catalysis procedure to get larger and larger calculi. Starting now with the calculus for the fragment, we can add the H-box with the label and add its catalysis rule. When we then go through the motions of the completeness proof again we will end up at the equation , where now each of the diagrams has entries in instead of . Luckily for us, is still ‘independent’ of the entries of so that again the only way for this equation to hold is if it holds for each component separately, so that the proof goes through without change. Since , this calculus can represent arbitrary matrices with entries in the ring .
Theorem 10.5.5. The fragment of Z- and X-spiders with and phases and phase-free H-boxes, augmented with H-boxes with a label of and and the catalysis rule Eq. (10.66) and scalar rule for and is complete for matrices with entries in the ring .
Exercise* 10.11. Two copies of a H-box can be used to represent an H-box with an label. So instead of adding the generator, we could only add the generator. We then have to modify the catalysis rules: instead of having two separate ones, we need to stack them together into a single one. Find a modified catalysis rule that works in the label-free ZH-calculus augmented with just an H-box with a label of and find which other rules you need to get completeness.
Exercise* 10.12. Prove supplementarity (10.62) using the phase-free H-box rules and the catalysis rules.
The Clifford+ gate set is approximately universal.
In particular, it can exactly represent unitary matrices with entries in the ring .
We can efficiently approximately synthesise single-qubit unitaries over the Clifford+ gate set. To achieve a precision of requires number of gates.
The scalable ZX notation allows us to represent large collections of parities as a single diagram. This is especially useful in representing large collections of phase gadgets.
Certain collections of phase gadgets with phases that are multiples of correspond to Clifford unitaries or the identity. We call such collections of phase gadgets spider nest identities.
A collection of gadgets represents an identity if its corresponding parity matrix is strongly 3-even, meaning that the Hamming weight of every column is a multiple of , of the product of every pair of columns is a multiple of , and the product of every triple of columns is a multiple of . The collection of gadgets is a Clifford if it’s parity matrix is 3-even, meaning that the three previous conditions only hold modulo 2.
We can instead represent a collection of -qubit gadgets by its indicator function. If this is a polynomial of degree at most , then it is equal to a Clifford.
Using this representation we can show that the standard ZX rewrite rules plus one additional rule (S4) suffice for completeness of CNOT+ circuits.
Optimising the number of gates in a CNOT+ circuit is equivalent to decoding a Reed-Muller code, or equivalently to finding a rank decomposition of a symmetric 3-tensor.
We can relate gate sets involving the CCZ, CS or gate together using the framework of catalysis, where we can interchange the role of one gate with another using a resource state that we call a catalyst. This catalyst is not consumed in the process and hence can be reused.
Using catalysis we can find an efficient way to implement small-angle rotations, prove the computational universality of a gate set, and prove completeness by extending other complete rule sets.
In this section we will take another look at the exact synthesis algorithm for Clifford+ unitaries described in Section 10.1.1, but now we will consider multi-qubit unitaries and fill in the number theory details. As we saw in that section, when we understand how to synthesise a state, an algorithm for synthesising a unitary follows easily, so let’s look at synthesising states first. So let’s suppose we have a normalised vector . Our task is to find a Clifford+ unitary such that . Writing we can represent the vector components as where . In Lemma 10.1.2 we saw that if these coefficients are integers, that then all the entries except one must be zero and hence is a unit vector. For a single-qubit state, the only possible unit vectors are and , so if we got we just apply an gate to get it to be the we want. However, now the state can be any up to global phase. We can map this to by applying gates wherever . While this is fine if we are synthesising a state, this messes things up when we are synthesising this state as part of a bigger unitary synthesis routine where we care about many columns being sent to the right location. In that case we need to apply the appropriate 2-cycle classical gate (see Section* 9.7.2 for how these can decomposed into Clifford+ gates), to transform into , or whatever basis state we need it to be. We can also get rid of its phase by applying the 1-level gate that adds a phase just to the state. Because these 2-level and 1-level operations only change the basis states we are interested in, they do not mess up any of the other columns of the unitary we are synthesising. So as in the single-qubit case, if all the components in the vector are integers we are essentially done. We then just need to find a strategy to make the vectors be ‘closer to being integers’, i.e. elements of . The obvious metric for how far an element in is from being an element in is the smallest power of we have to multiply the element with to get an integer. However, this turns out not to be the best choice. This is because is not a prime number in . The ‘magic number’ we saw in Section 10.1.1 is a prime in .
Definition 10.7.1. Let be a ring and . We say is a unit if there exists such that . We instead say is prime if is not a unit or , and if for any decomposition with we have that either or is a unit.
Example 10.7.2. In the only units are and , while in any field, like , every non-zero element is a unit. The primes of are precisely the prime numbers and their negations (since if is prime, then multiplying by any unit gives you another prime). In examples of units are , because , and , because .
We can prove is prime in by defining a new kind of norm on .
Exercise 10.13. On the ring we have a norm given by . This norm has some nice properties, namely that it is multiplicative, , and that it sends elements to positive elements of . We can define a different norm with similar properties on . For define the conjugate to be , and then define the new norm by .
Show that the conjugate on is multiplicative: . Use this to show that the norm is multiplicative.
For define . Argue that is also multiplicative, and that it maps all to positive integers.
Show that is a unit of if and only if . Hint: For the if direction the definition of the norm already gives you the inverse of .
Show that is prime if is prime in .
Calculate , and , and conclude that is prime, but the others are not.
So now we know that is prime while (and hence ) is not. But it turns out that is also a prime factor of (and hence ).
Write and as for some integers .
Using the fact that , write as for some integers and .
Define the unit . Show that .
So we see that can be decomposed up to units into two copies of , and hence can be decomposed into four copies. Hence, instead of considering the smallest power of we have to multiply a number in with to get something in our integer ring , we instead consider the smallest power of , as this is a more finegrained metric. For a , we call the smallest such that the least denominator exponent (lde) of . For a vector of values , we call its least denominator exponent the smallest such that . Of course if the lde of a vector is , then it already consists of elements in , and we know that such a normalised vector must be very simple. So if we can just find some procedure to iteratively reduce the lde to , then we are happy. The goal then is to find, for a given with lde , a set of unitaries such that has lde at most . Then we could just repeat this procedure until we get to denominator exponent 0. Given a with lde , we can define the vector . After making some modifications to by applying gates to get a , we are interested in whether this modification has reduced the lde. In order to see when this is the case, we hence need to know when we can divide by , and still get a vector in . Of course, when we start caring about divisibility by some number, we will need to talk about calculating modulo this number. So in the same way as we have been talking about parities, which are elements of modulo , now we are going to work with residues, which are elements of modulo . For elements we will write to denote that for some . For instance, in the exercise above we saw that , and hence . It is not hard to see that is an equivalence relation, and that it is preserved by addition and multiplication: if and , then and . We then also have .
Proof. We have , and as , we calculate then that
Hence for any we have , so that . Since furthermore , we see that hence the residue of an element modulo is either or . □
Given some our goal is to apply operations to to make all the components divisible by , and hence have zero residue. The components with residue are then the ‘obstacles’ we want to get rid of.
Lemma 10.7.4. Let be a normalised vector with lde , so that . Then there are at least 2 components and of that have residue .
Proof. is divisible by iff for all . But assuming that was the lde of , then by definition it won’t be divisible, and so there will be at least one with non-zero residue. By normalisation of we have and hence . Since residues are either or , we then know that there are an even number of cases where . The residue is multiplicative, so for these we must then also have . Hence, if is not divisible by there must be at least a pair of elements and that each have non-zero residue. □
The fact that non-zero residues come in pairs is good, because it turns out we can only reduce the residue of elements of in pairs. We are working with Clifford+T gates. The CNOT, and only contain non-zero elements that are units in and have at most one non-zero entry per row, and hence applying these gates does not affect the residues of the state. The only gate then that can affect residues is the Hadamard . We see that the Hadamard creates sums and differences of elements of the vector, which can lead to lower lde, but then it also divides the elements by , which can increase the lde. Because contains two powers of , we need to look at the vector elements modulo (the next power up), to see if applying a Hadamard will result in lower lde.
Exercise 10.15. We have . Show that any element in is equivalent modulo to an element in the set
Hint: First note that , so that we only have to deal with elements where . Then argue that when , the residue is zero, so that you only have to consider the cases where at most two of , , or are 1.
Note that from this above exercise we immediately get the following, just by checking all the possible cases:
Now we have all the tools we need to solve the problem at hand. For simplicity, let’s again first assume we are dealing with a single-qubit vector . If it is not already divisible by then we must have , since the non-zero residues come in pairs. Then by the previous lemma we have and . In other words: and for some . Then we see that if we apply a gate to this vector that we get:
where we have defined . Now comes the magic trick: we apply a Hadamard, and we use the fact that , and hence where is the unit from Exercise 10.14:
| (10.72) |
We see now that every term has at least one factor of , so we can factor it out:
| (10.73) |
Success! Because this means that is divisible by . As , this means that now has lde smaller than . We can now repeat this procedure until we get to lde , in which case we know the vector we have is a basis vector, and we are done. This just covers the single-qubit case, but reducing the lde of a multi-qubit normalised vector is done very similarly. Defining , where is the lde of , we saw that there must be an even number of elements of with non-zero residue. We can hence pick a pair that both have non-zero residue, and then apply the above technique, just ‘targeting’ this pair to zero out their residues. We can do this targetting by replacing the Hadamard and gates above with the 2-level and 1-level operators and that hence only change the residues of the elements and . The constructions in Section 9.7.2 show how to implement these gates using just Clifford+T gates. We do this reduction of lde with every pair that needs it, until all the residues are zero, in which case the modified is divisible by . This then means that the modified has lower lde. We then just repeat until the lde is zero and we are left with a basis vector. Pfew, that was a lot, so let’s summarise what we have actually done to get to the solution:
We started out with a unitary where all the entries are in the ring .
Then we realised in Eq. (10.74) that we wanted to find a unitary that reduces the first column of to a standard basis vector.
Finding such a unitary is equivalent to finding a way to reduce an arbitrary normalised vector to using : .
Instead of writing as a vector over , we write it as a vector over . We find its least denominator exponent : the smallest number such that , where . We picked as the base, since it is prime in .
We look for two components and of such that the residues . If there is such a pair we apply 2-level Hadamard and 1-level gates to zero out their residues.
If there isn’t such a pair of components left, then we have transformed to be divisible by , so that the new we found must have lower least denominator exponent.
We then repeat this procedure until has lde , in which case it is a standard basis vector up to a phase, which is easily permuted into the desired basis vector, and its phase removed by applying the appropriate 1-level gate.
We have now found our desired unitary that we can apply to to simplify its first column. Because is unitary, this means its first row must now also be simplified.
Now just rinse and repeat for all the other columns of , resulting in a series of Clifford+ unitaries such that .
We hence have .
Let’s fill in the details on how to exactly synthesise an entire unitary and not just a single state. We have a unitary with matrix entries in . Let be the first column of , and the Clifford+ unitary satisfying that we can find using the procedure described in the previous section. Then we have:
| (10.74) |
Note that here the first row also becomes a unit vector, because of the orthogonality conditions between the columns of the unitary . Now we can take the first column of the smaller unitary and synthesise it as a state again. We have to be careful to not undo the work we did with reducing the first column of , and hence we need to use 2-level and 1-level operators to only touch the elements of the matrix we want to. Repeating this procedure, we see that we get Clifford+ unitaries satisfying . Hence, we have synthesised as . All of this is a bit reminiscent of the CNOT synthesis algorithm using Gaussian elimination of Chapter 4. However, there we could encode the entire function of the CNOT circuit into an matrix, while here we are working with a matrix. Hence, even if each is a small circuit, the overall circuit synthesising might still be exponentially large. We would not expect to do any better as we are now dealing with an approximately universal gate set, and hence there are simply too many possible unitaries we can synthesise for all circuits implementing them to be small. Let’s record what we have now seen in a Theorem.
Theorem 10.7.6. Let be an -qubit unitary with entries in . Then can be realised by a Clifford+ circuit using at most one zeroed ancilla.
Proof. In these sections we have found a method to write using 2-level Hadamard and operators and 1-level operators. As described in Section 9.7.2, these can be built using gates with controls, which require a single zeroed ancilla to be implemented over the Clifford+ gate set (cf. Section 9.4.2). □
These techniques we have seen for exact synthesis are not unique to Clifford+. They work for many gate sets that can at least express the 2-level operators necessary to move elements of the vector to the place where they are needed. One particularly simple example of this is the correspondence between circuits of Toffoli, CNOT, NOT and gates, and unitaries over the ring . Since all the entries in such a unitary are integers, the normalisation of the column means that there is at most one non-zero element and that this element is . Hence, ignoring the possible phases, such a unitary is just a big permutation of the basis vectors, which we know we can realise using a Toffoli circuit (Section 9.7.1). The resulting unitary is then diagonal and only has phase. The phases can be realised by 1-level operators, and then we are done!
Proposition 10.7.7. Let be an -qubit unitary with entries in the ring of integers . Then can be realised by a quantum circuit consisting of Toffoli, CNOT, NOT, and gates, using at most one zeroed ancilla.
There are several other results like this that make a correspondence between a certain quantum gate set and the set of matrices over a given ring.
Theorem 10.7.8. Let be an -qubit unitary and let be a ring such that all the matrix entries of are in . Then can be synthesised as a quantum circuit over the gate set using at most one zeroed ancilla when:
Hence, we see that the exact synthesis of Clifford+ circuits doesn’t exist in a vacuum, but is in fact part of a ladder of increasingly more powerful gate sets corresponding to larger rings. This ladder can be continued, by replacing the gate by for some . The ring that corresponds to the resulting Clifford+ gate set is .
Using the exact synthesis algorithm for Clifford+ unitaries generally results in very large circuits. The exception is when we apply it to single-qubit unitaries, for which it is in fact optimal in the number of gates needed. As we already saw in Section 10.7.2.1, for a single-qubit unitary, the Clifford+ exact synthesis takes a particular nice form. We start with some unitary
| (10.75) |
Then we find a unitary built out of and gates that synthesises so that . Then
| (10.76) |
for some which is a multiple of . Hence we can further reduce to the identity by applying the appropriate power of . Up to some small constant, synthesising a single qubit unitary hence costs just as much as synthesising a single-qubit state . The cost of synthesising is directly related to its lde . To reduce it to lde 0 we need to apply for each reduction a Hadamard gate and some power of the gate where or . If we can see this as applying a gate and/or an gate. We hence get a sequence of , and gates, where each application of the Hadamard reduces the lde by at least 1. The total number of gates is hence at most , and the number of Hadamard gates and gates is each at most . You could wonder whether there is any way we could do better, but it turns out that this number of gates is actually optimal! Suppose we start from the state and we apply Hadamard, and gates to it. Then we end up with some state that we can write as
| (10.77) |
where is its least denominator exponent. Assuming that we see that , as is normalised. Hence . Furthermore, we necessarily have since otherwise both expressions would be further divisible by contradicting being the lde. Applying an and gate to this state doesn’t change the denominator exponent, but a Hadamard can change the lde. We calculate then:
| (10.78) |
Since , we see then that each of the components of the vector and are divisible by . As furthermore is up to some unit , we see then that
| (10.79) |
for some numbers and . Applying a Hadamard can hence only increase the lde by at most 1. So if we got some state with lde , then we know the circuit building it must contain at least Hadamards. But our synthesis algorithm requires at most Hadamards to synthesise it. Hence the lde is exactly equal to the optimal number of Hadamards needed to synthesise it. Or, well, this is almost true, because our argument above only holds if . If we have , so that is a unit vector up to some phase, applying a Hadamard increases the lde by . So the number of Hadamards is . This is actually what our synthesis algorithm above also finds if we were to analyse it a bit more carefully, because it turns out that there are no normalised vectors that have lde 1, so that in Eq. (10.73) our lde would actually reduce by 2 if we had to start with.
Exercise 10.16. Prove that there are no normalised vectors that have lde 1.
Hint: This corresponds to showing that there are no vectors
for which .
Now use Eq. (10.2) for the component norms ,
and argue that the only possible solution to
is the one where there is a single non-zero component which is in fact
divisible by .
To conclude we hence see that if we have a vector with lde , then our synthesis algorithm requires an optimal number of Hadamards to synthesise it, and up to gates. This last part is because we can have a gate after every Hadamard, but we can also have one appear before the first one.
In Section 10.1.2 we hinted at how we can do approximate synthesis of arbitrary single-qubit unitaries using a Clifford+ circuit. In this section we will fill in some more details on how this works, though some parts are a bit too technical even for this advanced section, and so we will just refer to the References for details on how to do that. So we have some arbitrary single-qubit unitary we want to approximate using Clifford+ gates. First we recall that any single-qubit unitary can be written as for some phases . Additionally, , so if we know how to synthesise gates, then we can synthesise arbitrary single-qubit unitaries. Let’s assume then that our goal is to approximate a gate for some arbitrary using just single-qubit Clifford+ gates. First, it will be useful to work with matrices that have determinant 1, so we write our and our approximating unitary as follows:
| (10.80) |
Here . Note this this means our definition of is different from the one we have been using in this book by a global phase. Note also that we wrote our approximation unitary with a factor of in front of it, instead of a power of . This turns out to be nicer for this algorithm. We will fix an error budget , and require . If we found a with this property, then we know how to synthesise it using the algorithm we described in the previous section, and this synthesis will have an optimal number of Hadamards and gates, given the lde . So our goal then is to find a within of , with as low a as possible. Given that we know how to determine whether a solution exists for a given , we can find the optimal value of by just starting low and increasing it until we find a solution. Since won’t be too big (it will be ), this will still be efficient. We can hence assume that and are fixed, and then we need to determine whether a solution exists, and if it does, what this solution is. In practice the finding of the solution will also tell us whether there is a solution, so we will focus on that. Our goal then is to find a as in Eq. (10.80) for some given and , such that for some given . In order to help us do that, we need to have a more concrete description of the norm in terms of the matrix elements.
Proof. Note first that by the unitary invariance of the operator norm. Now, is unitary, and hence has two eigenvectors with eigenvalues . Because and both have determinant 1 we have , and hence setting we necessarily have . The eigenvectors of are also eigenvectors of and have eigenvalue , so that . Recall that the Hilbert-Schmidt norm of a matrix is given by where the form an orthonormal basis. It is a standard exercise in linear algebra to show that the Hilbert-Schmidt norm is independent of choice of orthonormal basis. Choosing the eigenbasis of we see that , as both eigenvalues have equal magnitude. Instead picking , for our orthonormal basis, we calculate that by just evaluating the matrices. Putting these different expressions for the norm together, we calculate:
Now, using the fact that and , we can expand further and simplify some more:
Here denotes the real part of the complex number . So if is a solution to our approximation problem, then , or equivalently . Interestingly, this does not depend on , but only . Furthermore, if we write and , then we can interpret them as 2-dimensional real vectors and , and then is just a dot-product. We hence want to solve the following problem: given a 2-dimensional real unit-vector , find a subnormalised vector , such that , where and we allow some maximal power of depending on . We call this a grid problem, because we have some target region, the vectors whose inner product with is very close to , and a grid of points, given by , and we want to find a point on this grid that is in the target region. It turns out that this problem is efficiently solvable, although describing in full how to do so is quite lengthy (see the References of this chapter). So let’s suppose that we can solve the grid problem, and hence that we get a satisfying . This then gives us the candidate we are looking for. But we then still need to find a , so that the matrix defined as in Eq. (10.80) is in fact unitary. The only equation we need to satisfy for this matrix to be unitary is , as this means that the columns of the matrix correspond to normalised vectors. We know the expression for a norm for an element in , this is namely given by Eq. (10.2): for we have
Rewriting the equation to isolate our unknown we get But this right-hand side will now be some for integers . Writing , we can also expand the left-hand side. We then have a part of the equation that results in an integer, and a different part that results in an integer multiple of . These need to be independently satisfied, so that we can split the equation up into two parts:
This is a pair of quadratic equations over the integers, and is hence a special case of a Diophantine equation. While these are in general hard to solve, in this particular case it turns out that a large proportion of them are fast to solve. So in practice, we can try a bunch of candidates until we find one for which our solution strategy works. Once the details about solving the grid problem and this Diophantine equation are filled in, this gives an algorithm that gives an approximation of the phase gate using Clifford+T gates that uses an optimal number of Hadamard and gates. The length of the circuit scales quite well with the desired precision. For instance, we can approximate to within with a circuit containing 102 gates. This algorithm turns out to be both efficient and optimal. However, this optimality guarantee only applies when we restrict to unitary ancilla-free circuits. If we allow ancillae, measurements or classical control it is possible to do better by some constant factors. The optimal number in the unitary ancilla-free case scales as for some small constant , while the best known protocol when we allow measurements and classical control scales as (see the References of this chapter).
In this section we will show that the set of real-valued unitaries, i.e. where all matrix entries are real numbers, is computationally universal. Then we will show that in fact the restriction to just Toffoli and Hadamard is already computationally universal. First, obviously real-valued unitaries are not approximately universal as we can never approximate any complex-valued unitary (like the gate). But as we saw in Section 10.5.2 for a different gate set, it turns out we can ‘simulate’ complex-valued unitaries using a real-valued one on a larger set of qubits. For a (complex-valued) -qubit unitary , let be the real part of . That is: . Similarly define the complex part . Then . Now define the -qubit unitary via
While it is clear that is real-valued, it is not immediately obvious that it is unitary.
Exercise 10.17. In this exercise we will show that is indeed unitary for any choice of .
Express and in terms of and .
Express and in terms of , , and .
Show that . Hint: You will need .
Show that . Hint: You will need .
Conclude that is indeed unitary.
The encoding into is also compositional, meaning we can apply it iteratively to a sequence of unitaries.
Note that this construction is in fact an example of catalysis, as:
| (10.81) |
Exercise 10.19. Prove Eq. (10.81).
We can however not use the argument of Section 10.5.2 to use this to show real-valued unitaries are computationally universal as there is no way to write the catalyst state in terms of states that can be prepared by real-valued unitaries. We can however use a different argument to prove computational universality.
Proof. Suppose is an -qubit circuit built out of unitaries as . We then build the real-valued -qubit circuit by . Then note that:
Hence, if we input into and do a measurement marginalising over the first qubit we also get the probabilities:
Now, we can also, with some effort, calculate that . Hence, the probability distribution we get for is the same as that for when we prepare the first qubit in the state and ignore its measurement outcome. □
Because we can simulate complex-valued quantum circuits using real-valued unitaries in this direct manner, we don’t need all the real-valued unitaries. Given some approximately universal gate set we only need to be able to represent for . For instance, let’s take the Clifford+ gate set . Now, and are already real-valued, and it is easy to see that if is real-valued, so that it remains to see what gates we need for . Calculating its matrix, we see that it is
It is straightforward to verify that this is equal to:
| (10.82) |
We can construct the CZ gate using and so that we only additionally need the controlled-Hadamard gate.
Proof. We can use an ancilla in the state and a CH gate to create the Hadamard. Using Hadamard and we can construct the CZ gate. We can hence represent , and . Since is approximately universal, this means that we can use the gate set to arbitrarily closely simulate any unitary. □
The controlled-Hadamard is a bit of an arbitrary choice of gate. It turns out that the gate set of Hadamard and Toffoli is also computationally universal. This is a very pleasing result, because the Toffoli gate is universal for reversible classical computing, while the Hadamard gate is the single-qubit Fourier transform. So this in a way shows that the extra power of quantum computing comes from having access to this Fourier transform. The way we prove its computational universality, is by showing that Toffoli-Hadamard ‘simulates’ the CS-Hadamard gate set, which we know is computationally universal by Proposition 10.5.1. The real-valued encoding of the CS gate is equivalent to a Toffoli up to some swaps. With the Hadamard we just get . Hence, when we encode the CS+Hadamard gate set, we get the Toffoli+Hadamard gate set. We can hence do the following: starting with a Clifford+T computation, we write it as an ensemble of CS+Hadamard circuits. We then encode each of these circuits into a real-valued Toffoli+Hadamard circuit. By doing this we can efficiently simulate the original Clifford+T circuit. We see then that Toffoli+Hadamard circuits are also computationally universal.
So just using Hadamard gates and Toffoli gates we can simulate any quantum computation to any desired precision. The computational universality of real-valued quantum computing has an interesting philosophical consequence. One could wonder why quantum mechanics uses complex numbers. Why not just stick to regular old real numbers? The results of this section show that one plausible answer is in any case not a solution: we don’t need the complex numbers to reach a certain computational complexity, as we could have done the same with real numbers.
Exact Clifford+T synthesis That unitaries over the ring can be exactly synthesised over the Clifford+T gate set was proved in [107]. The presentation we use here was originally given in Seth Greylyn’s Master thesis [118].
Approximating unitaries with Clifford+T gates The first paper to give an efficient algorithm for approximating single-qubit phase gates with Clifford+T gates was given in [207], which found a T-count complexity of . This was improved to the optimal in [201]. A good reference for reading about the improvements made to approximate Clifford+T synthesis by incorporating ancillae, measurements and classical control is [154].
Spider nests That a 4-qubit phase gadget can be decomposed into the collection of all phase gadgets with at most 3 legs first appeared in [12], which hence gives the first appearance of a spider nest identity. The name ‘spider nest’ was coined in [75] which also described the optimisation technique based on toggling gadgets based on small spider nest identities. The notion of a strongly 3-even matrix was introduced in [35]. The relation between strongly 3-even matrices and -count optimisation seems to have been folklore for a number of years as the strongly 3-even matrices were mostly used in the context of quantum error correcting codes to understand when a code has a transversal gate (we will have a lot more to say about all that in the next chapter). The relation between strongly 3-even matrices and spider-nest identities was formally spelled out in [149]. In [16] it was shown that phase gadgets with dyadic angles are the only gadgets that allow non-trivial identities. If we for instance have gadgets with phases that are multiples of , then the only way they can cancel out is if they do so in the trivial way where the gadgets fuse together. See also [224] where they show that in certain settings the only optimisation we can do to ‘black-box’ non-Clifford phases is to fuse them.
T-count optimisation The idea to use small spider nests to optimise -count was introduced in [75] and later improved upon in followup work by the same authors [74]. The relationship between T-count optimisation and Reed-Muller codes was established in [16] and the relation to symmetric 3-tensor factorisation in [126]. It is the representation as a symmetric 3-tensor that is currently the leading approach in T-count optimisation, with the current best methods being [202, 228]. The NP-hardness of T-count optimisation was established in [223].
Scalable ZX The scalable ZX-calculus was formally introduced in [49], though many of the new features appeared in an informal way in an earlier preprint [51]. The CNOT+ completeness in scalable ZX of the Clifford rules plus the (S4) rule appeared in [149]. Note that Clifford+(S4) is not complete for the full fragment of ZX-diagrams where the phases are multiples of . This is because (S4) preserves the same invariant that was used in [191] to argue that the Clifford rules are incomplete for the universal fragment of ZX-diagrams. In [136] a complete rule set for ZX-diagrams with phases was given.
Catalysis The CCZ to 2 catalysis was first described in [105], while the -CS catalysis seems to be more of a folklore result. The idea of using catalysis to relate different gate sets was introduced in [13]. The synthesis of small phase gates using an adder and measurement-based uncomputation was described in [103]. The relation between catalysis, computational universality and completeness was studied by the authors of this book in [144]. This paper serves as the basis for Section 10.5.
Toffoli-Hadamard universality The original proof that Toffoli+Hadamard is computational universal is given in [208] (this also seems to be the paper that coined the term ‘computational universality’). This actually shows this property for a wider set of gates. Namely, if you have a Toffoli gate and any non-computational-basis-preserving gate, then this will lie dense in the group of special orthogonal matrices. A simpler proof of Toffoli+H universality, which is what we use, is given in [4], which also used the term ‘computational universality’. They show this easily by establishing that the gate set consisting of the controlled-S gate and the Hadamard gate maps to the Toffoli+Hadamard gate set. The approximate universality of the CS+H gate set was shown in [152, Lemma 4.6 on p. 1213]. Kitaev proves this using a ‘geometric lemma’ that adding a gate to a set of gates that stabilises a given state creates a larger-dimensional space of gates. This proof is not constructive. He then proves his Solovay-Kitaev algorithm to show how you would do it constructively (that proof only requires that the set of gates does in fact lie dense in the group). Note that what he calls is the Hadamard gate, and is the gate. In this book we instead show that CS+H is ‘just’ computationally universal, as this follows much more directly using catalysis. This proof originally appeared in [144].