11 Clifford+T

In the previous chapters, we have often seen a dichotomy between two classes of quantum computations. On the one hand, we have Clifford computations, which have a great deal of structure we can exploit to efficiently solve problems like circuit synthesis, deciding equality of computations, and strong classical simulation. On the other hand, we have looked at universal quantum computation, typically arising from adding

Z [α]

gates for arbitrary angles

α \in [0, 2 π]

. While Clifford angles, i.e. integer multiples of

\frac{π}{2}

, satisfy many identities, we have so far treated non-Clifford angles as “black boxes”, which don’t seem to satisfy any extra rules, except for trivial ones coming from the spider fusion rule like

and variations thereof. It turns out that for generic angles, this is pretty much all we can do. In fact, as we’ll explain in the References, there is a way to make this statement precise. However, for dyadic angles, i.e. angles of the form

\frac{π}{2^{k}}

for some integer

k

, there is a great deal more structure at play. As we’ll see in this chapter, we can take advantage of this dyadic structure in a variety of ways. First, we will see that it simplifies the problem of synthesising generic unitary maps using just Clifford gates and the first non-Clifford dyadic phase gate

T : = Z [\frac{π}{4}]

. We can characterise the set of unitary matrices that we can synthesise exactly using Clifford+T gates as those whose entries are all within a certain subset, called

𝔻 [ω]

, of the complex numbers. Using some special properties of this set, and a little (light) ring theory, we can figure out precisely how many gates we need to synthesise such a matrix exactly and do this synthesis efficiently. Note, here “efficient” means efficient in the size of the matrix, not the number of qubits. For generic unitaries, this is the best we can hope for. We’ll also see that for any tolerance

𝜖 > 0

, we can approximate any unitary matrix within

𝜖

with a

𝔻 [ω]

-matrix, which we can in turn synthesise exactly using Clifford+T gates. Second, we will see that for dyadic angles, new rules start to hold that wouldn’t for generic angles. In particular, certain complex configurations of phase gadgets can all cancel each other out in ways that are not implied by the gadget-fusion law we met back in Chapter 7. These so-called spider nest identities come from a particular interaction between the parity (i.e. mod-2) structure of the phase gadget itself and the mod-

2^{k}

structure of its angle. Similar to the representation from Chapter 7 of CNOT+phase circuits, we can represent phase gadgets as collections of binary vectors representing parities of qubits where phases get applied, and as before we can stick these vectors together into a matrix. We already saw in Section 7.2.2 that the scalable ZX notation allows us to directly represent these matrices in an efficient way in our diagrams. Using these matrices representing collections of phase gadgets we can also recognise which configurations of

\frac{π}{4}

phase gadgets will cancel out. Namely, it will be those whose gadget matrix satisfies a condition called strongly 3-even. In this chapter, we will work out some properties of strongly 3-even matrices and a closely related

𝔽_{2}

-linear structure called Reed-Muller codes. Finally, we will put the pieces together by classifying all the spider nest identities and explaining how they can be used for T count optimisation. Clifford+T circuits are especially important for fault-tolerant quantum computation, so in this chapter we will also be (secretly) laying the groundwork for Chapter 12, where we explain techniques for implementing universal quantum computation within a quantum error correcting code. In particular, in Section 11.4 we will see how we can relate circuits over different types of gate sets together using a technique called catalysis that allows you to perform certain ‘hard’ operations in an easy way as long as you have a suitable catalyst state lying around. Together with the classification of spider nest identities this will prove very handy for when we want to implement non-Clifford gates in fault-tolerant architectures.

11.1 Universality of Clifford+T circuits

Back in Chapter 2, we noted that CNOT gates plus arbitrary single-qubit unitaries are universal for quantum computation, in the sense that they can be used to build arbitrary unitaries over

n

qubits. In this section, we will show that, in fact Clifford+T circuits are approximately universal, in the sense that they can get arbitrarily close to any

n

-qubit unitary. One way to show this, is to show how for any single-qubit unitary

U

there exists some

U^{'}

arbitrary close to

U

expressed totally in terms of

H

and

T

gates. There are several ways to prove this. The “classic” way is to show that

V : = HT

corresponds to a rotation about some axis of the Bloch sphere by an irrational multiple of

π

. Then, by raising

V

to larger and larger powers, we will eventually land close to any possible rotation around that axis. We can do the same around some other axis, e.g. with

V^{'} = TH

, to obtain a pair of rotation gates that suffice to build any single-qubit unitary. It takes quite some work to spell out the details of this argument, and this has been done in several standard textbooks on quantum computing. There are also many variations one can use to obtain more or less efficient decompositions of single-qubit unitaries. We’ll give some pointers to where you can find all the gory details of this approach and variations at end of this chapter. However, in this section, we will start with a totally different approach to synthesis of unitaries in Clifford+T, which is based on number theory. This approach starts from the realisation that the numbers appearing in a unitary matrix built from CNOT, H, and T are not just any old complex numbers, but are actually quite special. First, lets have a look at the matrices again:

Clearly any

U

that we can construct from these gates will have as its entries sums and products of integers,

\frac{1}{\sqrt{2}}

, and the complex phase

ω : = e^{iπ ∕ 4}

. If we think about where

ω

lies on the complex plane:

we see that

ω = \frac{1 + i}{\sqrt{2}}

and

\bar{ω} = - ω^{3} = \frac{1 - i}{\sqrt{2}}

. Using these facts, it’s not hard to see that we can already build

\frac{1}{\sqrt{2}}

using just integers,

\frac{1}{2}

, and

ω

If we look at just the numbers we can build with integers and

\frac{1}{2}

, this consists of precisely the rational numbers whose denominator is a power of two. We give this set of numbers a special name.

Clearly

0, 1 \in 𝔻

, and for

q, r \in 𝔻

- q, q + r

and

qr

are also in

𝔻

. Hence,

𝔻

forms a ring. However, unlike the full set of rational numbers, not every

q \in 𝔻

has a multiplicative inverse in

𝔻

, so it is not a field. It turns out that, for the purposes of circuit synthesis, this is not a bug, but a feature, since rings can have very interesting properties owing to the fact that we cannot arbitrarily divide numbers by each other. To discuss the single-qubit synthesis algorithm, there are two relevant rings based on

𝔻

we need to study. The first is

𝔻 [ω]

, the ring obtained by allowing arbitrary sums and products of dyadic rationals with the complex number

ω

. The second is

ℤ [ω] ⊊ 𝔻 [ω]

, which is restricted just to integer multiples of powers of

ω

. Such rings are called ring extensions, i.e. rings obtained by adding one or more elements outside of the original ring and closing under sums and products. Since

ω^{4} = e^{iπ} = - 1

, we can concretely represent elements of

ℤ [ω]

and elements of

𝔻 [ω]

using quadruples of numbers taken respectively from

ℤ

𝔻

Clearly any unitary matrix built out of CNOT, H, and T will consist of elements from

𝔻 [ω]

. Perhaps surprisingly, the converse is also true: any unitary matrix whose elements are in

𝔻 [ω]

can be constructed exactly as a composition of CNOT, H, and T gates. We will show this by giving a concrete algorithm for synthesising unitaries over

𝔻 [ω]

in the following sections, then conclude with some remarks on how this lifts to an approximate synthesis algorithm for unitaries over all complex numbers.

11.1.1 Exact synthesis of one-qubit gates

We’ll begin by considering the simplest non-trivial synthesis problem we can have: namely preparation of an arbitrary single-qubit state

| ψ ⟩

. That is, we want to find a sequence of

H

and

T

to transform

| 0 ⟩

into

| ψ ⟩

. Equivalently, we can find a sequence of

H

and

T

transforming

| ψ ⟩

into

| 0 ⟩

, then take the adjoint. One way to do this is starting with a state whose entries are in

𝔻 [ω]

, and then applying gates to it until all of the entries are in

ℤ [ω]

. The following lemma should make clear why this helps us.

Proof. For any element $z = a + bω + c ω^{2} + d ω^{3}$ in $𝔻 [ω]$ , we can get an explicit form for $| z |^{2} = \bar{z} z$ in terms of $\sqrt{2} = ω - ω^{3}$ :

| z |^{2} = \bar{z} z = (a^{2} + b^{2} + c^{2} + d^{2}) + (ab + bc + cd - da) \sqrt{2} .

(11.2)

If we suppose that all the $a, b, c, d$ are integers, then the only way to have $0 \leq | z |^{2} \leq 1$ is when the $\sqrt{2}$ part here is negative. Let $| ψ ⟩ = \sum_{i} ψ_{i} | i ⟩$ be a normalised state with entries in $ℤ [ω]$ . Writing $| ψ_{i} |^{2} = n_{i} + m_{i} \sqrt{2}$ where $n_{i}, m_{i} \in ℤ$ and $n_{i} \geq 0$ and $m_{i} \leq 0$ , we have $1 = ⟨ ψ | ψ ⟩ = \sum_{i} | ψ_{i} |^{2} = \sum_{i} (n_{i} + m_{i} \sqrt{2})$ . Since we know that the total sum is $1$ , the $\sqrt{2}$ components should cancel: $\sum_{i} m_{i} = 0$ . But all the $m_{i}$ are negative, so the only way this can happen is if all $m_{i} = 0$ . Similarly, since all the $n_{i}$ are positive integers and they sum to $1$ , they must all be zero except for one $n_{j}$ which is equal to $1$ . But then exactly one of $a_{j}, b_{j}, c_{j}$ and $d_{j}$ is $\pm 1$ , and the rest are zero. Hence $| ψ ⟩ = ω^{k} | j ⟩$ for some $k, j$ . □

If we get to

ω^{j} | 0 ⟩

, mission accomplished, up to a global phase. If we get to

ω^{j} | 1 ⟩

, we simply apply an

X

gate (i.e.

H T^{4} H

) and again we are done. So, the name of the game is turning the coefficients of

| ψ ⟩

into elements of

ℤ [ω]

. One potential strategy is to factor

| ψ ⟩

as:

then apply gates to

| ψ ⟩

to try and make the coefficients

x

and

y

into even numbers. Then, we can factor out a

2

from

x

and

y

and the leading scalar becomes

\frac{1}{2^{k - 1}}

and we’ve made some progress toward getting a state whose coefficients are in

ℤ [ω]

. This does work, but it is a bit tricky to find the gates we need to apply to

| ψ ⟩

to “make progress”. We can make life a bit easier if we express

| ψ ⟩

differently, as:

where

δ : = 1 + ω

is a “special” number that has some nice number-theoretic properties that will help with our synthesis algorithm (for more details on the number theory side of things, check out the advanced section 11.6.1). First, we should be able to check that

| ψ ⟩

is indeed a vector over

𝔻 [ω]

. For this, we should check that

\frac{1}{δ} \in 𝔻 [ω]

, which is not immediate because not every element in

𝔻 [ω]

has a multiplicative inverse. The elements of a ring that do have multiplicative inverses are called units. We can see that

δ

is a unit in

𝔻 [ω]

by letting

δ^{- 1} : = \frac{1}{2} (1 - ω + ω^{2} - ω^{3})

and calculating

δ δ^{- 1} = 1

. Conversely, we would like to know that any

| ψ ⟩

with coefficients in

𝔻 [ω]

can be written in the form of (11.3). This is true if any

q \in 𝔻 [ω]

can be written as

\frac{x}{δ^{k}}

for some

x \in ℤ [ω]

and a high enough power

k

δ

. Put another way, we need to prove the following property of

δ

The smallest

k

needed to express

| ψ ⟩

in the form of equation (11.3) is called the least denominator exponent (lde). If the lde is

0

, then

| ψ ⟩

is already has coefficients in

ℤ [ω]

, so by Lemma 11.1.2, it must be a basis state, up to a phase. It is easy to see that

k

is minimal precisely when

x

and

y

are not both divisible by

δ

, i.e. when there exists no pair

a, b \in ℤ [ω]

such that

aδ = x

and

bδ = y

. We can easily check divisibility by

δ

using

δ^{- 1}

. Since

δ^{- 1} \notin ℤ [ω]

, then it is not necessarily the case that

z δ^{- 1} \in ℤ [ω]

for some

x \in ℤ [ω]

. In fact, it will be in

ℤ [ω]

precisely when

δ

divides

x

. The final piece of the puzzle is the following lemma, which lets us decrease the lde by applying H and T gates.

Since

x^{'}, y^{'}

become divisible by

δ

, we can factor a

δ

out and reduce the least denominator exponent by 1. If we repeat this process over and over, eventually the lde will be

0

, so by Lemma 11.1.2 it will be a basis vector. The reason why Lemma 11.1.3 works has to do with the special behaviour of

δ

within the ring

ℤ [ω]

. Namely, if we start computing numbers modulo

δ

(or powers of

δ

) in

ℤ [ω]

, we will see that

x

and

y

can always be broken down into a particular form that tells us how many

T

gates to apply to make progress. To understand this, we will need to define the notion of a residue class, which lets us formalise what it means to work “modulo” some element of an arbitrary ring. We will do this and provide a proof for Lemma 11.1.3 in Section 11.6.1. However, if we believe the lemma, we now have a synthesis algorithm for qubit states. In fact, this also already gives a synthesis algorithm for single-qubit unitaries. This is because the columns of a unitary matrix must be orthogonal, so if we send the first column to a basis vector, the second column will also get sent to a basis vector, up to a phase. So after transforming the basis vector to the correct positions and phases, we will have synthesised the entire single-qubit unitary. We summarise this procedure in Algorithm 4.

Example 11.1.4. Consider the following unitary over $𝔻 [ω]$ :

U = \frac{1}{2} (\begin{matrix} 1 - ω^{3} & - 1 + ω \\ - ω^{2} + ω^{3} & - 1 + ω^{3} \end{matrix})

We need to first express $U$ in terms of a $ℤ [ω]$ matrix multiplied by $\frac{1}{δ^{k}}$ , where $k$ is the lde of $U$ . This can be accomplished by multiplying $U$ by $δ$ repeatedly until we get a matrix $U^{'}$ whose entries are in $ℤ [ω]$ . For our chosen $U$ , we needed to multiply by $δ$ three times to get a $ℤ [ω]$ matrix, giving us the following expression:

U = \frac{1}{δ^{3}} (\begin{matrix} 2 + 3 ω + 2 ω^{2} & - 1 - ω + ω^{3} \\ - ω - ω^{2} - ω^{3} & - 2 - 3 ω - 2 ω^{2} \end{matrix})

So, we have an initial lde of 3. We then try to apply $T^{l}$ for some $l \in {0, \dots, 7}$ followed by an $H$ gate to reduce the lde. Picking $l = 0$ (i.e. just applying an $H$ gate) does not reduce the lde, but $l = 1$ reduces the lde from 3 to 2:

The lde is not zero yet, so we do another iteration, this time finding we can reduce the lde at $l = 3$ . Namely, if we apply $T^{3}$ followed by $H$ , we reduce the lde from 2 to 0:

Our matrix now contains only elements of $ℤ [ω]$ , so we know the first column must be a basis vector, up to a phase. We find it is $- ω^{3} | 1 ⟩$ . We can change it to $- ω^{3} | 0 ⟩$ by applying an $X$ gate:

Finally, we finish by correcting the relative phase with a final application of $T^{3}$ , obtaining identity, up to a global phase:

Moving everything but $U$ to the RHS, we conclude that:

11.1.2 Approximating arbitrary single-qubit gates

Suppose we want to use the exact synthesis algorithm from the previous section to approximate arbitrary, single-qubit unitaries. We now know that we can build any

2 \times 2

unitary matrix over the ring

𝔻 [ω]

, and it is not hard to see that any complex number can be approximated to arbitrarily high precision by some element of

𝔻 [ω]

. Indeed, if we take

\frac{a}{2^{j}} + \frac{b}{2^{k}} ω^{2} = \frac{a}{2^{j}} + \frac{b}{2^{k}} i

for

a, b \in ℤ

and some suitably high values of

j, k \in ℕ

, we can get as close as we like to an arbitrary complex number. We seem to be most of the way there on coming up with an approximate synthesis algorithm. The problem is, if we go through a complex-valued unitary matrix

U

element-by-element and approximate each complex number with an element of

𝔻 [ω]

, odds are we won’t get something unitary but just very nearly unitary. So we need a better idea. Like in the exact synthesis case, inspiration comes from looking at the least denominator exponenent. The idea is, for a target unitary matrix

U

and a fixed error bound

𝜖 > 0

, we progressively raise the denominator exponent

k

until we can find some

V = \frac{1}{δ^{k}} V^{'}

where

It turns out condition 2 is already quite restrictive. Since

V

is a unitary, its columns need to be normalised. Hence, the norm-squared of the columns of

V^{'}

must be

| δ |^{2 k}

. Using this fact, we can prove that for fixed

k

, there are only finitely many

V^{'}

to choose from.

Since we already know that there are finitely many

V

satisfying conditions 1 and 2 above for fixed

k

, we know that there must also be finitely many

V

satisfying all 3 conditions. Hence, we can enumerate them for a fixed

k

. If we find a solution, we accept it. Otherwise, we increase

k

and try again. It turns out that there is a way to do this enumeration of candidates satisfying conditions 1–3 efficiently, and that it terminates for

k = O (\log \frac{1}{𝜖})

. The way this works has to do with properties of certain discrete subsets of the complex plane, which are a bit advanced for our purposes. Hence, we’ll finish our story on synthesis here, and make a few more remarks about this in the advanced Section* 11.6.3.

11.2 Rewriting Clifford+T diagrams

In Chapter 10 in order to construct the Toffoli and CCZ gate using more low-level gates, in particular T gates, we used a Boolean Fourier transform to switch from the multilinear phase polynomial

{(- 1)}^{xyz}

used in the CCZ gate to a phase polynomial built out of XOR terms that can be constructed using phase gadgets. In essence this all boiled down to the equation:

But there is nothing special about the phase of the CCZ gate here, and in fact we can write a similar equation for a

CCZ (α)

gate:

Now when

α = 0

both sides of this equation are obviously equal to the identity (just copy some spiders and cancel some identity spiders). But this should then also hold for

α = 2 π

, and then this fact becomes less obvious. On the left-hand side we then have

e^{i 2 π} = 1

so that it is still the identity, but on the right-hand side we get a bunch of

e^{\pm i 2 π ∕ 4} = e^{\pm iπ ∕ 2}

phases. In that case we can show this by using the

Y

eigenstate identity of Exercise 3.14, the Euler decomposition of the Hadamard and local complementation:

In this case we can hence still prove this identity with the tools we have already seen. But when we generalise Eqs. (11.4) and (11.6) to work with more than 3 wires we start getting new and very useful identities. As the number of XOR terms blows up exponentially it will be helpful to introduce a slightly more compact way to talk about them. Note that we can represent a parity like

x_{1} \oplus x_{3} \oplus x_{4}

with a bit string

\vec{y} = 1011

since

\vec{y} \cdot \vec{x} = y_{1} x_{1} \oplus y_{2} x_{2} \oplus y_{3} x_{3} \oplus y_{4} x_{4} = x_{1} \oplus x_{3} \oplus x_{4}

. We can hence write Eq. (11.4) more compactly as

Here

\vec{x} = x_{1} x_{2} x_{3}

and the sum over the

\vec{y}

goes over all the bit strings

𝔽 2^{3}

except for

\vec{0} = 000

. We can then easily write down the generalisation of this equation to

n

variables

x_{1}, \dots, x_{n}

as follows:

Now if we take

n = 4

, and we consider the CCCZ gate, which applies the phase polynomial

e^{iπ x_{1} x_{2} x_{3} x_{4}}

, then applying Eq. (11.8) would result in a bunch of phase gadgets with a phase of

\pm \frac{π}{8}

. But this is the Clifford+T chapter, so we want

\pm \frac{π}{4}

phases. We can get those by instead considering the trivial phase polynomial

e^{2 πi x_{1} x_{2} x_{3} x_{4}}

. This then results in a constellation of

2^{4} - 1 = 15

\pm \frac{π}{4}

phase gadgets:

While this might look like some sort of confusing alien spacecraft, there is some order to the picture above: it contains all the possible phase gadgets on four qubits: all those with one leg (the

\frac{π}{4}

phases directly on the qubit wires), two legs, three legs, and the single four-legged one. All the gadgets with an odd number of legs have a phase of

\frac{π}{4}

, and all the gadgets with an even number of legs have a phase of

- \frac{π}{4}

. Eq. (11.9) is a genuinely new diagrammatic equation, a type of equation we call a spider nest identity. As we will see in this chapter and the next, there are many uses of such equations. As a first application, note that if we bring the four-legged phase gadget to the other side of the equation that this says that whenever we have a four-legged gadget with a

\pm \frac{π}{4}

phase, we can replace it by a collection of 14 three-, two- and one-legged phase gadgets involving all of the four-legged phase gadget’s legs. In fact, we can decompose any

n

-legged phase gadget with a phase of

\pm \frac{π}{4}

into a collection of phase gadgets with a most 3 legs. For instance, when

n = 5

So while we would a priori think that we could need

O (2^{n})

different phase gadgets, one for each possible parity, we see that we actually only need

O (n^{3})

, only those with at most three legs. A different use-case for Eq. (11.9) is that we can use it to optimise the number of

T

gates needed for a circuit. When we have a collection of phase gadgets with

\frac{π}{4}

phases we can look for any subset of four qubits that has many phase gadgets and then use a version of Eq. (11.9) where we have those phase gadgets on the left and all the other parities on the right. Then by applying this equation we essentially ‘toggle’ which phase gadgets on these four qubits were present. As long as we started out with at least half of all the possible phase gadgets present, so at least 8, we end up with fewer phase gadgets. If we had 8 phase gadgets, then we get

15 - 8 = 7

phase gadgets at the end. If we had

10

, then we end up with

15 - 10 = 5

of them. The exact phases, whether

+ \frac{π}{4}

- \frac{π}{4}

is not important for this, since if the signs don’t match, this just introduces a

\frac{π}{2}

Clifford phase gadget.

11.2.1 Spider nests as strongly 3-even matrices

We saw in Section 7.2.2 that we can use parity maps and scalable ZX notation to compactly represent many phase gadgets at once. The collection of 15 four-qubit phase gadgets on the LHS of Eq. (11.11) corresponds to the following

15 \times 4

matrix:

This rule holds not just for the specific matrix

M

in (11.12), but actually defines a whole family of rules.

This means that when hunting for such identities, we are really looking for a particular type of boolean matrix. To find out what kind of matrices we need, we need to look at the pseudo-Boolean Fourier transform again. This time however, we want to translate phase gadgets back into controlled phase gates. The ‘inverse’ of the equation (11.8) which relates an AND to a sum of XORs is given by:

Here

[n] : = {1, \dots, n}

, and

| S |

is the number of elements of

S

. Hence, each term

\prod_{i \in S} x_{i} = x_{i_{1}} \land \dots \land x_{i_{k}}

has a weight that is a power of

2

in the decomposition. Now, if we are considering this in a phase polynomial, where our phase gadget has a coefficient of

\frac{π}{4}

, then in the decomposition the different terms will have a weight of

\pm 2^{| S |} \frac{π}{8}

. Since these are phases, we are working modulo

2 π

so that when

| S | > 3

each term is a multiple of

2 π

and disappears:

Hence, each phase gadget corresponds to a collection of

T

gates (the linear terms), CS gates (quadratic terms) and CCZ terms (cubic terms).

If instead of a single gadget we have a collection of phase gadgets, then we can add together the respective

T,

CS and CCZ gates of their decompositions. It turns out that such a circuit can only be equal to the identity if all the gates cancel in the trivial way. That is, each qubit should have a number of

T

gates that is multiple of 8 since

T^{8} = I

, each pair of qubits should have a multiple of 4 CS gates and each triple should have an even number of CCZ gates.

From Eq. (11.15) we see that if a phase gadget involves the qubit

j

, that then a

T

gate appears on that qubit. If it connects to both qubits

j_{1}

and

j_{2}

then there will be a CS gate on those qubits, and similarly a CCZ appears when the qubits

j_{1}, j_{2}

and

j_{3}

are in the phase gadget. Hence, a collection of phase gadget implements the identity when each qubit is part of

0 mod 8

gadgets (

T^{8} = I

), each pair of qubits is part of

0 mod 4

gadgets (

{CS}^{4} = I

), and each triple is part of

0 mod 2

gadgets (

{CCZ}^{2} = I

). We can formalise this cancellation property as follows.

Definition 11.2.2. We say a matrix $M$ is strongly 3-even when

\begin{array}{l} \forall i : \sum_{l} M_{l}^{i} & = 0 (mod 8) \\ \forall i < j : \sum_{l} M_{l}^{i} M_{l}^{j} & = 0 (mod 4) \\ \forall i < j < k : \sum_{l} M_{l}^{i} M_{l}^{j} M_{l}^{k} & = 0 (mod 2) . \end{array}

That is, when each row, product of pairs of rows and product of triples of rows has a Hamming weight that is a multiple of respectively 8, 4 and 2. We say $M$ is 3-even when all three conditions only hold modulo 2:

\begin{array}{l} \forall i : \sum_{l} M_{l}^{i} & = 0 (mod 2) \\ \forall i < j : \sum_{l} M_{l}^{i} M_{l}^{j} & = 0 (mod 2) \\ \forall i < j < k : \sum_{l} M_{l}^{i} M_{l}^{j} M_{l}^{k} & = 0 (mod 2) . \end{array}

We can write the columns of

M

as a collection of bitstrings

{\vec{y}}_{1}, \dots {\vec{y}}_{k}

, where each

{\vec{y}}_{j}

describes the connectivity of a single

\frac{π}{4}

phase gadget. Then, we can show that

M

is strongly 3-even precisely when the gadgets make a spider nest.

Being just 3-even, as opposed to strongly 3-even, is a weaker condition that means the collection of gadgets is not exactly equal to the identity, but instead are equal to some Clifford unitary. Suppose we have a 3-even matrix of

\frac{π}{4}

phase gadgets and we convert each one to

T

, CS, and CCZ gates using the inverse Fourier transform equation (11.15). Instead of the

T

gates exactly cancelling due to them appearing a multiple of

8

times, they only appear a multiple of

2

times on each qubit and hence combine into

T^{2} = S

gates, which are Clifford. Similarly, the CS gates appear an even number of times to create CZ gates, and the CCZ gates still completely cancel. The resulting circuit is therefore diagonal and Clifford, so we can represent it as a collection of

\frac{π}{2}

phase gadgets. Hence, we get the following corollary to Proposition 11.2.3.

We will also call a set of gadgets that is equal to a Clifford (and hence corresponds to a 3-even matrix) a spider nest identity. We can find which Clifford it implements by rewriting all the gadgets using Eq. (11.15), i.e. by performing the inverse Fourier transform.

11.2.2 Proving all spider nest identities

We can now see when a collection of phase gadgets is a spider nest identity: check whether its associated parity matrix is strongly 3-even. Diagrammatically this is not very satisfying however, as it doesn’t tell us how to find strongly 3-even matrices or what is required to diagrammatically prove all these identities. In this section we will see that we can build all spider nest identities from one particular one. To do so, we need to go back to the 4-qubit spider nest Eq. (11.9) which we arrived at by decomposing the ‘trivial’ phase polynomial

e^{2 πi x_{1} \dots x_{4}}

into phase gadgets using Eq. (11.8). We can similarly get a 5-qubit spider nest by decomposing

e^{4 πi x_{1} \dots x_{5}}

. This then will have

2^{5} - 1 = 31

different phase gadgets. Note the factor of

4 π

instead of

2 π

which is needed to get a collection of

\pm \frac{π}{4}

phases instead of

\pm \frac{π}{8}

phases. We can in the same way build an

n

-qubit spider nest by decomposing

e^{2^{n - 3} πi x_{1} \dots x_{n}}

, which will result in a spider nest with

2^{n} - 1

phase gadgets with a

\pm \frac{π}{4}

phase. Now, let’s take the 5-qubit spider nest, and then ‘subtract’ the 4-qubit spider nest from it:

Here on the left-hand side we first have all the phase gadgets on five qubits, but then we subtract from those all the phase gadgets that don’t act on the first qubit. On the right-hand side we are then left with precisely those phase gadgets that do involve the first qubit. These are hence the parities like

x_{1}

x_{1} \oplus x_{2}

, …,

x_{1} \oplus x_{2} \oplus x_{3} \oplus x_{4} \oplus x_{5}

. Since these parities can involve any combination of the other four qubits, there are

2^{4} = 16

such parities (or alternatively you can see that we subtract

15

parities from

31

parities in Eq. (11.18) to arrive at

16

parities).

It is this 16-gadget identity (11.18) that turns out to generate all the other ones. We need to do some work to see that though. The first step is to construct such collections of phase gadgets in a more systematic way.

Intuitively, this inductive definition results in a phase gadget connecting the single input wire to every subset of the output wires. For example:

where the last step follows from applying strong complementarity to the marked spider pair, and then applying spider fusion

(sp)

as much as possible. Let’s formalise this intuitive explanation of

s_{n}

using scalable notation. Let

B_{n}

be the

n \times 2^{n}

matrix whose

2^{n}

rows consist of all

n

-bit strings. That is, the matrix defined inductively as follows:

where

\vec{0}

and

\vec{1}

are respectively the column vectors of all

0

’s and all

1

’s. For example, we have:

Proof. First, note that:

(11.22)

Using this equation and the scalable rules, we can prove (11.21) from (11.19) by induction on $n$ :

□

Now, we can put

s_{4}

in a circuit and it can represent exactly the 16-gadget 5-qubit identity (11.18):

By plugging diagram of ketplus-1

into all the inputs and yanking the first output to be an input we can present this in a slightly simpler way:

We call this the (S4) rule, and it is enough to prove all the spider nest identities. First, let’s note that because

s_{4}

disconnects, all the

s_{n}

for

n \geq 4

also disconnect.

In Eq. (11.23) we used

s_{4}

to represent the spider nest consisting of all gadgets connected to the first qubit. By using

s_{n}

for

n \geq 4

and using Lemma 11.2.7 we see then that the collection of all gadgets on

n + 1

qubits that are connected to the first qubit is also an identity. We can generalise this to the set of all gadgets connected to the first

k

qubits. To do this, note that if we connect

s_{n}

to an X-spider on the left that we obtain the following:

where

1

is the

k \times 2^{n}

matrix where every entry is 1. Hence, to represent a connection to the first

k

qubits, we can just compose

s_{n}

with an X-spider on its inputs. This then also leads to an identity:

Note that the special case of

k = 0

gives us the set of all phase gadgets on a set of wires, like the 15-gadget identity (11.9) (in that case

B_{n}

has 16 rows, but the all-zero row of

B_{n}

corresponds to an unconnected phase that can be removed as a scalar). As it turns out, every spider nest identity can be decomposed into a composition of the identities of the form (11.27), and so (S4) indeed suffices to prove all of them. To show this we will need some more machinery however.

11.2.3 Spider nests as Boolean polynomials

In the previous sections we saw that we can reduce a collection of phase gadgets to a series of bit strings denoting the connectivity of the gadgets:

Here we are ignoring the phase gadget with a

\frac{π}{2}

phase, as it is Clifford. We saw in Eq (7.48) that these bit strings can be stored in one big matrix and in this way we can efficiently write down a collection of gadgets in scalable notation. Storing them all in a matrix is however not the only way in which we could capture the information of a set of bit strings. We could instead represent them using its Boolean indicator function. That is, to a set of bit strings

S \subseteq 𝔽 2^{n}

we associate the function

f_{S} : 𝔽 2^{n} \to 𝔽 2

defined by

f_{S} (\vec{y}) = 1

iff

\vec{y} \in S

. We should note two important details about representing a collection of gadgets by its indicator function. First, this representation cannot deal with repeated gadgets / bit strings, and so this does not capture the exact phase of the gadgets (whether it is

+ \frac{π}{4}

- \frac{π}{4}

for instance). This means that when representing a collection of gadgets by its indicator function that we only represent it up to some Clifford information. Second, because

\vec{0}

corresponds to a phase gadget not interacting with any qubit, we don’t care about the value of the function at this value. It would hence be more accurate to write the functions as

f : 𝔽 2^{n} ∖ {\vec{0}} \to 𝔽 2

, but we will ignore this detail for now. Any Boolean indicator function

f : 𝔽 2^{n} \to 𝔽 2

corresponds to a collection of phase gadgets: a gadget with connectivity

\vec{y}

is in the collection if

f (\vec{y}) = 1

. Some collections of gadgets are spider nests, so let’s call

f

a spider-nest function when its corresponding collection of gadgets forms a spider nest (up to a possible Clifford unitary). I.e. if a collection of bit strings

S

forms the rows of a 3-even matrix, then

f_{S}

is a spider-nest function. The indicator function of the 4-qubit spider nest of Eq. (11.9) is the constant function

1_{4} : 𝔽 2^{4} \to 𝔽 2

that always returns 1, since every phase gadget is part of the spider nest. The 5-qubit spider nest of Eq. (11.23) that contains all the gadgets using the first qubit has as indicator function

X_{1} : 𝔽 2^{5} \to 𝔽 2

that maps

X_{1} (\vec{y}) = y_{1}

, since any bit string

\vec{y}

is part of the set of gadgets if

y_{1} = 1

. The spider nest of Eq. (11.27) that uses all gadgets touching the first

k

qubits has indicator function

X_{1} \dots X_{k} : 𝔽 2^{k + n} \to 𝔽 2

, which acts as

X_{1} \dots X_{k} (\vec{y}) = y_{1} \land \dots \land y_{k}

. These spider-nest functions are examples of Boolean monomials. A monomial is a function constructed by multiplying simple bit-indicator functions

X_{j}

together. For instance

X_{1} X_{3} X_{4} (\vec{y}) = X_{1} (\vec{y}) \cdot X_{3} (\vec{y}) \cdot X_{4} (\vec{y}) = y_{1} \land y_{3} \land y_{4}

. We call the number of bit-indicator functions in such an expression the degree of the monomial. For instance the spider-nest function

X_{1} \dots X_{k}

has degree

k

. It turns out that in general, any monomial of degree at most

n - 4

corresponds to a spider-nest identity. Indeed,

X_{1} \dots X_{k} : 𝔽 2^{k + n} \to 𝔽 2

for

n \geq 4

is a spider-nest function. By permuting the qubits these give us all monomials of degree at most

n - 4

. The number 4 in

n - 4

comes from the fact that the smallest spider-nest identity, corresponding to the Boolean function

1_{4}

acts on four qubits. Note that the spider-nest functions form a linear space: suppose that both

f_{S_{1}}, f_{S_{2}} : 𝔽 2^{n} \to 𝔽 2

correspond to spider nests. Then if we take the composition of all the phase gadgets with parities in

S_{1}

and that of

S_{2}

we end up with a new set of phase gadgets covering all the parities in

S_{1}

and of

S_{2}

. However, the gadgets that are in both

S_{1}

and

S_{2}

will fuse and hence get a Clifford phase. We are ignoring the Clifford unitaries, so we see then that the collection of non-Clifford phase gadgets corresponds to the symmetric difference

S_{1} Δ S_{2}

. The indicator function is then

f_{S_{1} Δ S_{2}} = f_{S_{1}} \oplus f_{S_{2}}

. As this XOR of functions is just the sum in

𝔽 2

, we see that any sum of spider-nest functions is again a spider-nest function, so that the spider-nests form a linear subspace of all Boolean functions. In particular, we can take a sum of monomials that are all spider-nest functions and create a Boolean polynomial that is a spider-nest function. Any Boolean function

f : 𝔽 2^{n} \to 𝔽 2

can be written in a unique way as a polynomial

f = \oplus_{\vec{a}} λ_{\vec{a}} X_{1}^{a_{1}} \dots X_{n}^{a_{n}}

where

λ_{\vec{a}} \in 𝔽 2

are the coefficients that determine which monomials are in the decomposition of

f

. The degree of a polynomial is then the maximum degree of its monomials. As a sum of spider-nest functions is still a spider-nest function, we then see that any Boolean polynomial of degree at most

n - 4

is a spider-nest function. What about the converse? Does any spider-nest function correspond to a degree at most

n - 4

polynomial?

Proof. Let $M^{'}$ be a matrix obtained from $M$ by removing all repeated pairs of rows. $M$ is 3-even if and only if $M^{'}$ is, and both matrices have the same indicator polynomial. Hence, we can assume without loss of generality that $M$ has no repeated rows. Write $f = f_{M}$ for the indicator function of $M$ . Let $P (r, n)$ denote the set of $n$ -bit polynomials of degree at most $r$ . We need to show that $f \in P (n - 4, n)$ . Define an inner product on Boolean functions by $⟨ g_{1}, g_{2} ⟩ = \oplus_{\vec{y}} g_{1} (\vec{y}) \land g_{2} (\vec{y}) = \sum_{\vec{y}} g_{1} (\vec{y}) g_{2} (\vec{y}) (mod 2)$ . We call $g_{1}$ and $g_{2}$ orthogonal when $⟨ g_{1}, g_{2} ⟩ = 0$ and we define $P {(r, n)}^{⊥}$ as the space of functions that are orthogonal to all functions in $P (r, n)$ . We claim that $P {(r, n)}^{⊥} = P (n - r - 1, n)$ . With this claim it then remains for us to show that $f \in P {(3, n)}^{⊥}$ . Let’s do that first. Let $g = X_{j} f$ . This is then a polynomial with $g (\vec{b}) = 1$ iff $b_{j} = 1$ and $f (\vec{b}) = 1$ . Hence $\sum_{\vec{b}} g (\vec{b})$ is equal to the Hamming weight of the $j$ th column of $M$ . This also works for products of columns: for $h = X_{i} X_{j} X_{k} f$ , we have $\sum_{\vec{b}} h (\vec{b})$ equal to the Hamming weight of the element-wise product of the $i, j$ and $k$ th rows, which is hence zero mod 2 because of 3-eveness. Now $\sum_{\vec{b}} h (\vec{b}) (mod 2) = ⟨ X_{i} X_{j} X_{k}, f ⟩$ so that $f$ is orthogonal to all degree-3 monomials $X_{i} X_{j} X_{k}$ . These span $P (3, n)$ , and hence $f \in P {(3, n)}^{⊥}$ as desired. Now to prove the claim $P {(r, n)}^{⊥} = P (n - r - 1, n)$ first note that if a polynomial $f$ of $n$ variables has degree less than $n$ , then $\sum_{\vec{b}} f (\vec{b}) = 0 (mod 2)$ . This is easy to check for monomials, as any monomial of degree $< n$ must omit some variable $x_{j}$ , so that

\sum_{\vec{b}} f (\vec{b}) = \sum_{\vec{b}, b_{j} = 0} f (\vec{b}) + \sum_{\vec{b}, b_{j} = 1} f (\vec{b}) = 0 (mod 2)

By $𝔽 2$ -linearity of the map $f \mapsto \sum_{\vec{b}} f (\vec{b}) (mod 2)$ this then holds for all polynomials. Now, for any polynomial $f$ of degree at most $r$ and $g$ of degree at most $n - r - 1$ , $f \cdot g$ has degree at most $n - 1$ . Hence $⟨ f, g ⟩ = \sum_{\vec{b}} (f \cdot g) (\vec{b}) = 0 (mod 2)$ . This implies $P (n - r - 1, n) \subseteq P {(r, n)}^{⊥}$ . To show this is actually an equality, we will do dimension counting. Note that for a $𝔽 2$ -vector space $V$ and $A \subseteq V$ we have $\dim (A^{⊥}) = \dim (V) - \dim (A)$ because the dimension of $A^{⊥}$ is restricted by independent linear equations specified by a basis of $A$ . Since $P (r, n)$ has the monomials of degree $\leq r$ as its basis, $\dim (P (r, n)) = \sum_{d = 0}^{r} (\binom{n}{d})$ . By manipulating binomial coefficients, we can then see that:

\dim (P (r, n)) + \dim (P (n - r - 1, n))

= \sum_{d = 0}^{r} (\binom{n}{d}) + \sum_{d = 0}^{n - r - 1} (\binom{n}{d}) = \sum_{d = 0}^{r} (\binom{n}{d}) + \sum_{d = r + 1}^{n} (\binom{n}{d}) = 2^{n} = \dim (𝔽 2^{2^{n}}),

so that $\dim (P (n - r - 1, m)) = \dim (P {(r, n)}^{⊥})$ and these spaces must be equal. □

Proof. Let $M$ be the $n \times k$ 3-even matrix describing a spider-nest identity. Then its corresponding indicator polynomial $f_{M}$ is a sum of monomials $f_{M} = \sum_{j}^{l} m_{j}$ of degree at most $n - 4$ . Let the corresponding matrices of these monomials be $M_{1}, \dots, M_{l}$ . We have already shown how to prove the spider-nest identities corresponding to the $M_{j}$ in Eq. (11.27), hence we can freely introduce them into the circuit of gadgets described by $M$ :

Then, the indicator polynomial of $L$ is $f_{M} + \sum_{j} m_{j} = 0$ . Hence, every row in $L$ appears an even number of times. Using gadget-fusion, we can therefore reduce all angles to integer multiples of $π ∕ 2$ . Hence, the entire diagram is then Clifford and can be rewritten into a Clifford circuit. □

We see then that when we restrict to just thinking about what we can do with diagrams, all the complexity of strongly 3-even matrices and degree

n - 4

Boolean polynomials reduces to just adding (S4) to the Clifford rules. Do note though that in the proof above we needed to know about Boolean polynomials and its decomposition into low-degree monomials in order to find which rewrites we should be applying to prove the spider-nest identity. In addition, the matrices corresponding to the monomials

M_{j}

might contain exponentially many rows, and hence this rewriting is not efficient. In fact, it seems very likely that an efficient rewrite strategy for spider nests should not exist (see Exercise 11.9). Forgetting about all these details again, we can see that we can rephrase this result into a completeness result, which very neatly ties in some of the earlier completeness results we have seen.

Proof. Note that we can trivially rewrite $U$ to $U V^{†} V$ by consecutively introducing pairs of cancelling gates from $V^{†}$ and $V$ . Hence, if we can show that $U V^{†}$ can be rewritten to the identity we are done. In Section 7.1 we saw how we can write any CNOT+Phases circuit into a layer of phase gadgets followed by a CNOT circuit. Now since $U V^{†} = I$ , it must be the case that both the phase gadget part and the CNOT circuit implement the identity. Hence, we can use the completeness of phase-free ZX (Theorem 4.3.6) to rewrite the CNOT circuit into the identity and Theorem 11.2.10 to rewrite the collection of phase gadgets, which necessarily forms a spider-nest identity, into a Clifford. This Clifford still must be equal to the identity, and hence by Clifford completeness (Theorem 5.5.7) it can be rewritten into the identity. □

11.3 Advanced T-count optimisation

We have now seen that there is a large number of configurations of

T

gates that actually correspond to Clifford circuits. Getting rid of non-Clifford parts of a circuit is usually a good thing, as we’ve seen that we can do a lot of rewriting and simplification with the Clifford parts of a circuit. In addition, as we will see in Chapter 12, the

T

gate in particular is quite costly to implement in many fault-tolerant architectures, and so we want to include as few of them as possible. The spider-nest identities suggest a simple rewrite strategy to optimise the number of

T

gates in a Clifford+

T

circuit. First, since spider nest identities apply to a collection of phase gadgets, and hence to CNOT+

T

circuits, we need to split up our Clifford+

T

circuit into CNOT+

T

subcircuits. We can view the Clifford+

T

gate set as consisting of CNOT, Hadamard,

S

and

T

. Of these, only the Hadamard is not in the CNOT+

T

set, and so we need to ‘split our circuit on Hadamards’. Then, pick a number of identities, preferably not containing too many gadgets and not acting on too many qubits. We want to see where in the CNOT+

T

circuit we can apply an identity so that it reduces the

T

-count. This is done as described in the beginning of Section 11.2: for each identity in our list, we check whether more than half of the gadgets are present in the circuit. If this is the case, then we add all the gadgets from the identity to the circuit, which by gadget fusion makes the matching gadgets already present in the circuit into Cliffords, and adds the other gadgets as non-Cliffords. This is repeated greedily until none of the identities has an overlap of more than half with the gadgets present in the circuit. Note that if we have

n

qubits in our circuit, then to match an

m

-qubit spider-nest identity, we need to check

(\binom{n}{m})

different groupings of

m

qubits to see whether the identity ‘matches’ there. Hence, as long as

m

is bounded, the complexity of this algorithm is polynomial:

O (n^{m})

. In practice, only checking spider nests with up to

5

qubits, and using a simple heuristic based on the ‘density’ of the number of gadgets on a set of qubits, the run-time can be made quite reasonable. Of course such an algorithm is only a heuristic, and is heavily dependent on the type of identities we include in our search, and since we are applying the identities greedily, you might get stuck in a local optimum. The problem of optimisation of CNOT+

T

circuits using spider nests is actually related to two well-known problems in computer science, which offer interesting and useful perspectives on this problem.

11.3.1 Reed-Muller decoding

We have seen that

n

-qubit spider nests correspond to Boolean polynomials of degree at most

n - 4

. In addition, we have a correspondence between Boolean functions

f

and collections of spider nests specified by their parities as

{\vec{y} | f (\vec{y}) = 1}

. Let’s call the set of all

n

-bit Boolean functions

B_{n}

, and the set of

n

-bit spider-nest functions

S_{n} \subseteq B_{n}

. Now, if we naively implement the set of gadgets corresponding to

f

by just implementing each of the gadgets in turn, then this will require

| f | : = \sum_{\vec{y} \neq \vec{0}} f (\vec{y})

number of

T

gates. We call

| f |

the Hamming weight of

f

. It is the number of

1

s in

f

’s truth table. Note that we do not include

f (\vec{0})

in this sum, as this corresponds to the trivial phase gadget not connected to any qubit. However, we don’t have to naively implement

f

, because we know that all

g \in S_{n}

are actually free to implement: these correspond to Cliffords and hence do not require any

T

gates. Instead of directly implementing

f

, we can implement

f \oplus g

, which corresponds to applying the spider nest identity of

g

to the collection of gadgets of

f

. The cost of this implementation will then be

| f \oplus g |

. To implement

f

with as few

T

gates as possible we are hence looking for a

g \in S_{n}

such that

| f \oplus g |

is minimal. Let’s state this problem again in a slightly different way, and a bit more abstractly. We have a vector space

V

with a specified subspace

S \subseteq V

. Given a vector

v \in V

, we want to find the closest

s \in S

v

, i.e. such that

v - s

is as small a vector as possible (in some norm). This is known as a linear decoding problem, where

S

is our code space consisting of code words

s \in S

, and

v

is our ‘noisy message’ we are trying to decode. We will spare you the details for now, as we will have a lot more to say about linear codes in Chapter 12. In our case the subspace

S_{n}

consists of the spider-nest functions, which we know to be all the degree

n - 4

Boolean polynomials. This code space actually has a name: it is the degree

n - 4

Reed-Muller code, and hence optimising the

T

-count of a CNOT+

T

circuit corresponds to decoding the Reed-Muller code. To summarise this optimisation approach now a bit more concretely: We start with a unitary

U

that is built out of phase gadgets with phases that are multiples of

\frac{π}{4}

. We take its corresponding Boolean function

f \in B_{n}

. We then find the ‘closest’ degree

n - 4

polynomial

g \in S_{n}

such that

| \vec{f} \oplus g |

is as low as possible (corresponding to decoding the Reed-Muller code). We then implement the circuit

U^{'}

corresponding to

f \oplus g

by composing its phase gadgets. We know that

U

is equal to

U^{'}

up to some Clifford. We find the Clifford

C

such that

U = U^{'} C

(see Exercise 11.6). Then

U^{'} C

is our new circuit, and this has

T

-count

| f \oplus g |

. Reed-Muller codes are used a lot in practice and their decoding problem has been extensively studied. So in principle we could use such a decoder to optimise the T-count of a circuit. However, the codes that are used in practice mostly have a size, corresponding in our case to the number of qubits

n

, that is not too large. However, we don’t want to restrict to just small

n

, so that is a problem. Decoding Reed-Muller codes for large

n

is believed to be a hard problem, so we don’t expect there to be an efficient algorithm to optimally minimise the T-count of a CNOT+

T

circuit. For optimising the T-count of general Clifford+

T

circuits (that are allowed to contain Hadamards), there is a simple argument to see that T-count optimisation is NP-hard.

Exercise* 11.9. In this exercise we will work towards a simple proof that determining whether a Boolean function is satisfiable reduces to $T$ -count optimisation of general Clifford+ $T$ circuits, and hence $T$ -count optimisation is NP-hard. Let $f : 𝔽 2^{n} \to 𝔽 2$ be some Boolean function, which is described as some poly-size Boolean expression consisting of AND, XOR and NOT terms. We we want to determine whether there is a $\vec{x} \in 𝔽 2^{n}$ such that $f (\vec{x}) = 1$ . We have seen in Chapter 10 how we can construct $U_{f}$ , the $(n + 1)$ -qubit unitary acting as $U_{f} | \vec{x}, y ⟩ = | \vec{x}, y \oplus f (\vec{x}) ⟩$ , using Clifford+ $T$ gates.

1..

Let the circuit $C_{f}$ be defined as follows:

(11.28)

Show that $C_{f}$ is a diagonal unitary which can be described by the following path-sum expression:

C_{f} | \vec{x}, y ⟩ = e^{i \frac{π}{4} (1 - 2 y) f (\vec{x})} | \vec{x}, y ⟩ .

(11.29)

2..

Show that if $f$ is not satisfiable or everywhere satisfiable (meaning $f (\vec{x}) = 1$ for all $\vec{x}$ ) that then $C_{f}$ is a Clifford unitary (up to global phase) and hence can be implemented with $T$ -count zero.

3..

If $f$ is satisfiable but not everywhere satisfiable, then by definition there exist ${\vec{z}}_{1}$ and ${\vec{z}}_{2}$ such that $f ({\vec{z}}_{1}) = 1$ and $f ({\vec{z}}_{2}) = 0$ . Then it is easy to see from Eq. (11.29) that

C_{f} | {\vec{z}}_{1}, 0 ⟩ = e^{i \frac{π}{4}} | {\vec{z}}_{1}, 0 ⟩ and C_{f} | {\vec{z}}_{2}, 0 ⟩ = | {\vec{z}}_{2}, 0 ⟩ .

Show that in this case $C_{f}$ is not Clifford and hence it’s $T$ -count non-zero, by finding a Pauli string $\vec{P}$ such that $C_{f}^{†} \vec{P} C_{f}$ is not in the Pauli group. Hint: You don’t have to calculate the full operator $C_{f}^{†} \vec{P} C_{f}$ . For the right choice of $\vec{P}$ it is enough to observe that $C_{f}^{†} \vec{P} C_{f} | {\vec{z}}_{1}, 0 ⟩$ maps $| {\vec{z}}_{1}, 0 ⟩$ to something that a member of the Pauli group could not.

Now, note that if we could efficiently determine the optimal $T$ -count of any circuit, then for a given $f$ we could construct $C_{f}$ and ask whether it’s $T$ -count is zero: if it is not then we know it has to be satisfiable. If it is zero, then either the circuit is not satisfiable or everywhere satisfiable. We then just check the value $f (0 \dots 0)$ to see which is the case. We can hence in either case efficiently determine whether $f$ is satisfiable, an NP-complete task.

11.3.2 Symmetric 3-tensor factorisation

There is another way we can formalise the optimisation of the number of

T

gates in a diagonal CNOT+

T

circuit. To do this, we again need to consider the multilinear decomposition of a phase gadget as in Eq. (11.15):

In particular, the action of an arbitrary collection of

\frac{π}{4}

phase gadgets can be represented by some degree-3 multilinear polynomial

Here the coefficients

a_{j}

can be taken modulo 8,

b_{ij}

modulo 4 and

c_{ijk}

modulo

2

. By Exercise 11.5 two collections of phase gadgets correspond to the same polynomial if and only if they implement the same linear map. Similarly, two phase gadget circuits are equal up to a Clifford when the coefficients of their polynomials have the same parities

a_{j} (mod 2), b_{ij} (mod 2), c_{ijk} (mod 2)

. Hence, if we don’t care about the Clifford part of a computation, we can forget that the coefficient

a_{j}

should be taken modulo 8, and instead take it modulo 2. Similarly we can take

b_{ij}

modulo 2 instead of 4. This is nice, because this information about the coefficients modulo 2 can be captured in a single object.

We define the symmetric 3-tensor

S_{ijk}

corresponding to a degree-3 multilinear polynomial by setting

for

i < j < k

. All other coefficients of

S

are then completely determined by symmetry. It is clear that from any symmetric 3-tensor we can also read of the coefficients of a degree-3 multilinear polynomial, which then corresponds to a phase gadget circuit. Note though that if we start with a phase gadget circuit, find its 3-tensor, and then translate it back into a phase gadget circuit, that we do lose some Clifford information in the process, and the resulting circuit is only equal to the original one up to some Clifford. Let’s look at what the 3-tensor of a single phase gadget looks like. Let

\vec{y} \in 𝔽 2^{n}

describe the connectivity of the phase gadget. Then by Eq. (11.30) the corresponding multilinear polynomial has coefficients

a_{i} = y_{i}

b_{ij} = - y_{i} y_{j}

and

c_{ijk} = y_{i} y_{j} y_{k}

. That is: there is a

T

gate on all the qubits the gadget is connected to, a

{CS}^{†}

on all pairs of qubits it is connected to, and a CCZ on all triples of qubits it is connected to. This means the 3-tensor corresponding to the gadget

S^{\vec{y}}

has a particularly simple form:

S_{ijk}^{\vec{y}} = y_{i} y_{j} y_{k}

. 3-tensors that have this form are said to have rank 1. When we have a set of gadgets

{\vec{y}}^{1}, \dots, {\vec{y}}^{k}

, the circuit has the corresponding tensor

S = S^{{\vec{y}}^{1}} + \dots + S^{{\vec{y}}^{k}}

, and hence it is a sum of rank 1 tensors. Conversely, any way to write

S

as a sum of rank 1 tensors corresponds to a way to implement it as a series of phase gadgets.

We see then that we have the following strategy for optimising the

T

-count of a phase gadget circuit: first find its corresponding 3-tensor. Then find a decomposition of this tensor into as few rank 1 tensors as possible. These rank 1 tensors directly correspond to an optimised set of phase gadgets which implements the same linear map as the original circuit up to a Clifford. Find what Clifford this is using the procedure of Exercise 11.6. The resulting

T

-count of the circuit is exactly equal to the rank of the decomposition we found. In particular, determining the optimal

T

-count of a given phase gadget circuit is equivalent to determining the rank of a symmetric 3-tensor. Unfortunately, determining the rank of a (symmetric) 3-tensor is believed to be a hard problem. Fortunately, there are some good heuristics that try to find low rank decompositions that work well in practice. We will say a bit more about these in the References of this chapter.

11.4 Catalysis

We saw way back in Section 3.3.1 that we can implement a

T

gate by injecting the

| T ⟩ : = T | + ⟩

magic state into the circuit:

This is useful, because it is sometimes difficult to directly implement the

T

gate (as we will see in the next chapter), and instead having the ability to prepare magic states ‘offline’ and inject them when needed is preferable. However, in this circuit we consume the magic state when we inject the

T

gate. Wouldn’t it be nice if we could preserve this state instead? This does turn out to be possible, if we use a different injection circuit that contains some other non-Clifford gates. In particular, using a CS gate and a single

| T ⟩

state, we can apply a

T

gate and get the starting

| T ⟩

state back:

Here we used the decomposition of the CS gate written using an H-box (Section 10.2) into elementary gates:

Just using Clifford operations and CS gates, it is not possible to construct a

T

gate. We can see this, because the matrices produced by the Clifford+CS gate set have entries in a ring that does not include

e^{i \frac{π}{4}}

. However, with Eq. (11.31) we see that as soon as we have just one

| T ⟩

state available to us, we can use

CS

gates to apply as many

T

gates as we want:

This is an example of what we call catalysis: a process that needs some resource to be present, but doesn’t consume that resource. In this case the resource is

| T ⟩

and the process is the implementation of a

T

gate using a CS gate. Another example of catalysis is using a

| CCZ ⟩ : = CCZ | + + + ⟩

magic state, Clifford operations and a single

T

gate in order to get 3

| T ⟩

states out:

Note we use the adjoint of (10.64) in the second step of the derivation above. So again, if we can perform CCZ and Clifford gates and have just a single

| T ⟩

available, then we can inject as many

T

gates as we want. There are three things we can do with catalysis of

| T ⟩

magic states that we will explore in this section. First, in some cases it turns out to be easier or cheaper to perform a CS or CCZ gate than a

T

gate, and then these methods allow us to save resources. Second, they allow us to prove that some gate sets are already computationally universal, even if they are not (obviously) approximately universal for unitary synthesis. And third, catalysis gives us a nice way to extend complete graphical calculi to larger domains.

11.4.1 Catalysis as a resource for compilation

In this section we will see how catalysis can be used to derive an efficient way to implement small angle rotations. To do that we first need to generalise Eq. (11.31) to allow us to implement controlled-phase gates. To see how this works it will be helpful to first write Eq. (11.31) in circuit notation:

Here we wrote a slightly more general circuit where we replace the

T

and controlled-

S

gates with

Z (α)

and controlled-

Z (2 α)

gates. As a shorthand we write

| Z (α) ⟩ : = Z (α) | + ⟩

as a generalisation of

| T ⟩ = T | + ⟩

. Since this is a circuit equality that holds on the nose (with a correct global phase), it should continue to hold when we add additional control wires:

Because we can apply catalysis equally well to controlled phases, we can start iterating the procedure producing bigger and bigger controlled-phase gates, where the phase being controlled is also increasingly large. For instance, if we want to implement a

T

gate, we can do the following:

Here in the last step we are left with a controlled

Z^{2}

operation. But since

Z^{2} = id

this does not do anything and we can remove it. So at this point we can stop the iteration of the catalysis. We see then that we can implement a

T

gate just using multiple-controlled Toffoli gates, if we have the right catalysis states lying around. This procedure works to implement any

Z (2 π ∕ 2^{k})

gate: we then get a ladder of

k

Toffoli gates. We have actually seen such a Toffoli ladder before: in Exercise 10.15 we saw that this is actually a controlled-decrementer circuit that decreases the value of an

n

-bit number by

1

, controlled on the top wire. In that same exercise we saw that we can build a subtraction circuit if we make a ladder of these controlled-decrementers. For this reason, when we apply a subtraction circuit to a collection of catalysis states, this implements phase gates on on the top qubits:

The reason this is nice, is because in Section 10.5 we found a very efficient construction of the adjoint of the subtraction circuit: the adder. So if we can transform Eq. (11.37) slightly, so that it uses an adder instead of a subtracter, this gives us a way to efficiently implement a whole collection of phase gates at once. The way we do this is by taking Eq. (11.37) and composing both sides on the right by

Add = {Sub}^{†}

, and on the left by

{(T \otimes S \otimes Z)}^{†}

. After cancelling with the adjoints we are then left with the following equation:

We showed the construction here for 3 bits, but this works for any number of bits

n

, in which case the smallest phase we implement is

Z (2 π ∕ 2^{n})

. While this is nice and all, this might not seem immediately useful: we have this pattern of phase gates appearing in parallel, where we have a small-angle phase gate, a slightly-large-angle one, and so on. You might wonder, surely it will not happen often that we can use this exact pattern of phases in a real quantum circuit, and you would be wondering right. However, with the magic of ancillae we can pick and choose exactly which phases we want to appear and where. We can transfer the application of a phase gate to a zeroed ancilla:

Now when we have a complicated phase, we can decompose it into simple components, and put each of these onto its own ancilla. Suppose for instance we want to implement the phase

Z (\frac{11}{8} π)

. We can then write

11

bitwise as

1011

so that

Z (\frac{11}{8} π) = Z (2 π ∕ 2^{4} (2^{3} + 2^{1} + 2^{0}))

. We can then put each of these component phases onto their own ancilla to get:

We have here also sneakily added a zeroed ancilla that gets a

Z (\frac{π}{2})

applied that does nothing. We need this qubit though to complete the pattern: we see then that we get the right shape needed to use Eq. (11.38). However, note that Eq. (11.38) has adjoint phases, instead of the actual phases we need. There are multiple ways we can deal with this. One way is to realise that for phase gates, the adjoint is the conjugate:

T^{†} = \bar{T}

. Hence, if we take the conjugate of both sides of Eq. (11.38) we do get the right phases. Since the Adder is a real matrix, this stays the same, but the states needed for the catalysis also flip:

\bar{| T ⟩} = | T^{†} ⟩

. We then have everything we need to produce the circuit we are after:

Well, that certainly seems like overkill. Why would we go through all this trouble just to prepare a single phase rotation. Well, it turns out that in a fault-tolerant setting we can’t just go and do whatever gate we want to do. We are restricted to just a small set of gates we can (cheaply) implement. So if our computation requires us to do some phase rotation on a weird angle, we have to find a way to this with the gates that we have access too. One way to do this would be to approximate the phase rotation by combining together unitary Clifford+T gates like in Section 11.1.2. But as we have now seen, another way to do it is to prepare just a single copy of each

| Z (2 π ∕ 2^{k}) ⟩

state to serve as a catalyst which can be reused, and then apply some CNOTs and an adder. Because the catalysts can be reused, the asymptotic cost of this procedure is just the cost of the adder and the CNOTs. Let’s calculate more accurately what the cost then is. Suppose we want to implement a phase gate with angle

α

up to a precision

𝜀

. We then find the smallest

n

such that

2^{- n} < 𝜀

. We can then approximate

α

by a phase

\hat{α} = a 2 π ∕ 2^{n}

where

a < 2^{n}

is an integer such that

| α - \hat{α} | < 2^{- n} < 𝜀

. It hence suffices to implement

\hat{α}

instead. Since

a

is an

n

-bit number, we can implement the

Z (\hat{α})

gate using a generalisation of Eq. (11.41) to

n

bits. We saw in Section 10.5 that we can implement an

n

-qubit quantum adder using

n

Toffoli gates. In a fault-tolerant architecture the implementation of these Toffoli gates is what dominates the cost. As

2^{- n} < 𝜀

we have

\log_{2} 1 ∕ 𝜖 < n

, and hence we can express the cost also in terms of the error budget, and say that we require

\log_{2} 1 ∕ 𝜖

Toffoli gates. Decomposing each Toffoli with

4

T

gates we see that we can equivalently say that the cost is

4 \log_{2} 1 ∕ 𝜖

T

gates.

11.4.2 Computational universality via catalysis

Using catalysis we can replace any occurrence of some gate (like

T

) with some other gate (like CS), as long as we have access to some special catalyst ancilla state. We can use this idea to prove that certain gate sets are also universal for quantum computing. In this section we will demonstrate this idea by showing the Clifford+CS gate set is universal. This notion of universality we will use is however not the approximate universality like that of the Clifford+

T

gate set we demonstrated in Section 11.1.2. Instead it is what we will call computational universality. Approximate universality requires that we can approximate any unitary and hence quantum circuit arbitrarily well. But such a strong condition is actually not needed for a gate set to be useful. It is sufficient if we can just simulate the run of an arbitrary quantum circuit using some runs of a quantum computer using the restricted gate set. Let’s work through an example to make this more clear. For this section we will say that the purpose of a quantum computer is to estimate the expectation value of some observable

O

. We start with some state

| ψ ⟩

, apply some unitary

U

to it, and then do measurements and post-process these measurements to get an estimate of

O

. After many such runs we will get a close approximation of

O

. Mathematically we can represent the exact expectation value as:

However, when we are trying to estimate this observable, we don’t have to do this with just a single quantum circuit we run over and over again. Instead we could have a collection of different quantum circuits

V_{j}

(potentially acting on a different number of qubits), input states

| ψ_{j} ⟩

, and observables

O_{j}

, such that taking a particular weighted average gets us the outcome we are after:

where here we define

{⟨ O ⟩}_{j}

to be the expectation value of

O_{j}

with respect to

V_{j}

and

| ψ_{j} ⟩

. We see then that if we can estimate each of the

{⟨ O ⟩}_{j}

, that we can also estimate

⟨ O ⟩

itself, by just summing our estimates like

⟨ O ⟩ = \sum_{j} λ_{j} {⟨ O ⟩}_{j}

. This might seem like quite a hypothetical situation, so let’s give a concrete example. Suppose we have a Clifford+T circuit

C

applied to the input state

| ψ ⟩

. Then we can transform

C

into a circuit

C^{'}

containing just Clifford gates and CS gates using catalysis, so that

C^{'} | ψ ⟩ | T ⟩ = C | ψ ⟩ | T ⟩

. If we were trying to estimate the observable

O

we can then check that:

So instead of just running the circuit

C

, we can run

C^{'}

, which doesn’t contain any

T

gates. This is then an example of Eq. (11.43) where the sum is over just one term and we have

λ_{1} : = 1

U_{1} : = C^{'}

| ψ_{1} ⟩ : = | ψ ⟩ \otimes | T ⟩

and

O_{1} : = O \otimes I

. But now suppose we don’t even want to use that single

| T ⟩

we need for the catalysis. What we can do then is decompose this magic state into a sum of Clifford states. Because each term in the sum needs to retain the form of an expectation value like (11.42), we can’t just decompose

| T ⟩

into pure states

| ϕ_{j} ⟩

, instead we need to decompose

| T ⟩ ⟨ T |

into a sum of

| ϕ_{j} ⟩ ⟨ ϕ_{j} |

density matrices. One way to do this is the following:

Hence, we can decompose

| T ⟩ ⟨ T |

into four Clifford states

| ϕ_{1} ⟩ = | + ⟩

| ϕ_{2} ⟩ = | - i ⟩

| ϕ_{3} ⟩ = | 0 ⟩

and

| ϕ_{4} ⟩ = | 1 ⟩

with weights

λ_{1} = λ_{2} = \frac{1}{\sqrt{2}}

and

λ_{3} = λ_{4} = - \frac{\sqrt{2} - 1}{2}

. Starting with the left-hand side of Eq. (11.44) we then have:

We see then that this is a case of Eq. (11.43) with

| ψ_{j} ⟩ : = | ψ ⟩ \otimes | ϕ_{j} ⟩

and

O_{j} : = O \otimes I

and

U_{j} = C^{'}

for all

j \in {1, 2, 3, 4}

. While we can reduce the calculation of an expectation value to the calculation of a sum of (potentially simpler to calculate) expection values in this way, there is an important issue here that we have however glossed over. We can only ever estimate the expectation value, not get an exact value. Generally, we want to determine an error budget for how close we want the estimate to be, and then that determines how many times we need to sample from the quantum computation. Since we are summing together different expectation values, we need to be careful that we aren’t blowing up the error in the estimates. Suppose for instance that some

λ_{k} = 100

. Then a small error in our estimate of

{⟨ O ⟩}_{j}

will blow up by a factor of a 100. On the other hand, if

λ_{k} = 1 ∕ 100

, then any error will also be decreased by a factor of a 100, so that even a large error is not that important. If you want to be efficient and not over-sample a given expectation value, so that we get its estimate at just the right target precision we then need to sample

{⟨ O ⟩}_{j}

a number of times proportional to

| λ_{j} |

. We can calculate that this summing approach gives a total overhead in the number of samples of

| λ_{j} |

compared to just determining the desired expectation value

⟨ O ⟩

with the original circuit. For instance, in the above example where we decomposed

| T ⟩ ⟨ T |

into four terms, we have

\sum_{j} | λ_{j} | = 2 \sqrt{2} - 1 \approx 1.83

. Hence, if we decompose the magic state in this way we need to collect

1.83

samples more than we would have needed to if we did use the magic

| T ⟩

state directly. Summarising the full procedure we see then that we need to do the following:

When we have Clifford gates and CS gates, the gate set is generated by CNOT, Hadamard,

S

and CS. Of course, CNOT can be constructed using CS and Hadamard, and if we allow states to be prepared into

| 0 ⟩

and

| 1 ⟩

, then we can also prepare an

S

using a CS. Hence, this gate set is equivalent to just the CS and Hadamard gate. We see then that we have proven the following.

Remark 11.4.2. Our decomposition (11.45) of $| T ⟩ ⟨ T |$ is a stabiliser decomposition, a concept we also looked at in Section* 7.8.1. But as we saw here, the simulation overhead was not based on the number of terms in the decomposition, but rather on the weight $\sum_{j} | λ_{j} | = 1.83$ . This value is known as the stabiliser extent, or equivalently, the robustness of magic of the decomposition. Without using any catalysis, we could have chosen to write each of the $T$ gates as a magic state injection, and then replace each of the $| T ⟩ ⟨ T |$ states by its stabiliser decomposition. When we do this however, the stabiliser extent scales as $1.8 3^{t}$ where $t$ is the number of $T$ gates, so that the simulation overhead becomes exponential in the number of non-Clifford gates. We expect such an exponential dependence, since replacing all the $T$ gates gives us a Clifford circuit, and we don’t expect this gate set to be computationally universal. Note that this however does give us a classical simulation technique: write a Clifford+ $T$ circuit as a Clifford circuit where each $T$ gate is replaced by a magic state injection using the $| ϕ_{j} ⟩$ Clifford states, and then efficiently classically simulate each of these Clifford circuits. The cost of this method is then roughly $O (k {(n + t)}^{3} 1.8 3^{t})$ where $n$ is the number of qubits, and $k$ is the total number of gates in the circuit.

Remark* 11.4.3. We haven’t actually given a formal definition of what ‘computational universality’ really is. There are multiple ways we could define it that all differ in the details. One particular way we could define it, which we could also call ‘BQP-completeness’ is as follows: for a given gate set $G$ define the complexity class ${BQP}_{G}$ as the types of decision problems that can be solved with high probability by a quantum computer just using gates from $G$ . Then $G$ is BQP-complete if $P^{{BQP}_{G}} = BQP$ . That is, if a classical computer that can query a quantum computer using gates from $G$ can solve the same problems in polynomial time as a universal quantum computer.

Because we can also catalyse

T

gates using a CCZ gate, we can also prove a version of Proposition 11.4.1 for the Clifford+CCZ gate set, showing that Clifford+CCZ is also computationally universal. In fact, we can restrict the gate set a bit more, as Toffoli+Hadamard is itself already computationally universal. This however requires a different argument then we have been using here, and hence we leave this for the advanced section 11.6.4.

11.4.3 Catalysing completeness

We saw in Theorem 11.2.11 that the (S4) spider nest rule combined with the standard ZX rules we have been using throughout the book is enough to get a complete set of rules for CNOT+

T

circuits. However, this extended rule set is not enough to prove all identities of ZX-diagrams in the

\frac{π}{4}

-fragment. For instance, a simple identity that cannot be proven is the following rule known as supplementarity:

Formally showing that this cannot be proven using the ZX+(S4) rules is difficult, but intuitively we can see this because none of the standard ZX rules have any special behaviour for

\frac{π}{4}

phases, while all the spider nest identities only deal with at least 15

\frac{π}{4}

phases, so there is nothing that says anything about the pair of

\frac{π}{4}

phases here. Finding a rule set and then proving it is complete is usually a difficult challenge. However, using catalysis it turns out we can make our lives much simpler, and simply extend an existing complete calculus. Recall that in Chapter 10 we introduced a new type of generator for ZX-diagrams we called H-boxes. In particular we found a number of rewrite rules for phase-free H-boxes in Section 10.2.2, summarised in Figure 10.0. We remarked there that when we restrict to spiders with

0

and

π

phases, together with phase-free H-boxes, that these rules give us a complete calculus for postselected Toffoli-Hadamard circuits. Such diagrams correspond to a specific type of matrices. Namely, all such matrices are of the form

{(\frac{1}{\sqrt{2}})}^{k} M

where

M

is an integer matrix. Hence, the matrix entries are from a subset of the ring

ℤ [1 ∕ \sqrt{2}]

. As these matrices are just integer matrices up to a global scalar of

\sqrt{2}

we will call this the

ℤ

fragment. Note that in particular there are no complex numbers in this calculus. Using catalysis we can make the set of matrices this calculus can represent larger in a ‘controlled way’ where we can also see which rules we need to add to preserve completeness. To see how this works, we want to first generalise the

T

gate catalysis of Eq. (11.31). First, as our goal will just be to produce states, we can plug

|+⟩

into the top wire of Eq. (11.31). We can then simplify the expression to a more symmetric form:

Here we suggestively used Eq. (10.15) to write the phases as H-boxes. We do this because such a rule doesn’t just hold for an H-box with a label that is a complex phase like

e^{iα}

, it in fact holds for any complex

a \neq 0

This then allows us to write down a generalisation of Eq. (11.48) to arbitrary H-boxes:

Here we wrote

a^{2}

in the 2-ary H-box instead of

a

so that we don’t have to work with square roots. When we take

a = e^{i \frac{π}{4}}

we get Eq. (11.48), but this works for any value. A particularly simple, but still interesting case is when

a = e^{i \frac{π}{2}} = i

. Translating this back into circuit form gives us a catalysis of

| i ⟩ : = | 0 ⟩ + i | 1 ⟩

states using a CZ. While this might seem trivial (it is after all provable using Clifford rewrite rules) in the context of the phase-free H-boxes it allows us to add complex numbers to the fragment. Namely, we can just add as a generator diagram of H-box-i

, a single-ary H-box with label

a = i

to the calculus, and this in fact turns out to be sufficient to then represent any matrix

\frac{1}{{\sqrt{2}}^{k}} M

where

M

now has entries in

ℤ [i]

. We will call this the

ℤ [i]

fragment. However, just getting this universality was the easy part. How do we know what new equations to add to this calculus to make it complete again? Generally, proving completeness is very difficult, as you first need to search for new equations, and then show that those equations are sufficient to prove all true equalities. However, as it turns out, adding the rule Eq. (11.51) for

a = i

to the already existing rules for the label-free fragment is already almost enough to get a complete calculus for this bigger fragment

ℤ [i]

which includes diagram of H-box-i

. To see this, let’s first consider what a generic diagram in the

ℤ [i]

fragment looks like. We added the generator diagram of H-box-i

, so now a diagram consists of generators from the old phase-free fragment plus this new generator. These generators could all combine to give us really complicated rewrites, so we want to prevent them from combining. Using Eq. (11.51) we can reduce all these separate instances of diagram of H-box-i

into just one of them, reducing the complexity of the diagram. That is, given some diagram

D

in the

ℤ [i]

fragment, we can rewrite it to a diagram

D^{'}

containing just generators from the

ℤ

fragment such that:

Or, as it turns out, this is possible for most diagrams in the

ℤ [i]

fragment (Can you see for which ones it doesn’t work? If not, don’t worry, the authors of this book also originally forgot about this case. As we said: completeness is hard). We will talk about the failing case later, but for now let’s assume that our diagram satisfies Eq. (11.52). As a shorthand, we will write

D^{'} [| ψ ⟩]

for the diagram we get when we plug

| ψ ⟩

into the bottom input in Eq. (11.52). So here we have diagram of H-box-i-1

. Note that diagram of H-box-i-2

. Hence, if we expand it like this we see that

D

is equal to a sum of two diagrams:

D^{'}

where we plugged in

| 0 ⟩

into the bottom wire, and

i D^{'}

where we plugged

| 1 ⟩

into the bottom wire:

D = D^{'} [| 0 ⟩] + i D^{'} [| 1 ⟩]

. Now suppose we have two diagrams

D_{1}

and

D_{2}

in the

ℤ [i]

fragment and that they implement the same linear map:

D_{1} = D_{2}

. We can both decompose them as described above to get

D_{1}^{'} [| 0 ⟩] + i D_{1}^{'} [| 1 ⟩] = D_{2}^{'} [| 0 ⟩] + i D_{2}^{'} [| 1 ⟩]

. Each of these

D_{j}^{'} [| x ⟩]

diagrams represents a matrix that is entirely real, so the only way for this equation of complex matrices to hold, is if it holds for the real part and for the complex part separately:

We then conclude that

D_{1}^{'}

and

D_{2}^{'}

are equal when we input either

| 0 ⟩

| 1 ⟩

into the bottom wire. As these states form a basis, this must then hold for any input. We can then leave this wire open and still have an equality:

Now we are in business! We have this equality as linear maps, but both diagrams are in the

ℤ

fragment for which we have completeness. We hence know how to rewrite one into the other. This gives us then a path to rewrite the original

D_{1}

into

D_{2}

Here each equality is now a diagrammatic equality, and with (*) we denote we are using rewrites from the original complete calculus for the

ℤ

fragment. We have then very easily proven completeness for this larger fragment, all made possible using a single rule about catalysis. Well..., we would have proven completeness, if it were true we could always rewrite a diagram in the

ℤ [i]

fragment as in Eq. (11.52). We however made a hidden assumption: that there is at least one generator diagram of H-box-i-3

present in the diagram. When that is the case we can use Eq. (11.51) to reduce all these instances of the generator to just a single copy. But if the diagram didn’t contain any diagram of H-box-i-3

to start with, then this rule does not apply. In fact, we currently have no rewrite rules that relate a diagram containing a diagram of H-box-i-3

to one that does not contain any diagram of H-box-i-3

. This means in particular that our current rule set cannot prove the following true equation:

However, when we also add Eq. (11.56) as an additional rule, then this problem is solved and it is true that we can then always rewrite a diagram in the

ℤ [i]

fragment as in Eq. (11.52): if the diagram contains at least one diagram of H-box-i-3

we can already use Eq. (11.51) to transform to the form of Eq. (11.52), and if it doesn’t we use Eq. (11.56) once to introduce one diagram of H-box-i-3

, in which case it is also in the form of Eq. (11.52).

This trick for extending the calculus doesn’t just work for

i

: it works for any complex number

a \neq 0

such that

a^{2} \in ℤ

. Let’s for example take

a = \sqrt{3}

. We can then do all the steps as before, translating a diagram

D

containing an arbitrary number of the H-box with label

\sqrt{3}

into a diagram

D^{'}

in the

ℤ

fragment which just requires a single input of the

\sqrt{3}

H-box:

D = D^{'} [| 0 ⟩ + \sqrt{3} | 1 ⟩] = D^{'} [| 0 ⟩] + \sqrt{3} D^{'} [| 1 ⟩]

. If we then have an equality between two diagrams

D_{1}

and

D_{2}

in the

ℤ [\sqrt{3}]

fragment, we get

D_{1}^{'} [| 0 ⟩] + \sqrt{3} D_{1}^{'} [| 1 ⟩] = D_{2}^{'} [| 0 ⟩] + \sqrt{3} D_{2}^{'} [| 1 ⟩]

. Because each of the component diagrams only contains integers, this equation can only hold if the integer part and the

\sqrt{3}

part hold separately. We hence again get two equalities

D_{1}^{'} [| 0 ⟩] = D_{2}^{'} [| 0 ⟩]

and

D_{1}^{'} [| 1 ⟩] = D_{2}^{'} [| 1 ⟩]

, which allows us to conclude that

D_{1}^{'} = D_{2}^{'}

with the wire left open. We can then use a modified version of Eq. (11.55) to conclude that we have completeness. We also have a modified version of Eq. (11.56) that continues to be true: diagram of X-H-sqrt3-empty

. Adding the catalysis rule and this scalar rule then gives us a complete calculus for the ring

ℤ [\sqrt{3}]

. Although this covers many possible extensions, it does not cover one we care about: extending it with a

T

gate. This is because taking

a = e^{i \frac{π}{4}}

we see that

a^{2} = i \notin ℤ

. However, it turns out we can just iterate the catalysis procedure to get larger and larger calculi. Starting now with the calculus for the

ℤ [i]

fragment, we can add the H-box with the label

e^{i \frac{π}{4}}

and add its catalysis rule. When we then go through the motions of the completeness proof again we will end up at the equation

D_{1}^{'} [| 0 ⟩] + e^{i \frac{π}{4}} D_{1}^{'} [| 1 ⟩] = D_{2}^{'} [| 0 ⟩] + e^{i \frac{π}{4}} D_{2}^{'} [| 1 ⟩]

, where now each of the diagrams

D_{j}^{'} [| x ⟩]

has entries in

ℤ [i]

instead of

ℤ

. Luckily for us,

e^{i \frac{π}{4}}

is still ‘independent’ of the entries of

ℤ [i]

so that again the only way for this equation to hold is if it holds for each component separately, so that the proof goes through without change. Since

e^{i \frac{π}{4}} = (1 + i) ∕ \sqrt{2}

, this calculus can represent arbitrary matrices with entries in the ring

ℤ [i, \frac{1}{\sqrt{2}}]

11.5 Summary: What to remember

11.6 Advanced Material*

11.6.1 Exact synthesis of Clifford+T states*

In this section we will take another look at the exact synthesis algorithm for Clifford+

T

unitaries described in Section 11.1.1, but now we will consider multi-qubit unitaries and fill in the number theory details. As we saw in that section, when we understand how to synthesise a state, an algorithm for synthesising a unitary follows easily, so let’s look at synthesising states first. So let’s suppose we have a normalised vector

| ψ ⟩ \in 𝔻 {[ω]}^{2^{n}}

. Our task is to find a Clifford+

T

unitary

U

such that

U | ψ ⟩ = | 0 \dots 0 ⟩

. Writing

| ψ ⟩ = (ψ_{1}, \dots, ψ_{2^{n}})

we can represent the vector components

ψ_{i}

ψ_{i} = a_{i} ω^{3} + b_{i} ω^{2} + c_{i} ω + d_{i}

where

a_{i}, b_{i}, c_{i}, d_{i} \in 𝔻

. In Lemma 11.1.2 we saw that if these coefficients are integers, that then all the entries except one must be zero and hence

| ψ ⟩

is a unit vector. For a single-qubit state, the only possible unit vectors are

| 0 ⟩

and

| 1 ⟩

, so if we got

| 1 ⟩

we just apply an

X

gate to get it to be the

| 0 ⟩

we want. However, now the state can be any

| \vec{x} ⟩

up to global phase. We can map this to

| \vec{0} ⟩

by applying

X

gates wherever

x_{i} = 1

. While this is fine if we are synthesising a state, this messes things up when we are synthesising this state as part of a bigger unitary synthesis routine where we care about many columns being sent to the right location. In that case we need to apply the appropriate 2-cycle classical gate (see Section* 10.7.2 for how these can decomposed into Clifford+

T

gates), to transform

| \vec{x} ⟩

into

| \vec{0} ⟩

, or whatever basis state we need it to be. We can also get rid of its phase

ω^{k}

by applying the 1-level

T_{[1]}^{- k}

gate that adds a

ω^{- k}

phase just to the

| 00 \dots 0 ⟩

state. Because these 2-level and 1-level operations only change the basis states we are interested in, they do not mess up any of the other columns of the unitary we are synthesising. So as in the single-qubit case, if all the components in the vector are integers we are essentially done. We then just need to find a strategy to make the vectors be ‘closer to being integers’, i.e. elements of

ℤ [ω]

. The obvious metric for how far an element in

𝔻 [ω]

is from being an element in

ℤ [ω]

is the smallest power of

2

we have to multiply the element with to get an integer. However, this turns out not to be the best choice. This is because

2

is not a prime number in

ℤ [ω]

. The ‘magic number’

δ = 1 + ω

we saw in Section 11.1.1 is a prime in

ℤ [ω]

We can prove

δ

is prime in

ℤ [ω]

by defining a new kind of norm on

ℤ [ω]

Exercise 11.13. On the ring $ℤ [ω]$ we have a norm given by $N_{ω} (z) = z \bar{z}$ . This norm has some nice properties, namely that it is multiplicative, $N_{ω} (z_{1} z_{2}) = N_{ω} (z_{1}) N_{ω} (z_{2})$ , and that it sends elements to positive elements of $ℤ [\sqrt{2}]$ . We can define a different norm with similar properties on $ℤ [\sqrt{2}]$ . For $a + b \sqrt{2} \in ℤ [\sqrt{2}]$ define the conjugate to be ${(a + b \sqrt{2})}^{'} = a - b \sqrt{2}$ , and then define the new norm by $N_{\sqrt{2}} (z) : = z z^{'}$ .

a): Show that the conjugate on $ℤ [\sqrt{2}]$ is multiplicative: ${(z_{1} z_{2})}^{'} = z_{1}^{'} z_{2}^{'}$ . Use this to show that the norm $N_{\sqrt{2}}$ is multiplicative.
b): For $z \in ℤ [ω]$ define $N (z) : = N_{\sqrt{2}} (N_{ω} (z)) = (z \bar{z}) {(z \bar{z})}^{'}$ . Argue that $N$ is also multiplicative, and that it maps all $z$ to positive integers.
c): Show that $z$ is a unit of $ℤ [ω]$ if and only if $N (z) = 1$ . Hint: For the if direction the definition of the norm already gives you the inverse of $z$ .
d): Show that $z$ is prime if $N (z)$ is prime in $ℤ$ .
e): Calculate $N (δ)$ , $N (\sqrt{2})$ and $N (2)$ , and conclude that $N (δ)$ is prime, but the others are not.

So now we know that

δ

is prime while

\sqrt{2}

(and hence

2

) is not. But it turns out that

δ

is also a prime factor of

\sqrt{2}

(and hence

2

So we see that

\sqrt{2}

can be decomposed up to units into two copies of

δ

, and hence

2

can be decomposed into four copies. Hence, instead of considering the smallest power of

2

we have to multiply a number in

𝔻 [ω]

with to get something in our integer ring

ℤ [ω]

, we instead consider the smallest power of

δ

, as this is a more finegrained metric. For a

z \in 𝔻 [ω]

, we call the smallest

k

such that

δ^{k} z \in ℤ [ω]

the least denominator exponent (lde) of

z

. For a vector of values

| ψ ⟩ \in 𝔻 {[ω]}^{N}

, we call its least denominator exponent the smallest

k

such that

δ^{k} | ψ ⟩ \in ℤ {[ω]}^{N}

. Of course if the lde of a vector is

0

, then it already consists of elements in

ℤ [ω]

, and we know that such a normalised vector must be very simple. So if we can just find some procedure to iteratively reduce the lde to

0

, then we are happy. The goal then is to find, for a given

| ψ ⟩

with lde

k

, a set of unitaries

G_{1}, \dots, G_{l}

such that

| ψ ⟩^{'} = G_{l} \dots G_{1} v

has lde at most

k - 1

. Then we could just repeat this procedure until we get to denominator exponent 0. Given a

| ψ ⟩

with lde

k

, we can define the vector

| u ⟩ : = δ^{k} | ψ ⟩ \in ℤ {[ω]}^{N}

. After making some modifications to

| u ⟩

by applying gates to get a

| u^{'} ⟩ \in ℤ {[ω]}^{N}

, we are interested in whether this modification has reduced the lde. In order to see when this is the case, we hence need to know when we can divide

| u^{'} ⟩

δ

, and still get a vector in

ℤ {[ω]}^{N}

. Of course, when we start caring about divisibility by some number, we will need to talk about calculating modulo this number. So in the same way as we have been talking about parities, which are elements of

ℤ

modulo

2

, now we are going to work with residues, which are elements of

ℤ [ω]

modulo

δ

. For elements

x, y \in ℤ [ω]

we will write

x \equiv_{δ} y

to denote that

x - y = aδ

for some

a \in ℤ [ω]

. For instance, in the exercise above we saw that

\sqrt{2} = (δ ω^{- 1} λ) δ

, and hence

\sqrt{2} \equiv_{δ} 0

. It is not hard to see that

\equiv_{δ}

is an equivalence relation, and that it is preserved by addition and multiplication: if

a \equiv_{δ} b

and

c \equiv_{δ} d

, then

a + c \equiv_{δ} b + d

and

ac \equiv_{δ} bd

. We then also have

2 \equiv_{δ} \sqrt{2} \sqrt{2} \equiv_{δ} 0

Proof. We have $δ ≅_{δ} 0$ , and as $δ = 1 + ω$ , we calculate then that

ω \equiv_{δ} ω + 0 \equiv_{δ} ω + 2 \equiv_{δ} δ + 1 \equiv_{δ} 1 .

Hence for any $j$ we have $ω^{j} \equiv_{δ} 1$ , so that $a + bω + c ω^{2} + d ω^{3} \equiv_{δ} a + b + c + d$ . Since furthermore $2 \equiv_{δ} 0$ , we see that hence the residue of an element modulo $δ$ is either $0$ or $1$ . □

Given some

| u ⟩ = δ^{k} | ψ ⟩ \in ℤ {[ω]}^{N}

our goal is to apply operations to

| u ⟩

to make all the components divisible by

δ

, and hence have zero residue. The components with residue

1

are then the ‘obstacles’ we want to get rid of.

Proof. $| u ⟩$ is divisible by $δ$ iff $u_{j} \equiv_{δ} 0$ for all $j$ . But assuming that $k$ was the lde of $| ψ ⟩$ , then by definition it won’t be divisible, and so there will be at least one $u_{j}$ with non-zero residue. By normalisation of $| ψ ⟩$ we have $⟨ ψ | ψ ⟩ = 1$ and hence $\sum_{j} u_{j} \bar{u_{j}} = ⟨ u | u ⟩ = δ^{k} {\bar{δ}}^{k} \equiv_{δ} 0$ . Since residues are either $0$ or $1$ , we then know that there are an even number of cases where $u_{j} \bar{u_{j}} \equiv_{δ} 1$ . The residue is multiplicative, so for these $j$ we must then also have $u_{j} \equiv_{δ} 1$ . Hence, if $| u ⟩$ is not divisible by $δ$ there must be at least a pair of elements $u_{i}$ and $u_{j}$ that each have non-zero residue. □

The fact that non-zero residues come in pairs is good, because it turns out we can only reduce the residue of elements of

| u ⟩

in pairs. We are working with Clifford+T gates. The CNOT,

S

and

T

only contain non-zero elements that are units in

ℤ [ω]

and have at most one non-zero entry per row, and hence applying these gates does not affect the residues of the state. The only gate then that can affect residues is the Hadamard

H = \frac{1}{\sqrt{2}} (\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix})

. We see that the Hadamard creates sums

u_{i} + u_{j}

and differences

u_{i} - u_{j}

of elements of the vector, which can lead to lower lde, but then it also divides the elements by

\sqrt{2}

, which can increase the lde. Because

\sqrt{2}

contains two powers of

δ

, we need to look at the vector elements modulo

δ^{3}

(the next power up), to see if applying a Hadamard will result in lower lde.

Note that from this above exercise we immediately get the following, just by checking all the possible cases:

Now we have all the tools we need to solve the problem at hand. For simplicity, let’s again first assume we are dealing with a single-qubit vector

| u ⟩ = (u_{1}, u_{2}) \in ℤ {[ω]}^{2}

. If it is not already divisible by

δ

then we must have

u_{1} \equiv_{δ} u_{2} \equiv_{δ} 1

, since the non-zero residues come in pairs. Then by the previous lemma we have

u_{1} \equiv_{δ^{3}} ω^{l}

and

u_{2} \equiv_{δ^{3}} ω^{k}

. In other words:

u_{1} = ω^{l} + x δ^{3}

and

u_{2} = ω^{k} + y δ^{3}

for some

x, y \in ℤ [ω]

. Then we see that if we apply a

T^{l - k}

gate to this vector that we get:

where we have defined

y^{'} : = ω^{l - k} y

. Now comes the magic trick: we apply a Hadamard, and we use the fact that

\sqrt{2} = δ^{2} ω^{- 1} λ

, and hence

δ^{2} = \sqrt{2} ω λ^{- 1}

where

λ

is the unit from Exercise 11.14:

We see now that every term has at least one factor of

δ

, so we can factor it out:

Success! Because this means that

H T^{l - k} | u ⟩

is divisible by

δ

. As

| u ⟩ = δ^{k} | ψ ⟩

, this means that

H T^{l - k} | ψ ⟩

now has lde smaller than

k

. We can now repeat this procedure until we get to lde

0

, in which case we know the vector we have is a basis vector, and we are done. This just covers the single-qubit case, but reducing the lde of a multi-qubit normalised vector

| ψ ⟩ \in 𝔻 {[ω]}^{2^{n}}

is done very similarly. Defining

| u ⟩ = δ^{k} | ψ ⟩

, where

k

is the lde of

| ψ ⟩

, we saw that there must be an even number of elements of

| u ⟩

with non-zero residue. We can hence pick a pair

(u_{i}, u_{j})

that both have non-zero residue, and then apply the above technique, just ‘targeting’ this pair to zero out their residues. We can do this targetting by replacing the Hadamard and

T

gates above with the 2-level and 1-level operators

H_{[ij]}

and

T_{[j]}

that hence only change the residues of the elements

u_{i}

and

u_{j}

. The constructions in Section 10.7.2 show how to implement these gates using just Clifford+T gates. We do this reduction of lde with every pair that needs it, until all the residues are zero, in which case the modified

| u ⟩

is divisible by

δ

. This then means that the modified

| ψ ⟩

has lower lde. We then just repeat until the lde is zero and we are left with a basis vector. Pfew, that was a lot, so let’s summarise what we have actually done to get to the solution:

11.6.2 Exact unitary synthesis*

Let’s fill in the details on how to exactly synthesise an entire unitary and not just a single state. We have a

2^{n} \times 2^{n}

unitary

U

with matrix entries in

𝔻 [ω]

. Let

| u_{1} ⟩

be the first column of

U

, and

V_{1}

the Clifford+

T

unitary satisfying

V_{1} | u_{1} ⟩ = | \vec{0} ⟩

that we can find using the procedure described in the previous section. Then we have:

Note that here the first row also becomes a unit vector, because of the orthogonality conditions between the columns of the unitary

V_{1} U

. Now we can take the first column of the smaller unitary

U^{'}

and synthesise it as a state again. We have to be careful to not undo the work we did with reducing the first column of

U

, and hence we need to use 2-level and 1-level operators to only touch the elements of the matrix we want to. Repeating this procedure, we see that we get Clifford+

T

unitaries

V_{1}, \dots, V_{2^{n}}

satisfying

V_{2^{n}} \dots V_{1} U = I

. Hence, we have synthesised

U

V_{1}^{†} \dots V_{2^{n}}^{†}

. All of this is a bit reminiscent of the CNOT synthesis algorithm using Gaussian elimination of Chapter 4. However, there we could encode the entire function of the CNOT circuit into an

n \times n

matrix, while here we are working with a

2^{n} \times 2^{n}

matrix. Hence, even if each

V_{i}

is a small circuit, the overall circuit synthesising

U

might still be exponentially large. We would not expect to do any better as we are now dealing with an approximately universal gate set, and hence there are simply too many possible unitaries we can synthesise for all circuits implementing them to be small. Let’s record what we have now seen in a Theorem.

Proof. In these sections we have found a method to write $U$ using 2-level Hadamard and $X$ operators and 1-level $T$ operators. As described in Section 10.7.2, these can be built using gates with $n - 1$ controls, which require a single zeroed ancilla to be implemented over the Clifford+ $T$ gate set (cf. Section 10.4.2). □

These techniques we have seen for exact synthesis are not unique to Clifford+

T

. They work for many gate sets that can at least express the 2-level operators necessary to move elements of the vector to the place where they are needed. One particularly simple example of this is the correspondence between circuits of Toffoli, CNOT, NOT and

Z

gates, and unitaries over the ring

ℤ

. Since all the entries in such a unitary

U

are integers, the normalisation of the column means that there is at most one non-zero element and that this element is

\pm 1

. Hence, ignoring the possible

- 1

phases, such a unitary is just a big permutation of the basis vectors, which we know we can realise using a Toffoli circuit

V

(Section 10.7.1). The resulting unitary

V U

is then diagonal and only has

\pm 1

phase. The

- 1

phases can be realised by 1-level

Z

operators, and then we are done!

There are several other results like this that make a correspondence between a certain quantum gate set and the set of matrices over a given ring.

Hence, we see that the exact synthesis of Clifford+

T

circuits doesn’t exist in a vacuum, but is in fact part of a ladder of increasingly more powerful gate sets corresponding to larger rings. This ladder can be continued, by replacing the

T = Z (\frac{π}{4})

gate by

Z (\frac{2 π}{2^{k}})

for some

k > 3

. The ring that corresponds to the resulting Clifford+

Z (\frac{2 π}{2^{k}})

gate set is

𝔻 [e^{i \frac{2 π}{2^{k}}}]

11.6.2.1 Optimality of single-qubit Clifford+T unitary synthesis*

Using the exact synthesis algorithm for Clifford+

T

unitaries generally results in very large circuits. The exception is when we apply it to single-qubit unitaries, for which it is in fact optimal in the number of gates needed. As we already saw in Section 11.6.2.1, for a single-qubit unitary, the Clifford+

T

exact synthesis takes a particular nice form. We start with some unitary

Then we find a unitary

V

built out of

H

and

T

gates that synthesises

| ψ ⟩ = (ψ_{1}, ψ_{2})

so that

V | ψ ⟩ = | 0 ⟩

. Then

for some

α^{'}

which is a multiple of

\frac{π}{4}

. Hence we can further reduce

V U

to the identity by applying the appropriate power of

T

. Up to some small constant, synthesising a single qubit unitary hence costs just as much as synthesising a single-qubit state

| ψ ⟩

. The cost of synthesising

| ψ ⟩

is directly related to its lde

k

. To reduce it to lde 0 we need to apply for each reduction a Hadamard gate and some power

T^{m}

of the

T

gate where

m = 1, 2

3

. If

m \neq 1

we can see this as applying a

T

gate and/or an

S

gate. We hence get a sequence of

H

T

and

S

gates, where each application of the Hadamard reduces the lde by at least 1. The total number of gates is hence at most

3 k

, and the number of Hadamard gates and

T

gates is each at most

k

. You could wonder whether there is any way we could do better, but it turns out that this number of gates is actually optimal! Suppose we start from the state

| 0 ⟩

and we apply Hadamard,

S

and

T

gates to it. Then we end up with some state that we can write as

where

k

is its least denominator exponent. Assuming that

k > 0

we see that

x + yδ \equiv_{δ} z + wδ

, as

| ψ ⟩

is normalised. Hence

x \equiv_{δ} z

. Furthermore, we necessarily have

x \equiv_{δ} z \equiv_{δ} 1

since otherwise both expressions would be further divisible by

δ

contradicting

k

being the lde. Applying an

S

and

T

gate to this state doesn’t change the denominator exponent, but a Hadamard can change the lde. We calculate then:

Since

x \equiv_{δ} z

, we see then that each of the components of the vector

x + z + (y + w) δ

and

x - z + (y - w) δ

are divisible by

δ

. As furthermore

\sqrt{2}

δ^{2}

up to some unit

ω^{- 1} λ

, we see then that

for some numbers

a^{'}

and

b^{'}

. Applying a Hadamard can hence only increase the lde by at most 1. So if we got some state

| ψ ⟩

with lde

k

, then we know the circuit building it must contain at least

k

Hadamards. But our synthesis algorithm requires at most

k

Hadamards to synthesise it. Hence the lde is exactly equal to the optimal number of Hadamards needed to synthesise it. Or, well, this is almost true, because our argument above only holds if

k > 0

. If we have

k = 0

, so that

| ψ ⟩

is a unit vector up to some phase, applying a Hadamard increases the lde by

2

. So the number of Hadamards is

k - 1

. This is actually what our synthesis algorithm above also finds if we were to analyse it a bit more carefully, because it turns out that there are no normalised vectors that have lde 1, so that in Eq. (11.58) our lde would actually reduce by 2 if we had

k = 2

to start with.

To conclude we hence see that if we have a vector with lde

k

, then our synthesis algorithm requires an optimal number of

k - 1

Hadamards to synthesise it, and up to

k

T

gates. This last part is because we can have a

T

gate after every Hadamard, but we can also have one appear before the first one.

11.6.3 Approximate single-qubit Clifford+T synthesis*

In Section 11.1.2 we hinted at how we can do approximate synthesis of arbitrary single-qubit unitaries using a Clifford+

T

circuit. In this section we will fill in some more details on how this works, though some parts are a bit too technical even for this advanced section, and so we will just refer to the References for details on how to do that. So we have some arbitrary single-qubit unitary we want to approximate using Clifford+

T

gates. First we recall that any single-qubit unitary can be written as

Z (α) X (β) Z (γ)

for some phases

α, β, γ

. Additionally,

X (β) = HZ (β) H

, so if we know how to synthesise

Z (α)

gates, then we can synthesise arbitrary single-qubit unitaries. Let’s assume then that our goal is to approximate a

Z (α)

gate for some arbitrary

α

using just single-qubit Clifford+

T

gates. First, it will be useful to work with matrices that have determinant 1, so we write our

Z (α)

and our approximating unitary

U

as follows:

Here

u, t \in ℤ [ω]

. Note this this means our definition of

Z (α)

is different from the one we have been using in this book by a global phase. Note also that we wrote our approximation unitary with a factor of

\frac{1}{{\sqrt{2}}^{k}}

in front of it, instead of a power of

δ

. This turns out to be nicer for this algorithm. We will fix an error budget

𝜖 \in ℝ

, and require

∥ U - Z (α) ∥ < 𝜖

. If we found a

U

with this property, then we know how to synthesise it using the algorithm we described in the previous section, and this synthesis will have an optimal number of Hadamards and

T

gates, given the lde

k

. So our goal then is to find a

U

within

𝜖

Z (α)

, with as low a

k

as possible. Given that we know how to determine whether a solution exists for a given

k

, we can find the optimal value of

k

by just starting low and increasing it until we find a solution. Since

k

won’t be too big (it will be

O (\log 1 ∕ 𝜖)

), this will still be efficient. We can hence assume that

k

and

𝜖

are fixed, and then we need to determine whether a solution exists, and if it does, what this solution is. In practice the finding of the solution will also tell us whether there is a solution, so we will focus on that. Our goal then is to find a

U

as in Eq. (11.65) for some given

k

and

α

, such that

∥ U - Z (α) ∥ < 𝜖

for some given

𝜖

. In order to help us do that, we need to have a more concrete description of the norm

∥ U - Z (α) ∥

in terms of the matrix elements.

Proof. Note first that $∥ U - Z (α) ∥ = ∥ I - U^{†} Z (α) ∥$ by the unitary invariance of the operator norm. Now, $U^{†} Z (α)$ is unitary, and hence has two eigenvectors $| ϕ_{j} ⟩$ with eigenvalues $e^{i α_{j}}$ . Because $U$ and $Z (α)$ both have determinant 1 we have $1 = det (U^{†} Z (α)) = e^{i α_{1}} e^{i α_{2}}$ , and hence setting $α : = α_{1}$ we necessarily have $α_{2} = - α$ . The eigenvectors of $U^{†} Z (α)$ are also eigenvectors of $I - U^{†} Z (α)$ and have eigenvalue $1 - e^{\pm iα}$ , so that $∥ I - U^{†} Z (α) ∥ = | 1 - e^{iα} |$ . Recall that the Hilbert-Schmidt norm of a matrix $A$ is given by $∥ A ∥_{HS}^{2} : = \sum_{j} ∥ A | ψ_{j} ⟩ ∥^{2} = \sum_{j} ⟨ ψ_{j} | A^{†} A | ψ_{j} ⟩$ where the ${| ψ_{j} ⟩}$ form an orthonormal basis. It is a standard exercise in linear algebra to show that the Hilbert-Schmidt norm is independent of choice of orthonormal basis. Choosing the eigenbasis of $U^{†} Z (α)$ we see that $∥ I - U^{†} Z (α) ∥_{HS}^{2} = 2 | 1 - e^{iα} |^{2} = 2 ∥ I - U^{†} Z (α) ∥^{2}$ , as both eigenvalues have equal magnitude. Instead picking $| 0 ⟩$ , $| 1 ⟩$ for our orthonormal basis, we calculate that $∥ U - Z (α) ∥_{HS}^{2} = 2 | u_{n} - z |^{2} + 2 | t_{n} |^{2}$ by just evaluating the matrices. Putting these different expressions for the norm together, we calculate:

\begin{array}{l} ∥ U - Z (α) ∥^{2} & = ∥ I - U^{†} Z (α) ∥^{2} = \frac{1}{2} ∥ I - U^{†} Z (α) ∥_{HS}^{2} \\ = \frac{1}{2} ∥ U - Z (α) ∥_{HS}^{2} = | u_{n} - z |^{2} + | t_{n} |^{2} & □ \end{array}

Now, using the fact that

u_{n} \bar{u_{n}} + t_{n} \bar{t_{n}} = 1

and

z \bar{z} = 1

, we can expand

| u_{n} - z |^{2} + | t_{n} |^{2}

further and simplify some more:

Here

ℜ (\bar{z} u_{n})

denotes the real part of the complex number

\bar{z} u_{n}

. So if

U

is a solution to our approximation problem, then

2 - 2 ℜ (\bar{z} u_{n}) \leq 𝜖^{2}

, or equivalently

ℜ (\bar{z} u_{n}) \geq 1 - \frac{𝜖^{2}}{2}

. Interestingly, this does not depend on

t_{n}

, but only

u_{n}

. Furthermore, if we write

u_{n} = a + bi

and

z = x + yi

, then we can interpret them as 2-dimensional real vectors

{\vec{u}}_{n} = (a, b)

and

\vec{z} = (x, y)

, and then

ℜ (\bar{z} u_{n}) = {\vec{u}}_{n} \cdot \vec{z}

is just a dot-product. We hence want to solve the following problem: given a 2-dimensional real unit-vector

\vec{z}

, find a subnormalised vector

{\vec{u}}_{n} = (a, b)

, such that

{\vec{u}}_{n} \cdot \vec{z} \geq 1 - \frac{𝜖^{2}}{2}

, where

a, b \in ℤ [\frac{1}{\sqrt{2}}]

and we allow some maximal power of

\frac{1}{\sqrt{2}}

depending on

k

. We call this a grid problem, because we have some target region, the vectors whose inner product with

\vec{z}

is very close to

1

, and a grid of points, given by

ℤ {[\frac{1}{\sqrt{2}}]}^{2}

, and we want to find a point on this grid that is in the target region. It turns out that this problem is efficiently solvable, although describing in full how to do so is quite lengthy (see the References of this chapter). So let’s suppose that we can solve the grid problem, and hence that we get a

{\vec{u}}_{n}

satisfying

{\vec{u}}_{n} \cdot \vec{z} \geq 1 - \frac{𝜖^{2}}{2}

. This then gives us the candidate

u

we are looking for. But we then still need to find a

t

, so that the matrix

U

defined as in Eq. (11.65) is in fact unitary. The only equation we need to satisfy for this matrix to be unitary is

| u |^{2} + | t |^{2} = 2^{k}

, as this means that the columns of the matrix correspond to normalised vectors. We know the expression for a norm for an element in

ℤ [ω]

, this is namely given by Eq. (11.2): for

z = a ω^{3} + b ω^{2} + cω + d

we have

Rewriting the equation

| u |^{2} + | t |^{2} = 2^{k}

to isolate our unknown

t

we get

| t |^{2} = 2^{k} - | u |^{2} .

But this right-hand side will now be some

ζ = x + y \sqrt{2}

for integers

x, y

. Writing

t = a ω^{3} + b ω^{2} + cω + d

, we can also expand the left-hand side. We then have a part of the equation that results in an integer, and a different part that results in an integer multiple of

\sqrt{2}

. These need to be independently satisfied, so that we can split the equation up into two parts:

This is a pair of quadratic equations over the integers, and is hence a special case of a Diophantine equation. While these are in general hard to solve, in this particular case it turns out that a large proportion of them are fast to solve. So in practice, we can try a bunch of candidates until we find one for which our solution strategy works. Once the details about solving the grid problem and this Diophantine equation are filled in, this gives an algorithm that gives an approximation of the phase gate

Z (α)

using Clifford+T gates that uses an optimal number of Hadamard and

T

gates. The length of the circuit scales quite well with the desired precision. For instance, we can approximate

Z (π ∕ 128)

to within

𝜖 = 1 0^{- 10}

with a circuit containing 102

T

gates. This algorithm turns out to be both efficient and optimal. However, this optimality guarantee only applies when we restrict to unitary ancilla-free circuits. If we allow ancillae, measurements or classical control it is possible to do better by some constant factors. The optimal number in the unitary ancilla-free case scales as

3 \log_{2} (1 ∕ 𝜖) + C

for some small constant

C

, while the best known protocol when we allow measurements and classical control scales as

\frac{1}{2} \log_{2} (1 ∕ 𝜖) + C^{'}

(see the References of this chapter).

11.6.4 Computational universality of Toffoli-Hadamard*

In this section we will show that the set of real-valued unitaries, i.e. where all matrix entries are real numbers, is computationally universal. Then we will show that in fact the restriction to just Toffoli and Hadamard is already computationally universal. First, obviously real-valued unitaries are not approximately universal as we can never approximate any complex-valued unitary (like the

S

gate). But as we saw in Section 11.4.2 for a different gate set, it turns out we can ‘simulate’ complex-valued unitaries using a real-valued one on a larger set of qubits. For a (complex-valued)

n

-qubit unitary

U

, let

ℜ (U)

be the real part of

U

. That is:

ℜ {(U)}_{ij} = ℜ (U_{ij})

. Similarly define the complex part

ℑ (U)

. Then

U = ℜ (U) + iℑ (U)

. Now define the

(n + 1)

-qubit unitary

\tilde{U}

via

While it is clear that

\tilde{U}

is real-valued, it is not immediately obvious that it is unitary.

Exercise 11.17. In this exercise we will show that $\tilde{U}$ is indeed unitary for any choice of $U$ .

a): Express $ℜ (U^{†})$ and $ℑ (U^{†})$ in terms of $ℜ (U)$ and $ℑ (U)$ .
b): Express $ℜ (UV)$ and $ℑ (UV)$ in terms of $ℜ (U)$ , $ℜ (V)$ , $ℑ (U)$ and $ℑ (V)$ .
c): Show that $((⟨ 0 | \otimes ⟨ ψ |) {\tilde{U}}^{†}) (\tilde{U} (| 0 ⟩ \otimes | ψ^{'} ⟩)) = ⟨ ψ | ψ^{'} ⟩$ . Hint: You will need $ℜ (U^{†} U) = ℜ (I) = I$ .
d): Show that $((⟨ 0 | \otimes ⟨ ψ |) {\tilde{U}}^{†}) (\tilde{U} (| 1 ⟩ \otimes | ψ^{'} ⟩)) = 0$ . Hint: You will need $ℑ (U^{†} U) = ℑ (I) = 0$ .
e): Conclude that $\tilde{U}$ is indeed unitary.

The encoding into

\tilde{U}

is also compositional, meaning we can apply it iteratively to a sequence of unitaries.

We can however not use the argument of Section 11.4.2 to use this to show real-valued unitaries are computationally universal as there is no way to write the

| - S ⟩ ⟨ - S |

catalyst state in terms of states that can be prepared by real-valued unitaries. We can however use a different argument to prove computational universality.

Proof. Suppose $C$ is an $n$ -qubit circuit built out of unitaries as $C = U_{1} \dots U_{k}$ . We then build the real-valued $(n + 1)$ -qubit circuit $\tilde{C}$ by $\tilde{C} = \tilde{U_{1}} \dots \tilde{U_{k}}$ . Then note that:

\begin{array}{l} (⟨ 0 | \otimes ⟨ ψ |) \tilde{C} (| 0 ⟩ \otimes | ψ^{'} ⟩) & = ⟨ ψ | ℜ (C) | ψ^{'} ⟩ \\ (⟨ 1 | \otimes ⟨ ψ |) \tilde{C} (| 0 ⟩ \otimes | ψ^{'} ⟩) & = ⟨ ψ | ℑ (C) | ψ^{'} ⟩ . \end{array}

Hence, if we input $| 0 ⟩ \otimes | ψ^{'} ⟩$ into $\tilde{C}$ and do a measurement marginalising over the first qubit we also get the probabilities:

\sum_{x = 0, 1} | ⟨ x, ψ | \tilde{C} | 0, ψ^{'} ⟩ |^{2} = | ⟨ ψ | ℜ (C) | ψ^{'} ⟩ |^{2} + | ⟨ ψ | ℑ (C) | ψ^{'} ⟩ |^{2} .

Now, we can also, with some effort, calculate that $| ⟨ ψ | C | ψ^{'} ⟩ |^{2} = | ⟨ ψ | ℜ (C) | ψ^{'} ⟩ |^{2} + | ⟨ ψ | ℑ (C) | ψ^{'} ⟩ |^{2}$ . Hence, the probability distribution we get for $C$ is the same as that for $\tilde{C}$ when we prepare the first qubit in the $| 0 ⟩$ state and ignore its measurement outcome. □

Because we can simulate complex-valued quantum circuits using real-valued unitaries in this direct manner, we don’t need all the real-valued unitaries. Given some approximately universal gate set

G

we only need to be able to represent

\tilde{U}

for

U \in G

. For instance, let’s take the Clifford+

T

gate set

G = {T, H, CNOT}

. Now,

H

and

CNOT

are already real-valued, and it is easy to see that

\tilde{U} = I \otimes U

U

is real-valued, so that it remains to see what gates we need for

\tilde{T}

. Calculating its matrix, we see that it is

We can construct the CZ gate using

H

and

CNOT

so that we only additionally need the controlled-Hadamard gate.

Proof. We can use an ancilla in the $| 1 ⟩$ state and a CH gate to create the Hadamard. Using Hadamard and $CNOT$ we can construct the CZ gate. We can hence represent $\tilde{T}$ , $\tilde{H} = I \otimes H$ and $\tilde{CNOT} = I \otimes CNOT$ . Since ${T, H, CNOT}$ is approximately universal, this means that we can use the ${X, CNOT, CH}$ gate set to arbitrarily closely simulate any unitary. □

The controlled-Hadamard is a bit of an arbitrary choice of gate. It turns out that the gate set of Hadamard and Toffoli is also computationally universal. This is a very pleasing result, because the Toffoli gate is universal for reversible classical computing, while the Hadamard gate is the single-qubit Fourier transform. So this in a way shows that the extra power of quantum computing comes from having access to this Fourier transform. The way we prove its computational universality, is by showing that Toffoli-Hadamard ‘simulates’ the CS-Hadamard gate set, which we know is computationally universal by Proposition 11.4.1. The real-valued encoding

\tilde{CS}

of the CS gate is equivalent to a Toffoli up to some swaps. With the Hadamard we just get

\tilde{H} = I \otimes H

. Hence, when we encode the CS+Hadamard gate set, we get the Toffoli+Hadamard gate set. We can hence do the following: starting with a Clifford+T computation, we write it as an ensemble of CS+Hadamard circuits. We then encode each of these circuits into a real-valued Toffoli+Hadamard circuit. By doing this we can efficiently simulate the original Clifford+T circuit. We see then that Toffoli+Hadamard circuits are also computationally universal.

So just using Hadamard gates and Toffoli gates we can simulate any quantum computation to any desired precision. The computational universality of real-valued quantum computing has an interesting philosophical consequence. One could wonder why quantum mechanics uses complex numbers. Why not just stick to regular old real numbers? The results of this section show that one plausible answer is in any case not a solution: we don’t need the complex numbers to reach a certain computational complexity, as we could have done the same with real numbers.

11.7 References and further reading

Exact Clifford+T synthesis That unitaries over the ring

𝔻 [ω]

can be exactly synthesised over the Clifford+T gate set was proved in [107]. The presentation we use here was originally given in Seth Greylyn’s Master thesis [118].

Approximating unitaries with Clifford+T gates The first paper to give an efficient algorithm for approximating single-qubit phase gates with Clifford+T gates was given in [207], which found a T-count complexity of

4 \log_{2} (1 ∕ 𝜖)

. This was improved to the optimal

3 \log_{2} (1 ∕ 𝜖)

in [201]. A good reference for reading about the improvements made to approximate Clifford+T synthesis by incorporating ancillae, measurements and classical control is [154].

Spider nests That a 4-qubit phase gadget can be decomposed into the collection of all phase gadgets with at most 3 legs first appeared in [12], which hence gives the first appearance of a spider nest identity. The name ‘spider nest’ was coined in [75] which also described the optimisation technique based on toggling gadgets based on small spider nest identities. The notion of a strongly 3-even matrix was introduced in [35]. The relation between strongly 3-even matrices and

T

-count optimisation seems to have been folklore for a number of years as the strongly 3-even matrices were mostly used in the context of quantum error correcting codes to understand when a code has a transversal

T

gate (we will have a lot more to say about all that in the next chapter). The relation between strongly 3-even matrices and spider-nest identities was formally spelled out in [149]. In [16] it was shown that phase gadgets with dyadic angles are the only gadgets that allow non-trivial identities. If we for instance have gadgets with phases that are multiples of

\frac{π}{3}

, then the only way they can cancel out is if they do so in the trivial way where the gadgets fuse together. See also [224] where they show that in certain settings the only optimisation we can do to ‘black-box’ non-Clifford phases is to fuse them.

T-count optimisation The idea to use small spider nests to optimise

T

-count was introduced in [75] and later improved upon in followup work by the same authors [74]. The relationship between T-count optimisation and Reed-Muller codes was established in [16] and the relation to symmetric 3-tensor factorisation in [126]. It is the representation as a symmetric 3-tensor that is currently the leading approach in T-count optimisation, with the current best methods being [202, 228]. The NP-hardness of T-count optimisation was established in [223].

Scalable ZX The scalable ZX-calculus was formally introduced in [49], though many of the new features appeared in an informal way in an earlier preprint [51]. The CNOT+

T

completeness in scalable ZX of the Clifford rules plus the (S4) rule appeared in [149]. Note that Clifford+(S4) is not complete for the full fragment of ZX-diagrams where the phases are multiples of

\frac{π}{4}

. This is because (S4) preserves the same invariant that was used in [191] to argue that the Clifford rules are incomplete for the universal fragment of ZX-diagrams. In [136] a complete rule set for ZX-diagrams with

\frac{π}{4}

phases was given.

Catalysis The CCZ to 2

| T ⟩

catalysis was first described in [105], while the

T

-CS catalysis seems to be more of a folklore result. The idea of using catalysis to relate different gate sets was introduced in [13]. The synthesis of small phase gates using an adder and measurement-based uncomputation was described in [103]. The relation between catalysis, computational universality and completeness was studied by the authors of this book in [144]. This paper serves as the basis for Section 11.4.

Toffoli-Hadamard universality The original proof that Toffoli+Hadamard is computational universal is given in [208] (this also seems to be the paper that coined the term ‘computational universality’). This actually shows this property for a wider set of gates. Namely, if you have a Toffoli gate and any non-computational-basis-preserving gate, then this will lie dense in the group of special orthogonal matrices. A simpler proof of Toffoli+H universality, which is what we use, is given in [4], which also used the term ‘computational universality’. They show this easily by establishing that the gate set consisting of the controlled-S gate and the Hadamard gate maps to the Toffoli+Hadamard gate set. The approximate universality of the CS+H gate set was shown in [152, Lemma 4.6 on p. 1213]. Kitaev proves this using a ‘geometric lemma’ that adding a gate to a set of gates that stabilises a given state creates a larger-dimensional space of gates. This proof is not constructive. He then proves his Solovay-Kitaev algorithm to show how you would do it constructively (that proof only requires that the set of gates does in fact lie dense in the group). Note that what he calls

S

is the Hadamard gate, and

K

is the

S

gate. In this book we instead show that CS+H is ‘just’ computationally universal, as this follows much more directly using catalysis. This proof originally appeared in [144].

Chapter 11Clifford+T

11.1 Universality of Clifford+T circuits

11.1.1 Exact synthesis of one-qubit gates

11.1.2 Approximating arbitrary single-qubit gates

11.2 Rewriting Clifford+T diagrams

11.2.1 Spider nests as strongly 3-even matrices

11.2.2 Proving all spider nest identities

11.2.3 Spider nests as Boolean polynomials

11.3 Advanced T-count optimisation

11.3.1 Reed-Muller decoding

11.3.2 Symmetric 3-tensor factorisation

11.4 Catalysis

11.4.1 Catalysis as a resource for compilation

11.4.2 Computational universality via catalysis

11.4.3 Catalysing completeness

11.5 Summary: What to remember

11.6 Advanced Material*

11.6.1 Exact synthesis of Clifford+T states*

11.6.2 Exact unitary synthesis*

11.6.2.1 Optimality of single-qubit Clifford+T unitary synthesis*

11.6.3 Approximate single-qubit Clifford+T synthesis*

11.6.4 Computational universality of Toffoli-Hadamard*

11.7 References and further reading

Chapter 11
Clifford+T