But it’s not enough to know that there exist special vectors whose pairings are easy to compute. After all, as a stupid example, the pairing between two -vectors supported in only dimensions is also computable in time. What distinguishes the vectors that feature in holographic algorithms from these vectors is the **surprising availability of the former in complexity theory problems**. Whereas there are way too many vectors in not supported on only polynomially many dimensions, the codimensions of for of low dimension and the spinor variety over low dimensions are quite small, and just as we’ve seen with matchgates, the vector spaces involved can often be decomposed into small enough “local” components (this local decomposition is key, because the abovementioned codimensions blow up very quickly).

In what follows, I hope to give a rigorous treatment of these ideas. My main references Chevalley’s *The Algebraic Theory of Spinors and Clifford Algebras* and the fifth chapter of Varadarajan’s lecture notes on supersymmetry for the development of the basic theory of Clifford algebras and spinors, and Landsberg’s *Tensors: Geometry and Applications* and Manivel’s *On Spinor Varieties and Their Secants* for establishing the connection between the matchgate identities and the variety of pure spinors. As always, if there are any mistakes in presentation below, they are entirely my fault.

**1. Clifford Algebras and Spinors **

“No one fully understands spinors. Their algebra is formally understood but their general significance is mysterious. In some sense they describe the “square root” of geometry and, just as understanding the square root of −1 took centuries, the same might be true of spinors.” -Michael Atiyah

As a remark, I make no attempt to cover technical material related to spinors and Clifford algebras beyond the scope of their relevance to holographic algorithms, so I apologize for failing to include certain key ideas (e.g. anything more than a basic mention of the graded structure of the Clifford algebra).

The notion of a spinor, while not named at the time, first came up in Cartan’s famous classification of the simple representations of all simple Lie algebras when he discovered an “exotic” representation of not obtainable from tensors over the defining representation (the reason being that the weights of representations obtained from the usual tensor constructions are integral while those of the spin representation are odd multiples of 1/2). We see the spin representation come up immediately, for example, from the fact that (the belt trick): the spin representation is merely the faithful representation of degree 2 of . Spinors are just the vectors acted on by the spin representation.

In any case, the cleanest route to developing these notions will be from the point of view of Clifford algebras, which can be thought of in one way as providing a generalization of the multiplicative structure of the vector spaces of complex numbers, quaternions, and octonions to higher dimensions. In our discussion below, we will for convenience restrict ourselves to the field , though the results are valid over all fields of characteristic not equal to 2. Let denote a real vector space of dimension , let be some quadratic form, and let denote the symmetric bilinear form associated with (namely, ).

The **Clifford algebra** of associated to is defined to be the -algebra , where is the ideal in generated by all elements of the form .

is graded, so denote by and the direct sums of the respectively even and odd components of , and denote by and the spaces .

As a first observation, note that there is a natural map , and it’s easy to see that in fact this natural map is an injection; so we’ll regard as sitting inside . Now let’s prove some results on the structure of which will allow us to define spin representations.

As a second observation, we can characterize in terms of the exterior algebra of :

The underlying vector space of is isomorphic to .

*Proof:* There is a natural filtration corresponding to , and projecting this filtration down to gives a filtration on . But the graded ring associated to this filtration is isomorphic to . Indeed, , and for any , , from which we conclude that we have a natural isomorphism of graded algebras . But as vector spaces, and are isomorphic, so we’re done.

As a third observation, real Clifford algebras are parametrized quite nicely. By Sylvester’s law of intertia, quadratic forms can be diagonalized into the form for some which are invariants of , so the real Clifford algebras are parametrized by these invariants. We will thus denote them by when we wish, in which case if , we can choose generators of satisfying and .

If is even, then is a central simple algebra.

*Proof:* Say that for , and pick the generators described above, but assume that we’ve diagonalized so that . Then note that the elements and satisfy and and thus generate , which is the real matrix algebra , while generate . The conclusion is that , and it remains to show that is central simple. For those familiar with the proof of Bott periodicity with Clifford algebras, it is easy to check using this same , trick that while the and don’t consistently generate the same algebra , the intermediate algebras generated at each step tensor together nicely, so that for all , we get , and we’re done.

While we cannot make the same claim for odd, we can make this claim about :

If is odd, then is a central simple algebra.

*Proof:* is generated by , which satisfy and so that is isomorphic to just or , which we know from the previous discussion to be central simple.

Now let denote the orthogonal group of transformations preserving .

The **Clifford group** of is the set of for which conjugation of any gives something inside . Let be the linear representation of sending to this conjugation map; call this the **vector representation of **.

In fact, sends not just into but into : . Moreover, it is known that if the rank of is even (resp. odd), is all of (resp. the even-graded elements of with determinant one), though we’ll omit this proof.

Moving onto the spin representation, assume first that is even. We know is simple, and simple representations of simple algebras are equivalent, so pick any such representation, and this is the **spin representation**. Vectors in the space over which it acts are called **spinors**.

On the other hand, assume that is odd. We can no longer say that is simple, but we do know that is, so again, all simple representations of this are equivalent. By Lemma 1 below, there are two ways to extend this unique spin representation of to a representation, and these are the spin representations of , with their corresponding spinors defined as above.

If is not simple, there are exactly two ways to extend the spin representation on to one on .

*Proof:* Before we jump to the proof of this, let’s prove a basic but key fact about the center of :

**Claim 1**: If is of odd dimension, then for some odd element for which .

*Proof:* For orthogonal basis , we will denote . If , then ; otherwise . Pick in the center, and write it as . Then one can check that for all implies that for all except , and is the odd element that we’re looking for, up to scaling.

Now write any uniquely as for . The maps and are homomorphisms . Now simply compose the spin representations with these and to obtain two distinct representations and .

Now say that we had another representation of extending . Call it , and say that . The space of spinors decomposes into , those for which , and , those for which . But preserves the action of both of these subspaces, and by the fact that is simple, we conclude that must be either or .

**2. Pure Spinors **

The particular kinds of spinors which form the bridge between geometry and holographic algebras are the so-called “pure spinors.” We begin by exploring these in the classical sense. The intuition is that pure spinors corresponding to a certain kind of subspace of are the analogue of “decomposable vectors” associated to a subspace of . They’re particularly nice because they correspond to maximal linear subvarieties of the quadric .

Before we can define pure spinors, here is the key preliminary result. For simplicity, we will assume that is of even dimension. Pick a splitting into maximal isotropic spaces . Denote the uniquely defined spin representation of by .

The intersection between a minimal left ideal and a minimal right ideal of is a 1-dimensional vector space.

*Proof:* A theorem due to Brauer tells us that and for some idempotents and .

If is the kernel of and is the image of under the projection , then we see first off that for all , maps to zero. On the other hand, for any that maps to zero, ( acts as the identity on ), from which we conclude that is the set of elements in that kill . But the set of elements in that kill any bigger than likewise corresponds to a left ideal smaller than , contradicting minimality. We conclude that is a hyperplane.

The argument for is analogous. If is the image of under the projection , then we see first off that for all , sends to itself. On the other hand, for any that does this, ( acts as the identity on ). The set of elements of that project into a smaller than corresponds to a right ideal smaller than , contradicting minimality. We conclude that is a line.

is thus the set of which kill and project into . For , lies in a line. If we pick a basis element for , is determined exactly by the for which , so we conclude that is of dimension 1.

Now take to be a maximal isotropic subspace of . and say generates as a subalgebra of . Pick a basis of and define to be . Also define to be for basis elements of . By Lemma 1, we note that both and is defined up to a multiplicative factor, independent of the choice of bases. But one can show that and are minimal left and right ideals respectively.

Now because and intersect on a line, say that their intersection is some for one-dimensional subspace . is called the line of **representative spinors** of .

A **pure spinor** is any spinors representative of *some* maximal isotropic .

These spinors are called “representative” for the following reason:

If is a representative spinor of , then is precisely the set of all which kill .

*Proof:* Recall that for of even dimension, , meaning there must exist some by which can be conjugated to obtain . As such, we’ll assume that , in which case the corresponding line of pure spinors contains . But for any is multiplication by the component of inside , and this map is zero iff this component is zero, i.e iff .

There is thus a correspondence between maximal isotropic subspaces of and lines of pure spinors. As such, we’ll define the so-called **spinor variety** as the variety of all maximal isotropic subspaces of .

This variety consists of two connected components and such that two maximal isotropic subspaces lie in the same component iff their intersection has dimension of the same parity as .

As an aside, recall that one of the ingredients for holographic algorithms to work is that “efficient pairings can be computed quite often.” Once we establish the connection between standard signatures and points on the spinor variety, this statement translates to the fact that “the spinor varieties over low dimensions have low codimension.” Indeed, the spinor varieties over dimensions 2, 3, and 4 are , , and .

**3. Sub-Pfaffians and Matchgates Revisited **

Having developed the foundations for the theory of spinors, we will now show how they relate to the matchgate identities. In this section, we will eventually characterize points on the spinor variety in terms of sub-Pfaffians.

Before we do this however, let’s look at the analogous correspondence in the case of the Grassmannian manifold .

The Plucker embedding sends to the projectivization of via so that the cone of the Grassmannian is the set of decomposable vectors inside .

Write as for -plane . Say that has basis elements and has basis elements . The neighbors of , can be locally parametrized by matrices of coordinates of (i.e. the tangent space at the point of ) as . Then one can check that for and of the same size, the coordinate of the th basis element where for and otherwise equals the determinant of with rows indexed by and columns indexed by .

The relationship between the spinor variety and sub-Pfaffians of some coordinate matrix is *completely analogous*. But first, we give a slightly different, cleaner formulation of spinors as sub-Pfaffians using an **exponential map**. Pick a basis of . If , define . This map has all the properties one would expect of an exponential map: 1) , 2) it is approximated by in the sense that if is the multiple of any basis element or in fact anything decomposable, then , and . Furthermore, one can check that conjugation by any individual factor preserves , so is always in , and furthermore is just multiplication by . The reason we introduce this map is that the coefficients of for all are precisely the Pfaffian of the submatrix of with rows and columns indexed by the .

The main result of this section is the following characterization due to Chevalley of all pure spinors:

A spinor is pure iff it can be represented as for some linearly independent in and scalar . If represents , are a basis of .

*Proof:* To prove this, we’ll need two auxiliary results.

Let be the space spanned by , and let be the -component of ‘s conjugate. Then is a maximal isotropic subspace, and represents .

If are all maximal isotropic subspaces, and , then there exists a for which sends to and fixes every point of .

Let’s first see how these two lemmas prove Theorem 3. We know that represents a space whose -component has basis . By our characterization of the space that a pure spinor represents as the space of all things killing it, the space that represents is conjugated by . But because the elements of are fixed by conjugation by , so we also get the second part of the theorem.

In the other direction, start with any maximal isotropic . Take , , and in the second lemma to be , , and as defined above. As we’ve already noted, represents , and by the second lemma we have some for which sends to , so the line of representative spinors consists of multiples of , as desired.

It remains to prove our lemmas. For the first lemma, and clearly have dimensions which sum to that of , and the bilinear form associated to is nondegerate, so , which is clearly isotropic, is maximal. represents it because is the space of all elements killing it.

For the second lemma, we will show that any map in the orthogonal group fixing the points of is of the form for some . After that, it suffices to construct a map sending to while fixing .

Start with any fixing points in . It turns out that it’s possible to construct bases for and , call them and , such that , for all for some , and for all , where the depends on the parity of . The we are looking for is then simply .

Finally, to construct the desired map sending to , define to be the subspace of spanned by , and to be . It suffices to show that and can both be sent to while fixing points of . We will omit the details, but the map from to sends via the identity and via taking the -component.

The big conclusion is that the **the matchgate identities cut out the variety of pure spinors**!

Anyways, for the sake of concreteness and completeness, let’s continue the analogy we started with. Instead of a point on , consider a point on the spinor variety of . In other words, rather than a -plane , let’s take a maximal isotropic subspace , and let as before. Whereas the Plucker map embeds into the projectivization of , our discussion of the spinor variety tells us we have an embedding of the projectivization of (where denotes the sum of the even graded components of and denotes the sum of the odd ones). Whereas the neighbors of a -plane in were parametrized by any matrix of coordinates inside , as one can check, the neighbors of a maximal isotropic subspace in the spinor variety are parametrized by skew-symmetric matrices :

The space generated by is maximal isotropic iff is skew-symmetric.

Whereas the decomposable vector in had coordinates the determinants of submatrices of , the pure spinor corresponding to , by Chevalley’s result, has coordinates the *Pfaffians* of certain submatrices of . Specifically, a bit of computation gives us the following observation.

For even, the th coordinate of this pure spinor is the Pfaffian of the submatrix of with rows and columns indexed by .

**4. Conclusion **

In this post, we began with a classical presentation of the matchgate identities and proceeded to develop the appropriate theory for understanding where these identities really come from. To reiterate, this latter discussion in some sense “demystifies” holographic algorithms by uncovering the *underlying geometric reason* why holographic algorithms work: realizable signatures of arity are parametrized by points on the variety of pure spinors over dimension , tensor contractions are often easy when there is this kind of additional structure because, and for low such varieties have low codimension so that this happens nontrivially often.

Of course, the bigger question is still whether holographic algorithms could ever be used to give a proof that (in fact, such a proof would actually imply the more astounding result that ), and if not, how geometry could somehow “know” that certain problems are intractable.

I don’t know if anyone has a convincing answer yet. To further explore the full potential of holographic algorithms, I will take a detour from these more theoretical considerations in the next post and cover some more of Cai’s work, namely his recent work on holographic algorithms over domains of larger size than just the Boolean domain, asking whether larger domains, unlike larger bases, can add power to holographic algorithms.

]]>We claimed last time that the following identities are necessary for a vector to be the standard signature of a planar matchgate . In fact, we will show that they are also sufficient.

**Theorem 1**: The vector is the standard signature of a matchgate of arity iff given any ,

where denotes the bit vector with support at the th bit, denotes the position of the th nonzero bit of , and is the Hamming weight of .

**1. Necessity **

We begin by showing necessity. The basic idea is that the matchgate identities are really just identities that sub-Pfaffians must satisfy, and if we impose upon our matchgate a Pfaffian orientation, the possible sign differences between the Pfaffian and in any of the subgraphs will cancel each other out in those identities so that satisfies these same identities.

Let’s first prove the necessity of these identities for sub-Pfaffians:

**Lemma 2**: The identities of Theorem~1 hold if is replaced by , i.e.

Here, for a string denotes the Pfaffian of the weight matrix of , where denotes the input/output nodes indexed by the nonzero bits of . For convenience, if , denote the Pfaffian of the submatrix whose rows and columns are indexed by these by .

*Proof:* As we alluded to previously, the matchgate identities arise from the so-called Grassmann-Plucker identities. Specifically, if and are indices from , then because for any indices , it is trivial to verify that

We can write this in a nicer way in terms of the symmetric difference , which we will denote by :

Lemma~2 follows.

**2. Convenient Sign Cancellation **

Now, it suffices to show that the extra signs mentioned above cancel in such a way that the Grassmann-Plucker identities still hold for . Fix some Pfaffian orientation of and define for to be .

Now compare equation (2), which we know, to equation (1), which we want. Because and differ only in sign for any string , the nonzero terms are in bijection. If the th summands of (2) and (1) differ in sign by , we want to show that for all , , i.e. . Cai/Gorenstein’s proof of this is a bit involved, so we will only sketch the key ideas in proving this.

Because for all by definition, really the claim we are making is the following:

**Lemma 3**: For any and strings , , and , and bits ,

*Proof:* We will prove this for , though the proof for follows analogously. We will assume that the matchgate is connected. For convenience, for each attach a path of length 2 to the th input/output node (with the two edges of weight 1) and define the new input/output node to be the other ends of these paths. Call it node . Note that this doesn’t affect the standard signature of the matchgate.

A Pfaffian orientation on induces one for each , which we’ll denote by . Denote by (resp. ) the sign of the summand in the Pfaffian (resp ) of (endowed with a particular Pfaffian orientation) corresponding to a matching of .

Fix values for , and denote and by and respectively. Define and to be matchings of and . To get the desired result, we need to show that the value of the left-hand side of (5) is independent of the choice of . Let be the edge connecting input/output nodes and , define to be , with the extra edge oriented to preserve the Pfaffian orientation.

Define to be the matching . We show that the relationships between the sign of and the signs of and are the same regardless of .

First of all because doesn’t contain , and by definition of a Pfaffian orientation: all summands of the Pfaffian must have the same sign. We conclude that the signs of and are always the same.

The analysis for is a bit trickier. and are off by a factor of , where is the permutation corresponding to the matching . The sign of the permutation is where is merely the number of edges from an output/input node between and to a point outside of this range, i.e. the number of zeros in between bits and . For , consider its value when is zero only at and . Now with every bit switched to zero, we’re including one more counterclockwise edge in the cycle formed by the new face created by the addition of . In other words, for a quantity not depending on , so we’re done.

**3. Sufficiency **

Next, we want to show that the matchgate identities are in fact sufficient conditions, i.e. given a -vector satisfying these identities, we can construct a planar graph with that vector as its standard signature. The idea will be to construct a graph, not necessarily planar, whose standard signature agrees with the desired signature in enough places that the matchgate identities force a single possible value for each of the other entries of its standard signature. We then want to alter this graph by replacing all intersection points with so-called “crossover gadgets” in order to obtain a planar graph with the same signature.

First let’s simplify the problem by assuming that . If this isn’t the case, define for some for which , and we’ll construct a matchgate with as its standard signature. After this, we can simply add an extra edge of length disjoint from the rest of the matchgate, and attach extra edges to the appropriate input/output nodes, i.e. indexed by for which , and we’re done.

Let’s now set out constructing with signature under these assumptions. For now, let’s drop the condition of planarity. Consider the complete graph on vertices. For each edge , assign the weight so that as long as . On the other hand, defining to be the index of the th bit at which is zero, then applying the matchgate identities to the strings and tells us that

But of the complete graph is clearly 1, and the summands on the right hand are products of entries in the standard signature of higher weight, so we conclude that it indeed suffices to know the entries of weight .

**4. Planarity Assumption **

Now we want to get rid of the planarity assumption by replacing all of the places where edges intersect by the crossover gadget shown in the figure below. Among the non-gadget edges corresponding to an edge between vertices of the original , pick one edge to assign a weight of the original edge to. Call the resulting matchgate .

**Lemma 4**: .

*Proof:* First note that the standard signature of is given by , for all other . Next, denote by all edges in which do not belong to a copy of ; we’ll call these edges “passage edges.” Note that any perfect matching of corresponds to a subset of passage edges, call it .

We can restrict our attention to perfect matchings of for which for some perfect matching of . The reason is that if one cannot reconstruct from , there is one copy of for which there is an odd number of passage edges incident to or else exactly two passage edges incident to vertices on the same side of . But recall that the crossover gadget was designed to have standard signature equal to zero at these entries and thus not contribute to .

But observe that of all the four valid combinations (indexed by ) of corner vertices to remove from , only in the case that all are removed (i.e. the crossover gadget is the sight of an actual crossover in the original matching of ) is , where the sum is taken over all perfect matchings of with these vertices removed. In particular, this sum is -1. Thus, the sum in (6) is , where is the number of crossovers. But a matching with an odd number of crossovers corresponds to an odd permutation, so (6) is precisely equal to . Now just pick a Pfaffian orientation, and we’re done.

In the next post, we will look at the geometric meaning behind the matchgate identities.

]]>Cai/Lu later showed however that this problem can actually be solved only using matchgates over a basis of size 1. We sketch both algorithms in the second section. They subsequently showed that in fact all holographic algorithms using bases of size 2 can be “collapsed” to ones using bases of size 1 (The reason they stopped at 2 at the time was that for higher sizes, there exists an additional constraint for realizability of signatures, namely the so-called “Grassmann-Plucker identities”, which adds considerable complexity to the problem). They later extended their ideas for the original collapse theorem to prove a general **basis collapse theorem** by discovering that for so-called “non-degenerate bases” (degenerate bases give rise to trivial holographic algorithms and are uninteresting), it’s possible to break any such basis down into an “embedded basis” of 2-vectors capturing all the expressiveness of the higher-dimensional basis. Long story short, Cai/Lu showed that in fact higher-dimensional bases do not add power to holographic algorithms. The main goal of this post will be to present the proof of their final basis collapse theorem.

**1. Bases of Size **

In general, if a tensor of arity is realized over a basis of size , the corresponding matchgate has arity , and it’ll be convenient to think of its input/output nodes as split into “blocks” of size . Call the basis’ transformation matrix , and denote by the entry in row and column .

We can then extend the tensor-theoretic formulation covered last time as follows. As usual, the contravariant tensor **G** of a generator has signature over iff . Concretely, for ,

where denotes the standard signature of . Likewise, the covariant tensor **R** of a recognizer has signature over iff . Concretely,

where denotes the standard signature of .

**2. Solutions Modulo 7 **

First note that the constraints of Pl-Rtw-Mon-3CNF can be simulated using generators with signature and recognizers with signature . The generators represent variables and the recognizers clauses. The symmetric signature captures the read-twice condition because the generator has arity 2 and captures the constraint that the truth values of the two instances of the variable are consistent because and are emitted with weight 0. The symmetric signature captures the 3CNF condition because the recognizer has arity 3 and captures the problem of satisfiability because it only recognizes instances where all three variables being transmitted are true.

Unfortunately, these signatures are not realizable in or , but Valiant showed that they are realizable in , specifically via the basis and .

Cai/Lu showed that these signatures are in fact realizable over the basis and . (For now we will defer the details of proving realizability in both cases; time permitting, I’ll write up a post some time on Cai/Lu’s successful characterization of realizability for all symmetric signatures.)

With this first instance of a dimension “collapse” result in mind, we will now begin to explore Cai/Lu’s remarkable result saying that such a collapse can always occur for nontrivial holographic algorithms.

**3. Degenerate Generators **

A new notion that Cai/Lu’s two papers on basis collapse introduced in moving beyond size-1 bases was the notion of degenerate and non-degenerate generators and bases. As it turns out recognizers over bases of size more than 1 are indeed more powerful. The saving grace is that such recognizers can only ever appear in conjunction with essentially “useless” generators that give rise to trivial holographic algorithms. Before introducing the actual definitions, let’s get a feel for what such generators would look like.

For starters, generators of arity 1 are pretty useless. Intuitively, without additional nodes to emit particles, there can be no holographic interference, defeating the purpose of using FKT to achieve exponential cancellations. More formally, assume that all the generators in a matchgrid were of arity 1; denote by the number of input nodes of recognizer , and denote by the generator connected to the th input node of . Then the Holant is merely ; in other words, the Holant of all of is the product of the Holants of the subgraphs of containing each recognizer and all generators attached to it. What this tells us is that the particular counting problem in question can be solved merely by looking at its local parts; for instance, if our problem is satisfiability and the recognizers represent the clauses in a CNF, generators of arity 1 would imply that this CNF is read-once so that the total number of satisfying assignments is merely the product of the number of satisfying assignments for each clause.

If arity 1 generators are uninteresting, anything that “decomposes” into such a generator is also uninteresting. In tensor language, we mean decomposition in the sense that the signature tensor of an arity- generator is equal to , where is the signature tensor of an arity-1 generator. In such a case, we could replace with arity-1 generators without changing the Holant, and we’re back in the situation of the previous paragraph.

**Definition 1**: A generator whose signature tensor has such a decomposition into signature tensors of arity-1 generators is said to be a **degenerate generator**.

To ignore degenerate generators, we will work only in bases for which at least one realizable tensor is non-degenerate. Cai/Lu’s first step is to give a necessary condition for such a basis. Let be the transformation matrix corresponding to a rank-2 basis of size . Denote by and the Hamming weight of and its parity, respectively.

**Definition 2**: is a **degenerate basis** if for all such that is of a particular parity.

**Theorem 3**: If is degenerate, then all signatures realizable over are degenerate.

We will need the following collection of identities regarding standard signatures, the above-mentioned Grassmann-Plucker identities (also called matchgate identities), which are central to the theory of matchgates. In a future post, we will prove the necessity and sufficiency of these identities and explore the observation by Landsberg, Morton, and Norine that in fact these identities are merely the defining equations for the spinor varieties obtained by Chevalley. For now, we state these identities without proof:

If is the standard signature of a matchgate of arity , then given any , where denotes the bit vector with support at the th bit, denotes the position of the th nonzero bit of , and is the Hamming weight of .

**Example 1**: If is the standard signature of a generator of arity 4, then one of the matchgate identities must satisfy is (we obtain this from setting and ).

*Proof:* Assume that for all odd (the proof for even is essentially identical).

Now pick any tensor realizable on ; also call the generator that realizes it . Denoting the standard signature of the generator as , we have by definition that . If the standard signature is already the zero vector, then we’d conclude that and be done. If not, assume for some . In particular, by multiplying our tensor by a scalar, we can assume , and by permuting indices, we can assume .

Denote the th “block” of as . More precisely, this is the matchgate whose output nodes are the output nodes of in the th block. Note that it suffices to show that the standard signature decomposes, in an appropriate sense, as a product of the standard signatures of these blocks. By definition, . We claim the following.

**Claim 1**: For all ,

Before we prove this, let’s make sure that this implies we’re done. Recall that we need to express as a tensor product of signature tensors of arity-1 generators. Pick a matrix for which is the identity matrix. Then simply define , and we’re done.

*Proof:* By our assumption that for all odd , we can assume that is even for all , or else the claim is trivially true.

We proceed by induction on . Weight 0 is obvious, and for weight 2, the two nonzero bits have to be in the same block or is odd for two values of and the claim is trivially true.

For weight , the inductive step is basically just some careful bookkeeping using the identities in Lemma~3. Let be the position of the first nonzero bit of (we’ve assumed implicitly that is not all zero, but we could very well done this analysis for any not all zero). We will take in that lemma to be and to be . By pulling out the first term in the matchgate identity and noting that , we get

But for such that lies outside of the first block, , so we’ll ignore those to get , where . The point now is that by restricting our attention to the first block, we can apply our inductive hypothesis to the tensors in each of the summands to conclude that it suffices to show , but one more application of Lemma~3, this time for and gives the desired result.

**4. Embedded Basis Theorem **

Now that we know what sorts of bases to ignore, we can establish the first crucial step towards the collapse theorem: any transformation matrix for non-degenerate basis has row rank 2! In our discussion below, we will refer to the row of indexed by by .

**Theorem 4**: If is non-degenerate, then and are linearly dependent if and have the same parity.

*Proof:* Pick any of even weight and of odd weight. Denote by the matrix with rows and , and the matrix with rows and . The claim is that for all possible and defined this way, .

Because is non-degenerate, Theorem~3 tells us that there exists some non-degenerate tensor realizable on . Also call the generator realizing it . Assume first that has an odd number of vertices (we’ll remark how to drop this assumption at the end), so that entries of whose indices are of even weight are zero.

If the arity of is even, then and are zero vectors because their entries are entries in the standard signature of whose indices are of even weight (the sum of an even number of integers of the same parity is even), but has an odd number of vertices. Because we’re assuming is non-degenerate and thus certainly not zero, this implies that and are not invertible, as desired.

If the arity of is odd, then still consists of entries in the standard signature of whose indices are of even weight (the sum of an odd number of even integers is even), so the reasoning above tells us and is not invertible. We can no longer say that , but instead we will “compress” into a nonzero tensor in for which .

Pick any for which and is even (the crucial assumption here is that is non-degenerate). Then let’s replace by so that . Let’s rewrite this; for each , define by so that .

The punch line is that one of these is the we’re looking for, i.e. nonzero, or else we get a contradiction of the fact that is non-degenerate. Indeed, if to the contrary for all , then if , the definition of tells us that , i.e. we can basically “walk” from to by switching off bits in the index at a cost of a factor of at each step. In other words, , so , a contradiction. Likewise, if , we could say that .

**5. Collapse Theorem for Generators **

Now the end is in sight; we’ve shown there exist 2-vectors and for which every row in is a scalar off from either or . Define the transformation matrix for this basis by where and ; Cai/Lu call this an **embedded basis of size 1**.

In other words, **there is a ton of redundant information in signatures realizable over high-dimensional bases**. Is it possible for such a signature to be realizable over the embedded basis? Specifically, we can basically fold into a particular -vector , and the question is whether we can collapse of arity defined above into a generator of arity whose standard signature is this .

Before we jump into proving this in the affirmative, let’s clarify what this is. For any , let denote the scalar for which . Then

So if we define by , this would imply that

**Theorem 5**: is the standard signature of some generator.

First, some simplifying assumptions that will be useful for the collapse theorem for recognizers. Pick and of minimal Hamming distance such that , , and and are nonzero. By simple tweaks to , we can assume that and . Let . Note by minimality of the distance between and that for any between and , . As another piece of notation, denote the th external node in block of by .

*Proof:* Modify into a new generator as follows: for each block , attach to node a node via an edge of weight and attach to this a node via an edge of weight . Add edges of weight 1 between and for all . Take the output nodes to be all of these .

The point of connecting every other pair of adjacent nodes is as follows: for each , if is not present in the matching, then the edge connecting and must be. The only matchings of the original block which omit necessarily omit , again by minimality of distance between and . In other words, the only such matchings of the original block correspond to . The matching of the new block minus thus includes all the edges connecting adjacent pairs of nodes.

On the other hand, if is present in the matching, then the edge connecting and must be, as well as an edge from the original block that is incident to . The only matchings of the original block containing such an edge correspond to for the same reasons as above.

It thus follows that , and thus has standard signature as desired.

**6. Collapse Theorem for Recognizers **

We conclude this post by showing a similar collapsing result for recognizers. Let’s first find an analogue of from the previous section. For signature realized over basis by a recognizer that we also call , we know by definition that . But recall that is defined up to a multiplicative constant by , so group the by parity to get .

The analogue of is thus . Our claim is thus:

**Theorem 6**: is the standard signature of some recognizer.

*Proof:* The following result essentially finishes the proof.

**Theorem 7**: The collection of all form the condensed signature of an even matchgate of arity .

Before we see how to prove this (the argument is very similar to that of Theorem 5), let’s see how this can be used to prove Theorem 6. The idea is to attach a copy of to each block of essentially to scale its contents and collapse each block to a single input node. Specifically, for each block, attach in a planar fashion the input nodes to the first input nodes of (you should check that this can be done in a planar fashion) by edges of weight 1, and denote the remaining node of as an input node. Call this new recognizer . Then . But note all the summands for which there exists one such that are zero: if so that the input node at block is included in the matching, then with this node omitted is odd so that an odd number of the remaining vertices must also be omitted for there to exist a perfect matching of , meaning must be even. Likewise, if , we conclude that must be odd, and we’re done.

*Proof:* We will build the desired matchgate out of . The following claim provides an anchor for our construction: if a standard signature has more than one nonzero entry, there exist entries whose indices are of Hamming distance 2 away from each other (perhaps more on this in another post, but for now, we’ll take this for granted).

Applying this claim to , assume wlog that we can pick entries and . Now for our construction, do nothing to block 1 but call the nodes . Modify block 2 as we did to construct above, but instead of weights and , assign weights and to be decided later. Call the node that we referred to as in the previous proof instead. For blocks , if , then (by the usual minimality argument) is already the only possible assignment of bits to the th block’s output nodes for which there is at least one perfect matching in that block, so don’t change those blocks. If , then we want to ensure that is the only possible assignment, so attach an edge of weight to the output node and pair up vertices to as before.

It’s straightforward to check that is even. By construction, we have for even and odd, respectively, that

Pick and , and we’re done.

In the next post, we will cover matchgate identities as they relate to spinors.

]]>At a high level, just as FKT solves a counting problem, , related to a graph by tweaking until that counting problem becams something we know how to solve efficiently, namely the Pfaffian, the Holant theorem will allow us to solve counting problems related to by building out of a (planar) grid of “gadgets” called **matchgates** which capture the constraints related to the counting problem in question, so that the answer to the counting problem becomes something we know how to solve efficiently, namely . In particular, the counting problem is encoded by so that its answer is by construction equal to a particular function of the weighted edges of , an exponential sum of terms known as the **Holant**. We should think of the Holant as a global consolidation of all the local data needed to solve the counting problem at hand.

The whole point is that whereas we can rig the Holant to capture all this information because it’s defined as this huge exponential sum that we can’t hope to brute-force any more than we can our original problem, the FKT and Holant theorems imply that there are *exponential cancellations* among these terms, and **solving the counting problem is as easy as computing the Holant is as easy as computing **.

Of course, the tricky part is to define what the abovementioned “gadgets” are and how they can encode the local constraints we want. Before we introduce these key players, however, let’s play around with how we might encode the constraints of perfect matching locally.

**1. Matchgates and Matchgrids **

Our discussion at the beginning might come off as somewhat slow, but I’ll include it for those seeking motivation for all the definitions I’m throwing at the reader. For the time being, the following definitions will suffice.

**Definition 1**: A **matchgate** is the data of a weighted, undirected planar graph , a set of **output nodes** , and a set of **input nodes** . is said to be of **arity** .

We will restrict our attention to matchgates exclusively containing output nodes, called **generators**, and exclusively containing input nodes, called **recognizers**. All other matchgates are called **transducers**.

**Definition 2**: A **matchgrid** is the planar graph defined as the union of a set of generators , a set of recognizers , and a set of edges of weight 1, called **wires**, such that each input and output node in the matchgrid has exactly one incident wire.

Any perfect matching assigns to each of the wires a bit depending on whether that wire belongs in the matching. Consider the set of all possible bit vectors that can be assigned to the wires of . For a fixed , say that decomposes as the concatenation and for and denoting whether the input nodes in and respectively belong to wires assigned either 0 or 1.

Let be one of these wires, with output node belonging to generator and input node belonging to recognizer . If is assigned 0 (resp. 1) so that does not (resp. does) belong to the matching, the matching restricts to a perfect matching of a subgraph of including (resp. not including) and a perfect matching of including (resp. not including) .

The point is the following. For fixed , the contribution to of perfect matchings of coresponding to is where denotes the subset of output nodes of for which is an indicator bit vector, and is defined likewise.

More generally, for matchgate with output nodes and input nodes , define the **standard signature of ** to be the matrix whose th entry is . Note that the standard signature of a generator (resp. recognizer) is a column (resp. row) vector. It follows that .

Our informal discussion is almost done; we just need a bit of formality to help transition into our actual treatment of matchgates. Let denote the standard basis . In the decomposition of for generator as a linear combination of all , the coefficient of is obviously . Define this as , and define the vector of all such values as . In other words, the signature of with respect to the standard basis is rigged to be the standard signature . The picture you should keep in mind is that generators emit from their output nodes all possible combinations of and particles, where each combination is weighted by the appropriate entry in .

What’s another fancy way to express the standard signature of a matchgate “dual” to the one we’ve just given (more on this in the last section)? The inner product of with for recognizer is also obviously . Define this as , and define as the vector of all such values; call this vector the **signature of with respect to **. Again, the signature of with respect to the standard basis is the standard signature . The picture here is that recognizers take in all combinations of and particles into their input nodes and spit out their values specified by the appropriate entries in .

To complete the analogy, wires are the medium by which these particles are transferred. As such, we will occasionally regard the vector as an element of . In any case, the moral is that

**2. Valiant’s Holant Theorem and Changes of Basis **

Of course, this is pretty trivial given that we picked to be the standard basis. We could likewise have defined and for arbitrary bases consisting of vectors of length 2 (in fact we can also consider vectors of dimension for arbitrary , but that’s a topic for another post). Define the **Holant** of to be the quantity on the right-hand side of (1). The content of Valiant’s Holant Theorem is the following miracle:

**Theorem 3**: Identity (1) holds **regardless of the choice of basis**. In other words, .

Why should we care? Recall that our ultimate goal is to take our input graph , construct a matchgrid whose matchgates capture the local constraints of the counting problem at hand so that equals the solution to our counting problem, and compute the solution by efficiently computing . Different signatures encode different kinds of constraints, and the issue with the standard basis is that it can’t realize all signatures. The Holant theorem tells us this isn’t an issue because we can just use other bases.

Whether some basis can realize a particular signature for a given matchgate boils down to whether the corresponding system of polynomial equations is solvable. We will give a nice formulation of such a system in a future post, but for now let’s illustrate with an extended example.

**Example 1**: Let’s just consider matchgates with arity 2. What kinds of standard signatures exist? For starters, has an odd or even number of vertices, so if we take out an even or odd number of vertices respectively, we’re left with zero perfect matchings. In other words, must be a vector such that either 1) whenever is even, 2) whenever is odd. For arity 2, not only is this a necessary condition for a standard signature to be realizable, but it’s sufficient.

For all , and are realizable as standard signatures.

*Proof:* For the former signature, consider the path graph with three vertices whose endpoints are the output nodes and whose edge weights are and . For the latter signature, consider the path graph with four vertices, whose endpoints are the output nodes and whose edge weights are , , and 1.

Now what kinds of signatures are realizable by these matchgates in other bases? Consider the basis and a signature . For a generator with standard signature , the constraint that either or , combined with the definition of , implies that one of the following two polynomial systems must have a solution in :

So for example, for , the first equation is satisfied as long as the signature is of the form , while the second equation is satisfied as long as the signature is of the form .

**3. A First Proof **

The following is the original proof of Valiant’s Holant theorem. The idea is to blow the matchgrid up into an family of exponentially many matchgrids consisting of generators and recognizers of arity 1, each representing a possible combination of bits passed along the wires of , such that equals the contribution of to the Holant. We then build the matchgrid back up out of these components, arguing by linearity at each step that of each component equals its respective contribution to the Holant. At the end, we obtain a graph for which and agree, and we’re done.

*Proof:* Pick a fixed vector of particles assigned to the wires. We first claim that there exists a generator of arity 1 whose standard signature, if we allow “omittable nodes,” can be any 2-vector that we want. Consider the path graph on three vertices with edge weights and from left to right and output node the rightmost vertex. Normally, we’d say that the standard signature of this generator is . As a hackish fix, let’s also count matchings which potentially omit the leftmost vertex . Specifically, if we assign weight to vertex and weight to vertex , then as desired. For a given 2-vector , denote the corresponding arity-one generator by .

Say that emits some string of and particles from its output nodes. Then replace by the disjoint union of , and for each , rehook the wire originally going into the th output of into the output node of . We’ve purposely scaled up one of these arity-1 generators by so that these generators will emit with the same weighting as . Call this new matchgrid .

The point of all of this is that now,

(note that implicitly we’re weighting each recognizer vertex by 0).

To be explicit, any matching corresponds to a bitstring depending on whether the omittable nodes in our arity-1 generators are present, and in particular corresponds to an entry in , namely . If denotes so that , and denotes the input nodes of already occupied in our matching by a wire, the matchings corresponding to a particular such entry thus contribute to a total of . But by definition of as the inner product of with , is precisely .

The rest of the argument basically follows by linearity. Split into groups of elements each, where in each group all elements assign the same particles to the output nodes of . For any one group, say that the assignment to these output nodes is . For any , let , and define to be with the arity-1 generators corresponding to replaced once more by . We can now split the matchings of this new matchgrid up based on which particles outputs, concluding that .

Now we can keep playing this game; define to be for any , with the arity-1 generators corresponding to replaced once more by , so that . Continuing thus, we conclude that is precisely the Holant. But because we’ve gotten rid of all the omittable nodes we initially introduced.

**4. **

Here’s our first example of a holographic algorithm. As a bit of background, recall the remark from the last post that while counting perfect matchings of planar graphs is easy, counting arbitrary matchings of planar graphs is -complete. In fact, a theorem by Vadhan says that counting matchings even in bipartite graphs of degree at most six is just as hard!

Let’s consider a slightly less ambitious problem. Say I wanted to compute the number of matchings in any bipartite graph where the left vertices have degree 2 and the right vertices have degree . Then I claim there exists a holographic algorithm solving this problem .

In fact, there exists a holographic algorithm for the so-called problem which subsumes the above as a special case: given a planar weighted bipartite graph whose left vertices have degree 2, compute the sum of the “masses” of all matchings. Here, the “mass” of a matching is the product of with . (For the previous problem, weight all the edges by 1 and note that ).

Given such a bipartite graph , we want to replace each left vertex by a generator of arity 2 which sends out at most 1 particle, i.e. ; this translates to the local constraint that a matching will contain at most one edge incident to any given left vertex. We want to replace each right vertex by a recognizer of arity which 1) outputs 0 when sent more than 2 particles, 2) outputs when sent exactly 1 particle along a wire of weight , 3) outputs when sent no particles. 1) captures the local constraint that a matching will contain at most one edge incident to any given right vertex. 2) captures the constraint of the first multiplicand defining “mass.” 3) captures the constraint of the second multiplicand.

By construction, the Holant will equal the product of the masses of all matchings: subsets of which aren’t matchings contribute nothing, while those that are matchings contribute precisely their mass to the Holant.

It remains to check that we can construct matchgates with the abovementioned signatures under some basis. For the generators, consider the single edge with output nodes both vertices and weight -1. Its standard signature is ( of the empty graph we will take to be 1), so we want 2-vectors and satisfying relevant polynomial constraints. I won’t bore you any further by writing these down, but suffice it to say that the basis with and works.

For the recognizers, consider the star graph on vertices and edge weights . Its standard signature is 0 at all entries corresponding to removing at most input nodes, and at the entry corresponding to removal of all but the input node corresponding to , and at the entry corresponding to removal of all nodes. The signature with respect to and is the one we want.

By Valiant’s Holant theorem and the FKT algorithm, we conclude that there exists a polynomial time algorithm for .

The key takeaway of this example is that the primary challenge of understanding the power of holographic algorithms is **determining what collections of combinatorial constraints encoded by signatures are simultaneously realizable under a choice of bases for each wire**? As you can tell from the above discussion about polynomial systems suggests, the question really boils down to determining whether the corresponding subvarieties of have nonempty intersection. More on this in a future post.

**5. A Tensor Formulation **

We now give an alternative formulation of the theory of matchgrids in terms of tensors, due to Cai/Choudhary. The motivation is that the statement of the Holant theorem and the fact that it holds trivially for the standard basis seems to suggest that if we stare closely at the definitions, there should probably be a nice, coordinate-free proof. Indeed, if we think about , , and the Holant in the right way, we can get a very quick proof of the Holant theorem.

The exposition will follow Cai/Choudhary’s 2005 paper “Valiant’s Holant Theorem and Matchgate Tensors.”

As above, let denote a basis of two column 2-vectors for and , and the standard basis, and regard the standard signature of a matchgate with outputs and inputs as a matrix so that is a column (resp. row) vector if is a generator (resp. recognizer). Also let be the transformation matrix defined b . We’ll assume is invertible for now.

If is a generator with inputs, then is the column vector defined by the equation . Writing as , we conclude that . On the other hand, if is a recognizer with inputs, then the original definition of tells us it’s merely .

In what sense are these two constructions dual to each other? Let denote the basis dual to such that ; define anlogously. Whereas is the vector of coefficients by which can be expressed in the basis , we can show that is the vector of coefficients by which can be expressed in the basis . Then implies that as desired.

Now consider a matchgrid with generators , recognizers , and wires . If and is its dual space, one observes that the Holant is merely the application of dual vector to . Alternatively, we could write the Holant as

The proof of the Holant Theorem is now quite straightforward: by our alternative definitions of and , we can write the two arguments of the inner product defining as and , from which we conclude that in any basis is equal to . Recall that this equals for combinatorial reasons, by our earlier discussion.

For the case where is not invertible, we just need that the span of contains for each generator , because then we can still define to be any one of the vectors for which . The Holant theorem follows in essentially the same way.

If denotes the tensor product of copies of and copies of , where , what the above discussion tells us is that should be thought of as living inside , inside should be thought of as living inside , and **the Holant is merely the contraction of with **. Specifically,

where and denote copies of the standard basis and dual basis indexed by the input/output nodes of the matchgate in question. Valiant’s Holant theorem then immediately follows merely for the abovementioned combinatorial reasons.

**6. Two Unrelated Remarks **

The first is that this is my first post using Prof. Luca Trevisan’s nifty LaTeX2WP script for converting TeX into something suitable for pasting into WordPress’s editor; highly recommended for anyone fed up with typing “” and hand-numbering section and theorem numbers in the tiny WP editor window!

The second is that my statement about Gaussian elimination as it relates to FKT from the previous post was incorrect. The polynomial-time algorithm we should be using is Berkowitz’ algorithm, because for numerical reasons we don’t want to use division.

]]>I’ve recently been reading about Valiant’s framework of **holographic algorithms**, and I thought it might be a good way to revive this blog by posting a bit about what I’ve learned. Let’s begin with some motivation with respect to the famous P vs. NP problem: it is a widely held conjecture that the class of languages decidable in deterministic polynomial time can be separated from that of languages decidable in nondeterministic polynomial time. One of the prevailing reasons is simply the intuition that the algorithmic methods available to us for solving problems deterministically in polynomial time don’t seem sufficient for solving certain hard problems. In some sense, Valiant’s holographic approach serves as a potential warning against this kind of intuition: by giving rise to exotic polynomial-time algorithms for problems which previously might have been deemed intractable, it suggests that perhaps the reason we haven’t been able to come up with efficient solutions to NP-complete problems because our understanding of the power of polynomial-time computation is currently limited. Roughly speaking, these algorithms make use of so-called **holographic reductions** which, unlike classical reductions in complexity theory that map one problem instance to another, map the sum of solution fragments to multiple instances of a problem to the sum of solution fragments to multiple instances of another problem in such a way that while the problem instances do not correspond one-to-one, the sums agree. The utility of holographic algorithms mainly comes from reducing to counting problems which are tractable because their solutions involve making “cancellations” (think Strassen’s algorithm or Gaussian elimination) that arise because of linear algebraic reasons and in some cases can yield exponential speedups in computation. In this way, we can essentially import the power of cancellations to situations even where these linear algebraic reasons are no longer obvious and get a host of efficient solutions for seemingly difficult problems.

This blog series will build up to an exotic algorithm for the following bizarre-sounding problem: determine the number modulo 7 of satisfying assignments in any planar, read-twice, monotone 3-CNF, a problem Valiant refers to as “**#7Pl-Rtw-Mon-3CNF**” and seek to explain, using the machinery subsequently developed by Cai, where the number 7 comes from. Before we get to the core of the subject in subsequent posts, however, I’d like to spend one brief post discussing the famous algorithm due to Fisher, Kasteleyn, and Temperley for counting perfect matchings in planar graphs which takes advantage of the abovementioned exponential cancellations. The algorithms we will discuss in later posts will hinge on this algorithm.

**Perfect Matchings**

Recall that a perfect matching of a graph is a set of disjoint edges in the union of whose endpoints is all of . The physical motivations for studying perfect matchings lie in the realm of chemistry and statistical mechanics: how many ways can you position a layer of diatomic molecules adsorbed on a surface, a collection of water molecules bonded in the form of ice, or a mixture of dimers covering a lattice? It turns out that the number of perfect matchings of the relevant graphs roughly corresponds to the energy of such systems. So exactly how hard is it to compute this quantity?

For general graphs , finding a single perfect matching is easy. **Edmonds’ blossom algorithm** is a polynomial-time procedure for doing so; the basic idea is to start with an empty matching and iteratively build up the matching by finding “augmenting paths,” paths which end in uncovered vertices and which consist of edges alternating between lying inside and outside of the current matching, and flipping membership inside our matching of all edges in the augmenting path. It turns out that a matching is maximal if no augmenting path exists, and furthermore these paths can be *found efficiently* by looking for certain kinds of odd cycles called “blossoms” which, once contracted to a point, preserve the number of augmenting paths left in .

The problem of finding perfect matchings is however the canonical, (and indeed the first known) example of a problem in P whose corresponding counting problem, quite shockingly, is P-complete. In fact, by the #P-completeness of the permanent, this is the case even for counting perfect matchings in bipartite graphs. To get a grasp for the hardness of #P-complete problems, recall Toda’s theorem: give a deterministic Turing machine running in polynomial time a #P oracle, and with just a single query to that oracle it can solve any problem in the polynomial hierarchy.

The miracle that holographic algorithms exploit is that while the problem of counting perfect matchings in general graphs is intractable, if we restrict our attention to the case of planar graphs, all of a sudden the problem admits a *polynomial-time solution*. This is the famous FKT algorithm discovered independently by Kasteleyn and by Fisher and Temperley for the case of lattice graphs and then generalized by Kasteleyn to the case of planar graphs. We will spend the first part of this blog post describing the algorithm.

**Remark 1**: **Kuratowski’s Theorem** (the prototypical example of a forbidden graph characterization) says that a finite graph is planar so long as it contains neither a complete graph on five vertices nor a bipartite graph. Vijay Vazirani has shown that we can in fact ignore the former assumption and obtain a generalization of the FKT algorithm to all graphs missing a bipartite graph.

**Remark 2**: Counting the number of *all* matchings is unfortunately #P-complete even for planar graphs.

**The Pffafian**

We will think of a perfect matching first as a partition of elements into pairs. The first observation is that the number of perfect matchings of a graph is a polynomial in the entries of its -valued adjacency matrix :

Now let’s slightly tweak this polynomial. We can also think of a perfect matching as a permutation ; for example, the matching given by the partition corresponds to the permutation , so consider the following polynomial, the **Pfaffian** of :

Now the conundrum is that while these two polynomials are basically the same up to signs of coefficients, the former is ridiculously hard to compute whereas the latter I claim can be determined in polynomial time. Indeed, the famous result of Muir tells us that in fact , and we know how to compute the determinant in polynomial time (Gaussian elimination).

In other words, the Pfaffian is easier to compute because all those different signs in its coefficients yield exponential cancellations that facilitate computation. These are the sorts of “accidental” cancellations that holographic algorithms are after.

**FKT’s Algorithm**

Now the way to compute more efficiently becomes clear: modify the signs of the entries of adjacency matrix by orienting the edges of and hope that we get an orientation, a so-called **Pfaffian orientation**, such that the coefficients of for our new skew-symmetric adjacency matrix are all the same, i.e. the signs of the matchings are all the same. The point of FKT is that all planar graphs have such a Pfaffian orienation.

**Theorem 1 (Kasteleyn): **An orientation where each face has an odd number of edges oriented clockwise is a Pfaffian orientation.

Assuming this is true, it is straightforward to check that such an orientation can be found in polynomial time: basically just start with an arbitrary orientation of sufficiently many edges and fill in the remaining orientations in the right order. The algorithm is as follows:

- Compute a spanning tree of and orient the edges in arbitrarily.
- Construct the tree whose vertices correspond to the faces of and connect vertices corresponding to faces sharing an edge not in the spanning tree
- Starting at the leaves of this tree, orient the missing edges in each face appropriately

It remains to justify the theorem. First, some terminology: for a matching , denote the neighbor of vertex inside by , and define the merging of two matchings and to be the cycle cover of defined as follows: is contained in the cycle whose vertices are . It is clear that each such cycle is necessarily even (otherwise is adjacent to two vertices in ). Next, we say an even cycle is **clockwise odd** if an odd number of its edges are clockwise, and we say a cycle is **removable** if the subgraph induced from removing from still admits a perfect matching. It’s clear that for any , the cycles in the merging of and are removable.

Theorem 1 follows from the following two lemmas:

**Lemma 1: **If every cycle in the merging of and is clockwise odd, then the signs of and are the same.

**Lemma 2:** If every face has an odd number of clockwise edges, then every removable cycle is clockwise odd.

**Proof of lemma 1:** Consider any cycle in the cycle cover arising from a merging. Because the cycle has an odd number of clockwise edges, after swapping an even number of edges’ orientations, we get a cycle with a single clockwise edge and all other edges counterclockwise. We’ve preserved the matchings’ sign difference, but now we see that the sign of the permutation which sends to , so the signs of the two matchings agree.\end{proof}

**Proof of lemma 2: **Let be a removable cycle. Let respectively denote be the number of faces, vertices, and edges *inside* . Because the planar graph has Euler characteristic 2, and the total number of faces of is (including the single face outside ), . Let denote the number of clockwise edges on C, let denote number of clockwise edges on face i inside C. We want to show is odd.

Because each face has odd number of edges, oriented clockwise, and have the same parity. Furthermore, (each edge inside is clockwise for exactly one of the two faces sharing it), so and have the same parity, meaning and have opposite parities by Euler’s formula. But is even because once you remove , the subgraph inside should still have a perfect matching, and we’re done.

In the next post, we will put this algorithm to work by introducing the tools central to holographic reductions, namely the theory of matchgrids and Valiant’s Holant Theorem.

]]>

**Preliminaries and AGHP’s Construction**

Characters of finite abelian groups are homomorphisms . It is easy to see that the only characters of are given by , where is a generator of . We have the following useful theorem due to Weil:

**Theorem 0**: (Weil) For an odd prime power, with distinct zeroes in the algebraic closure of , and a character with order not dividing the greatest common multiplicity of ‘s roots, .

For AGHP’s construction, we restrict our attention to the only nontrivial character in the case of taking values only in , namely the quadratic character. Concretely, we have the following definition:

**Definition 1**: The **quadratic character** sends to 1 if it is a quadratic residue and -1 otherwise. Assign to be 0.

AGHP’s construction is remarkably simple: define bitstring-valued map by if , otherwise let . The motivation for this is that (for ).

**Theorem 1**: is -biased.

**Proof**: The bias by definition . For the for which there is no such that and , . For for which there is such a , this expression takes on a value of zero; we’ll just bound by 1, and because there are at most such , we get the desired bound on bias once we apply Weil’s theorem.

**AMN’s Generalization**

We first define AMN’s generalization of bias to :

**Definition 2**: A probability distribution is an **-biased space** if 1) is uniformly distributed over for all , and 2) for all test vectors and corresponding , .

A bit of computation tells us that the Fourier coefficient of an -biased distribution with respect to is bounded above by . Another useful fact is that the -distance between any distribution on and the uniform distribution is at most .

We now present AMN’s generalization of AGHP’s characters construction for .

Pick and a character of such that the order of is . Fix elements . Define a map as follows: for , take to be the -residue of the exponent such that if , and to the -residue of the exponent such that .

The motivation is that for a primitive th root of unity (taking the place of in the original construction), and for , with the discrete log of with respect to generator , we have that .

**Theorem 2**: is -biased.

We make use of a corollary of Weil’s theorem:

**Proposition**: If denotes the number of solutions to , then . This follows from a bit of computation following the observation that for all .

We now proceed with the proof that the image of is a small-bias space:

**Proof**: Denote the random variables given by the coordinates of the image of by . It is clear that each is equidistributed over and thus is equidistributed over because . As in the proof of our first theorem, consider the cases of for all and for some separately. The latter case incurs an extra in bias by our reasoning in the AGHP proof. In the former case, . If , then the greatest common multiplicity of the polynomial is relatively prime to , and we can apply the corollary to Weil’s theorem to get that if there are values of such that , , and we get the desired bound on bias.

In the case that , we know from the above argument that if , has the desired level of bias, and a bit of modular arithmetic gives the same bound for .

In our next post, we will examine the problem of fooling not just linear tests, but low-degree polynomial tests.

]]>In this post we examine four constructions, including an extremely simple construction over by Alon/Mansour (2002) which generalizes one due to Alon/Goldreich/Hastad/Peralta (1992), another construction based on linear feedback shift registers due to AGHP, and Naor/Naor’s original construction (1990) based on -wise independence (or alternatively Justesen codes) and random walks on expander graphs.

**First definitions and Alon/Mansour’s -biased set**

We first formalize the notion of bias over and present Alon/Mansour’s construction before restricting our attention to the case of for our remaining three constructions.

**Definition 1**: A probability distribution over is an **-biased space** if for all nonzero “test vectors” . Here, is the th root of unity .

In particular, an equivalent condition in the case of is that for any nonzero , . Any for which we will call a **distinguisher**.

**Definition 2**: A set is an **-biased set** if the uniform distribution over is an -biased space.

Alon/Mansour give an -biased set of size .

For , let denote the tuple and the collection of all such for .

**Theorem 1**: is an -biased set.

**Proof**: From our definitions we know the bias of with respect to a test vector is . We can group the summands by the exponent of to get , and the key observation is that massive cancellation happens among the terms with a factor of for contribute nothing to the bias. Indeed, for a fixed nonzero exponent and nonzero test vector , for each for which there is exactly one such that , so . For , has at most zeroes so that . So because , our sum reduces to for any , and this is at most , concluding the proof.

In a later blog post, we will show how this simple construction can be used to derandomize Mansour’s algorithm for interpolating sparse polynomials (1992).

Unfortunately, this construction is useless unless , and in particular, it’s useless when , the case that initial study of bias focused on. That said, we can slightly tweak the argument by replacing with and with , where we’ve naturally identified with the vector space to obtain an inner product, getting one of AGHP’s original constructions:

**Theorem 2**: There is an -biased set of size .

The next construction by AGHP is motivated by linear feedback shift register sequences (LFSRs) and likewies gets -bias out of a set of size roughly :

**LFSRs and another construction by AGHP**

Our analysis of the bias in the following construction follows from two facts about polynomials: 1) the number of irreducible monic polynomials of degree is , and 2) (more obviously) there are at most irreducible monic polynomials of degree which divide a polynomial of degree .

Now consider the following construction inspired by LFSRs and more fundamentally by polynomial long division:

**Definition 3**: For start sequence and feedback rule , the \emph{shift register sequence generated by and } is the bitstring where for and for .

The point is that if we think of the ‘s as monomials in some indeterminate so that the ‘s as polynomials in , then as a polynomial in is precisely the remainder of when divided by .

Given this view, we will only concern ourselves with feedback rules such that is irreducible. These are called \emph{non-degenerate feedback rules}.

**Theorem 3**: The set (of size by fact 1 above) of all shift register sequences generated by any starting sequence and nondegenerate feedback rule is an -biased set.

**Proof**: By our discussion above, is a linear combination of remainders with respect to , with coefficients all zero iff . If is a linear combination with coefficients not all zero, it’s some nontrivial linear combination of all and thus has nonzero bias. So the bias is precisely the probability that a randomly chosen irreducible monic polynomial of degree divides . By facts 1) and 2) above, we’re done.

There is one more -biased set of comparable size due to AGHP that we will present in our next blog post about using the “randomness” of quadratic characters via Weil’s character sum estimate. For now, we see how to improve linear dependence on at the cost of worse (but still polynomial) dependence on .

**Naor/Naor’s Construction**

The argument here is much more involved and takes place in two steps: 1) construct a polynomial-sized family of bitvectors such that for any test vector , the probability that is a distinguisher is at least some constant , 2) using a walk on an expander, sample enough vectors from such that for any , with probability at most all of the are 0.

**Proposition 1**: The vectors generated by our expander walk span an -biased set.

**Proof**: It suffices to show that when there is some distinguisher of , then for a random subset , is a distinguisher with probability 1/2. If is a distinguisher, then with probability 1/2, is already not in . If is not a distinguisher, then with probability 1/2, is in , and it is easy to check that the sum of a distinguisher and a non-distinguisher is a distinguisher, so we’re done.

We next show that we can generate such a collection of vectors using a random walk on an expander. From step 1), we have a probability of picking a distinguisher from , and we’d like to reduce the probability of failing to draw a distinguisher to over steps. See \url{https://formalreasons.wordpress.com/2013/12/25/random-efficient-error-reduction-part-2-expander-graphs/} for our discussion from a while back about how to do this with an expander walk. The only subtelty in applying this method is that is a bit too small, so we should let each vertex of our expander correspond to a set of assignments for sufficiently large. By our discussion about error reduction with expanders, we need random bits, where is the degree of the expander.

Finally, we need to show that we can construct . There is a very nice way to do this using Justesen codes, but here we present an argument that does not depend on the fact that we’re working with .

At a high level, we will split the possible test vectors into buckets based on the number of nonzero coordinates, sample with high probability a distinguisher for each scenario, and add a random subcollection of these distinguishers together so that with high probability what we end up with distinguishes the actual test vector . Throughout, we’d like to use -wise rather than complete independence to save on randomness.

Our buckets will be the set of for which the number of nonzero coordinates is in . Fix such a bucket and say that we know our test vector comes from that bucket. For convenience, denote by . We would like to somehow remove all but a constant nonzero coordinates from and then generate our distinguisher -wise independently to obtain a distinguisher with probability . Because we don’t actually know our adversary’s test vector, however, we achieve the same effect by approximating the removal from a -wise independently generated vector.

Specifically, using pick a random bitstring of such that the entries are individually purely random but are jointly -wise independent, and let be the “mask” that approximates the removal procedure, a bitstring whose entries are jointly pairse independent but such that for each .

**Proposition 2**: Define the bitstring by . is a distinguisher of with probability 1/4.

**Proof**: This is equivalent to saying that defined by is distinguished by with probability 1/4. If is the random variable corresponding to the number of nonzero coordinates in , it suffices to show that for some constant chosen at the beginning, is high. By pairwise independence of the , and , and a bit of calculation using Chebyshev’s gives that for , our falls in the desired range with probability at least 1/2, and we’re done.

We generate one such for each of the buckets, using the same random bits to generate the ‘s for each bucket. We finish the proof by noting that by the same argument of Proposition 1, a sum of a random subcollection of these ‘s is a distinguisher of with probability at least .

In total, we thus require random bits, giving an -biased set of size .

**Conclusion**

So far, we’ve seen that we can achieve small bias using polynomials, -wise independence, linear codes, and expander graphs. In the next section, we’ll look at exploiting the randomness of quadratic characters and examine a third construction by AGHP using this, as well as an extension, due to Azar/Motwani/Naor, of this character approach to .

]]>The goal now is to find such an encoding/decoding. We reduce this problem further to the problem of finding an algorithm that adjusts a word in to a codeword (rather than a message) with nontrivial accuracy, and the reduction will be valid as long as we can efficiently translate between the messages and codewords. Formally,

**Definition 7**: A **local -correcting algorithm** for an encoding is a probabilistic algorithm that, given oracle access to a received word and an input position , computes with probability 2/3 the value for any codeword -close to . For coin tosses , the output of the algorithm will be denoted by , and probability is taken over all .

**Definition 8**: An encoding is **systematic** if there exists a “translation” map running in time polynomial in the length of the domain and a map such that for all messages and corresponding codewords , .

We can get a decoder by sending input position through such a translation map and then through a local corrector:

**Result 4**: If is a systematic encoding and the code has a local -correcting algorithm in time , then has a local -decoding algorithm running in time .

We will design a local corrector and system encoding for the Reed-Muller code. We then concatenate to the Hadamard code to obtain a binary alphabet, and this will turn out to give the desired hardness amplification.

**Reed-Muller Corrector**

**Result 5**: For and , the -ary, degree-, -dimensional Reed-Muller code has a local -correcting algorithm running in time polynomial in and .

**Proof**:** **The basic picture for our corrector is that we reduce the dimension of the input space to by evaluating the oracle close to our target codeword only on a **line** (a map given by for and ). The resulting univariate polynomial is a Reed-Solomon codeword that we then globally decode at a distance close enough that it is likely the codeword itself restricted to the line in question. Evaluating at zero then gives as desired.

Specifically, if we receive a Reed-Muller codeword , to compute it with nontrivial accuracy at an input index using our oracle where , randomly pick some and define by . This is a -ary, -dimensional Reed-Solomon codeword, so run the global decoder given by Result 6 of our post on list decoding, for distance (this is where we use the fact that ). Evaluate the resulting message at zero.

We verify that this procedure outputs with probability at least 2/3. The expected distance over all lines through is less than rather than because the input to and is no longer uniformly distributed over all of . Markov’s inequality then tells us that the probability that this distance is at least is at most , and we’re done by uniqueness of the polynomial less than away from Reed-Solomon codeword .

To convert our corrector to a decoder, we need a systematic encoding.

**Systematic Encoding **

We begin by encoding -bitstring messages as an extensions to the domain of degree 1 in each variable. Indeed, for any such , define by

.

This construction generalizes from to arbitrary subsets of , giving:

**Result 6**: For all , finite fields , and maps , there exists an extension of degree at most in each variable.

Our encoding will send any first to such an by sending each input in efficiently to some element of and then to the given by result 6. Let’s check that our criteria for encoding are met:

1) This encoding is certainly systematic.

2) To specify completely requires additions for each of the inputs. Taking to be of size and to be gives additions for each of the inputs, so encoding time is correct.

3) Codeword length is .

Because the total degree of is at most , we need so that .

**Converting to Boolean**

The last step will be to make the alphabet of the code binary. We have the following “concatenation” result:

**Lemma**: If has minimum distance and has minimum distance , then their concatenation given by has minimum distance .

**Proof**: Take any two messages , and and differ in at least positions, and in these of all positions , and differ by at least of all letters, so distance is multiplicative under concatenation.

To locally decode a concatenation of and , then, we simply apply the -decoding algorithm for the inner code to retrieve the letters and then run the local -decoding algorithm on the -codeword these letters form.

We’d like to concatenate with a code with a binary alphabet, and the most convenient one that has a local decoding is the Hadamard code:

**Lemma**: For all and , the Hadamard code of message length has a local -decoding algorithm running in time .

**Proof**: The identity map is a systematic encoding, so it suffices to find a -decoding algorithm running in the desired time. We have some that we’d like to compute for any given input with high probability given oracle access to some word close to it. Because. By linearity of Hadamard codewords, , where will denote exclusive-or. If we pick a random , the probability that differs from at either or is at most , so if we run this sufficiently many times (polynomial in ) on different and take a majority vote on the output of , then we can get with nontrivial probability.

By concatenating our Reed-Muller code with a Hadamard code to get a code with codeword length polynomial in and for which there exists a local 1/48-decoder running in time , giving us our first result in hardness amplification.

**Result 7**: If there exists an in that is worst-case hard in nonuniform time , there is an in that is -average-case-hard.

We now want to amplify worst-case hardness into arbitrarily large average-case hardness . To get input length linear in the original input length, we will use so-called local *list*-decoding. Morally, given oracle access to some , a list-decoding algorithm probabilistically outputs a bunch of guesses for codewords close to , one of which can be decoded to be equal to the target message close to with high probability. It will turn out to be helpful to define a list-variant of local correcting as well, and again, its definition will essentially be the same as that of decoding with all instances of the word “message” replaced by “codeword.” Formally:

**Definition 9**:** **A **local -list-decoding algorithm **for an encoding is a probabilistic algorithm that, given oracle access to a received word , and an input position , takes place in two steps using algorithms where probabilistically outputs advice strings such that for any message whose encoding is -close to , with probability at least 2/3 for one of the advice strings , where probability is taken over the distribution of collections of advice strings output by .

**Definition 10:** A **local -list-correcting algorithm ** for an encoding is a probabilistic algorithm that, given oracle access to a received word , and an input position , takes place in two steps using algorithms where probabilistically outputs advice strings such that for any codeword -close to , with probability at least 2/3 for one of the advice strings , where probability is taken over the distribution of collections of advice strings output by .

Analogously to result 4 we get local list-decoders from local list-correctors given a systematic encoding, with the encoding adding an overhead polylogarithmic in the message length.** **Analogously to result 3 we get hardness amplification out of local list-decoding, with time slowdown by a factor of where is the time for . The proof for the former remains the same, whereas the proof for the latter is differs slightly from its counterpart because makes use of nonuniformity not in the form of fixed coin tosses but in the form of an advice string .

The good news is that our discussion on systematic encoding applies just as soundly in the case of list-decoding, and moreover we already know the Hadamard code can be -list-decoded globally (see result 3 here). What remains is to construct a local -list -corrector, and it turns out we can do this with the Reed-Muller code as well.

**Result 8**: There is some fixed such that for all , the -ary, -dimensional, degree- RM code has a local -list-corrector running in time polynomial in and .

**Proof**: Call our unknown codeword . The corrector is very similar to the one used in result 5: reduce the dimension of the code to 1 by restricting our oracle to a line through and use Reed-Solomon (global) list-decoding on this line. In particular, will pass to advice strings including a point on , and will use Reed-Solomon list decoding to find all univariate polynomials close to , and the intuition is that if only one of these polynomials contains the point we know to be on , we’ve probably properly corrected received word .

Specifically, the algorithm is as follows: in order to ensure passes in a point on , just pick some and, because it doesn’t cost much, pass all points of the form so that one of the advice strings will be the right one. then -list-decodes restricted to the line going through and to get polynomials -close to . If exactly one of these passes through , output its value at 0.

As long as we can compute something -close to , we can use our unique corrector of result 5 to compute the actual with probability 2/3.

We will show that for at least half of all , for more than of the lines passing through we have that is the unique univariate polynomial which is close to and which, evaluated at 1, is equal to .

It suffices to prove that for a randomly chosen and line through , we can tune the parameter in to make the probability that this condition holds arbitrarily large.

We first prove that the probability that and agree on at most inputs is arbitrarily small. The point is that if we pick our line randomly, the samples of points from this line are certainly not independent, but they are pairwise independent. Recalling Result 3 from here and noting that the expected agreement between and is at least , we obtain that , which we can make arbitrarily small by making arbitrarily large.

Lastly, we prove that the probability that there exists a univariate polynomial other than that is -close to such that is also arbitrarily small. The key is 1) there are only univariate polynomials that close to to begin with, and each agrees with it at most points. So if we first pick an arbitrary line and then a point , then for any line chosen, the union bound tells us that the probability over all on this line that for any of these is at most , and because , , so again, we can take arbitrarily small to make this probability small as well.

As we stated earlier, we can directly use the results on systematic encoding and concatenation above to get an explicit code with a local -list-decoder. Specifically, as before we set dimension to be so that degree is bounded above by and . The running time depends linearly on alphabet size , so our decoding algorithm runs in time .

**Conclusion**

With this Reed-Muller code, we can send a function that is worst-case hard for time to a function that is -average-case hard, where . Such a function is certainly also -average-case hard, and result 3 thus tells that:

**Corollary**: Fix a time function that is computable in time exponential in its input. If there is some in **E** that on length- inputs is worst-case hard for time , there is a mildly explicit -generator with seed length .

In particular, if in the above result, then we would get that **BPP = P**! Even if the latter doesn’t turn out to be true and no such function in **E** has such high circuit complexity, this would suggest that problems in **E** in like satisfiability don’t have as high circuit complexity as previously thought.

**Preliminaries**

Roughly speaking, generators are deterministic functions that stretch uniformly random seeds into bits that appear random. It turns out that our methods of quantifying “randomness” using both min-entropy and statistical difference from the uniform distribution are insufficient, because for any random variable and deterministic . In place of statistical or information theoretic measures, we thus use a complexity theoretical one: a variable will seem random if it is impossible for a so-called nonuniform algorithm running within a certain time limit to tell it apart from a uniformly random variable. We first define nonuniformity:

**Definition 0**: An algorithm deciding a language is **-nonuniform** for if there exist **advice bitstrings** each of length at most such that there exists a language decided by some algorithm where iff .

We will not make use of this definition in its full generality; it will suffice to talk about algorithms which operate with some “advice” hardwired in, and the length of this advice will be reasonable enough that we won’t bother with the data of the length function . We can now formalize our abovementioned ideas of “randomness” and pseudorandom generators.

**Definition 1**: Random variables and are (nonuniformly) ** computationally indistinguishable **if for every nonuniform algorithm running in time at most , .

**Definition 2**: A **-pseudorandom generator** is a deterministic function such that and are -computationally indistinguishable.

**Derandomizing BPP with Generators**

It turns out that if we can find such a generator, we can effectively derandomize BPP. We will refer to generators as “computable in ” if we can determine the number of random bits needed to stretch to bits and run a uniform deterministic algorithm to calculate the generator’s output both in time . Later on, we will refer to generators as **mildly explicit** if they are computable in time polynomial in and .

**Result 1**: If for every there exists a pseudorandom generator computable in time , then **BPP** is strictly contained in** ** **DTIME**.

**Proof**: Take a **BPP** algorithm for input and coin tosses , which we will take WLOG to be the circuit size simulating for some constant . To decide for , we can just write down for all sets of coin tosses and take the majority decision to be the output of our derandomized algorithm, and this takes as desired. To show that decides correctly more than half the time for every input , it suffices to observe that if it does not for some , the probability that outputs 1 exceeds by at least (the 1/3 comes from our definition of the error bound for **BPP** algorithms), contradicting the pseudorandomness of .

Tantalizingly, by substituting in asymptotics, this statement tells us that if we can find a mildly explicit -generator with seed logarithmic in output length, we can derandomize all of **BPP** into **P**! We will proceed beginning with the weakest possible assumptions about generators, and our final result will tell us either that we can indeed derandomize all of **BPP** into a subset **E**=**DTIME** of the **DTIME **class given above, or at least guarantee the existence of significantly faster algorithms for SAT and other common NP-complete problems!

To do this, we reduce the problem of finding mildly explicit generators to finding a function in **E **that is “hard on average to compute,” then use hardness amplification to show that it suffices to find a function in **E** that is only “hard to compute” in the worst case. We first define both notions of hardness:

**Definitions 3,4**: A Boolean function is **-average-case hard** if for every nonuniform probabilistic algorithm running within time , .** **On the other hand, is **-worst-case hard **if for all such algorithms , there exists some input such that .

We first show how to get a good generator out of an average-case hard function in , we first introduce the notion of next-bit unpredictability and its connection to generators:

**Definition 5**: A random variable on is -next-bit-unpredictable if for every and every nonuniform probabilistic algorithm (“predictor”) running within time , , where denotes the th bit of the value that takes on and where the probability is taken over all of and coin tosses

**Result 2**: If on is -next-bit-unpredictable, it is -pseudorandom.

**Proof**: Assume to the contrary that is not -pseudorandom so that there exists some nonuniform algorithm in time that distinguishes between and the uniform distribution on with advantage greater than . The key is that this advantage allows to predict future bits better than a random coin toss. What follows is a so-called “hybrid argument”: define to be the bitstring-valued random variable , where is the uniform distribution on . Because , there is some such that . Rewriting as

,

we find that . Now define a predictor by , where are uniformly random bits. By construction, the probability that the predictor succeeds exceeds , and by using the random bits as advice, we get a predictor that runs in the same time as , a contradiction of next-bit unpredictability.

**Remark**: In fact the converse is also true: -pseudorandomness implies -next-bit-unpredictability; the contrapositive is given by constructing an algorithm that outputs a 1 when the predictor succeeds and a zero otherwise, and the additive constant comes from adding logical gates to ‘s circuit to check for this success.

We can now prove our first main result, the reduction of finding mildly explicit generators to finding average-case-hard functions in **E**:

**Result 3**: Fix a time function that is computable in time exponential in its input. If there is some in **E** that on length- inputs is -average-case hard, there is a mildly explicit -generator with seed length .

**Proof**: As it turns out that with the above result, we’re almost done; we’ll just need appropriate asymptotics on seed length. The gist will be to evaluate our hard function on inputs that share only a few bits. Define an design to be a family of subsets of each of cardinality and mutually intersecting in at most elements. The key to getting the desired asymptotic bounds is the following lemma:

**Lemma**: For every and , there exists an -design such that and .

**Proof**: The proof is nonconstructive. We will show that an -design will exist as long as is bounded above by . Fix , and the probability that intersect in at least elements is . This tells us that for any , the expected number of sets among , by our choice of bound on , is strictly less than 1, so an -design on subsets is indeed possible. Stirling’s approximation on our inequality then gives the desired result.

Given an design and a function , we can get a generator by evaluating on the input restricted to each subset in the design. The generator given by is called the **Nigan-Wigderson generator**

**Lemma**: If is hard, then is -pseudorandom.

**Proof**: As usual, we prove the contrapositive. Say is not pseudorandom so that by our result on next-bit unpredictability, we have a -predictor and an index such that with probability more than . We can reduce the random choice of to a random choice of . Specifically, there must exist some fixed choice of bits for such that if we define to be the random variable such that equals this fixed choice and is uniformly distributed over bitstrings of length , then define the algorithm , which will compute the function with error less than . Because can be computed by storing as advice a lookup table of size at most and doing lookups with a circuit of size , the total time of is at most , so is not -hard, as desired.

The proof of result 2 follows by picking the appropriate parameters.

**Hardness Amplification I: Worst-Case Hard to Constant Average-Case**

Our next goal will be to construct a constant average-case-hard function out of a worst-case hard function using list decoding.

It turns out that if we view a worst-case hard function as a message and pass this through an encoding for which there exists an efficient decoding, then the codeword will be average-case hard. Unfortunately, decoding as we’ve defined it requires reading in all of the bits of the received word and writing out all of the bits of the decoded message, taking time that we might as well have used to make a gigantic truth table to compute to begin with. Instead, we introduce the following variant of decoding:

**Definition 6**: A **local -decoding algorithm** for an encoding is a probabilistic algorithm that, given oracle access to a received word and an input position , computes with probability 2/3 the value for any message whose encoding is -close to . For coin tosses , the output of the algorithm will be denoted by , and probability is taken over all .

**Result 3**: If the message in the above definition is worst-case hard for nonuniform time and runs in nonuniform time at most , then is -average-case hard.

**Proof: **Say that is not average-case hard so that there exists some algorithm running in time such that . There must exist some choice of coin tosses such that, defining deterministic nonuniform algorithm , we have . But this is just another word in the space , so this bound translates into . By definition, the algorithm computes to within constant error, and the circuit representing this algorithm is the circuit for with oracle gates replaced by the circuit for , so is not -worst-case-hard.

Our hardness amplification will reduce to a worst-case-hard function running in asymptotically the same time as the average-case-hard function, so we will want an encoding . Because our final result will be a statement about functions in exponential deterministic time, we also want our encoding to be of this complexity . We will also want the decoding not to take too much time, but we will determine later what the runtime for should be precisely. Lastly, to get rid of the term in our lemma on Nigan-Wigderson generators, we will want the code to have a binary alphabet.

]]>—

In this post, I’ll cover some of what I think are the coolest ideas in Vadhan’s survey, the principles unifying the main pseudorandom objects studied. We will continue where we left off in the last post by illustrating a connection between list encoding expanders. We then introduce the concept of randomness extractors and relate these to hash functions, expanders, and list encoding.

NOTE: This post only touches the surface on what Vadhan covers here.

**List Encoding and Expanders**

For , a target subset , and probability parameter , define to be the set of for which . Likewise, for a “filtering” function , define to be the set of for which . Taking to be the characteristic function on , we get .

As a sanity check, taking for a given encoding to be given by , where denotes the bits of given by and where we view as , we readily get out another definition for list-decodability:

**Result 1**: For an , define to be the set of all pairs for . Then the encoding is -list decodable iff for all .

Now taking to be the neighbor function for a -regular bipartite graph with left vertices and right vertices, i.e. so that is the th neighbor of left-vertex , we readily get another definition of vertex expansion in the case that only sets of *exactly * vertices expand- we will refer to such graphs as -vertex expanders:

**Result 2: ** with neighbor function is an -vertex expander iff for every set of fewer than right-vertices, .

With these connections in mind, we proceed to prove that we can get a very good expander, albeit with nonconstant degree, out of a Parvaresh-Vardy code.

For the -ary Parvaresh-Vardy code of degree , redundancy , irreducible , and power , consider the -regular bipartite graph with left vertices indexed by and right vertices indexed by , with neighbor function given by .

**Result 3: ** is an -expander.

**Proof**: We will tune the parameters for maximum set size and expansion as we go. We will first prove that is an expander. For this, result 7 tells us that we want to show that for all of cardinality less than .

The gist of the proof is essentially the same as our proofs of efficient list-decodability: we will find a “characteristic” polynomial that all elements of solve, then observe that the roots of include all elements of . Once we have shown that is an expander, we can tweak our proof to get expansion for all sets of cardinality less than as well.

Take polynomial of degree less than in and less than in each of the other variables. We want to be solved by all elements in , so as long as the number of unknowns exceeds the number of constraints . So it’ll be convenient to take . As usual, we’ll assume that has no factors of .

Now we want that for every that , but this is already true for all choices of . In order for this univariate polynomial in to be the zero polynomial, we’d like that the number of these roots exceeds the degree of the polynomial, so let’s take to be . With this in mind, we can simply factorize the polynomial to get our elements . In particular, the number of such elements is bounded above by the degree of in , which is as desired.

We can easily tweak the above argument to work for smaller values of than : build our out of monomials where and (instead of , where is the monomial of degree in so that . By construction, our bound for the existence of such a still holds, and the degree of that bounds the cardinality of is still strictly bounded above by .

By picking our parameters , , , carefully, we get a remarkably good expander.

**Result 4**: For all , maximum subset size , , and tradeoff parameter , there exists a fully explicit , expander with left-vertices, left-degree , and right vertices.

**Proof**: Because the number of left vertices is , we take to be . The number of right-vertices is . We’d like to be close to where . Take to be something (we’ll specify this later) in , and take to thus be , and this will give us the desired upper bound on . To get the desired expansion factor, we must show that . But and , so taking gives the desired lower bound on expansion. By this choice of , we also get the desired asymptotic growth for left-degree .

What remains is to choose an appropriate . We want an explicit expander, so taking to be a power of 2 allows us to describe the field we’re working with in time polynomial in and , completing the proof.

Note that the parameter gives a tradeoff between the size of the right half of the graph and the size of its left-degree. More importantly, we get arbitrarily-close-to-optimal expansion at the cost of polylogarithmic degree.

**Randomness Extractors – Preliminaries**

We now introduce the notion of randomness extractors. Morally, these will be maps which smear “weakly random” variables into variables whose distributions are pretty close to perfectly random. These will be helpful both in running BPP algorithms using weakly random sources of bits, say atmospheric radio noise or the low-order bits of the system clock. Formally, a **source** on will denote a random variable taking values in . We’ll first need a notion of statistical “closeness”:

**Definition 1**: Two random variables on have **statistical difference** given by , where the maximum is taken over all subsets of . If , we say that and are -close.

It turns out this condition is equivalent to the following:

**Definition 1’**: , where here denotes the norm.

With this in mind, we can define a simple class of extractors:

**Definition 2**: A map is a **deterministic -extractor** for a class of sources if is -close to the uniform distribution over for all sources .

We will restrict our attention to sources which we know contain at least a certain amount of randomness.

**Definition 3**: A **-source** is a random variable such that for all . In other words, a -source has min-entropy bounded below by .

It turns out that for any -source, random functions are good extractors. To prove this, it suffices to prove these for the “basis elements” of the class of -sources, so we begin with a lemma and a definition.

**Definition 4**: The **flat -source on ** for is the uniform distribution on .

**Lemma 1**: Every -source on is a linear combination of flat -sources , where .

**Proof**: The picture is of a circle broken up into half-open intervals of length . Pick a collection of points evenly spaced around the circle. Because is a -source, each contains at most one point of , and let denote the set of such that contains a point. Picking a point from where is a randomly chosen rotation of gives with probability . Then .

Then we get the following out of the Chernoff bound given in a previous post.

**Result 5**: For every , every flat -source , and every error parameter , a randomly chosen function for is a deterministic extractor for with probability .

Unfortunately, we can’t find extractors that work for all -sources.

**Result 6**: There are -sources for which there exist no deterministic -extractors for arbitrarily small .

**Proof**: Specifically, even for any map , we can easily construct an -source for which is constant. The image of hits 0 and/or 1 at least times, so the uniform distribution on the preimage of this bit is our desired -source.

Morally, the reason for this is that there are too many flat -sources. To get around can loosen our deterministic definition by introducing a purely random seed:

**Definition 5**: A function is a ** extractor of seed ** if for any -source on , is -close to .

Now we can show (nonconstructively) that there are extractors that work for all -sources, provided we use a long enough random seed.

**Result 7**: For all , there is an -extractor of seed with image in for .

**Proof**: Again, we just need to show that there is an extractor that maps all *flat *-sources -close to . We want to show that the probability that a random extractor fails to do this is below 1. By the union bound and the previous result for deterministic extractors, we get an upper bound on the probability of failure of for , and Stirling’s approximation gives an upper bound of , and tuning the parameters to make this upper bound less than 1 gives the desired bound on seed length.

Having motivated seeded extractors, we now proceed to give a straightforward connection between extractors and expanders and then a result due to Impagliazzo, Levin, and Luby relating extractors to hash functions.

**Extractors and Expanders**

The signature of any given seeded extractor is highly suggestive of a neighbor function for a bipartite -regular graph which we will denote by , with left vertices and right vertices. The th neighbor of vertex in is .

It turns out that we can translate the condition of being a -extractor into something beautifully (and usefully) reminiscent of the Expander Mixing Lemma:

**Result 8**: is an -extractor iff satisfies, for every set of left-vertices and every set of right-vertices ,

.

**Proof**: The condition for a function to be a -extractor is that for every flat -source and , we have

.

We can translate into the probability that an edge leaving in is in the cut from to , i.e. . Likewise, we can translate to the probability that a right vertex lies in , i.e. , and we get the desired result.

This result tells us we can import results on spectral expanders to constructions of extractors. In particular, by taking the on the right-hand side of the Expander Mixing Lemma to be at most , it suffices to take a spectral expander with expansion .

Applying this to a Ramanujan graph, i.e. one of optimal spectral expansion , we find when we solve for that taking the corresponding extractor gives a -extractor with an output of .

**Extractors and Hash Functions**

The connection between hash functions as considered in a previous post about random-efficient error reduction and extractors is equally evocative: an extractor for the flat -sources for set can be thought of as a hash function that maps almost uniformly to its image. We have the following result due to Impagliazzo, Levin, and Levin saying that if we use the purely random bits needed to choose a hash function out of a pairwise independent family, we get a good seeded extractor.

**Result 9** (Leftover Hash Lemma) The map for given by is a -extractor, where is a hash function taken from the pairwise independent family .

**Proof**: Take to be any -source and to be the random variable uniformly distributed on the family . The key is to recall the alternative definition of statistical difference in terms of the -norm and show that the collision probability, which is closely related to the -norm, of is close to .

Recall that . We have a collision of when the hash functions agree and when either the ’s agree or they disagree but are hashed to the same value. This occurs with probability at most . So the -norm of the difference is bounded above by .

Cauchy-Schwarz tells us that the ratio of the -norm to the -norm is bounded above by , where is the dimension of the underlying vector space, so the statistical difference between and is bounded above by , so we’re done.

The result gets its name from the fact that morally, this result says that if an adversary has acquired a certain number of bits of a secret key of ours, we can still create a key out of the remaining bits of which the adversary has very little knowledge.

**List Encoding and Randomness Extractors**

Lastly, we place randomness extractors under this unifying framework of list-decodability. Our will be our extractor .

**Result 10a**: If is a -extractor, then for all , .

**Proof**: Say that and let be the -source uniformly distributed on . Then by construction the expected value of is more than , implying that and are -far, a contradiction!

There is a “converse” that involves coarser functions which only take discrete bit values (for this reason, we will consider the set formulation of instead), more entropy, and more error; these turn out to be insignificant for extractors.

**Result 10b**: If for every we have that , then is a .

**Proof**: lies in either if falls in or if it doesn’t and anyways, where is any -source. Some calculation tells us that and are indeed -close.

We conclude with an explicit connection between list encoding and extractors. The extractor corresponding to an encoding is . One direction of correspondence we get just out of the parallel between the list-decoding formulation of list-decodability and randomness extractors:

**Result 11a**: If is a -extractor, then is -list decodable.

There is again a slightly worse converse which says that in the other direction, the error blows up unless the alphabet of our code is small.

**Result 11b**: If is -list decodable, then is a -extractor.

**Proof**: Take a -source and the uniform distribution over . We want to show that and are -close. The statistical difference is equal to the expected value of taken over all from , and the definition of statistical difference tell us this is bounded above by , and by list decodability, defining to be the set of where is the choice that maximizes the probability in this expression, we find the statistical difference is bounded above by . We can split this probability up into the cases that lies or does not lie in , and then a bit of computation gives us the desired -closeness.