A small-bias space is a distribution over the Boolean hypercube which fools parity tests. Since they were first introduced by Naor/Naor, these objects have been of central importance to the study of pseudorandomness and have found applications in constructing pseudorandom generator fooling small-width branching programs, derandomizing certain algorithms, constructing -independent hash functions, and obtaining short PCPs.

In this post we examine four constructions, including an extremely simple construction over by Alon/Mansour (2002) which generalizes one due to Alon/Goldreich/Hastad/Peralta (1992), another construction based on linear feedback shift registers due to AGHP, and Naor/Naor’s original construction (1990) based on -wise independence (or alternatively Justesen codes) and random walks on expander graphs.

**First definitions and Alon/Mansour’s -biased set**

We first formalize the notion of bias over and present Alon/Mansour’s construction before restricting our attention to the case of for our remaining three constructions.

**Definition 1**: A probability distribution over is an **-biased space** if for all nonzero “test vectors” . Here, is the th root of unity .

In particular, an equivalent condition in the case of is that for any nonzero , . Any for which we will call a **distinguisher**.

**Definition 2**: A set is an **-biased set** if the uniform distribution over is an -biased space.

Alon/Mansour give an -biased set of size .

For , let denote the tuple and the collection of all such for .

**Theorem 1**: is an -biased set.

**Proof**: From our definitions we know the bias of with respect to a test vector is . We can group the summands by the exponent of to get , and the key observation is that massive cancellation happens among the terms with a factor of for contribute nothing to the bias. Indeed, for a fixed nonzero exponent and nonzero test vector , for each for which there is exactly one such that , so . For , has at most zeroes so that . So because , our sum reduces to for any , and this is at most , concluding the proof.

In a later blog post, we will show how this simple construction can be used to derandomize Mansour’s algorithm for interpolating sparse polynomials (1992).

Unfortunately, this construction is useless unless , and in particular, it’s useless when , the case that initial study of bias focused on. That said, we can slightly tweak the argument by replacing with and with , where we’ve naturally identified with the vector space to obtain an inner product, getting one of AGHP’s original constructions:

**Theorem 2**: There is an -biased set of size .

The next construction by AGHP is motivated by linear feedback shift register sequences (LFSRs) and likewies gets -bias out of a set of size roughly :

**LFSRs and another construction by AGHP**

Our analysis of the bias in the following construction follows from two facts about polynomials: 1) the number of irreducible monic polynomials of degree is , and 2) (more obviously) there are at most irreducible monic polynomials of degree which divide a polynomial of degree .

Now consider the following construction inspired by LFSRs and more fundamentally by polynomial long division:

**Definition 3**: For start sequence and feedback rule , the \emph{shift register sequence generated by and } is the bitstring where for and for .

The point is that if we think of the ‘s as monomials in some indeterminate so that the ‘s as polynomials in , then as a polynomial in is precisely the remainder of when divided by .

Given this view, we will only concern ourselves with feedback rules such that is irreducible. These are called \emph{non-degenerate feedback rules}.

**Theorem 3**: The set (of size by fact 1 above) of all shift register sequences generated by any starting sequence and nondegenerate feedback rule is an -biased set.

**Proof**: By our discussion above, is a linear combination of remainders with respect to , with coefficients all zero iff . If is a linear combination with coefficients not all zero, it’s some nontrivial linear combination of all and thus has nonzero bias. So the bias is precisely the probability that a randomly chosen irreducible monic polynomial of degree divides . By facts 1) and 2) above, we’re done.

There is one more -biased set of comparable size due to AGHP that we will present in our next blog post about using the “randomness” of quadratic characters via Weil’s character sum estimate. For now, we see how to improve linear dependence on at the cost of worse (but still polynomial) dependence on .

**Naor/Naor’s Construction**

The argument here is much more involved and takes place in two steps: 1) construct a polynomial-sized family of bitvectors such that for any test vector , the probability that is a distinguisher is at least some constant , 2) using a walk on an expander, sample enough vectors from such that for any , with probability at most all of the are 0.

**Proposition 1**: The vectors generated by our expander walk span an -biased set.

**Proof**: It suffices to show that when there is some distinguisher of , then for a random subset , is a distinguisher with probability 1/2. If is a distinguisher, then with probability 1/2, is already not in . If is not a distinguisher, then with probability 1/2, is in , and it is easy to check that the sum of a distinguisher and a non-distinguisher is a distinguisher, so we’re done.

We next show that we can generate such a collection of vectors using a random walk on an expander. From step 1), we have a probability of picking a distinguisher from , and we’d like to reduce the probability of failing to draw a distinguisher to over steps. See \url{https://formalreasons.wordpress.com/2013/12/25/random-efficient-error-reduction-part-2-expander-graphs/} for our discussion from a while back about how to do this with an expander walk. The only subtelty in applying this method is that is a bit too small, so we should let each vertex of our expander correspond to a set of assignments for sufficiently large. By our discussion about error reduction with expanders, we need random bits, where is the degree of the expander.

Finally, we need to show that we can construct . There is a very nice way to do this using Justesen codes, but here we present an argument that does not depend on the fact that we’re working with .

At a high level, we will split the possible test vectors into buckets based on the number of nonzero coordinates, sample with high probability a distinguisher for each scenario, and add a random subcollection of these distinguishers together so that with high probability what we end up with distinguishes the actual test vector . Throughout, we’d like to use -wise rather than complete independence to save on randomness.

Our buckets will be the set of for which the number of nonzero coordinates is in . Fix such a bucket and say that we know our test vector comes from that bucket. For convenience, denote by . We would like to somehow remove all but a constant nonzero coordinates from and then generate our distinguisher -wise independently to obtain a distinguisher with probability . Because we don’t actually know our adversary’s test vector, however, we achieve the same effect by approximating the removal from a -wise independently generated vector.

Specifically, using pick a random bitstring of such that the entries are individually purely random but are jointly -wise independent, and let be the “mask” that approximates the removal procedure, a bitstring whose entries are jointly pairse independent but such that for each .

**Proposition 2**: Define the bitstring by . is a distinguisher of with probability 1/4.

**Proof**: This is equivalent to saying that defined by is distinguished by with probability 1/4. If is the random variable corresponding to the number of nonzero coordinates in , it suffices to show that for some constant chosen at the beginning, is high. By pairwise independence of the , and , and a bit of calculation using Chebyshev’s gives that for , our falls in the desired range with probability at least 1/2, and we’re done.

We generate one such for each of the buckets, using the same random bits to generate the ‘s for each bucket. We finish the proof by noting that by the same argument of Proposition 1, a sum of a random subcollection of these ‘s is a distinguisher of with probability at least .

In total, we thus require random bits, giving an -biased set of size .

**Conclusion**

So far, we’ve seen that we can achieve small bias using polynomials, -wise independence, linear codes, and expander graphs. In the next section, we’ll look at exploiting the randomness of quadratic characters and examine a third construction by AGHP using this, as well as an extension, due to Azar/Motwani/Naor, of this character approach to .