Some Elementary Constructions of Small-Bias Spaces, Part 1

A small-bias space is a distribution over the Boolean hypercube \{0,1\}^n which fools parity tests. Since they were first introduced by Naor/Naor, these objects have been of central importance to the study of pseudorandomness and have found applications in constructing pseudorandom generator fooling small-width branching programs, derandomizing certain algorithms, constructing (k,\epsilon)-independent hash functions, and obtaining short PCPs.

In this post we examine four constructions, including an extremely simple construction over \mathbb{Z}^n_p by Alon/Mansour (2002) which generalizes one due to Alon/Goldreich/Hastad/Peralta (1992), another construction based on linear feedback shift registers due to AGHP, and Naor/Naor’s original construction (1990) based on k-wise independence (or alternatively Justesen codes) and random walks on expander graphs.

First definitions and Alon/Mansour’s \epsilon-biased set

We first formalize the notion of bias over \mathbb{Z}^n_p and present Alon/Mansour’s construction before restricting our attention to the case of p = 2 for our remaining three constructions.

Definition 1: A probability distribution X over \mathbb{Z}^n_p is an \epsilon-biased space if \left|E_{x\leftarrow X}\left[\omega^{<\alpha,x>}\right]\right|\le\epsilon for all nonzero “test vectors” \alpha\in\mathbb{Z}^n_p. Here, \omega is the pth root of unity e^{2\pi i/p}.

In particular, an equivalent condition in the case of p = 2 is that for any nonzero \alpha\in\{0,1\}^n, \left|\Pr_{x\leftarrow X}(<\alpha,x> = 0) - \Pr_{x\leftarrow X}(<\alpha,x> = 1)\right|\le\epsilon. Any x for which <\alpha,x>\neq 0 we will call a distinguisher.

Definition 2: A set S\subset\mathbb{Z}^n_p is an \epsilon-biased set if the uniform distribution over S is an \epsilon-biased space.

Alon/Mansour give an \frac{n-1}{p-1}-biased set of size (p-1)^2.

For x,y\in\mathbb{Z}^*_p, let v_{x,y} denote the tuple (y,xy,x^2y,...,x^{n-1}y) and A^n_p the collection of all such v_{x,y} for x,y\in\mathbb{Z}^*_p.

Theorem 1A^n_p is an \frac{n-1}{p-1}-biased set.

Proof: From our definitions we know the bias of A^n_p with respect to a test vector \alpha is \frac{1}{(p-1)^2}\left|\sum_{x,y\in\mathbb{Z}^*_p}\omega^{<\alpha,v_{x,y}>}\right|. We can group the summands by the exponent of \omega to get \frac{1}{(p-1)^2}\left|\sum_{a\in\mathbb{Z}_p}n_{a,\alpha}\omega^{a}\right|, and the key observation is that massive cancellation happens among the terms with a factor of \omega^a for a\neq 0 contribute nothing to the bias. Indeed, for a fixed nonzero exponent a and nonzero test vector \alpha, for each x for which \sum_i\alpha_ix^i\neq 0 there is exactly one y such that <v_{x,y},\alpha> = a, so n_{a,\alpha} = n_{a',\alpha}. For a = 0, \sum_i\alpha_ix^i has at most n-1 zeroes so that n_{0,\alpha}\le (n-1)(p-1)^2. So because \sum^{p-1}_{i=1}\omega^i = -1, our sum reduces to n_{0,\alpha}-n_{a,\alpha} for any a\neq 0, and this is at most (n-1)(p-1), concluding the proof.

In a later blog post, we will show how this simple construction can be used to derandomize Mansour’s algorithm for interpolating sparse polynomials (1992).

Unfortunately, this construction is useless unless n<p, and in particular, it’s useless when p=2, the case that initial study of bias focused on. That said, we can slightly tweak the argument by replacing p with 2^m+1 and v_{x,y} with (<y,1>, <y,x^1>,...,<y,x^{n-1}>, where we’ve naturally identified \mathbb{Z}^*_p with the vector space \{0,1\}^m to obtain an inner product, getting one of AGHP’s original constructions:

Theorem 2: There is an \frac{n-1}{2^m}-biased set of size 2^{2m}.

The next construction by AGHP is motivated by linear feedback shift register sequences (LFSRs) and likewies gets \epsilon-bias out of a set of size roughly (n/\epsilon)^2:

LFSRs and another construction by AGHP

Our analysis of the bias in the following construction follows from two facts about polynomials: 1) the number of irreducible monic polynomials of degree m is (1+O(2^{-m/2}))\frac{2^m}{m}\approx\frac{2^m}{m}, and 2) (more obviously) there are at most \frac{n-1}{m} irreducible monic polynomials of degree m which divide a polynomial of degree n-1.

Now consider the following construction inspired by LFSRs and more fundamentally by polynomial long division:

Definition 3: For start sequence s\in\{0,1\}^m and feedback rule f\in\{0,1\}^m, the \emph{shift register sequence generated by f and s} is the bitstring r\in\{0,1\}^n where r_i = s_i for 0\le i<m and r_i = \sum^{m-1}_{j=0}f_jr_{i-m+j} for i\ge m.

The point is that if we think of the s_i‘s as monomials t^i in some indeterminate t so that the r_i‘s as polynomials in t, then r_i as a polynomial in t is precisely the remainder of t^i when divided by f(t) = t^m + \sum^{m-1}_{j=0}f_jt^j.

Given this view, we will only concern ourselves with feedback rules such that f(t) is irreducible. These are called \emph{non-degenerate feedback rules}.

Theorem 3: The set A^{2m}_n (of size \frac{2^{2m}}{m}(1+O(2^{-m/2})) by fact 1 above) of all shift register sequences generated by any starting sequence s\in\{0,1\}^m and nondegenerate feedback rule f\in\{0,1\}^m is an \frac{n-1}{2^m}(1+O(2^{-m/2}))-biased set.

Proof: By our discussion above, <\alpha,r> = \sum^{n-1}_{i=0}\alpha_ir_i is a linear combination of remainders with respect to f(t), with coefficients all zero iff \sum^{n-1}_{i=0}\alpha_it^i\equiv 0\pmod{f(t)}. If <\alpha,r> is a linear combination with coefficients not all zero, it’s some nontrivial linear combination of all s_i and thus has nonzero bias. So the bias is precisely the probability that a randomly chosen irreducible monic polynomial f(t) of degree m divides \sum^{n-1}_{i=0}\alpha_it^i. By facts 1) and 2) above, we’re done.

There is one more \epsilon-biased set of comparable size due to AGHP that we will present in our next blog post about using the “randomness” of quadratic characters via Weil’s character sum estimate. For now, we see how to improve linear dependence on n at the cost of worse (but still polynomial) dependence on 1/\epsilon.

Naor/Naor’s Construction

The argument here is much more involved and takes place in two steps: 1) construct a polynomial-sized family \mathcal{F} of bitvectors such that for any test vector \alpha, the probability that x\leftarrow\mathcal{F} is a distinguisher is at least some constant \beta, 2) using a walk on an expander, sample enough vectors r_1,...,r_l from \mathcal{F} such that for any \alpha, with probability at most \epsilon all of the <\alpha,r_i> are 0.

Proposition 1: The vectors r_1,...,r_l generated by our expander walk span an \epsilon-biased set.

Proof: It suffices to show that when there is some distinguisher r_j of \alpha, then for a random subset S\subseteq[l], \sum_{i\in S} r_i is a distinguisher with probability 1/2. If \sum_{S-[j]}r_i is a distinguisher, then with probability 1/2, j is already not in S. If \sum_{S-[j]}r_i is not a distinguisher, then with probability 1/2, j is in S, and it is easy to check that the sum of a distinguisher and a non-distinguisher is a distinguisher, so we’re done.

We next show that we can generate such a collection of vectors r_1,...,r_l using a random walk on an expander. From step 1), we have a probability \beta of picking a distinguisher from \mathcal{F}, and we’d like to reduce the probability of failing to draw a distinguisher to \epsilon over l steps. See \url{https://formalreasons.wordpress.com/2013/12/25/random-efficient-error-reduction-part-2-expander-graphs/} for our discussion from a while back about how to do this with an expander walk. The only subtelty in applying this method is that \beta is a bit too small, so we should let each vertex of our expander correspond to a set of h assignments for h sufficiently large. By our discussion about error reduction with expanders, we need \log|\mathcal{F}| + \log d\cdot O(\log(1/\epsilon)) random bits, where d is the degree of the expander.

Finally, we need to show that we can construct \mathcal{F}. There is a very nice way to do this using Justesen codes, but here we present an argument that does not depend on the fact that we’re working with \mathbb{Z}_2.

At a high level, we will split the possible test vectors \alpha into buckets based on the number of nonzero coordinates, sample with high probability a distinguisher for each scenario, and add a random subcollection of these distinguishers together so that with high probability what we end up with distinguishes the actual test vector \alpha. Throughout, we’d like to use k-wise rather than complete independence to save on randomness.

Our buckets B_i will be the set of \alpha for which the number l of nonzero coordinates is in [2^{i-1},2^i]. Fix such a bucket B_i and say that we know our test vector comes from that bucket. For convenience, denote 2^{i-1} by k. We would like to somehow remove all but a constant c nonzero coordinates from \alpha and then generate our distinguisher c-wise independently to obtain a distinguisher with probability 1/2. Because we don’t actually know our adversary’s test vector, however, we achieve the same effect by approximating the removal from a c-wise independently generated vector.

Specifically, using O(\log n) pick u\in\{0,1\}^n a random bitstring of such that the entries are individually purely random but are jointly c-wise independent, and let w\in\{0,1\}^n be the “mask” that approximates the removal procedure, a bitstring whose entries are jointly pairse independent but such that \Pr(w_i = 1) = 2/k for each 1\le i\le n.

Proposition 2:  Define the bitstring r by r_i = (u_i = 1 \wedge w_i = 1). r is a distinguisher of \alpha with probability 1/4.

Proof: This is equivalent to saying that \alpha' defined by \alpha'_i = (\alpha_i = 1 \wedge w_i = 1) is distinguished by u with probability 1/4. If h is the random variable corresponding to the number of nonzero coordinates in \alpha', it suffices to show that for some constant c chosen at the beginning, \Pr(0<h\le c) is high. By pairwise independence of the w_i, E[h] = pl = 2 and Var[h] = p(1-p)l, and a bit of calculation using Chebyshev’s gives that for c = 2, our h falls in the desired range with probability at least 1/2, and we’re done.

We generate one such r for each of the \log n buckets, using the same random bits to generate the w‘s for each bucket. We finish the proof by noting that by the same argument of Proposition 1, a sum of a random subcollection of these r‘s is a distinguisher of \alpha with probability at least (1/4)\cdot(1/2) = 1/8.

In total, we thus require O(\log n + \log(1/\epsilon)) random bits, giving an \epsilon-biased set of size \frac{n}{\text{poly}(\epsilon)}.

Conclusion

So far, we’ve seen that we can achieve small bias using polynomials, k-wise independence, linear codes, and expander graphs. In the next section, we’ll look at exploiting the randomness of quadratic characters and examine a third construction by AGHP using this, as well as an extension, due to Azar/Motwani/Naor, of this character approach to \mathbb{Z}^n_p.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s