# Some Elementary Constructions of Small-Bias Spaces, Part 1

A small-bias space is a distribution over the Boolean hypercube $\{0,1\}^n$ which fools parity tests. Since they were first introduced by Naor/Naor, these objects have been of central importance to the study of pseudorandomness and have found applications in constructing pseudorandom generator fooling small-width branching programs, derandomizing certain algorithms, constructing $(k,\epsilon)$-independent hash functions, and obtaining short PCPs.

In this post we examine four constructions, including an extremely simple construction over $\mathbb{Z}^n_p$ by Alon/Mansour (2002) which generalizes one due to Alon/Goldreich/Hastad/Peralta (1992), another construction based on linear feedback shift registers due to AGHP, and Naor/Naor’s original construction (1990) based on $k$-wise independence (or alternatively Justesen codes) and random walks on expander graphs.

First definitions and Alon/Mansour’s $\epsilon$-biased set

We first formalize the notion of bias over $\mathbb{Z}^n_p$ and present Alon/Mansour’s construction before restricting our attention to the case of $p = 2$ for our remaining three constructions.

Definition 1: A probability distribution $X$ over $\mathbb{Z}^n_p$ is an $\epsilon$-biased space if $\left|E_{x\leftarrow X}\left[\omega^{<\alpha,x>}\right]\right|\le\epsilon$ for all nonzero “test vectors” $\alpha\in\mathbb{Z}^n_p$. Here, $\omega$ is the $p$th root of unity $e^{2\pi i/p}$.

In particular, an equivalent condition in the case of $p = 2$ is that for any nonzero $\alpha\in\{0,1\}^n$, $\left|\Pr_{x\leftarrow X}(<\alpha,x> = 0) - \Pr_{x\leftarrow X}(<\alpha,x> = 1)\right|\le\epsilon$. Any $x$ for which $<\alpha,x>\neq 0$ we will call a distinguisher.

Definition 2: A set $S\subset\mathbb{Z}^n_p$ is an $\epsilon$-biased set if the uniform distribution over $S$ is an $\epsilon$-biased space.

Alon/Mansour give an $\frac{n-1}{p-1}$-biased set of size $(p-1)^2$.

For $x,y\in\mathbb{Z}^*_p$, let $v_{x,y}$ denote the tuple $(y,xy,x^2y,...,x^{n-1}y)$ and $A^n_p$ the collection of all such $v_{x,y}$ for $x,y\in\mathbb{Z}^*_p$.

Theorem 1$A^n_p$ is an $\frac{n-1}{p-1}$-biased set.

Proof: From our definitions we know the bias of $A^n_p$ with respect to a test vector $\alpha$ is $\frac{1}{(p-1)^2}\left|\sum_{x,y\in\mathbb{Z}^*_p}\omega^{<\alpha,v_{x,y}>}\right|$. We can group the summands by the exponent of $\omega$ to get $\frac{1}{(p-1)^2}\left|\sum_{a\in\mathbb{Z}_p}n_{a,\alpha}\omega^{a}\right|$, and the key observation is that massive cancellation happens among the terms with a factor of $\omega^a$ for $a\neq 0$ contribute nothing to the bias. Indeed, for a fixed nonzero exponent $a$ and nonzero test vector $\alpha$, for each $x$ for which $\sum_i\alpha_ix^i\neq 0$ there is exactly one $y$ such that $ = a$, so $n_{a,\alpha} = n_{a',\alpha}$. For $a = 0$, $\sum_i\alpha_ix^i$ has at most $n-1$ zeroes so that $n_{0,\alpha}\le (n-1)(p-1)^2$. So because $\sum^{p-1}_{i=1}\omega^i = -1$, our sum reduces to $n_{0,\alpha}-n_{a,\alpha}$ for any $a\neq 0$, and this is at most $(n-1)(p-1)$, concluding the proof.

In a later blog post, we will show how this simple construction can be used to derandomize Mansour’s algorithm for interpolating sparse polynomials (1992).

Unfortunately, this construction is useless unless $n, and in particular, it’s useless when $p=2$, the case that initial study of bias focused on. That said, we can slightly tweak the argument by replacing $p$ with $2^m+1$ and $v_{x,y}$ with $(, ,...,$, where we’ve naturally identified $\mathbb{Z}^*_p$ with the vector space $\{0,1\}^m$ to obtain an inner product, getting one of AGHP’s original constructions:

Theorem 2: There is an $\frac{n-1}{2^m}$-biased set of size $2^{2m}$.

The next construction by AGHP is motivated by linear feedback shift register sequences (LFSRs) and likewies gets $\epsilon$-bias out of a set of size roughly $(n/\epsilon)^2$:

LFSRs and another construction by AGHP

Our analysis of the bias in the following construction follows from two facts about polynomials: 1) the number of irreducible monic polynomials of degree $m$ is $(1+O(2^{-m/2}))\frac{2^m}{m}\approx\frac{2^m}{m}$, and 2) (more obviously) there are at most $\frac{n-1}{m}$ irreducible monic polynomials of degree $m$ which divide a polynomial of degree $n-1$.

Now consider the following construction inspired by LFSRs and more fundamentally by polynomial long division:

Definition 3: For start sequence $s\in\{0,1\}^m$ and feedback rule $f\in\{0,1\}^m$, the \emph{shift register sequence generated by $f$ and $s$} is the bitstring $r\in\{0,1\}^n$ where $r_i = s_i$ for $0\le i and $r_i = \sum^{m-1}_{j=0}f_jr_{i-m+j}$ for $i\ge m$.

The point is that if we think of the $s_i$‘s as monomials $t^i$ in some indeterminate $t$ so that the $r_i$‘s as polynomials in $t$, then $r_i$ as a polynomial in $t$ is precisely the remainder of $t^i$ when divided by $f(t) = t^m + \sum^{m-1}_{j=0}f_jt^j$.

Given this view, we will only concern ourselves with feedback rules such that $f(t)$ is irreducible. These are called \emph{non-degenerate feedback rules}.

Theorem 3: The set $A^{2m}_n$ (of size $\frac{2^{2m}}{m}(1+O(2^{-m/2}))$ by fact 1 above) of all shift register sequences generated by any starting sequence $s\in\{0,1\}^m$ and nondegenerate feedback rule $f\in\{0,1\}^m$ is an $\frac{n-1}{2^m}(1+O(2^{-m/2}))$-biased set.

Proof: By our discussion above, $<\alpha,r> = \sum^{n-1}_{i=0}\alpha_ir_i$ is a linear combination of remainders with respect to $f(t)$, with coefficients all zero iff $\sum^{n-1}_{i=0}\alpha_it^i\equiv 0\pmod{f(t)}$. If $<\alpha,r>$ is a linear combination with coefficients not all zero, it’s some nontrivial linear combination of all $s_i$ and thus has nonzero bias. So the bias is precisely the probability that a randomly chosen irreducible monic polynomial $f(t)$ of degree $m$ divides $\sum^{n-1}_{i=0}\alpha_it^i$. By facts 1) and 2) above, we’re done.

There is one more $\epsilon$-biased set of comparable size due to AGHP that we will present in our next blog post about using the “randomness” of quadratic characters via Weil’s character sum estimate. For now, we see how to improve linear dependence on $n$ at the cost of worse (but still polynomial) dependence on $1/\epsilon$.

Naor/Naor’s Construction

The argument here is much more involved and takes place in two steps: 1) construct a polynomial-sized family $\mathcal{F}$ of bitvectors such that for any test vector $\alpha$, the probability that $x\leftarrow\mathcal{F}$ is a distinguisher is at least some constant $\beta$, 2) using a walk on an expander, sample enough vectors $r_1,...,r_l$ from $\mathcal{F}$ such that for any $\alpha$, with probability at most $\epsilon$ all of the $<\alpha,r_i>$ are 0.

Proposition 1: The vectors $r_1,...,r_l$ generated by our expander walk span an $\epsilon$-biased set.

Proof: It suffices to show that when there is some distinguisher $r_j$ of $\alpha$, then for a random subset $S\subseteq[l]$, $\sum_{i\in S} r_i$ is a distinguisher with probability 1/2. If $\sum_{S-[j]}r_i$ is a distinguisher, then with probability 1/2, $j$ is already not in $S$. If $\sum_{S-[j]}r_i$ is not a distinguisher, then with probability 1/2, $j$ is in $S$, and it is easy to check that the sum of a distinguisher and a non-distinguisher is a distinguisher, so we’re done.

We next show that we can generate such a collection of vectors $r_1,...,r_l$ using a random walk on an expander. From step 1), we have a probability $\beta$ of picking a distinguisher from $\mathcal{F}$, and we’d like to reduce the probability of failing to draw a distinguisher to $\epsilon$ over $l$ steps. See \url{https://formalreasons.wordpress.com/2013/12/25/random-efficient-error-reduction-part-2-expander-graphs/} for our discussion from a while back about how to do this with an expander walk. The only subtelty in applying this method is that $\beta$ is a bit too small, so we should let each vertex of our expander correspond to a set of $h$ assignments for $h$ sufficiently large. By our discussion about error reduction with expanders, we need $\log|\mathcal{F}| + \log d\cdot O(\log(1/\epsilon))$ random bits, where $d$ is the degree of the expander.

Finally, we need to show that we can construct $\mathcal{F}$. There is a very nice way to do this using Justesen codes, but here we present an argument that does not depend on the fact that we’re working with $\mathbb{Z}_2$.

At a high level, we will split the possible test vectors $\alpha$ into buckets based on the number of nonzero coordinates, sample with high probability a distinguisher for each scenario, and add a random subcollection of these distinguishers together so that with high probability what we end up with distinguishes the actual test vector $\alpha$. Throughout, we’d like to use $k$-wise rather than complete independence to save on randomness.

Our buckets $B_i$ will be the set of $\alpha$ for which the number $l$ of nonzero coordinates is in $[2^{i-1},2^i]$. Fix such a bucket $B_i$ and say that we know our test vector comes from that bucket. For convenience, denote $2^{i-1}$ by $k$. We would like to somehow remove all but a constant $c$ nonzero coordinates from $\alpha$ and then generate our distinguisher $c$-wise independently to obtain a distinguisher with probability $1/2$. Because we don’t actually know our adversary’s test vector, however, we achieve the same effect by approximating the removal from a $c$-wise independently generated vector.

Specifically, using $O(\log n)$ pick $u\in\{0,1\}^n$ a random bitstring of such that the entries are individually purely random but are jointly $c$-wise independent, and let $w\in\{0,1\}^n$ be the “mask” that approximates the removal procedure, a bitstring whose entries are jointly pairse independent but such that $\Pr(w_i = 1) = 2/k$ for each $1\le i\le n$.

Proposition 2:  Define the bitstring $r$ by $r_i = (u_i = 1 \wedge w_i = 1)$. $r$ is a distinguisher of $\alpha$ with probability 1/4.

Proof: This is equivalent to saying that $\alpha'$ defined by $\alpha'_i = (\alpha_i = 1 \wedge w_i = 1)$ is distinguished by $u$ with probability 1/4. If $h$ is the random variable corresponding to the number of nonzero coordinates in $\alpha'$, it suffices to show that for some constant $c$ chosen at the beginning, $\Pr(0 is high. By pairwise independence of the $w_i$, $E[h] = pl = 2$ and $Var[h] = p(1-p)l$, and a bit of calculation using Chebyshev’s gives that for $c = 2$, our $h$ falls in the desired range with probability at least 1/2, and we’re done.

We generate one such $r$ for each of the $\log n$ buckets, using the same random bits to generate the $w$‘s for each bucket. We finish the proof by noting that by the same argument of Proposition 1, a sum of a random subcollection of these $r$‘s is a distinguisher of $\alpha$ with probability at least $(1/4)\cdot(1/2) = 1/8$.

In total, we thus require $O(\log n + \log(1/\epsilon))$ random bits, giving an $\epsilon$-biased set of size $\frac{n}{\text{poly}(\epsilon)}$.

Conclusion

So far, we’ve seen that we can achieve small bias using polynomials, $k$-wise independence, linear codes, and expander graphs. In the next section, we’ll look at exploiting the randomness of quadratic characters and examine a third construction by AGHP using this, as well as an extension, due to Azar/Motwani/Naor, of this character approach to $\mathbb{Z}^n_p$.