Randomness Extractors and Relationships Between Pseudorandom Objects

“Everything is vitally connected, not merely artificially related, so that when I see anything, I am seeing everything. When I speak to anyone, I am speaking to everybody. When I touch anything, I touch all things, and when I know one thing, I know everything.” – Brihadaranyaka Upanishad

In this post, I’ll cover some of what I think are the coolest ideas in Vadhan’s survey, the principles unifying the main pseudorandom objects studied. We will continue where we left off in the last post by illustrating a connection between list encoding expanders. We then introduce the concept of randomness extractors and relate these to hash functions, expanders, and list encoding.

NOTE: This post only touches the surface on what Vadhan covers here.

List Encoding and Expanders

For $\Gamma: [M] \times[D]\to[N]$, a target subset $T\subset [N]$, and probability parameter $\epsilon>0$, define $\ell_{\Gamma}(T,\epsilon)$ to be the set of $x$ for which $P_y(\Gamma(x,y)\in T)>\epsilon$. Likewise, for a “filtering” function $f: [2^n]\to\{0,1\}$, define $\ell_{\Gamma}(f,\epsilon)$ to be the set of $x$ for which $E_y(\Gamma(x,y))>\epsilon$. Taking $f$ to be the characteristic function on $T$, we get $\ell_{\Gamma}(T,\epsilon)$.

As a sanity check, taking $\Gamma$ for a given encoding $E: [M]\to [q]^D$ to be given by $\Gamma(x,y) = (y,E(x)_y)$, where $E(x)_y$ denotes the bits of $E(x)$ given by $y$ and where we view $[q]\times[D]$ as $[N] = [qD]$, we readily get out another definition for list-decodability:

Result 1: For an $r\in [N]^D$, define $T_r$ to be the set of all pairs $(y,r_y)$ for $y\in[D]$. Then the encoding $E$ is $(1-\epsilon,K)$-list decodable iff $|\ell_{\Gamma}(T_r,\epsilon)|\le K$ for all $r$.

Now taking $\Gamma$ to be the neighbor function for a $D$-regular bipartite graph $G$ with $M$ left vertices and $N$ right vertices, i.e. so that $\Gamma(x,y)$ is the $y$th neighbor of left-vertex $x$, we readily get another definition of vertex expansion in the case that only sets of exactly $K$ vertices expand- we will refer to such graphs as $(=K,A)$-vertex expanders:

Result 2:  $G$ with neighbor function $\Gamma$ is an $(=K,A)$-vertex expander iff for every set $T$ of fewer than $KA$ right-vertices, $|\ell_{\Gamma}(T,1)|.

With these connections in mind, we proceed to prove that we can get a very good expander, albeit with nonconstant degree, out of a Parvaresh-Vardy code.

For the $q$-ary Parvaresh-Vardy code of degree $n-1$, redundancy $m$, irreducible $E(y)$, and power $h$, consider the $q$-regular bipartite graph $G$ with $q^m$ left vertices indexed by $\mathbb{F}^m_q$ and $q^{n+1}$ right vertices indexed by $\mathbb{F}_q\times\mathbb{F}^m_q$, with neighbor function given by $\Gamma(f,y) = (y,f(y),f^{h}(y),...,f^{h^{m-1}}(y))\pmod {E(y)}$.

Result 3:  $G$ is an $(h^n, q - mhn)$-expander.

Proof: We will tune the parameters for maximum set size $K$ and expansion $A$ as we go. We will first prove that $G$ is an $(=h^n,q-mhn)$ expander. For this, result 7 tells us that we want to show that $|\ell_{\Gamma}(T,1)| for all $T\subset\mathbb{F}_q\times\mathbb{F}^n_q$ of cardinality less than $AK$.

The gist of the proof is essentially the same as our proofs of efficient list-decodability: we will find a “characteristic” polynomial that all elements of $T$ solve, then observe that the roots of $Y$ include all elements of $\ell_{\Gamma}(T,1)$. Once we have shown that $G$ is an $(=h^n,q-mhn)$ expander, we can tweak our proof to get expansion for all sets of cardinality less than $AK$ as well.

Take polynomial $Q(Y,Z_0,...,Z_{n-1})$ of degree less than $A$ in $Y$ and less than $h$ in each of the other variables. We want $Q$ to be solved by all elements in $T$, so as long as the number of unknowns $Ah^n$ exceeds the number of constraints $|T|$. So it’ll be convenient to take $K = h^n$. As usual, we’ll assume that $Q$ has no factors of $E(Y)$.

Now we want that for every $f\in\ell_{\Gamma}(T,1)$ that $Q(Y,f(Y),f^h(Y),...,f^{h^{n-1}}(Y))\pmod {E(Y)} = 0$, but this is already true for all choices of $Y\in\mathbb{F}_q$. In order for this univariate polynomial in $Y$ to be the zero polynomial, we’d like that the number $q$ of these roots $y\in\mathbb{F}_q$ exceeds the degree $A - 1 + nmh$ of the polynomial, so let’s take $A$ to be $q - nmh$. With this in mind, we can simply factorize the polynomial $Q^*(Z) = Q(Y,Z,Z^h,...,Z^{h^{n-1}})\pmod{E(Y)}$ to get our elements $f\in\ell_{\Gamma}(T,1)$. In particular, the number of such elements is bounded above by the degree of $Q^*$ in $Z$, which is $(h-1)\frac{h^n-1}{h-1} = h^n-1 = K-1 < K$ as desired.

We can easily tweak the above argument to work for smaller values of $K$ than $h^n$: build our $Q$ out of monomials $Y^ip_j(Z_0,...,Z_{n-1})$ where $0\le i and $0\le j (instead of $0\le j, where $p_j(Z_0,...,Z_{n-1})$ is the monomial of degree $d_k$ in $Z_k$ so that $\sum_kh^kd_k = j$. By construction, our bound $AK > |T|$ for the existence of such a $Q$ still holds, and the degree of $Q^*$ that bounds the cardinality of $\ell_{\Gamma}(T,1)$ is still strictly bounded above by $K$.

By picking our parameters $n$, $m$, $h$, $q = D$ carefully, we get a remarkably good expander.

Result 4: For all $M$, maximum subset size $K\le M$, $\epsilon>0$, and tradeoff parameter $\alpha>0$, there exists a fully explicit $(K,(1-\epsilon)D$, expander with $M$ left-vertices, left-degree $D = O((\log M)(\log K)/\epsilon)^{1+1/\alpha}$, and $N\le D^2K^{1+\alpha}$ right vertices.

Proof: Because the number of left vertices is $q^m$, we take $m$ to be $\log_q M$. The number of right-vertices is $q^{n+1} = q^2(q^{n-1})$. We’d like $q^{n-1}$ to be close to $K^{1+\alpha}$ where $K\le h^n$. Take $q$ to be something (we’ll specify this later) in $(h^{1+\alpha}/2, h^{1+\alpha}]$, and take $n$ to thus be $\lceil\log_hK\rceil$, and this will give us the desired upper bound on $N$. To get the desired expansion factor, we must show that $A = q-nmh\ge (1-\epsilon)D$. But $n\le \log_qK$ and $q\le h^{1+\alpha}/2$, so taking $h = \lceil(2m\log_qK/\epsilon)^{1/\alpha}\rceil$ gives the desired lower bound on expansion. By this choice of $h$, we also get the desired asymptotic growth for left-degree $D$.

What remains is to choose an appropriate $q$. We want an explicit expander, so taking $q$ to be a power of 2 allows us to describe the field $\mathbb{F}_q$ we’re working with in time polynomial in $\log N$ and $\log D$, completing the proof.

Note that the $\alpha$ parameter gives a tradeoff between the size of the right half of the graph and the size of its left-degree. More importantly, we get arbitrarily-close-to-optimal expansion at the cost of polylogarithmic degree.

Randomness Extractors – Preliminaries

We now introduce the notion of randomness extractors. Morally, these will be maps which smear “weakly random” variables into variables whose distributions are pretty close to perfectly random. These will be helpful both in running BPP algorithms using weakly random sources of bits, say atmospheric radio noise or the low-order bits of the system clock. Formally, a source on $\{0,1\}^n$ will denote a random variable taking values in $\{0,1\}^n$. We’ll first need a notion of statistical “closeness”:

Definition 1: Two random variables $X,Y$ on $\mathcal{U}$ have statistical difference $\Delta(X,Y)$ given by $\max|P(X\in T) - P(Y\in T)|$, where the maximum is taken over all subsets of $\mathcal{U}$. If $\Delta(X,Y)\le\epsilon$, we say that $X$ and $Y$ are $\epsilon$-close.

It turns out this condition is equivalent to the following:

Definition 1’: $\Delta(X,Y) = \frac{1}{2}|X-Y|$, where $|\cdot|$ here denotes the $\ell_1$ norm.

With this in mind, we can define a simple class of extractors:

Definition 2: A map $E: \{0,1\}^m\to \{0,1\}^n$ is a deterministic $\epsilon$-extractor for a class of sources $\mathcal{C}$ if $E(X)$ is $\epsilon$-close to the uniform distribution $U_m$ over $\{0,1\}^n$ for all sources $X\in\mathcal{C}$.

We will restrict our attention to sources which we know contain at least a certain amount of randomness.

Definition 3: A $k$-source is a random variable $X$ such that $P(X=x)\le 2^{-k}$ for all $x$. In other words, a $k$-source has min-entropy bounded below by $k$.

It turns out that for any $k$-source, random functions are good extractors. To prove this, it suffices to prove these for the “basis elements” of the class of $k$-sources, so we begin with a lemma and a definition.

Definition 4: The flat $k$-source on $S\subset \{0,1\}^m$ for $|S|=2^k$ is the uniform distribution on $S$.

Lemma 1: Every $k$-source on $\{0,1\}^n$ is a linear combination $\sum_i a_iX_i$ of flat $k$-sources $\{X_i\}$, where $\sum_i a_i = 1$.

Proof: The picture is of a circle broken up into $2^n$ half-open intervals $I_t$ of length $P(X=t)$. Pick a collection of $2^k$ points $S$ evenly spaced around the circle. Because $X$ is a $k$-source, each $I_t$ contains at most one point of $S$, and let $T(S)$ denote the set of $t$ such that $I_t$ contains a point. Picking a point from $T(R)$ where $R$ is a randomly chosen rotation $R$ of $S$ gives $t$ with probability $|I_t| = P(X=t)$. Then $X = \sum_{T\subseteq [N]}P(T(R) = T)U_T$.

Then we get the following out of the Chernoff bound given in a previous post.

Result 5: For every $m$, every flat $k$-source $X$, and every error parameter $\epsilon$, a randomly chosen function $\{0,1\}^m\to\{0,1\}^n$ for $m = k - 2\log(1/\epsilon) - O(1)$ is a deterministic $\epsilon$ extractor for $X$ with probability $1 - 2^{-\Omega(2^k\epsilon^2)}$.

Unfortunately, we can’t find extractors that work for all $k$-sources.

Result 6: There are $k$-sources for which there exist no deterministic $\epsilon$-extractors for arbitrarily small $\epsilon$.

Proof: Specifically, even for any map $E:\{0,1\}^n\to\{0,1\}$, we can easily construct an $(n-1)$-source for which $E(X)$ is constant. The image of $E$ hits 0 and/or 1 at least $2^{n-1}$ times, so the uniform distribution on the preimage of this bit is our desired $(n-1)$-source.

Morally, the reason for this is that there are too many flat $k$-sources. To get around can loosen our deterministic definition by introducing a purely random seed:

Definition 5: A function $E: \{0,1\}^m\times\{0,1\}^d\to \{0,1\}^n$ is a $(k,\epsilon)$ extractor of seed $d$ if for any $k$-source $X$ on $\{0,1\}^n$, $E(X,U_d)$ is $\epsilon$-close to $U_m$.

Now we can show (nonconstructively) that there are extractors that work for all $k$-sources, provided we use a long enough random seed.

Result 7: For all $m, k, \epsilon$, there is an $(k,\epsilon)$-extractor of seed $d = \log(n-k)+2\log(1/\epsilon)+O(1)$ with image in $\{0,1\}^m$ for $m = k+d - 2\log(1/\epsilon)-O(1)$.

Proof: Again, we just need to show that there is an extractor that maps all flat $k$-sources $\epsilon$-close to $U_m$. We want to show that the probability that a random extractor fails to do this is below 1. By the union bound and the previous result for deterministic extractors, we get an upper bound on the probability of failure of $\binom{2^n}{2^k}2^{-\Omega(KD\epsilon^2)}$ for $m = k + d - 2\log(1/\epsilon) - O(1)$, and Stirling’s approximation gives an upper bound of  $\left(\frac{2^ne}{2^k}\right)^{2^k}2^{-\Omega(KD\epsilon^2)}$, and tuning the parameters to make this upper bound less than 1 gives the desired bound on seed length.

Having motivated seeded extractors, we now proceed to give a straightforward connection between extractors and expanders and then a result due to Impagliazzo, Levin, and Luby relating extractors to hash functions.

Extractors and Expanders

The signature $\{0,1\}^m\times\{0,1\}^d\to\{0,1\}^n$ of any given seeded extractor $E$ is highly suggestive of a neighbor function for a bipartite $2^d$-regular graph which we will denote by $G_E$, with $2^m$ left vertices and $2^n$ right vertices. The $n$th neighbor of vertex $v$ in $G_E$ is $E(v,n)$.

It turns out that we can translate the condition of being a $(k,\epsilon)$-extractor into something beautifully (and usefully) reminiscent of the Expander Mixing Lemma:

Result 8: $E: \{0,1\}^m\times \{0,1\}^d\to \{0,1\}^n$ is an $(k,\epsilon)$-extractor iff $G_E$ satisfies, for every set of left-vertices $S$ and every set of right-vertices $T$,

$\left|\frac{e(S,T)}{2^{m+d}}-\mu(S)\mu(T)\right|\le\epsilon\mu(S)$.

Proof: The condition for a function to be a $(k,\epsilon)$-extractor is that for every flat $k$-source $S\subset\{0,1\}^m$ and $T\subset\{0,1\}^n$, we have

$\left|P(E(U_S,U_{[2^d]})\in T) - P(U_{[2^m]}\in T)\right|\le\epsilon$.

We can translate $P(E(U_S,U_{[2^d]})\in T)$ into the probability that an edge leaving $S$ in $G_E$ is in the cut from $S$ to $T$, i.e. $\frac{e(S,T)}{2^d|S|}$. Likewise, we can translate $P(U_{[2^n]}\in T)$ to the probability that a right vertex lies in $T$, i.e. $\frac{|T|}{2^n}$, and we get the desired result.

This result tells us we can import results on spectral expanders to constructions of extractors. In particular, by taking the $\lambda\sqrt{\mu(S)\mu(T)}$ on the right-hand side of the Expander Mixing Lemma to be at most $\epsilon\mu(S)$, it suffices to take a spectral expander with expansion $\gamma\ge 1 - \epsilon\sqrt{2^k/2^n}$.

Applying this to a Ramanujan graph, i.e. one of optimal spectral expansion $\gamma\ge 1 - \frac{2\sqrt{2^d-1}}{2^d}$, we find when we solve for $d$ that taking the corresponding extractor gives a $(k,\epsilon)$-extractor with an output of $n = k+d - 2\log(1/\epsilon)-O(1)$.

Extractors and Hash Functions

The connection between hash functions as considered in a previous post about random-efficient error reduction and extractors is equally evocative: an extractor for the flat $k$-sources for set $S$ can be thought of as a hash function that maps $S$ almost uniformly to its image. We have the following result due to Impagliazzo, Levin, and Levin saying that if we use the purely random bits needed to choose a hash function out of a pairwise independent family, we get a good seeded extractor.

Result 9 (Leftover Hash Lemma) The map $E: \{0,1\}^m\times\{0,1\}^d\to\{0,1\}^n\times\{0,1\}^d$ for $n = k - 2\log(1/\epsilon)$ given by $E(x,h) = (h(x),h)$ is a $(k,\epsilon)$-extractor, where $h: \{0,1\}^m\to \{0,1\}^n$ is a hash function taken from the pairwise independent family $\mathcal{H}$.

Proof: Take $X$ to be any $k$-source and $H$ to be the random variable uniformly distributed on the family $\mathcal{H}$. The key is to recall the alternative definition of statistical difference in terms of the $\ell_1$-norm and show that the collision probability, which is closely related to the $\ell_2$-norm, of $(H(X),H)$ is close to $U_n\times U_d$.

Recall that $|(H(X),H)-U_n\times U_d|^2_2 = CP(H(X),H)-\frac{1}{2^{n+d}}$. We have a collision of $(H(X),H)$ when the hash functions agree and when either the $X$’s agree or they disagree but are hashed to the same value. This occurs with probability at most $\frac{1}{2^d}\left(\frac{1}{2^k}+\frac{1}{2^n}\right)\le\frac{1+\epsilon^2}{2^{n+d}}$. So the $\ell_2$-norm of the difference is bounded above by $\sqrt{\frac{\epsilon^2}{2^{n+d}}}$.

Cauchy-Schwarz tells us that the ratio of the $\ell_1$-norm to the $\ell_2$-norm is bounded above by $\sqrt{D}$, where $D$ is the dimension of the underlying vector space, so the statistical difference between $(H(X),H)$ and $U_n\times U_d$ is bounded above by $\frac{\sqrt{2^{n+d}}}{2}\sqrt{\frac{\epsilon^2}{2^{n+d}}} = \frac{\epsilon}{2}$, so we’re done.

The result gets its name from the fact that morally, this result says that if an adversary has acquired a certain number of bits of a secret key of ours, we can still create a key out of the remaining bits of which the adversary has very little knowledge.

List Encoding and Randomness Extractors

Lastly, we place randomness extractors under this unifying framework of list-decodability. Our $\Gamma$ will be our extractor $E: \{0,1\}^m\times\{0,1\}^d\to\{0,1\}^n$.

Result 10a: If $E$ is a $(k,\epsilon)$-extractor, then for all $f: \{0,1\}^n\to [0,1]$, $|\ell_{\Gamma}(f,\mu(f)+\epsilon)|<2^k$.

Proof: Say that $|\ell_{\Gamma}(f,\mu(f)+\epsilon)|\ge 2^k$ and let $X$ be the $k$-source uniformly distributed on $\ell_{\Gamma}(f,\mu(f)+\epsilon)$. Then by construction the expected value of $f(E(X,U_{\{0,1\}^d}))$ is more than $\mu(f)+\epsilon$, implying that $E(X,U_{\{0,1\}^d})$ and $U_{\{0,1\}^n}$ are $\epsilon$-far, a contradiction!

There is a “converse” that involves coarser functions $f$ which only take discrete bit values (for this reason, we will consider the set formulation of $\ell(\cdot,\cdot)$ instead), more entropy, and more error; these turn out to be insignificant for extractors.

Result 10b: If for every $T\subset \{0,1\}^n$ we have that $|\ell_{\Gamma}(f,\mu(f)+\epsilon)|\le 2^k$, then $E$ is a $(k+\log(1/\epsilon),2\epsilon)$.

Proof: $E(X,U_{\{0,1\}^d}$ lies in $T$ either if $X$ falls in $\ell(T,\mu(T)+\epsilon)$ or if it doesn’t and $E(X,U_{\{0,1\}^d}\in T$ anyways, where $X$ is any $(k+\log(1/\epsilon))$-source. Some calculation tells us that $E(X,U_{\{0,1\}^d}$ and $U_{\{0,1\}^n}$ are indeed $2\epsilon$-close.

We conclude with an explicit connection between list encoding and extractors. The extractor $E: [2^m]\times[2^d]\to [2^n]$ corresponding to an encoding $e: [2^m]\to [2^n]^d$ is $E(x,y) = (x,e(x)_y)$. One direction of correspondence we get just out of the parallel between the list-decoding formulation of list-decodability and randomness extractors:

Result 11a: If $E$ is a $(k,\epsilon)$-extractor, then $e$ is $(1 - 1/2^m - \epsilon, 2^k)$-list decodable.

There is again a slightly worse converse which says that in the other direction, the error blows up unless the alphabet of our code is small.

Result 11b: If $e$ is $(1-1/2^m-\epsilon,2^k)$-list decodable, then $E$ is a $(k+\log(1/\epsilon), 2^m\epsilon)$-extractor.

Proof: Take $X$ a $(k+\log(1/\epsilon))$-source and $Y$ the uniform distribution over $\{0,1\}^d$. We want to show that $(Y,E(X,Y))$ and $Y\times U_{\{0,1\}^m}$ are $M\epsilon$-close. The statistical difference is equal to the expected value of $\Delta(e(X)_y, U_{\{0,1\}^m}$ taken over all $y$ from $Y$, and the $\ell_1$ definition of statistical difference tell us this is bounded above by $\frac{M}{2}E[\max_z P(e(X)_y = z) - 1/2^m]$, and by list decodability, defining $T_r$ to be the set of $(y,r_y)$ where $r_y$ is the choice $z$ that maximizes  the probability in this expression, we find the statistical difference is bounded above by $\frac{M}{2}P(X\in\ell_{\Gamma}(T_r,1/M+\epsilon))+\epsilon)$. We can split this probability up into the cases that $X$ lies or does not lie in $\ell_{\Gamma}(T_r,1/M+\epsilon)$, and then a bit of computation gives us the desired $M\epsilon$-closeness.