Random-Efficient Error Reduction Part 1- Preliminaries

The following series of posts will document things that I’ve learned from a neat survey on pseudorandomness by Prof. Salil Vadhan at Harvard. They’re mostly to help me digest the material, but I hope they’ll be of interest to anyone reading this blog.

Specifically, this first set of two posts will be about the power of expander graphs in randomness-efficient error reduction in randomized algorithms; this first post will present two naive approaches to error reduction. We begin with the following definition.

Definition 1: A language $L$ lies in class BPP (bounded-error probabilistic polynomial-time) if there exists a probabilistic polynomial-time algorithm $A$ such that the probability that $A$ accepts an element x is $1/2+1/p(n)$ if $x\in L$ and is $1/2-1/p(n)$ if $x\not\in L$, where $p$ is some polynomial in the length $n$ of the input. Stated another way, BPP is the class of languages with probabilistic polynomial-time algorithms with two-sided error at most $1/2-1/p(n)$.

Intuitively, taking multiple repetitions should push the error in any such $A$ arbitrarily low, thanks to the following standard result from probability theory:

Result 1 ((A) Chernoff Bound): Let $X_1,...,X_n$ be independent random variables from $[0,1]$. Let $X$ be their average $(\sum_i X_i)/n$, and let $\mu$ be the expected value of $X$. Then $P(|X-\mu|\ge\epsilon)\le 2e^{-n\epsilon^2/4}$, that is, the probability of the average differing from its expected value by some quantity $\epsilon$ falls exponentially in the number of repetitions $n$.

For an algorithm $A$ with error at most $1/2-1/p(n)$, consider the corresponding algorithm $A'$ which runs $A$ for $t(n)$ repetitions. Let $X$ be the average of the $t(n)$ indicator variables $X_i$ which each take on a value of 1 if the algorithm outputs the right answer on the $i$th repetition and 0 otherwise. By the Chernoff bound above for $\epsilon = 1/p(n)$, we get that $P(X\le 1/2)\le 2e^{-t(n)/4p(n)^2}$. Taking $t(n)$ to be $4kp(n)^2$, we get that the probability is bounded above by $2^{-k}$ as long as $n$ is large enough. Here, reducing error to $2^{-k}$ comes at a cost of k repetitions and, if our algorithm $A$ requires $m$ random bits, $A'$ requires $km$ random bits, because our repetitions are completely independent.

The second approach to error-efficient makes a tradeoff between time and randomness by loosening the requirement of strict independence into one of pairwise independence.

Observation 1: If $X_1,...,X_n$ are independent random bits, then for each nonempty set of indices $S$, let $R_S$ denote the XOR of all $X_i$ where $i\in S$. These $2^k-1$ random variables are pairwise independent. In other words, it takes only $O(\log N)$ independent random bits to generate $O(N)$ pairwise independent bits.

Definition 2: A family of (hash) functions $\{f: [N]\to [M]\}$, where $[k]$ denotes the set $\{0,...,k-1\}$, is said to be pairwise independent if for all distinct $x_1,x_2\in [N]$, distinct $y_1,y_2\in [M]$, and $F$ in our family of functions, $P(F(x_1)=y_1,F(x_2)=y_2) = \frac{1}{M^2}$.

Result 2: For all $m,n$, we can construct a family of pairwise independent hash functions $\{0,1\}^m\to\{0,1\}^n$ where we can pick a random function from this family using $\max(m,n)+m$ independent random bits.

Proof: Indeed, for any finite field $\mathbb{F}_q$, we can construct a family of pairwise independent hash functions $\{h_{a,b}\}$ given by $h_{a,b}(x) = ax+b$.

By taking $q$ to be $2^n$, we can construct a family $\mathcal{F}_{n,n}$ of pairwise independent hash functions $\{f: \{0,1\}^n\to \{0,1\}^n\}$. Choosing a function from this family requires $n$ random bits for both of $a$ and $b$.

To construct a family $\mathcal{F}_{m,n}$ of hash functions where input length exceeds output length $m>n$, restrict the output of each function from $\mathcal{F}_{m,m}$ to the first $n$ bits; this requires $2m$ random random bits for the family $\mathcal{F}_{m,m}$. In fact, we can do better: rather than restricting the output of functions $f\in\mathcal{F}_{m,m}$, i.e. of the form $ax+b$ for $a,b\in\mathbb{F}_{2^m}$, restrict functions of the form $ax+b'$ where $a\in\mathbb{F}_{2^m}$ and $b\in\mathbb{F}_{2^n}$. This requires only $m+n$ random bits.

To construct $\mathcal{F}_{m,n}$ where $m, pad the input with $m-n$ extra zeroes so that for each function $f\in\mathcal{F}_{n,n}$, we can define a function $f'(x) = f(x\oplus 0^{m-n}$; this requires $2n$ random bits for the family $\mathcal{F}_{n,n}$, completing the proof.

Before we discuss how this means we require fewer purely independent random bits, we use the following pairwise (and thus slightly weaker) equivalent of the Chernoff bound, which follows from a straightforward application of Chebyshev’s, to find the time price we need to pay for using fewer random bits.

Result 3: Let $X_1,...,X_n$ be pairwise independent random variables from $[0,1]$. Let $X$ be their average $(\sum_i X_i)/n$, and let $\mu$ be the expected value of $X$. Then $P(|X-\mu|\ge\epsilon)\le\frac{1}{n\epsilon^2}$, that is, the probability of the average differing from its expected value by some quantity $\epsilon$ falls linearly with respect to the number of repetitions $n$.

What do we get out of all this with regards to error reduction? If we already have a BPP algorithm that requires $k$ repetitions with $m$ random bits per repetition to reduce error to $2^{-k}$, we can get out another BPP algorithm in which we first select a function from $\mathcal{F}_{k,m}$ using $O(m+k)$ and then, because Result 3 tells us we now need $2^k$ repetitions as opposed to $k$, run this hash function over all inputs from $\{0,1\}^k$ to achieve the same amount of error reduction.

In other words, while we’ve reduced the number of random bits needed from $O(km)$ in the case of purely independent repetitions, to $O(m+k)$ in the case of pairwise independent repetitions, we’ve bumped the number of repetitions needed from $O(k)$ to $O(2^k)$.

In the next post in this series, we’ll discuss how expander graphs do away with this tradeoff by taking time comparable to independent repetitions and randomness comparable to pairwise independent repetitions.