Random-Efficient Error Reduction Part 1- Preliminaries

The following series of posts will document things that I’ve learned from a neat survey on pseudorandomness by Prof. Salil Vadhan at Harvard. They’re mostly to help me digest the material, but I hope they’ll be of interest to anyone reading this blog.

Specifically, this first set of two posts will be about the power of expander graphs in randomness-efficient error reduction in randomized algorithms; this first post will present two naive approaches to error reduction. We begin with the following definition.

Definition 1: A language L lies in class BPP (bounded-error probabilistic polynomial-time) if there exists a probabilistic polynomial-time algorithm A such that the probability that A accepts an element x is 1/2+1/p(n) if x\in L and is 1/2-1/p(n) if x\not\in L, where p is some polynomial in the length n of the input. Stated another way, BPP is the class of languages with probabilistic polynomial-time algorithms with two-sided error at most 1/2-1/p(n).

Intuitively, taking multiple repetitions should push the error in any such A arbitrarily low, thanks to the following standard result from probability theory:

Result 1 ((A) Chernoff Bound): Let X_1,...,X_n be independent random variables from [0,1]. Let X be their average (\sum_i X_i)/n, and let \mu be the expected value of X. Then P(|X-\mu|\ge\epsilon)\le 2e^{-n\epsilon^2/4}, that is, the probability of the average differing from its expected value by some quantity \epsilon falls exponentially in the number of repetitions n.

For an algorithm A with error at most 1/2-1/p(n), consider the corresponding algorithm A' which runs A for t(n) repetitions. Let X be the average of the t(n) indicator variables X_i which each take on a value of 1 if the algorithm outputs the right answer on the ith repetition and 0 otherwise. By the Chernoff bound above for \epsilon = 1/p(n), we get that P(X\le 1/2)\le 2e^{-t(n)/4p(n)^2}. Taking t(n) to be 4kp(n)^2, we get that the probability is bounded above by 2^{-k} as long as n is large enough. Here, reducing error to 2^{-k} comes at a cost of k repetitions and, if our algorithm A requires m random bits, A' requires km random bits, because our repetitions are completely independent.

The second approach to error-efficient makes a tradeoff between time and randomness by loosening the requirement of strict independence into one of pairwise independence.

Observation 1: If X_1,...,X_n are independent random bits, then for each nonempty set of indices S, let R_S denote the XOR of all X_i where i\in S. These 2^k-1 random variables are pairwise independent. In other words, it takes only O(\log N) independent random bits to generate O(N) pairwise independent bits.

Definition 2: A family of (hash) functions \{f: [N]\to [M]\}, where [k] denotes the set \{0,...,k-1\}, is said to be pairwise independent if for all distinct x_1,x_2\in [N], distinct y_1,y_2\in [M], and F in our family of functions, P(F(x_1)=y_1,F(x_2)=y_2) = \frac{1}{M^2}.

Result 2: For all m,n, we can construct a family of pairwise independent hash functions \{0,1\}^m\to\{0,1\}^n where we can pick a random function from this family using \max(m,n)+m independent random bits.

Proof: Indeed, for any finite field \mathbb{F}_q, we can construct a family of pairwise independent hash functions \{h_{a,b}\} given by h_{a,b}(x) = ax+b.

By taking q to be 2^n, we can construct a family \mathcal{F}_{n,n} of pairwise independent hash functions \{f: \{0,1\}^n\to \{0,1\}^n\}. Choosing a function from this family requires n random bits for both of a and b.

To construct a family \mathcal{F}_{m,n} of hash functions where input length exceeds output length m>n, restrict the output of each function from \mathcal{F}_{m,m} to the first $n$ bits; this requires 2m random random bits for the family \mathcal{F}_{m,m}. In fact, we can do better: rather than restricting the output of functions f\in\mathcal{F}_{m,m}, i.e. of the form ax+b for a,b\in\mathbb{F}_{2^m}, restrict functions of the form ax+b' where a\in\mathbb{F}_{2^m} and b\in\mathbb{F}_{2^n}. This requires only m+n random bits.

To construct \mathcal{F}_{m,n} where m<n, pad the input with m-n extra zeroes so that for each function f\in\mathcal{F}_{n,n}, we can define a function f'(x) = f(x\oplus 0^{m-n}; this requires 2n random bits for the family \mathcal{F}_{n,n}, completing the proof.

Before we discuss how this means we require fewer purely independent random bits, we use the following pairwise (and thus slightly weaker) equivalent of the Chernoff bound, which follows from a straightforward application of Chebyshev’s, to find the time price we need to pay for using fewer random bits.

Result 3: Let X_1,...,X_n be pairwise independent random variables from [0,1]. Let X be their average (\sum_i X_i)/n, and let \mu be the expected value of X. Then P(|X-\mu|\ge\epsilon)\le\frac{1}{n\epsilon^2}, that is, the probability of the average differing from its expected value by some quantity \epsilon falls linearly with respect to the number of repetitions n.

What do we get out of all this with regards to error reduction? If we already have a BPP algorithm that requires k repetitions with m random bits per repetition to reduce error to 2^{-k}, we can get out another BPP algorithm in which we first select a function from \mathcal{F}_{k,m} using O(m+k) and then, because Result 3 tells us we now need 2^k repetitions as opposed to k, run this hash function over all inputs from \{0,1\}^k to achieve the same amount of error reduction.

In other words, while we’ve reduced the number of random bits needed from O(km) in the case of purely independent repetitions, to O(m+k) in the case of pairwise independent repetitions, we’ve bumped the number of repetitions needed from O(k) to O(2^k).

In the next post in this series, we’ll discuss how expander graphs do away with this tradeoff by taking time comparable to independent repetitions and randomness comparable to pairwise independent repetitions.

Advertisements

2 comments

  1. Pingback: PRGs and Hardness Amplification Part 2- Local Decoding « For Formal Reasons

  2. Pingback: Random-Efficient Error Reduction Part 2- Expander Graphs « For Formal Reasons


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s