# PRGs and Hardness Amplification Part 1- Preliminaries

In this series, we’ll be exploring some of the material from the last chapter of Vadhan’s survey, in particular results of amplifying the hardness of computing functions and obtaining pseudorandom generators out of hard functions. In this relatively short post, by providing preliminary definitions and a motivation related to derandomization. We then close with a glimpse of our first step in hardness amplification.

Preliminaries

Roughly speaking, generators are deterministic functions that stretch uniformly random seeds into bits that appear random. It turns out that our methods of quantifying “randomness” using both min-entropy $H_{\infty}$ and statistical difference from the uniform distribution are insufficient, because $H_{\infty}(f(X))\le H_{\infty}(X)$ for any random variable $X$ and deterministic $f$. In place of statistical or information theoretic measures, we thus use a complexity theoretical one: a variable will seem random if it is impossible for a so-called nonuniform algorithm running within a certain time limit to tell it apart from a uniformly random variable. We first define nonuniformity:

Definition 0: An algorithm $T$ deciding a language $L$ is $a$-nonuniform for $a: \mathbb{N}\to\mathbb{N}$ if there exist advice bitstrings $\{\alpha_k\}_{k\in\mathbb{N}}\in\{0,1\}^*$ each of length at most $a(k)$ such that there exists a language $L'$ decided by some algorithm $T'$ where $A(x)=1$ iff $A'(x,\alpha_{|x|})$.

We will not make use of this definition in its full generality; it will suffice to talk about algorithms which operate with some “advice” hardwired in, and the length of this advice will be reasonable enough that we won’t bother with the data of the length function $a$. We can now formalize our abovementioned ideas of “randomness” and pseudorandom generators.

Definition 1: Random variables $X$ and $Y$ are (nonuniformly) $(t,\epsilon)$ computationally indistinguishable if for every nonuniform algorithm $T$ running in time at most $t$, $|P(T(X)=1) - P(T(Y)=1)|\le\epsilon$.

Definition 2: A $(t,\epsilon)$-pseudorandom generator is a deterministic function $G: \{0,1\}^d\to\{0,1\}^m$ such that $G(U_d)$ and $U_m$ are $(t,\epsilon)$-computationally indistinguishable.

Derandomizing BPP with Generators

It turns out that if we can find such a generator, we can effectively derandomize BPP. We will refer to generators as “computable in $t(m)$” if we can determine the number of random bits needed to stretch to $m$ bits and run a uniform deterministic algorithm to calculate the generator’s output both in time $t(m)$. Later on, we will refer to generators $G: \{0,1\}^{d(m)}\to\{0,1\}^m$ as mildly explicit if they are computable in time polynomial in $m$ and $2^{d(m)}$.

Result 1: If for every $m$ there exists a $(m,1/8)$ pseudorandom generator $G_m: \{0,1\}^{d(m)}\to\{0,1\}^m$ computable in time $t(m)$, then BPP is strictly contained in $\cup_c$ DTIME$(2^{d(n^c)}\cdot (n^c+t(n^c))$.

Proof: Take a BPP algorithm $A(x;r)$ for input $x$ and coin tosses $r$, which we will take WLOG to be the circuit size $n^c$ simulating $A$ for some constant $c$. To decide for $x$, we can just write down $A(x,G(r))$ for all $2^{d(n^c)}$ sets of coin tosses $r$ and take the majority decision to be the output of our derandomized algorithm, and this takes $2^{d(n^c)}(t(n^c)+n^c)$ as desired. To show that $A$ decides correctly more than half the time for every input $x$, it suffices to observe that if it does not for some $x$, the probability that $A(x,G(U_{d(n^c)})$ outputs 1 exceeds $A(x,U_{n^c})$ by at least $1/2-1/3>1/8$ (the 1/3 comes from our definition of the error bound for BPP algorithms), contradicting the pseudorandomness of $G$.

Tantalizingly, by substituting in asymptotics, this statement tells us that if we can find a mildly explicit $(m,1/8)$-generator with seed logarithmic in output length, we can derandomize all of BPP into P! We will proceed beginning with the weakest possible assumptions about generators, and our final result will tell us either that we can indeed derandomize all of BPP into a subset E=DTIME$(2^{O(n)})$ of the DTIME class given above, or at least guarantee the existence of significantly faster algorithms for SAT and other common NP-complete problems!

To do this, we reduce the problem of finding mildly explicit generators to finding a function in that is “hard on average to compute,” then use hardness amplification to show that it suffices to find a function in E that is only “hard to compute” in the worst case. We first define both notions of hardness:

Definitions 3,4: A Boolean function $f: \{0,1\}^{\ell}\to\{0,1\}$ is $(s,\delta)$-average-case hard if for every nonuniform probabilistic algorithm $A$ running within time $s$, $P_{x,r}(A(x;r) = f(x))\le 1-\delta$. On the other hand, $f$ is $s$-worst-case hard if for all such algorithms $A$, there exists some input $x$ such that $P_r(A(x;r) = f(x)) \le 2/3$.

We first show how to get a good generator out of an average-case hard function in $E$, we first introduce the notion of next-bit unpredictability and its connection to generators:

Definition 5: A random variable $X$ on $\{0,1\}^m$ is $(t,\epsilon)$-next-bit-unpredictable if for every $1\le i and every nonuniform probabilistic algorithm (“predictor”) $A$ running within time $t$, $P(A(X_1\cdots X_i;r))=X_{i+1})\le \frac{1}{2}+\epsilon$, where $X_i$ denotes the $i$th bit of the value that $X$ takes on and where the probability is taken over all of $X$ and coin tosses $r$

Result 2: If $X$ on $\{0,1\}^m$ is $(t,\epsilon)$-next-bit-unpredictable, it is $(t,m\epsilon)$-pseudorandom.

Proof: Assume to the contrary that $X$ is not $(t,m\epsilon)$-pseudorandom so that there exists some nonuniform algorithm $A$ in time $t$ that distinguishes between $X$ and the uniform distribution $U$ on $\{0,1\}^m$ with advantage greater than $\epsilon$. The key is that this advantage allows $A$ to predict future bits better than a random coin toss. What follows is a so-called “hybrid argument”: define $H_i$ to be the bitstring-valued random variable $X_1\cdots X_iU_{i+1}\cdots U_m$, where $U$ is the uniform distribution on $\{0,1\}^m$. Because $\sum^m_{i=1}P(A(H_i)=1)-P(A(H_{i-1})=1) = P(A(X)=1)-P(A(U)=1) > m\epsilon$, there is some $i$ such that $P(A(H_i)=1)-P(A(H_{i-1})=1)>\epsilon$. Rewriting $P(A(H_{i-1})=1)$ as

$\frac{1}{2}\left(P(A(X_1\cdots X_{i-1}X_{i}U_{i+1}\cdots U_m)=1) + P(A(X_1\cdots X_{i-1}\bar{X_i}U_{i+1}\cdots U_m)=1)\right)$,

we find that $\frac{1}{2}P(A(X_1\cdots X_{i-1}X_{i}U_{i+1}\cdots U_m)=1) - \frac{1}{2}P(A(X_1\cdots X_{i-1}\bar{X_i}U_{i+1}\cdots U_m)=1)>\epsilon$. Now define a predictor $B$ by $B(x_1\cdots x_{i-1}) = (A(x_1,...,x_{i-1},u_i,...,u_m) = u_i)$, where $u_i,...,u_m$ are uniformly random bits. By construction, the probability that the predictor succeeds exceeds $\frac{1}{2}+\epsilon$, and by using the random bits $u_i,...,u_m$ as advice, we get a predictor that runs in the same time as $A$, a contradiction of next-bit unpredictability.

Remark: In fact the converse is also true: $(t,\epsilon)$-pseudorandomness implies $(t-3,\epsilon)$-next-bit-unpredictability; the contrapositive is given by constructing an algorithm that outputs a 1 when the predictor succeeds and a zero otherwise, and the additive constant comes from adding logical gates to $P$‘s circuit to check for this success.

We can now prove our first main result, the reduction of finding mildly explicit generators to finding average-case-hard functions in E:

Result 3: Fix a time function $s: \mathbb{N}\to\mathbb{N}$ that is computable in time exponential in its input. If there is some $f$ in E that on length-$\ell$ inputs is $(s(\ell),1/2-s(\ell))$-average-case hard, there is a mildly explicit $(m,1/m)$-generator $G: \{0,1\}^{d(m)}\to\{0,1\}^m$ with seed length $d(m) = O(s^{-1}(\text{poly}(m))^2/\log m)$.

Proof: As it turns out that with the above result, we’re almost done; we’ll just need appropriate asymptotics on seed length. The gist will be to evaluate our hard function on inputs that share only a few bits. Define an $(\ell,a)$ design to be a family of subsets of $\{0,...,d-1\}$ each of cardinality $\ell$ and mutually intersecting in at most $a$ elements. The key to getting the desired asymptotic bounds is the following lemma:

Lemma: For every $\gamma>0$ and $\ell, m$, there exists an $(\ell, a)$-design $S_1...,S_m\subset\{0,...,d-1\}$ such that $d = O(\ell^2/a)$ and $a = \gamma\log m$.

Proof: The proof is nonconstructive. We will show that an $(\ell,a)$-design will exist as long as $m$ is bounded above by $\binom{d}{a}/\binom{\ell}{a}^2$. Fix $i,j$, and the probability that $S_i,S_j$ intersect in at least $a$ elements is $\frac{\binom{d}{a}\binom{d-a}{\ell-a}^2}{\binom{d}{\ell}^2} = \frac{\binom{\ell}{a}^2}{\binom{d}{a}^2}$. This tells us that for any $i$, the expected number of sets among $S_1,...,S_{i-1}$, by our choice of bound on $m$, is strictly less than 1, so an $(\ell,a)$-design on $m$ subsets is indeed possible. Stirling’s approximation on our inequality then gives the desired result.

Given an $(\ell,a)$ design $S_1,...,S_m$ and a function $f: \{0,1\}^{\ell}\to\{0,1\}$, we can get a generator $\{0,1\}^d\to\{0,1\}^m$ by evaluating $f$ on the input restricted to each subset in the design. The generator given by $G(x) = f(x|_{S_1})\cdots f(x|_{S_m})$ is called the Nigan-Wigderson generator

Lemma: If $f$ is $(s,1/2-\epsilon/m)$ hard, then $G$ is $(s-ma2^a,\epsilon)$-pseudorandom.

Proof: As usual, we prove the contrapositive. Say $G$ is not pseudorandom so that by our result on next-bit unpredictability, we have a $(t,\epsilon/m)$-predictor $A$ and an index $i$ such that $P(A(f(X_{S_1})\cdots f(X_{S_{i-1}}))=f(X_{S_i}))$ with probability more than $1/2+\epsilon/m$. We can reduce the random choice of $X$ to a random choice of $S_{X_i}$. Specifically, there must exist some fixed choice of bits for $X|_{\bar{S_i}}$ such that if we define $Y$ to be the random variable such that $Y|_{\bar{S_i}}$ equals this fixed choice and $Y|_{S_i}$ is uniformly distributed over bitstrings of length $\ell$, then define the algorithm $B = A(f(Y|_{S_1})\cdots f(Y|_{S_{i-1}}))$, which will compute the function $f$ with error less than $1/2-\epsilon m$. Because $f(Y|_{S_{j}})$ can be computed by storing as advice a lookup table of size at most $|S_i\cap S_j|\le a$ and doing lookups with a circuit of size $a2^a$, the total time of $B$ is at most $t+ma2^a = s$, so $f$ is not $(s, 1/2-\epsilon/m)$-hard, as desired.

The proof of result 2 follows by picking the appropriate parameters.

Hardness Amplification I: Worst-Case Hard to Constant Average-Case

Our next goal will be to construct a constant average-case-hard function out of a worst-case hard function using list decoding.

It turns out that if we view a worst-case hard function $\{0,1\}^{\ell}\to\{0,1\}$ as a message and pass this through an encoding $E$ for which there exists an efficient decoding, then the codeword $E(f) = f': \{0,1\}^{\ell'}\to\{0,1\}$ will be average-case hard. Unfortunately, decoding as we’ve defined it requires reading in all of the bits of the received word and writing out all of the bits of the decoded message, taking time that we might as well have used to make a gigantic truth table to compute $f$ to begin with. Instead, we introduce the following variant of decoding:

Definition 6: A local $\delta$-decoding algorithm $D$ for an encoding $E$ is a probabilistic algorithm that, given oracle access to a received word $g$ and an input position $x$, computes with probability 2/3 the value $f(x)$ for any message $f$ whose encoding $E(f) = f'$ is $\delta$-close to $g$. For coin tosses $r$, the output of the algorithm will be denoted by $D^g(x;r)$, and probability is taken over all $r$.

Result 3: If the message $f$ in the above definition is worst-case hard for nonuniform time $t$ and $D$ runs in nonuniform time at most $t_D$, then $f'$ is $(t/t_D,\delta)$-average-case hard.

Proof: Say that $f'$ is not average-case hard so that there exists some algorithm $A$ running in time $t/t_D$ such that $P(A(x;r) = f'(x)) > 1-\delta$. There must exist some choice of coin tosses $\tilde{r}$ such that, defining deterministic nonuniform algorithm $A'(x) = A(x;r)$, we have $P(A'(x) = f'(x))>1-\delta$. But this $A'$ is just another word in the space $\Sigma^{L'}$, so this bound translates into $d_H(A',f')<\delta$. By definition, the algorithm $D^{A'}$ computes $f$ to within $1/3$ constant error, and the circuit representing this algorithm is the circuit for $D$ with oracle gates replaced by the circuit for $A'$, so $f$ is not $t$-worst-case-hard.

Our hardness amplification will reduce to a worst-case-hard function running in asymptotically the same time $t(\ell) = t(\ell')$ as the average-case-hard function, so we will want an encoding $\{0,1\}^{L}\to\Sigma^{L'}$. Because our final result will be a statement about functions in exponential deterministic time, we also want our encoding to be of this complexity $2^{O(\ell')}$. We will also want the decoding not to take too much time, but we will determine later what the runtime for $D$ should be precisely. Lastly, to get rid of the $m$ term in our lemma on Nigan-Wigderson generators, we will want the code to have a binary alphabet.