# Random-Efficient Error Reduction Part 2- Expander Graphs

(This is a continuation of a previous post.)

We begin by building up some of the canonical technology of expander graphs Morally, we can think of an expander graph as a building block characterized paradoxically by connectivity and sparseness, out of which we can construct families of increasingly larger graphs increasing in the number of vertices but barely growing in degree, i.e. by taking some combination of graph powers, tensor products, zigzag products, etc. Here are a few measures of the connectivity characteristic of expander graphs:

Definition 1 (vertex expansion): A directed multigraph (the word “graph” will be used to denote these from now on unless otherwise stated) is a $(K,A)$-vertex expander if all sets $S$ of at most $K$ vertices have at least $A|S|$ neighbors. In other words, all small enough collections of vertices are well connected to the entire graph.

Definition 2 (spectral expansion): A regular graph $G$ is a $\gamma$-spectral expander if the spectral gap $\gamma(G)$ of its Laplacian is at least $\gamma$. Morally, the algebraic “mixing” of $G$ is sufficiently fast.

Definition 3 (edge expansion): For subsets $S,T$ of the vertices of a graph $G$, let $e(S,T)$ denote the cut size between $S$ and $T$, i.e. the number of directed edges from vertices in $S$ to vertices in $T$. A $D$-regular graph $G$ is a $(K,\epsilon)$-edge expander if $e(S,\bar{S})\ge\epsilon D|S|$ for all $S$ such that $|S|\le K$.

In other words, all small enough collections of vertices have lots of edges emanating from them. It turns out that these three criteria for expansion are closely related (the following results aren’t immediately relevant to error reduction, but as a newbie learning about expander graphs for the first time, I think they form a neat connection between the algebraic and the combinatorial).

Result 1: If $G$ is a $\gamma$-spectral expander with $N$ vertices, it is also an $\left(\alpha N,\frac{1}{(1-\alpha)(1-\gamma)^2+\alpha}\right)$-vertex expander.

Proof: Let $M$ denote the Laplacian of $G$ and $\pi$ the uniform distribution on a given set of vertices $S\subset V(G)$. The key to this proof is that we can express the size of $S$ in terms of the collision probability $CP(\pi)$ of $\pi$ (i.e. the probability that two independent samples from $\pi$ are the same) and the neighborhood size $|N(S)|$ in terms of the size of the support $S(\pi M)$ of $\pi M$ (i.e. the set of nonzero coordinates of $\pi$– note that this $S(\cdot)$ notation is nonstandard and only used for typographical simplicity). We can quickly verify that for any probability distribution $\pi$, i) $CP(\pi) = |\pi-u|^2+1/N$, and ii) Cauchy-Schwarz gives that $CP(\pi)\ge 1/|S(\pi)|$. Fact i) tells us that

$\lambda^2\ge\left(\frac{|\pi M-u|}{|\pi - u|}\right)^2=\frac{CP(\pi M)-1/N}{CP(\pi)-1/N}$,

and because, as we mentioned above, $CP(\pi)=1/|S|$, we can use fact ii) and the aforementioned observation that $|S(\pi M)| = |N(S)|$ to see that our above inequality becomes

$\lambda^2\ge \frac{1/|N(S)|-1/N}{1/|S|-1/N}$,

and solving for an appropriate lower bound on $|N(S)|$ gives the desired result.

Result 2 (Alon): If $G$ is a $D$-regular $(N/2,1+\alpha)$-vertex expander, then $G$ is a $\left(1 - \frac{\alpha^2}{D(4\alpha^2+8)}\right)$-spectral expander.

Proof: We defer this to the next post.

Before we state the connection between edge expansion and vertex/spectral expansion, we introduce a famously useful result, the Expander Mixing Lemma.

Result 3: For $G$ a $D$-regular $(1-\lambda)$-spectral expander of $N$ vertices, then for any sets $S,T\subset V(G)$ of densities $\alpha = |S|/N$ and $\beta = |T|/N$, we have

$\left|\frac{e(S,T)}{ND}-\alpha\beta\right|\le\lambda\sqrt{\alpha\beta(1-\alpha)(1-\beta)}$

Proof: This neat result follows from a bit of computation after observing that if $\chi_S$ and $\chi_T$ are the characteristic row vectors for $S$ and $T$ and $A$ is the adjacency matrix of $G$, then $e(S,T) = \chi_SA\chi^t_T$ (the $i$th coordinate of $\chi_SA$ gives the number of vertices in $S$ adjacent to vertex $i$, so taking the dot product of this with $\chi^t_T$ gives the number of edges from $S$ to $T$).

Denoting $u$ to be the uniform distribution, we break up $\chi_S$ and $\chi_T$ into orthogonal components $(\alpha N)u+\chi^{\perp}_S$ and $(\beta N)u+\chi^{\perp}_T$ for some $\chi^{\perp}_S$ and $\chi^{\perp}_T$. Substituting these into our expression for $e(S,T)$ in terms of the characteristic row vectors of $S,T$, we can get rid of cross terms by orthogonality (we leave these manipulations as an exercise), ending up with

$\frac{e(S,T)}{ND} = \alpha\beta+\frac{\chi^{\perp}_SM(\chi^{\perp}_T)^t}{N}$,

where $M$ denotes the Laplacian of $G$. By definition of spectral conclusion, we get that $\left|\frac{e(S,T)}{ND}-\alpha\beta\right|\le\frac{1}{N}\lambda|\chi^{\perp}_S|\cdot|\chi^{\perp}_T|,$ and we can rewrite the norms of the two characteristic vectors using the Pythagorean theorem to get the desired inequality.

Observe that by taking $T$ to be $\bar{S}$ and the lower bound of $|\bar{S}|$ to be $N/2$, we get that every $D$-regular $\gamma$-spectral expander of $N$ is an $(N/2,\gamma/2)$ edge expander.

The intuition behind the Expander Mixing Lemma is that the fraction of all directed edges in the graph that are going from a set $S$ of vertices to another set $T$ should be roughly the product of the densities of those two sets, suggesting that perhaps it’s good to think of expander graphs as random graphs in the following sense:

Result 4 (Pinsker): For any $D$, there is some $\alpha>0$ so that a random $D$-regular undirected graph with $N$ vertices is an $(\alpha N,D-1.01)$-vertex expander at least half the time.

Proof: Most sources online only give the proof for the simple case in which the graph is $D$-leftregular bipartite. We will present Pinsker’s original proof for the general case in a future post. For now, again, this result isn’t immediately relevant to random-efficient error reduction.

Now let’s move on to how to achieve random-efficient error reduction using expander graphs. We will get the randomness needed in our BPP algorithm out of a random walk on the expander graph; morally the good mixing property of spectral expanders gives us a probability distribution that converges pretty nicely to a uniform one, so by taking the nodes visited in our walk to be the “coin tosses” in our BPP algorithm, it turns out we can get both the random-efficiency of pairwise independent sampling and the time-efficiency of purely independent sampling. Specifically, we have the following Chernoff bound for random walks on expander graphs:

Result 5 ((Another) Chernoff Bound): If $G$ is a regular $(1-\lambda)$-spectral expander with $N$ vertices and $f: [N]\to[0,1]$ is the function we are approximating, then for a random walk $V_1,...,V_n\subset V(G)$ where $V_1$ is uniformly randomly chosen,

$P\left(|\frac{1}{n}\sum_i f(V_i)-\mu(f)|\ge\lambda+\epsilon\right)\le 2e^{-\Omega(e^2n)}$

where $\mu(f)$ denotes the expected value of $f$. In other words, if $\lambda$ is sufficiently small (e.g. by taking $G$ to a sufficiently large power), then the probability that our approximation has error at least $\epsilon$ decreases exponentially in the number of repetitions $n$.

Before we prove this, we give a convenient lemma that morally says that at each step of our random walk, we are jumping anywhere at random on the graph with probability $\gamma$ and jumping somewhere that won’t change the current probability distribution too much.

Lemma: We can break up the Laplacian $M$ of a regular $(1-\lambda)$-spectral expander $G$ into $(1-\lambda) J + \lambda E$, where $J$ is the Laplacian of the complete $N$-graph and $E$ is an “error” matrix with norm bounded above by 1.

Proof: As usual, let $\gamma$ denote $1-\lambda$. We need to show that $|E| = \left|\frac{M-\gamma J}{\lambda}\right|\le 1$. Every vector we can write as a linear combination of uniform distribution $u$ and a vector perpendicular to $u$, so it suffices to show that right-multiplication of $u$ or any such orthogonal vector $v$ by $E$ preserves orthogonality and does not increase norm. Firstly, $uE=u$. On the other hand, regularity of $G$ tells us that the sum of the coordinates of $vM$ remains zero so that $vE = vM - vJ$ is still orthogonal to $\perp u$. On the other hand, $vE = (vM-\gamma vJ)/\lambda = vM/\lambda$, which has norm bounded above by 1 by definition of spectral expansion, completing our proof of the lemma.

Returning to our proof of the Chernoff bound for expander walks, we try to bound the moment-generating function for $X = \sum_iX_i = \sum_if(V_i)$ and then use Markov’s inequality, though this is a bit hairy because our random variables $\{X_i\}$ aren’t independent… We bound we construct as follows:

Let $P$ be the diagonal matrix with entries $e^{rf(i)}$. We can verify with a simple inductive argument that $E[e^{rX}]$ equals the sum of the coordinates of $uP(MP)^{-t-1}$, and because $G$ is regular, this is the same as the sum of the coordinates of $u(MP)^t$, which we can bound above by $\sqrt{N}|u|\cdot|MP|^t$ using Cauchy-Schwarz, giving that $E[e^{rX}]\le |MP|^t$.

To bound $|MP|$, decompose it using the above lemma into $(1-\lambda)|JP|+\lambda|EP|$. A bit of computation and Taylor expansion gives us that $|JP|=1+r\mu(f)+O(r^2)$, while the fact that $|E|\le 1$ gives us that $|EP|\le |P|\le e^r=1+r+O(r^2)$. Some more Taylor expansion after plugging these inequalities into our expression for the moment-generating function gives us that $E[e^{rX}]\le e^{(\mu(f)+\lambda)rt+O(r^2t)}$. Markov’s inequality tells us that

$P(X\ge(\mu(f)+\lambda+\epsilon)t)\le e^{-\epsilon rt+O(r^2t)}$

For $r$ chosen to be a sufficiently small fraction of $\epsilon$, we get an upper bound of $e^{-\Omega(\epsilon^2t)}$. By symmetry, if we had taken our function to be $1-f$ instead, we would get that $P(X\le(\mu(f)-\lambda-\epsilon)t)\le e^{-\Omega(\epsilon^2t)}$, and we’re done.

The Chernoff bound on expander walks tells us that a random walk of length $O(k)$ suffices to cut error to $2^{-k}$. More precisely, if we have a BPP algorithm requiring $k$ repetitions with $m$ random bits per repetition, we can now pay the per-repetition cost of $m$ random bits in picking the first vertex of the walk and then pay a constant number of bits (depending on the degree of the expander) per subsequent step of the walk, meaning that random-efficient error reduction only requires $k$ repetitions and $m+O(k)$ random bits!

I’ll tie up some loose ends in the next two posts, including the results due respectively to Alon and Pinsker that 1) edge expansion implies spectral expansion and 2) (general) random regular graphs are good expanders.

EDIT: I got a bit ahead of myself and went ahead with blogging about later chapters of the survey instead of proving Alon’s and Pinsker’s results, but I promise I’ll tie up all loose ends soon!