List-Decoding Algorithms and Rate-Distance Tradeoff

This post will cover some cool results in coding theory I’ve learned from Vadhan’s survey on pseudorandomness, with a focus on list-decoding algorithms and the tradeoff between rate and distance and culminating in a discussion of the folded Reed-Solomon code and its relationship to the Parvaresh-Vardy code.

We begin with some basic definitions.

Definition 1: For alphabet \Sigma of size q and block length \hat{n}, a q-ary code is a multiset \mathcal{C} inside the set of \hat{n}-words \text{Maps}([\hat{n}],\Sigma). The members of \mathcal{C} are called codewords.

Definition 2: An encoding function for a code \mathcal{C} is a bijection from the messages \{1,...,|\mathcal{C}|\} to the set of codewords, thought of as a labeling.

The set of \hat{n}-words from the alphabet \Sigma has an obvious structure as a metric space, and this notion of distance will help us in developing a model of decoding.

Definition 3: The relative Hamming distance d(x,y) for x,y\in \text{Maps}([\hat{n}],\Sigma) is the fraction of positions i\in[\hat{n}] such that x(i) \neq y(i). Under this metric, denote B(x,\delta) to be the open ball of \hat{n}-words centered at x with radius \delta. This happens to be called the Hamming ball of radius \delta around x.

With these in mind, we will focus not on unique decoding, but on decoding algorithms that output lists of potential candidates and allow for handling of bigger batches of errors.

Definition 4: Let \ell(r,\epsilon) denote the set of codewords lying inside the Hamming ball of radius 1-\epsilon centered about r. An encoding E: [|\mathcal{C}|]\to\mathcal{C} is said to be (\delta, L)-list-decodable if |\ell(r,1-\delta)|\le L for all \hat{n}-words r.

Continue reading

Random-Efficient Error Reduction Part 2- Expander Graphs

(This is a continuation of a previous post.)

We begin by building up some of the canonical technology of expander graphs Morally, we can think of an expander graph as a building block characterized paradoxically by connectivity and sparseness, out of which we can construct families of increasingly larger graphs increasing in the number of vertices but barely growing in degree, i.e. by taking some combination of graph powers, tensor products, zigzag products, etc. Here are a few measures of the connectivity characteristic of expander graphs:

Definition 1 (vertex expansion): A directed multigraph (the word “graph” will be used to denote these from now on unless otherwise stated) is a (K,A)-vertex expander if all sets S of at most K vertices have at least A|S| neighbors. In other words, all small enough collections of vertices are well connected to the entire graph.

Definition 2 (spectral expansion): A regular graph G is a \gamma-spectral expander if the spectral gap \gamma(G) of its Laplacian is at least \gamma. Morally, the algebraic “mixing” of G is sufficiently fast.

Definition 3 (edge expansion): For subsets S,T of the vertices of a graph G, let e(S,T) denote the cut size between S and T, i.e. the number of directed edges from vertices in S to vertices in T. A D-regular graph G is a (K,\epsilon)-edge expander if e(S,\bar{S})\ge\epsilon D|S| for all S such that |S|\le K.

In other words, all small enough collections of vertices have lots of edges emanating from them. It turns out that these three criteria for expansion are closely related (the following results aren’t immediately relevant to error reduction, but as a newbie learning about expander graphs for the first time, I think they form a neat connection between the algebraic and the combinatorial).

Continue reading

Random-Efficient Error Reduction Part 1- Preliminaries

The following series of posts will document things that I’ve learned from a neat survey on pseudorandomness by Prof. Salil Vadhan at Harvard. They’re mostly to help me digest the material, but I hope they’ll be of interest to anyone reading this blog.

Specifically, this first set of two posts will be about the power of expander graphs in randomness-efficient error reduction in randomized algorithms; this first post will present two naive approaches to error reduction. We begin with the following definition.

Definition 1: A language L lies in class BPP (bounded-error probabilistic polynomial-time) if there exists a probabilistic polynomial-time algorithm A such that the probability that A accepts an element x is 1/2+1/p(n) if x\in L and is 1/2-1/p(n) if x\not\in L, where p is some polynomial in the length n of the input. Stated another way, BPP is the class of languages with probabilistic polynomial-time algorithms with two-sided error at most 1/2-1/p(n).

Intuitively, taking multiple repetitions should push the error in any such A arbitrarily low, thanks to the following standard result from probability theory:

Result 1 ((A) Chernoff Bound): Let X_1,...,X_n be independent random variables from [0,1]. Let X be their average (\sum_i X_i)/n, and let \mu be the expected value of X. Then P(|X-\mu|\ge\epsilon)\le 2e^{-n\epsilon^2/4}, that is, the probability of the average differing from its expected value by some quantity \epsilon falls exponentially in the number of repetitions n.

For an algorithm A with error at most 1/2-1/p(n), consider the corresponding algorithm A' which runs A for t(n) repetitions. Let X be the average of the t(n) indicator variables X_i which each take on a value of 1 if the algorithm outputs the right answer on the ith repetition and 0 otherwise. By the Chernoff bound above for \epsilon = 1/p(n), we get that P(X\le 1/2)\le 2e^{-t(n)/4p(n)^2}. Taking t(n) to be 4kp(n)^2, we get that the probability is bounded above by 2^{-k} as long as n is large enough. Here, reducing error to 2^{-k} comes at a cost of k repetitions and, if our algorithm A requires m random bits, A' requires km random bits, because our repetitions are completely independent.

Continue reading