List-Decoding Algorithms and Rate-Distance Tradeoff

This post will cover some cool results in coding theory I’ve learned from Vadhan’s survey on pseudorandomness, with a focus on list-decoding algorithms and the tradeoff between rate and distance and culminating in a discussion of the folded Reed-Solomon code and its relationship to the Parvaresh-Vardy code.

We begin with some basic definitions.

Definition 1: For alphabet \Sigma of size q and block length \hat{n}, a q-ary code is a multiset \mathcal{C} inside the set of \hat{n}-words \text{Maps}([\hat{n}],\Sigma). The members of \mathcal{C} are called codewords.

Definition 2: An encoding function for a code \mathcal{C} is a bijection from the messages \{1,...,|\mathcal{C}|\} to the set of codewords, thought of as a labeling.

The set of \hat{n}-words from the alphabet \Sigma has an obvious structure as a metric space, and this notion of distance will help us in developing a model of decoding.

Definition 3: The relative Hamming distance d(x,y) for x,y\in \text{Maps}([\hat{n}],\Sigma) is the fraction of positions i\in[\hat{n}] such that x(i) \neq y(i). Under this metric, denote B(x,\delta) to be the open ball of \hat{n}-words centered at x with radius \delta. This happens to be called the Hamming ball of radius \delta around x.

With these in mind, we will focus not on unique decoding, but on decoding algorithms that output lists of potential candidates and allow for handling of bigger batches of errors.

Definition 4: Let \ell(r,\epsilon) denote the set of codewords lying inside the Hamming ball of radius 1-\epsilon centered about r. An encoding E: [|\mathcal{C}|]\to\mathcal{C} is said to be (\delta, L)-list-decodable if |\ell(r,1-\delta)|\le L for all \hat{n}-words r.

Continue reading

Random-Efficient Error Reduction Part 2- Expander Graphs

(This is a continuation of a previous post.)

We begin by building up some of the canonical technology of expander graphs Morally, we can think of an expander graph as a building block characterized paradoxically by connectivity and sparseness, out of which we can construct families of increasingly larger graphs increasing in the number of vertices but barely growing in degree, i.e. by taking some combination of graph powers, tensor products, zigzag products, etc. Here are a few measures of the connectivity characteristic of expander graphs:

Definition 1 (vertex expansion): A directed multigraph (the word “graph” will be used to denote these from now on unless otherwise stated) is a (K,A)-vertex expander if all sets S of at most K vertices have at least A|S| neighbors. In other words, all small enough collections of vertices are well connected to the entire graph.

Definition 2 (spectral expansion): A regular graph G is a \gamma-spectral expander if the spectral gap \gamma(G) of its Laplacian is at least \gamma. Morally, the algebraic “mixing” of G is sufficiently fast.

Definition 3 (edge expansion): For subsets S,T of the vertices of a graph G, let e(S,T) denote the cut size between S and T, i.e. the number of directed edges from vertices in S to vertices in T. A D-regular graph G is a (K,\epsilon)-edge expander if e(S,\bar{S})\ge\epsilon D|S| for all S such that |S|\le K.

In other words, all small enough collections of vertices have lots of edges emanating from them. It turns out that these three criteria for expansion are closely related (the following results aren’t immediately relevant to error reduction, but as a newbie learning about expander graphs for the first time, I think they form a neat connection between the algebraic and the combinatorial).

Continue reading

Random-Efficient Error Reduction Part 1- Preliminaries

The following series of posts will document things that I’ve learned from a neat survey on pseudorandomness by Prof. Salil Vadhan at Harvard. They’re mostly to help me digest the material, but I hope they’ll be of interest to anyone reading this blog.

Specifically, this first set of two posts will be about the power of expander graphs in randomness-efficient error reduction in randomized algorithms; this first post will present two naive approaches to error reduction. We begin with the following definition.

Definition 1: A language L lies in class BPP (bounded-error probabilistic polynomial-time) if there exists a probabilistic polynomial-time algorithm A such that the probability that A accepts an element x is 1/2+1/p(n) if x\in L and is 1/2-1/p(n) if x\not\in L, where p is some polynomial in the length n of the input. Stated another way, BPP is the class of languages with probabilistic polynomial-time algorithms with two-sided error at most 1/2-1/p(n).

Intuitively, taking multiple repetitions should push the error in any such A arbitrarily low, thanks to the following standard result from probability theory:

Result 1 ((A) Chernoff Bound): Let X_1,...,X_n be independent random variables from [0,1]. Let X be their average (\sum_i X_i)/n, and let \mu be the expected value of X. Then P(|X-\mu|\ge\epsilon)\le 2e^{-n\epsilon^2/4}, that is, the probability of the average differing from its expected value by some quantity \epsilon falls exponentially in the number of repetitions n.

For an algorithm A with error at most 1/2-1/p(n), consider the corresponding algorithm A' which runs A for t(n) repetitions. Let X be the average of the t(n) indicator variables X_i which each take on a value of 1 if the algorithm outputs the right answer on the ith repetition and 0 otherwise. By the Chernoff bound above for \epsilon = 1/p(n), we get that P(X\le 1/2)\le 2e^{-t(n)/4p(n)^2}. Taking t(n) to be 4kp(n)^2, we get that the probability is bounded above by 2^{-k} as long as n is large enough. Here, reducing error to 2^{-k} comes at a cost of k repetitions and, if our algorithm A requires m random bits, A' requires km random bits, because our repetitions are completely independent.

Continue reading

The Isoperimetric Inequality

This will be a cool application of the concepts introduced in the last post. We call a curve rectifiable if there exists finite M such that for any partition a=t_0<t_1<\cdots<t_N=b, \sum^N_{j=1}|z(t_j)-z(t_{j-1})\le M, i.e. the curve has some notion of “finite length,” where length is the supremum of this quantity taken over all partitions or equivalently the infimum over all upper bounds M. It follows by definition that if the x– and y-parametrizing functions of the curve are of bounded variation, the curve is rectifiable. It is not, however, true that the classical arc length formula holds for all curves of bounded variation: just consider a curve where the x and y functions are the Cantor-Lebesgue function. We get a straight line from the origin to (1,1), but the derivative of the function is zero almost everywhere.

Result 4: The arc length formula does work if we assume absolute continuity of x(t) and y(t).

Proof: We will prove that the total variation of a complex valued function over [a,b] is \int^b_a|F'(t)|dt, because then we can just substitute F(t) = x(t)+iy(t). As usual, we will prove that there is inequality in both directions. But recall that for absolutely continuous functions, the fundamental theorem of calculus holds, so pick a partition a = t_0<t_1<\cdots<t_N=b so that \sum^N_{j=1}|F(t_j)-F(t_{j-1}| = \sum^N_{j=1}\left|\int^{t_j}_{t_{j-1}}F'(t)\right|dt\le\int^b_a|F'(t)|dt. In the other direction, recall that step functions are dense in L^1, so we can find an approximating step function g to F' so that h = F'-g has integral arbitrarily small. By the triangle inequality, T_F(a,b)\ge T_G(a,b)-T_H(a,b)>T_G(a,b)-\epsilon, where G(x) = \int^x_a g(t)dt and H(x) = \int^x_ah(t)dt. We can bound T_G(a,b) by taking a partition where the adjacent intervals are over constant parts of the step function g so that T_G(a,b)\ge \sum^N_{j=1}\left|\int^{t_j}_{t_{j-1}}g(t)\right|dt=\int^b_a|g(t)|dt. But recall that we picked g to be extremely close to F' so that \int^b_a|g(t)|dt\ge\int^b_a|F'(t)dt|-\epsilon, so T_F(a,b)\ge\int^b_a|F'(t)|dt - 2\epsilon and we get inequality in the other direction.

Now before we proceed to state and prove the isoperimetric inequality, let’s get some vocabulary under our belt. Define the one-dimensional Minkowski content M(K) of a curve K to be \lim_{\delta\to 0}\frac{m(K^{\delta}}{2\delta}, where K^{\delta} denotes the set of points which are at most \delta away from any point in K. Define a simple curve to be a curve that doesn’t double over on itself, a quasi-simple curve to be a curve such that t\to z(t) is injective except for finitely many points, and a closed curve to be one that starts where it ends. As the name suggests, the Minkowski content of a curve turns out to be precisely its length if the curve is rectifiable and quasi-simple.

Continue reading

Integrating the Derivative- Filling in the Gaps

This will be a short continuation of the last post. Note that in our discussions in the previous post, we assumed throughout that F was continuous. Let’s drop that assumption and take care of the case of jump discontinuities. Fortunately there aren’t too many: between any jump discontinuity we can squeeze in a distinct rational number, so a bounded increasing function can have at most countably many discontinuities.

Define F(x^-) and F(x^+) to be F except defined on the left and right sides of each jump discontinuity respectively so that for F increasing, F(x^-)\le F(x)\le F(x^+). Define the jump function J(x) to be the sum of all jumps \alpha_n F(x^+_n)-F(x^-_n) at and to the left of the point, where x_n are points of discontinuity. In other words, we are constructing an increasing step functions where the discontinuities are exactly the same points as in the original function. Note that if F is bounded, the sum of these jumps and thus J is absolutely and uniformly convergent.

Why did we construct this? Well by design, F-J is continuous and increasing. Basically, the motivation graphically is to “drop down” all the discontinuities so that we get one continuous curve F-J, and then we add the jumps separately in the form of J. We can do this because it turns out:

Result 3: J'(x) exists and equals zero almsot everywhere.

Proof: Define E to be the set of x for fixed \epsilon for which the limit superior of the differential \frac{J(x+h)-J(x)}{h}>\epsilon. We want to show that m(E)=0. Because the series of jumps \sum\alpha_n converges, we can find N for any \nu such that the sum past the Nth jump is less than \nu. Then the jump function J_0 corresponding to all jumps past the Nth jump changes by less than \nu from J_0(a) to J_0(b). But J differs from J_0 by a finite number of summands, so the set E' for which the limit superior of \frac{J_0(x+h)-J(x)}{h} exceeds \epsilon differs from E by finitely many points. Pick a compact subset of E so that when we take out these points, the resulting compact subset K\subset E' is at least half the measure of E. For each x\in K, we have a neighborhood (a_x,b_x) where J_0(b_x)-J_0(a_x)>\epsilon(b_x-a_x). By compactness, we can pick a finite subcover, and then we can apply our old covering argument to get a disjoint sub-collection for which \sum^n_{j=1}m(I_n)\ge m(K)/3.

So we have \nu > J_0(b)-J_0(a)\ge \sum J_0(b_k)-J_0(a_k)>\epsilon\sum(b_k-a_k)\ge \frac{\epsilon m(E)}{6}, and because \nu can be anything, we have proven m(E) to be zero.

Integrating the Derivative

Dual to the problem addressed in the previous post is that of when the result of the fundamental theorem of calculus holds true, namely that F(b)-F(a) = \int^b_a F'(x)dx. It turns out the condition of absolute continuity defined at the end of the last post is sufficient. First, define the variation of a complex-valued function F over a partition a = t_0< \cdots < t_N=b to be \sum^N_{i=1}|F_i-F_{i-1}|. It is easy to see that variation increases in the “fineness” of the partition; we say that a function is of bounded variation if the supremum of the variation over all partitions, the total variation T_F(a,b), is finite. This will be the case, roughly speaking, for functions that do not oscillate too widely or too frequently. We can see that real, monotonic, bounded functions, as well as differentiable and bounded functions, are of bounded variation.

The first result of this post will show some of the motivation for studying these functions.

Result 1: Bounded variation implies differentiability almost everywhere.

Proof: To prove this, let’s first narrow our focus solely to increasing functions. We can do this because of the following characterization of functions of bounded variation: they are precisely the functions that are differences of two increasing bounded functions. One direction is obvious, that the difference of two increasing bounded functions is of bounded variation. To prove the other direction, define positive and negative variation P_F(a,b) and N_F(a,b) over the interval \sup\sum_{(+)}F(t_j)-F(t_{j-1}) and \sup\sum_{(-)}F(t_j)-F(t_{j-1}), where the sums are taken over all positive and negative differences, respectively. We immediately have that F(b)-F(a) = P_F(a,b) - N_F(a,b) and T_F(a,b) = P_F(a,x)+N_F(a,x). Then to prove the other direction of our claim, simply take the functions to be P_F(a,b)+F(a) and N_F(a,b).

Continue reading

Differentiating the Integral

Now let’s put the machinery we’ve built up to use in making precise the familiar notion of differentiation and integration being dual to each other. It is easy to see in one direction why this makes sense: roughly speaking, the derivative of an integral is the limit of the average value of a function over a neighborhood as the measure of that neighborhood approaches zero. For this reason the first problem we will address in this post is called the averaging problem, namely, if f is Lebesgue-integrable, do we have that \lim_{m(B)\to 0, x\in B}\frac{1}{m(B)}\int_Bf(y)dy = f(x) for almost all x?

Result 1: We can answer the averaging problem in the affirmative.

Well it’s certainly in the affirmative for continuous functions; one of the keys is to observe that continuous functions of compact support are dense in the space L^1 of Lebesgue-integrable functions. We already know simple functions are, so step functions are as well, and we can easily find arbitrarily close approximations to the most basic step function, namely the characteristic function of a rectangle, by continuous functions of compact support, so now for any f in L^1, approximate by such a continuous function g so that \left|\left|f-g\right|\right| can be made arbitrarily small. Then we can rewrite \frac{1}{m(B)}\int_Bf(y)dy - f(x) as

\frac{1}{m(B)}\int_B(f(y)-g(y))dy + \left(\frac{1}{m(B)}\int_Bg(y)dy - g(x)\right) +(g(x)-f(x)).

Take the limit superior of both sides over balls that contain x, and because g is continuous, the middle term on the right vanishes. We want to show that for any given \alpha, the measure of the set E_{\alpha} for which the limit superior of the left, i.e. the difference between f and the limit of its average value, exceeds \alpha is zero. By Chebyshev’s inequality, |g(x)-f(x)|>\alpha on a set of measure O(\left|\left|f-g\right|\right|), and \limsup\frac{1}{m(B)}\int_B(f(y)-g(y))dy\le (f-g)^*(x), where (f-g)^*=\sup\frac{1}{m(B)}\int_B|f(y)-g(y)|dy, the so-called Hardy-Littlewood maximal function.

Continue reading

Lebesgue Integration Part 4- Complex Functions

If we have a complex measurable f = u+iv, where u,v are real and measurable, then we define the Lebesgue integral of f over measurable set E\subset X as follows:

\displaystyle\int_E f \ d\mu = \displaystyle\int_E u^+ \ d\mu - \displaystyle\int_E u^- \ d\mu+i\displaystyle\int_E v^+ \ d\mu - i\displaystyle\int_E v^- \ d\mu,

where for a real function g, g^+ and g^- denote \max(g,0) and -\min(-g,0) respectively so that g = g^+ - g^-. We say that f\in L^1(\mu), that is, f is Lebesgue-integrable, if \int_X |f| \ d\mu<\infty. It is easy to verify the usual additive and scalar multiplicative properties of the complex Lebesgue integral.

Continue reading

Lebesgue Integration Part 3- Integration of Nonnegative Functions

Let’s first define the integral of a nonnegative simple function given by s=\sum^{n}_{i=1}\alpha_i\chi_{A_i}, where s takes on the values \alpha_1,...,\alpha_n on sets A_1,...,A_n. Then the integral of s over a subset E in the \sigma-algebra is given by the geometrically intuitive \int_E s \ d\mu = \sum^n_{i=1}\alpha_i\mu(A_i\cap E). We can then extend this to arbitrary nonnegative measurable functions f to get \int_E f \ d\mu = \sup\int_E s \ d\mu, where the supremum is taken over all simple functions s such that 0\le s\le f. We can easily deduce familiar properties that if 0\le f\le g over E, the integral of g over E is greater than that of f, that the the Lebesgue integral is increasing by set inclusion and multiplicative by scalar factors, and that \int_E f \ d\mu = 0 if E is of measure 0 or f is identically zero over E.

Recall from the previous post that to every nonnegative function f corresponds some increasing sequence of simple nonnegative functions that converges to f. This result and the following result empower us to prove many results on the integral of arbitrary nonnegative functions simply by proving them for the integral of simple functions.

Result 8 (Monotone Convergence): If \{f_n\} is an increasing sequence of nonnegative, measurable functions, then

f_n\to f\Rightarrow \int_X f_n \ d\mu\to\int_X f \ d\mu.

Proof: By a previous result, f is certainly measurable. Let’s say that the sequence of integrals converges to some L. We know that L\le\int_X f \ d\mu and thus want to prove that this inequality is true in the reverse. First observe that the function \phi(E)=\in_E s \ d\mu is a measure for s simple (we leave it as an exercise to the reader to verify that \phi is countably additive and that \phi(\emptyset)=0). We will proceed to construct an increasing sequence of sets whose union is X and use this to show that \alpha is greater than the integral of any simply function bounded between 0 and f. Fix a constant 0<c<1 and any simple measurable function 0\le s\le f, and define E_n = \{x: f_n(x)\ge cs(x). We can verify that \{E_n\} is an increasing sequence of measurable sets whose union is the whole space. So if we take the limit of both sides of \int_X f_n \ d\mu\ge c\int_{E_n} s \ d\mu, we get that \alpha\ge c\int_X s \ d\mu. This holds for all 0<c<1 and all simple 0\le s\le f, so we have equality. Note that if we had picked c=1, the union of the sets E_n would not necessarily have been the entire space (pick f=s), so the strict inequality of 0<c<1 was necessary.

Continue reading

Lebesgue Integration Part 2- Borel Sets, Simple Functions, Measures

Given a topological space X, call the sets in the \sigma-algebra generated by the open sets of X the Borel sets, and call X endowed with this structure a Borel space . So in other words, the Borel sets consist of all open and closed sets and their countable intersections and unions. The Borel functions are measurable functions from a Borel space to a topological space. Observe that a continuous function from a Borel space to a topological space is Borel, but the converse is not necessarily true.

It is clear that if f is continuous and g is measurable, then f\circ g is measurable, and in fact, the same is true if f is Borel. To see this, note that a measurable function pulls all Borel sets back to measurable sets. This observation will be key to our definition of the Lebesgue integral.

Next, define a simple function to be a complex function s on a measurable space with an image of finitely many values from [0,\infty). Denote these values by \alpha_1,...,\alpha_n and denote \{x: s(x)=\alpha_i by E_i. The function s can then be rewritten as s=\sum^n_{i=1}\alpha_i\chi_{A_i}, which is measurable iff A_i is measurable for all i.

In Riemann integration, roughly speaking, the familiar approximating blocks heights can be thought of as given by a simple function. And indeed, the next result says that every nonnegative measurable function has a corresponding increasing sequence of approximating simple functions.

Continue reading