List-Decoding Algorithms and Rate-Distance Tradeoff

This post will cover some cool results in coding theory I’ve learned from Vadhan’s survey on pseudorandomness, with a focus on list-decoding algorithms and the tradeoff between rate and distance and culminating in a discussion of the folded Reed-Solomon code and its relationship to the Parvaresh-Vardy code.

We begin with some basic definitions.

Definition 1: For alphabet \Sigma of size q and block length \hat{n}, a q-ary code is a multiset \mathcal{C} inside the set of \hat{n}-words \text{Maps}([\hat{n}],\Sigma). The members of \mathcal{C} are called codewords.

Definition 2: An encoding function for a code \mathcal{C} is a bijection from the messages \{1,...,|\mathcal{C}|\} to the set of codewords, thought of as a labeling.

The set of \hat{n}-words from the alphabet \Sigma has an obvious structure as a metric space, and this notion of distance will help us in developing a model of decoding.

Definition 3: The relative Hamming distance d(x,y) for x,y\in \text{Maps}([\hat{n}],\Sigma) is the fraction of positions i\in[\hat{n}] such that x(i) \neq y(i). Under this metric, denote B(x,\delta) to be the open ball of \hat{n}-words centered at x with radius \delta. This happens to be called the Hamming ball of radius \delta around x.

With these in mind, we will focus not on unique decoding, but on decoding algorithms that output lists of potential candidates and allow for handling of bigger batches of errors.

Definition 4: Let \ell(r,\epsilon) denote the set of codewords lying inside the Hamming ball of radius 1-\epsilon centered about r. An encoding E: [|\mathcal{C}|]\to\mathcal{C} is said to be (\delta, L)-list-decodable if |\ell(r,1-\delta)|\le L for all \hat{n}-words r.

The \delta parameter in the above definition can be thought of as capturing the difficulty of decoding: the lower the distance \delta, the closer codewords are to each other and the harder it is to distinguish among them.

A related parameter is the minimum distance of a code, defined to be the minimum Hamming distance between any two distinct codewords.

One of the tradeoffs central to coding theory is that between the efficient decodability captured by \delta and efficient encodability. This will serve as the motivation for the progression of codes we will present leading up to the FRS codes. We will formalize efficient encodability as follows:

If we imagine each message as a binary string, then the length of that binary string gives what one can think of as a measure of the “breadth” of the code. If we further think of the elements of the alphabet as binary strings, then the length of these strings times the block length gives a measure of the “verbosity“ of the code. With these images in mind, we can define the following parameters; in particular, we can think of the latter as the “expressiveness” of the code.

Definition 5: Call n = \log|\mathcal{C}| the message length of code \mathcal{C} with alphabet \Sigma.. Define the rate of \mathcal{C} to be \rho \frac{n}{\hat{n}\log|\Sigma|}.

We introduce a slight modification to this, here solely for the purpose of giving a probabilistic proof of the existence of very good codes. The slight modification can be thought of both as the rate of all \hat{n}-words lying within a Hamming ball of fixed radius, and, more mysteriously, as a convergent to a measure of the entropy of an alphabet of size q for a given distance \delta.

Definition 6: H_q(\delta,\hat{n}) = \frac{\log(|B(x,\delta)| )}{\log|\Sigma|\hat{n}}.

Result 1: For fixed block length \hat{n}, base q, list size L, and distance \delta<1-1/q, there exists a (\delta, L)-list-decodable q-ary code with a rate of at least \rho = 1-H_q(\delta,\hat{n})-1/(L+1).

Proof: Randomly and independently pick N words from the set of \hat{n}-words to be our code for some N that we will determine later to guarantee the existence of a good code. The probability that there exists a Hamming ball of radius \delta containing more than L codewords, i.e. the probability that we run into problems trying to decode, is bounded above, according to the union bound), by


and by taking N to be \left\lceil q^{1-H_q(\delta,\hat{n})-1/(L+1)\hat{n}} \right\rceil,

and we’re done.

We conclude our preliminaries with two result on minimum distance that will prove useful later on.

Result 2 (Johnson Bound): i) If \mathcal{C} has minimum distance 1-\epsilon, it is (1-O(\sqrt{\epsilon}),O(1/\sqrt{\epsilon}))-list-decodable.

ii) If \mathcal{C} is binary and has minimum distance 1/2-\epsilon, it is (1/2-O(\sqrt{\epsilon}),O(1/\epsilon))-list-decodable.

Proof: For part 1, we will show that for \delta = 1/2-2\sqrt{\epsilon}, there cannot exist at least s=\left\lceil 1/\sqrt{\epsilon}\right\rceil codewords in B(x,1-2\sqrt{\epsilon}) for some \hat{n}-word x. Say that such a collection of codewords c_1,...,c_s. We count the fraction of positions where x agrees with at least one of these codewords. By the union bound, this is at least

\sum_i(1-2\sqrt{\epsilon})-\sum_{I,j}\epsilon> s(2\sqrt{\epsilon})-\binom{s}{2}\epsilon\ge 1.

We defer our proof of part 2 until after our discussion of randomness extractors in a future post.

For the remainder of our discussion, we will be interested in codes with time-efficient encoding and decoding.

Definition 7: An encoding E is fully explicit if for any message m, E(m)(i) can be found in time polynomial in n, \log\hat{n}, and \log|\Sigma|.

We start with an explicit code with a small alphabet and the best distance we could hope for:

Definition 8: The binary Hadamard code of message length n is the code with alphabet \Sigma = \{0,1\} and block length 2^n consisting of all linear maps from \mathbb{Z}^n_2\to \mathbb{Z}_2, where \mathbb{Z}^n_2 we regard as a \mathbb{Z}_2-module.

Remark: For the purposes of comparison to the RS code given below and of generalization to the RM code, these linear maps can also be thought of as homogeneous polynomials of degree 1 in as many variables as there are additive generators of \mathbb{Z}^n_2, the coefficient of each variable specifying where our map sends the corresponding generator.

Result 3: The Hadamard code has minimum distance ½ and is (1/2-\epsilon,O(1/\epsilon^2))-list-decodable for all \epsilon>0. The encoding E given by E(m)(x) = m\cdot x\pmod{2} is fully explicit.

Proof: The first result is an artifact of the fact that any two distinct linear maps \mathbb{Z}^n_2\to\mathbb{Z}_2 will differ on half of all inputs. The second part of result 3 follows immediately from the second part of the Johnson bound. The third part of result 3 comes from our discussion in a previous post on finite field arithmetic in discussing pairwise independent hash functions.

Observe that the rate of the Hadamard code, n/2^n decreases exponentially in message length. By considering fields of higher order, codewords as single-variable polynomial maps into a bigger alphabet, we get a constant rate, at the cost of this bigger alphabet.

Definition 9: The q-ary, degree-d Reed-Solomon (RS) code is the code with alphabet \mathbb{F}_q and block length q consisting of all polynomials \mathbb{F}_q\to\mathbb{F}_q of degree at most d.

Result 4: The RS code has minimum distance 1 - d/q and is (1-O(\sqrt{d/q}),O(\sqrt{q/d}))-list-decodable. The encoding E given by E(m)(x) = \sum^d_{i=0}m_ix^i is fully explicit.

Proof: The difference of two degree-d polynomials has at most d zeroes, giving the first part of the result on minimum distance. The second comes from the first part of the Johnson Bound.

We can generalize these two orthogonal codes into the following, the Hadamard code by considering larger alphabets and higher-degree polynomial maps, the RS code by considering polynomial maps in more variables.

Definition 10: The q-ary, degree-d, D-dimensional Reed-Muller (RM) code is the code with alphabet \mathbb{F}_q and block length q^D consisting of all polynomials \mathbb{F}^D_q\to\mathbb{F}_q of total degree at most d.

Result 5: The RM code has minimum distance at least 1 - d/q and is (1-O(\sqrt{d/q}),O(\sqrt{q/d}))-list-decodable. The encoding function taking a \binom{m+d}{d}-words to the polynomial with those coefficients is fully explicit.

Proof: This result is essentially identical to the one for RS, except because we are dealing with multivariate polynomials, we invoke Schwartz-Zippel instead of the Fundamental Theorem of Algebra.

We can now discuss list-decoding algorithms. We want to efficiently determine \ell(r,\epsilon) for any \hat{n}-word r. We begin with a discussion of decoding RS using an algorithm due to Sudan. This method turns out to be useful in decoding a general class of “algebraic geometric” codes.

Result 6: For the q-ary, degree-d RS code, there exists a polynomial time (1-2\sqrt{d/q})-list decoding algorithm.

Proof: Given a \hat{q}-word r, we want to find the polynomials of degree at most d which agree with r on more than \epsilon q elements in \mathbb{F}_q. The approach will be to find a low-degree bivariate polynomial Q which is solved by (y,r(y)) for all y\in\mathbb{F}_q. By construction, factoring the polynomial will give us the desired polynomials.

In order for our “characteristic polynomial” Q(Y,Z) to be solved by all pairs (y,r(y)), we must satisfy a system of q equations in (d_Y+1)(d_Z+1) unknown coefficients, which can be done as long as (d_Y+1)(d_Z+1)>q.

Once we have such a Q, consider Q(Y,f(Y) for any codeword f. This shares more than \epsilon q zeroes with Q(Y,r(Y)), so if \epsilon q exceeds the degree d_Y+dd_Z of Q(Y,f(Y)), then Q(Y,f(Y)) is the zero polynomial. We can then tune d_Y, d_Z, and \epsilon to satisfy these two constraints, resulting in the \epsilon =2\sqrt{d/q} parameter. It suffices then to factor Q and pick out the factors of the form Z-f(Y).

We can adapt this approach to a code with a better rate-distance tradeoff than the quadratic one for RS. This construction will involve a characteristic polynomial of r in more than two variables that is defined as being solved by tuples of powers of r so that morally, we are getting more information out of our r. These powers will be reduced modulo an irreducible polynomial in order to make sure our characteristic polynomial isn’t trivial.

Definition 11: For prime power q, the q-ary Parvaresh-Vardy (PV) code of degree d, power h, redundancy m, and irreducible E(Y)\in\mathbb{F}_q[Y] is the code with alphabet \mathbb{F}^m_q, block length q, and messages represented by polynomials f(Y)\in\mathbb{F}_q[Y] of degree at most d, with encoding function e given by

e(f)(y) = \left(f(Y), f(Y)^h, f(Y)^{h^2},...,f(Y)^{h^{m-1}}\right) \pmod{E(Y)}

Result 7: For 0\le d< q, irreducible E of degree d+1, redundancy \left\lceil\log(q/d)\right\rceil, power h = 2, the q-ary PV code can be \delta-list-decoded in polynomial time for \delta\le 1 - \tilde{O}(d/q) at a rate of \rho = \tilde{\Omega}(d/q).

Proof: The approach is essentially the same as in the case of the RS code. Take some word q-word r represented as a map \mathbb{F}_q\to \mathbb{F}^m_q. First, we can assume without loss of generality that E(Y) does not divide our characteristic polynomial, by irreducibility of E(Y). We want our characteristic polynomial to have low degree; denote d_Y to be the degree of Q in Y and let the degree of each of the other variables of Q be the power of h of the code. In order for there to be a characteristic polynomial Q(Y,Z_0,...,Z_{m-1}) such that Q(y,r(y)_0,r(y)_1,...,r(y)_{m-1})=0 for all y\in\mathbb{F}_q, the number of unknown coefficients d_Y\cdot h^m must exceed the number of equations q.

As before, we can conclude that Q(Y,f(Y),f(Y)^h,...,f(Y)^{h^{m-1}})\cong 0\pmod{E(Y)} and argue that it suffices to factorize Q^*(Z) = Q(Y,Z,...,Z^{h^{m-1}})\pmod{E(Y)} as long as degree d_Y+(h-1)dm exceeds the number of shared zeroes \epsilon q. By our assumption that E does not divide Q and by the fact that the degree of Q in each power of Z is less than h, i.e. we’ve “separated” our Z_i variables so that they don’t cancel each other implies that Q* is nonzero.

The parameters can be tuned to get the desired power and message length, and it turns out that we get a rate and distance of \rho = \frac{d}{mq}=\tilde{\Omega}(d/q) and 1-\epsilon = 1-\frac{1}{h^m}-\frac{dhm}{q}=1 - \tilde{O}(d/q) respectively.

We’re extremely close, but it turns out that to close the gap in the rate-distance tradeoff, we need to get rid of some redundancy. Consider irreducible E(Y) = Y^{q-1}-\gamma for a multiplicative generator \gamma of \mathbb{F}^*_q. Then for h = q, e(f)(y)_i = f^{h^i}(y)\cong f(y^{h^i})\cong f(\gamma^{m-1}y)\pmod{E(y)}.

Observe that e(f)(y) and e(f)(\gamma y) overlap in m-1 positions, suggesting we can skip m factors of \gamma at a time and get a “compressed” encoding with m times the rate of the PV code. In particular, we can get rid of the factor of m in the denominator of \rho = \frac{d}{mq} above so that \rho = O(d/q).

Definition 11: For prime power q, the q-ary folded RS (FRS) code of degree d, folding parameter m, and generator \gamma is the code with alphabet \mathbb{F}^m_q, block length \left\lfloor\frac{q-1}{m}\right\rfloor, and messages represented by polynomials f(Y)\in\mathbb{F}_q[Y] of degree at most d, with encoding function e given by

e(f)(k) = \left(f(\gamma^{(k-1)m}), f(\gamma^{(k-1)m+1}), f(\gamma^{(k-1)m+2}),..., f(\gamma^{km-1})\right) \pmod{E(Y)},

where E(Y) = Y^{q-1}-\gamma.

Result 8: For 0\le d,m< q, the q-ary folded Reed Solomon code of degree d, generator \gamma, folding parameter m can be \delta-list-decoded in time O(q^{O(m)}) for \delta\le 1 - d/q - O(\sqrt{1/m}) at a rate of \rho = d/q.

Proof: Take a received word r: \mathbb{F}_q\to \mathbb{F}^m_q. To get rid of the redundancy, we will extract out of the FRS code a PV code of redundancy m' and power h = q, where each symbol of the folded code contains m-\frac{m-1}{m'} overlapping PV symbols. Indeed, as we discussed above, the tuple e(r)(k) contains m - \frac{m-1}{m'} contiguous subsequences of the form

\left(latex r\left(\gamma^{(k-1)m+j}\right),...,r\left(\gamma^{(k-1)m+j+(m-1)/m'-1}\right)\right)

Taking m' to be O(\sqrt{m}) will turn out to allow us to extract the most information in total (this is also where the extra O(\sqrt{1/m}) term in the upper bound on \delta comes from).

Unfolding r into the resulting r' gives a received word such that for every symbol on which r agrees with f under FRS, r' agrees with m-m' symbols of f under PV, for a total of more than \epsilon\hat{n}(m-m')\ge(\epsilon - O(\sqrt{1/m}))q positions. The problem reduces to finding \ell(r',\epsilon') for \epsilon' = \epsilon - O(\sqrt{1/m}) under PV.

The approach is identical to the one taken in our proof for the previous result, except we will take Q which is of total degree at most 1 in all of the Z_i variables rather than individual degree of at most h-1 in each of these variables. This will yield a bigger redundancy parameter but ultimately yield the desired improvement on rate-distance tradeoff. Our constraints are (d_Y+1)m'>q to get a characteristic polynomial Q as well as \epsilon' q\ge d_Y+d to ensure that Q(Y,f(Y),...,f(Y)^{h^{m-1}}\cong 0\pmod{Y^{q-1}-\gamma}=0 for all f\in\ell(r',\epsilon'). Tuning \epsilon' to be at least 1/m'+d/q gives us the desired upper bound on \delta and the desired rate. Constructing and factoring the Q^* described in the previous proof also takes time O(q,h^m), but because h=q, runtime grows exponentially in the folding parameter m.

In fact, because the power of the PV code is nonconstant, the list size also increases exponentially in folding parameter, as does the alphabet size. Beyond these drawbacks, we are able to get the desired rate-distance tradeoff \rho\approx 1 - \delta.


One comment

  1. Pingback: PRGs and Hardness Amplification Part 2- Local Decoding « For Formal Reasons

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s