(This is a continuation of a previous post.)
We begin by building up some of the canonical technology of expander graphs. Morally, we can think of an expander graph as a building block characterized paradoxically by connectivity and sparseness, out of which we can construct families of increasingly larger graphs increasing in the number of vertices but barely growing in degree, i.e. by taking some combination of graph powers, tensor products, zigzag products, etc. Here are a few measures of the connectivity characteristic of expander graphs:
Definition 1 (vertex expansion): A directed multigraph (the word “graph” will be used to denote these from now on unless otherwise stated) is a -vertex expander if all sets of at most vertices have at least neighbors. In other words, all small enough collections of vertices are well connected to the entire graph.
Definition 2 (spectral expansion): A regular graph is a -spectral expander if the spectral gap of its Laplacian is at least . Morally, the algebraic “mixing” of is sufficiently fast.
Definition 3 (edge expansion): For subsets of the vertices of a graph , let denote the cut size between and , i.e. the number of directed edges from vertices in to vertices in . A -regular graph is a -edge expander if for all such that .
In other words, all small enough collections of vertices have lots of edges emanating from them. It turns out that these three criteria for expansion are closely related (the following results aren’t immediately relevant to error reduction, but as a newbie learning about expander graphs for the first time, I think they form a neat connection between the algebraic and the combinatorial).
The following series of posts will document things that I’ve learned from a neat survey on pseudorandomness by Prof. Salil Vadhan at Harvard. They’re mostly to help me digest the material, but I hope they’ll be of interest to anyone reading this blog.
Specifically, this first set of two posts will be about the power of expander graphs in randomness-efficient error reduction in randomized algorithms; this first post will present two naive approaches to error reduction. We begin with the following definition.
Definition 1: A language lies in class BPP (bounded-error probabilistic polynomial-time) if there exists a probabilistic polynomial-time algorithm such that the probability that accepts an element x is if and is if , where is some polynomial in the length of the input. Stated another way, BPP is the class of languages with probabilistic polynomial-time algorithms with two-sided error at most .
Intuitively, taking multiple repetitions should push the error in any such arbitrarily low, thanks to the following standard result from probability theory:
Result 1 ((A) Chernoff Bound): Let be independent random variables from . Let be their average , and let be the expected value of . Then , that is, the probability of the average differing from its expected value by some quantity falls exponentially in the number of repetitions .
For an algorithm with error at most , consider the corresponding algorithm which runs for repetitions. Let be the average of the indicator variables which each take on a value of 1 if the algorithm outputs the right answer on the th repetition and 0 otherwise. By the Chernoff bound above for , we get that . Taking to be , we get that the probability is bounded above by as long as is large enough. Here, reducing error to comes at a cost of k repetitions and, if our algorithm requires random bits, requires random bits, because our repetitions are completely independent.
This will be a cool application of the concepts introduced in the last post. We call a curve rectifiable if there exists finite such that for any partition , , i.e. the curve has some notion of “finite length,” where length is the supremum of this quantity taken over all partitions or equivalently the infimum over all upper bounds . It follows by definition that if the – and -parametrizing functions of the curve are of bounded variation, the curve is rectifiable. It is not, however, true that the classical arc length formula holds for all curves of bounded variation: just consider a curve where the and functions are the Cantor-Lebesgue function. We get a straight line from the origin to , but the derivative of the function is zero almost everywhere.
Result 4: The arc length formula does work if we assume absolute continuity of and .
Proof: We will prove that the total variation of a complex valued function over is , because then we can just substitute . As usual, we will prove that there is inequality in both directions. But recall that for absolutely continuous functions, the fundamental theorem of calculus holds, so pick a partition so that . In the other direction, recall that step functions are dense in , so we can find an approximating step function to so that has integral arbitrarily small. By the triangle inequality, , where and . We can bound by taking a partition where the adjacent intervals are over constant parts of the step function so that . But recall that we picked to be extremely close to so that , so and we get inequality in the other direction.
Now before we proceed to state and prove the isoperimetric inequality, let’s get some vocabulary under our belt. Define the one-dimensional Minkowski content of a curve to be , where denotes the set of points which are at most away from any point in . Define a simple curve to be a curve that doesn’t double over on itself, a quasi-simple curve to be a curve such that is injective except for finitely many points, and a closed curve to be one that starts where it ends. As the name suggests, the Minkowski content of a curve turns out to be precisely its length if the curve is rectifiable and quasi-simple.
This will be a short continuation of the last post. Note that in our discussions in the previous post, we assumed throughout that was continuous. Let’s drop that assumption and take care of the case of jump discontinuities. Fortunately there aren’t too many: between any jump discontinuity we can squeeze in a distinct rational number, so a bounded increasing function can have at most countably many discontinuities.
Define and to be except defined on the left and right sides of each jump discontinuity respectively so that for increasing, . Define the jump function to be the sum of all jumps at and to the left of the point, where are points of discontinuity. In other words, we are constructing an increasing step functions where the discontinuities are exactly the same points as in the original function. Note that if is bounded, the sum of these jumps and thus is absolutely and uniformly convergent.
Why did we construct this? Well by design, is continuous and increasing. Basically, the motivation graphically is to “drop down” all the discontinuities so that we get one continuous curve , and then we add the jumps separately in the form of . We can do this because it turns out:
Result 3: exists and equals zero almsot everywhere.
Proof: Define to be the set of for fixed for which the limit superior of the differential . We want to show that . Because the series of jumps converges, we can find for any such that the sum past the th jump is less than . Then the jump function corresponding to all jumps past the th jump changes by less than from to . But differs from by a finite number of summands, so the set for which the limit superior of exceeds differs from by finitely many points. Pick a compact subset of so that when we take out these points, the resulting compact subset is at least half the measure of . For each , we have a neighborhood where . By compactness, we can pick a finite subcover, and then we can apply our old covering argument to get a disjoint sub-collection for which .
So we have , and because can be anything, we have proven to be zero.
Dual to the problem addressed in the previous post is that of when the result of the fundamental theorem of calculus holds true, namely that . It turns out the condition of absolute continuity defined at the end of the last post is sufficient. First, define the variation of a complex-valued function over a partition to be . It is easy to see that variation increases in the “fineness” of the partition; we say that a function is of bounded variation if the supremum of the variation over all partitions, the total variation , is finite. This will be the case, roughly speaking, for functions that do not oscillate too widely or too frequently. We can see that real, monotonic, bounded functions, as well as differentiable and bounded functions, are of bounded variation.
The first result of this post will show some of the motivation for studying these functions.
Result 1: Bounded variation implies differentiability almost everywhere.
Proof: To prove this, let’s first narrow our focus solely to increasing functions. We can do this because of the following characterization of functions of bounded variation: they are precisely the functions that are differences of two increasing bounded functions. One direction is obvious, that the difference of two increasing bounded functions is of bounded variation. To prove the other direction, define positive and negative variation and over the interval and , where the sums are taken over all positive and negative differences, respectively. We immediately have that and . Then to prove the other direction of our claim, simply take the functions to be and .
Now let’s put the machinery we’ve built up to use in making precise the familiar notion of differentiation and integration being dual to each other. It is easy to see in one direction why this makes sense: roughly speaking, the derivative of an integral is the limit of the average value of a function over a neighborhood as the measure of that neighborhood approaches zero. For this reason the first problem we will address in this post is called the averaging problem, namely, if is Lebesgue-integrable, do we have that for almost all ?
Result 1: We can answer the averaging problem in the affirmative.
Well it’s certainly in the affirmative for continuous functions; one of the keys is to observe that continuous functions of compact support are dense in the space of Lebesgue-integrable functions. We already know simple functions are, so step functions are as well, and we can easily find arbitrarily close approximations to the most basic step function, namely the characteristic function of a rectangle, by continuous functions of compact support, so now for any in , approximate by such a continuous function so that can be made arbitrarily small. Then we can rewrite as
Take the limit superior of both sides over balls that contain , and because is continuous, the middle term on the right vanishes. We want to show that for any given , the measure of the set for which the limit superior of the left, i.e. the difference between and the limit of its average value, exceeds is zero. By Chebyshev’s inequality, on a set of measure , and , where , the so-called Hardy-Littlewood maximal function.
Let’s first define the integral of a nonnegative simple function given by , where takes on the values on sets . Then the integral of over a subset in the -algebra is given by the geometrically intuitive . We can then extend this to arbitrary nonnegative measurable functions to get , where the supremum is taken over all simple functions such that . We can easily deduce familiar properties that if over , the integral of over is greater than that of , that the the Lebesgue integral is increasing by set inclusion and multiplicative by scalar factors, and that if is of measure 0 or is identically zero over .
Recall from the previous post that to every nonnegative function corresponds some increasing sequence of simple nonnegative functions that converges to . This result and the following result empower us to prove many results on the integral of arbitrary nonnegative functions simply by proving them for the integral of simple functions.
Result 8 (Monotone Convergence): If is an increasing sequence of nonnegative, measurable functions, then
Proof: By a previous result, is certainly measurable. Let’s say that the sequence of integrals converges to some . We know that and thus want to prove that this inequality is true in the reverse. First observe that the function is a measure for simple (we leave it as an exercise to the reader to verify that is countably additive and that ). We will proceed to construct an increasing sequence of sets whose union is and use this to show that is greater than the integral of any simply function bounded between 0 and . Fix a constant and any simple measurable function , and define . We can verify that is an increasing sequence of measurable sets whose union is the whole space. So if we take the limit of both sides of , we get that . This holds for all and all simple , so we have equality. Note that if we had picked , the union of the sets would not necessarily have been the entire space (pick ), so the strict inequality of was necessary.
Given a topological space , call the sets in the -algebra generated by the open sets of the Borel sets, and call endowed with this structure a Borel space . So in other words, the Borel sets consist of all open and closed sets and their countable intersections and unions. The Borel functions are measurable functions from a Borel space to a topological space. Observe that a continuous function from a Borel space to a topological space is Borel, but the converse is not necessarily true.
It is clear that if is continuous and is measurable, then is measurable, and in fact, the same is true if is Borel. To see this, note that a measurable function pulls all Borel sets back to measurable sets. This observation will be key to our definition of the Lebesgue integral.
Next, define a simple function to be a complex function on a measurable space with an image of finitely many values from . Denote these values by and denote by . The function can then be rewritten as , which is measurable iff is measurable for all .
In Riemann integration, roughly speaking, the familiar approximating blocks heights can be thought of as given by a simple function. And indeed, the next result says that every nonnegative measurable function has a corresponding increasing sequence of approximating simple functions.