# Integrating the Derivative

Dual to the problem addressed in the previous post is that of when the result of the fundamental theorem of calculus holds true, namely that $F(b)-F(a) = \int^b_a F'(x)dx$. It turns out the condition of absolute continuity defined at the end of the last post is sufficient. First, define the variation of a complex-valued function $F$ over a partition $a = t_0< \cdots < t_N=b$ to be $\sum^N_{i=1}|F_i-F_{i-1}|$. It is easy to see that variation increases in the “fineness” of the partition; we say that a function is of bounded variation if the supremum of the variation over all partitions, the total variation $T_F(a,b)$, is finite. This will be the case, roughly speaking, for functions that do not oscillate too widely or too frequently. We can see that real, monotonic, bounded functions, as well as differentiable and bounded functions, are of bounded variation.

The first result of this post will show some of the motivation for studying these functions.

Result 1: Bounded variation implies differentiability almost everywhere.

Proof: To prove this, let’s first narrow our focus solely to increasing functions. We can do this because of the following characterization of functions of bounded variation: they are precisely the functions that are differences of two increasing bounded functions. One direction is obvious, that the difference of two increasing bounded functions is of bounded variation. To prove the other direction, define positive and negative variation $P_F(a,b)$ and $N_F(a,b)$ over the interval $\sup\sum_{(+)}F(t_j)-F(t_{j-1})$ and $\sup\sum_{(-)}F(t_j)-F(t_{j-1})$, where the sums are taken over all positive and negative differences, respectively. We immediately have that $F(b)-F(a) = P_F(a,b) - N_F(a,b)$ and $T_F(a,b) = P_F(a,x)+N_F(a,x)$. Then to prove the other direction of our claim, simply take the functions to be $P_F(a,b)+F(a)$ and $N_F(a,b)$.

Next, we’ll prove the lemma that forms the crux for our proof of almost-everywhere differentiability.

Riesz’ Lemma: For a real, continuous function $G$, let $E$ be the set of points $x$ such that there exists to the right of $x$ some $x+h_x$ such that $G(x+h_x)>G(x)$. $E$ is the union of open intervals $(a_k,b_k)$ so that on each interval, $G$ takes the same value at both endpoints except in the case of the interval starting at $a_1 = a$ for which $G(a_1)\le G(b_1)$.

Proof: The proof is evocative and beautiful: think of the function as a series of rolling hills and imagine a sun shining to the right of the function, The shadows on the hills are precisely the intervals composing $E$, and each contiguous shadow certainly starts where it ends except for the first one! The other way to prove this is just to apply the intermediate value theorem repeatedly.

Returning to the proof, define the coarse differential quantity $\Delta_h(F)(x) = \frac{F(x+h)-F(x)}{h}$ and the four Dini numbers $D^+$, $D^-$, $D_+$, $D_-$ to be the limit superior of $\Delta_h(F)(x)$ from the right and left, and the limit inferior of $\Delta_h(F)(x)$ from the right and left, respectively. Because we have that $D_-\le D^-\le D_+\le D^+$, it suffices to prove that $D^+$ is finite and bounded above by $D^-$.

It is time to use our lemma: for a given $\gamma>0$, let $E_{\gamma}$ be the set of $x$ for which $D^+(F)(x)>\gamma$. We want to show that the measure of $E_{\gamma}$ shrinks arbitrarily as $\gamma$ becomes large, which would imply that $D^+<\infty$ almost everywhere. Apply Riesz’ lemma to the function $G(x) = F(x) - \gamma x$ and note that $E_{\gamma}$ is part of the union of open intervals described in the lemma: if a point satisfies $D^+(F)(x)>\gamma$, then $F(x+h)-F(x)>h\gamma$ for some $h$ and thus there exists to the right of $x$ some $x+h$ such that $G(x+h)>G(x)$. So the measure of $E_{\gamma}$ is bounded by $\sum m((a_k,b_k))$ where $G(a_k)\le G(b_k)$ so that $F(b_k)-F(a_k)\ge\gamma(b_k-a_k)$ . Thus $\sum_k m((a_k,b_k))\le\frac{1}{\gamma}\sum_k F(b_k)-F(a_k)$, and because $F$ is increasing, this is bounded above by $\gamma \frac{1}{\gamma}(F(b)-F(a))$, proving as desired that $m(E_{\gamma})$ becomes arbitrarily small as $\gamma$ increases.

We use the same approach to prove that $D^+\le D_-$ almost everywhere: define a set that can be used with Riesz’ lemma and prove it’s of measure zero. For fixed $R>r$, let $E$ be the set where $D^+>R$ and $D_-. Because this is for arbitrary $R,r$, we just need to prove $m(E)=0$. To the contrary, assume that $m(E)>0$. Then there is some $O$ between $(a,b)$ and $E$ such that $m(O). $O$ is the union of disjoint open intervals $I_n$, so pick one; we’ll use Riesz’ lemma twice. Basically, in each $I_n$, we’re going to construct a set $O_n$ which is small enough relative to $I_n$ (in fact, by a factor of $\frac{r}{R}$) but still contains $E$ in $I_n$. Then $m(E) = \sum_n m(E\cap I_n)\le\sum_n m(O_n)\le\frac{r}{R}\sum_n m(I_n) = \frac{r}{R}m(O), a contradiction stemming from the strict inequality which arises from our false assumption that $m(E)>0$.

Because we are approaching from the left in the case of $D_-$ and our lemma considers high values lying on the right, we reflect through the origin by considering $G(x) = -F(-x)+rx$ and obtain a union of open intervals, and reflecting these back through the origin, we get some $\cup(a_k,b_k)$ such that $G(a_k) = G(b_k)$ and thus $F(b_k)-F(a_k)\le r(b_k-a_k)$. For each of these subintervals $(a_k,b_k)$, further apply the Riesz lemma using $G(x) = F(x) - Rx$ to get another layer of subintervals $(a_{k,j},b_{k,j})$ so that $F(b_{k,j})-F(a_{k,j})\ge R(b_{k,j}-a_{k,j})$. Taking the union of all these subintervals, we get a set $O_n$ whose measure we can deduce, by the fact that $F$ is increasing, to be bounded above by $\frac{r}{R}m(I_n)$, as desired.

Now that we have differentiability almost everywhere, we can approximate $F'$ by convergents $G_n(x) = \frac{F(x+1/n)-F(x)}{1/n}$ and then invoke Fatou’s lemma on these convergents to get that for $F$ increasing and continuous, $\int^b_a F(x)dx \le F(b)-F(a)$. By our characterization of functions of bounded variation, this inequality extends to those functions as well.

But it turns out we can do no better than this inequality without narrowing our focus further. The Cantor-Lebesgue function is one such counterexample; the basic premise is to construct a series of continuous “stairstep functions” where the steps come at the endpoints in the Cantor set convergents. In this case the derivative is almost everywhere zero, but $F(1)-F(0)= 1$. Specifically, define the convergents $F_n$ of the Cantor-Lebesgue function such that $F_n(0) = 0$, $F_n(x) = \frac{k}{2^n}$ on the complement of $C_n$, and $F_n$ is linear on $C_n$. An image I like to use to visualize this is that we’re climbing from height 0 to height 1 in such a way that even when we make it to 1 at the end, our vertical progress has been so slow that it might seem like we’d traveled nowhere.

What it takes to get the fundamental theorem of calculus to hold is the stronger condition of absolute continuity. Recall that the generalized definition is that for any $\epsilon>0$ there exists a $\delta>0$ such that $m(E)<\delta$ implies $\int_E |F|< \epsilon$. In the one-dimensional case, a function $F$ is absolutely continuous if for every $\epsilon>0$ there exists a $\delta>0$ such that $\delta>0$ (i.e. \ $\sum^N_{k=1}|F(b_k)-F(a_k)|<\epsilon$ for all disjoint intervals $(a_k,b_k)$ such that their combined length is less than $\delta$.

It is easy to see that absolute continuity implies uniform continuity and, more importantly, bounded variation and thus differentiability almost everywhere. We will prove that if the derivative of an absolutely continuous function is zero almost everywhere, the function is constant. This gives our desired theorem, one direction of which was proven in the last post:

Result 2: If $F$ is absolutely continuous, then $\int^x_a F'(x)dx = F(x)-F(a)$. On the other hand, if $f$is integrable, then $F(x) = \int^x_a f(y)dy$ is absolutely continuous and $F'(x) = f(x)$.

Proof: In the former direction, the equation to be proven makes sense because the integral of the derivative, which for convenience let’s denote by $G(x)$, by our bound from Fatou’s lemma, is finite and functions of bounded variation are differentiable almost everywhere. But then we can use the first main result of our last post: local integrability implies that the limit of $\frac{1}{m(B)}\int_BF'(y)dy = F(x)$ almost everywhere, so in other words, $G'(x) = F'(x)$ almost everywhere, so $G-F$ is constant by what we will prove later. So $F(b)-F(a) = G(b)-G(a) =\int^b_a F'(x)dx$.

In the other direction, we already know this by the Lebesgue differentiation theorem of the last post.

To conclude this post, we prove that zero derivative almost everywhere implies constant.

Proof: It suffices to prove that at the endpoints of the interval $[a,b]$, $F$ takes on the same value, because then we can restrict to any subinterval we wish. Let $E$ be the set of $x$ for which $F'(x) = 0$; by hypothesis it has measure $b-a$. By definition, we can find for every $x$ and every $\nu>0$ a neighborhood $(a_x,b_x)$ of width $\nu$ around $x$ such that $|F(b_x)-F(a_x)|\le \epsilon(b_x-a_x)$. This is an example of a Vitali covering, a covering by balls for every point in the set and every radius $\nu$, there is a ball containing that point and of measure less than $\nu$. It turns out that, roughly speaking, for any Vitali covering we can find a disjoint sub-collection of balls whose combined measure is arbitrarily close to that of the set being covered, i.e. for any $\delta>0$ there are finitely many disjoint $B_1,...,B_N$ such that $\sum^N_{i=1}m(B_i) \ge m(E)-\delta$.

If we can prove this, then we can find a disjoint sub-collection of intervals $I_i = (a_i,b_i)$ such that $\sum^N_{i=1}m(I_i)\ge m(E)-\delta = b-a-\delta$ and such that $\sum^N_{i=1}|F(b_i)-F(a_i)|\le\epsilon(b-a)$ which can be made arbitrarily.  Because we’re able to shrink the complement as we wish by our Vitali result, by absolute continuity the same sum of $|F(\beta_k)-F(\alpha_k)|$ over all the intervals making up the complement can be made arbitrarily small, so we are done.

So it remains to show the result for Vitali coverings. In fact we can make use of our covering argument from the previous post: given a finite collection of balls, we can get within a factor of $3^d$ using a finite sub-collection. The argument is then simply: i) pick a subset $E'\subset E$ which si compact and has at least measure $\delta$, ii) cover it by finitely many balls and invoke the $3^d$ result to find a sub-collection which is at least $3^{-d}m(E')\ge 3^{-d}\delta$, iii) if this is at least $m(E)-\delta$, we’re done, otherwise repeat on $E$ minus the sub-collection $\cup \bar{B_k}$ of the current balls’ closures and repeat for all balls in the original Vitali covering which do not intersect any of the $B_k$.