Differentiating power series

I’m writing this post as a way of preparing for a lecture. I want to discuss the result that a power series \sum_{n=0}^\infty a_nz^n is differentiable inside its circle of convergence, and the derivative is given by the obvious formula \sum_{n=1}^\infty na_nz^{n-1}. In other words, inside the circle of convergence we can think of a power series as like a polynomial of degree \infty for the purposes of differentiation.

A preliminary question about this is why it is not more or less obvious. After all, writing f(z)=\sum_{n=0}^\infty a_nz^n, we have the following facts.

  1. Writing S_N(z)=\sum_{n=0}^Na_nz^n, we have that S_N(z)\to f(z).
  2. For each N, S_N'(z)=\sum_{n=1}^Nna_nz^{n-1}.

If we knew that S_N'(z)\to f'(z), then we would be done.

Ah, you might be thinking, how do we know that the sequence (S_N'(z)) converges? But it turns out that that is not the problem: it is reasonably straightforward to show that it converges. (Roughly speaking, inside the circle of convergence the series \sum_na_nz^{n-1} converges at least as fast as a GP, and multiplying the nth term by n doesn’t stop a GP converging (as can easily be seen with the help of the ratio test). So, writing g(z) for \sum_{n=1}^\infty na_nz^{n-1}, we have the following facts at our disposal.

  1. S_N(z)\to f(z)
  2. S_N'(z)\to g(z)

Doesn’t it follow from that that f'(z)=g(z)?

We are appealing here to a general principle, which is that if some functions converge to f and their derivatives converge to g, then f is differentiable with f'=g. Is this general principle correct?

Unfortunately, it isn’t. Suppose we take some continuous functions g_N that converge to a step function. (Roughly speaking, you make g_N be 0 up to 0, then linear with gradient N until it hits 1, then 1 from that point onwards.) And suppose we then let f_N be the function that differentiates to g_N and is 0 up to 0. Then the f_N converge to the function that is 0 up to 0 and x for positive x. This function almost differentiates to the step function, but it isn’t differentiable at 0.

So we’ve somehow got to use particular facts about power series in order to prove our result — we can’t appeal to general considerations, because then we are appealing to a principle that isn’t true. (Actually, in principle some compromise might be possible, where we show that functions defined by power series have a certain property and then use nothing apart from that property from that point on. But as it happens, we shall not do this.)

Why can’t we just jump in and prove it with a big calculation?

We have a formula for f(z). Why don’t we write out a formula for (f(z+h)-f(z))/h and see if we can tell what happens when h\to 0?

That is certainly a sensible first thing to try, so let’s see what happens.

(f(z+h)-f(z))/h = \sum_{n=0}^\infty a_n((z+h)^n-z^n)/h

What can we do with that? Perhaps we’d better apply the binomial theorem. Then we find that the right-hand side is equal to

\displaystyle \sum_{n=1}^\infty a_n\bigl(nz^{n-1}+\binom n2hz^{n-2}+\binom n3h^2z^{n-3}+\dots+h^{n-1}\bigr)

Part of the above expression gives us what we want, namely \sum_{n=1}^\infty na_nz^{n-1}. So we’re left wanting to prove that

\displaystyle \sum_{n=1}^\infty a_n\bigl(\binom n2hz^{n-2}+\binom n3h^2z^{n-3}+\dots+h^{n-1}\bigr)

tends to 0 as h\to 0.

Unfortunately, as n gets big, some of those binomial coefficients get pretty big too. Indeed, when n is bigger than 1/h, the growth in the binomial coefficients seems to outstrip the shrinking of the powers of h. What can we do?

A useful trick

Fortunately, there is a better (for our purposes at least) way of writing (z+h)^n-z^n. We just expanded out (z+h)^n using the binomial theorem. But we could instead have used the expansion

a^n-b^n=(a-b)(a^{n-1}+a^{n-2}b+\dots+ab^{n-2}+b^{n-1})

Applying that with a=z+h and b=z, we get

(z+h)^n-z^n=h\bigl((z+h)^{n-1}+(z+h)^{n-2}z+\dots+z^{n-1}\bigr)

Just before we continue, note that this gives us an alternative, and in my view nicer, way to see that the derivative of z^n is nz^{n-1}, since if you divide the right-hand side by h and let h\to 0 then each of the n terms tends to z^{n-1}.

Anyhow, if we use this trick, then (f(z+h)-f(z))/h works out to be

\sum_{n=1}^\infty a_n\bigl((z+h)^{n-1}+(z+h)^{n-2}z+\dots+z^{n-1}\bigr)

Now let’s subtract the thing we want this to tend to, which is \sum_{n=1}^\infty na_nz^{n-1}. (This is not valid unless we know that this series converges. So at some stage we will need to prove that.) If we think of nz^{n-1} as a sum of n copies of z^{n-1}, then we can write the difference as

\sum_{n=1}^\infty a_n\sum_{j=0}^{n-1}(z^j(z+h)^{n-1-j}-z^{n-1})

which equals

\sum_{n=1}^\infty a_n\sum_{j=0}^{n-1}z^j((z+h)^{n-1-j}-z^{n-1-j})

Now (z+h)^{n-1-j}-z^{n-1-j} is another example of the expansion we had above. That is, we can write it as

h\bigl((z+h)^{n-2-j}+(z+h)^{n-3-j}z+\dots+z^{n-2-j})

We haven’t yet mentioned the radius of convergence of the original power series, but let’s do so now. Suppose it is R, that r is such that |z|<r<R, and that we have chosen h small enough that |z|+|h|<r. Then the modulus of the expression above is at most |h|(n-1-j)r^{n-2-j}.

It follows that

|\sum_{n=1}^\infty a_n\sum_{j=0}^{n-1}z^j((z+h)^{n-1-j}-z^{n-1-j})|
\leq|h|\sum_{n=2}^\infty |a_n|\sum_{j=0}^{n-1}(n-1-j)r^jr^{n-2-j}

Since \sum_{j=0}^{n-1}(n-1-j)=\sum_{j=0}^{n-1}j=\binom n2, this is equal to |h|\sum_{n=2}^\infty\binom n2|a_n|r^{n-2}.

So this will tend to zero as h\to 0 as long as we can prove that the sum \sum_{n=2}^\infty\binom n2|a_n|r^{n-2} converges.

Convergence of the power series you get when you differentiate term by term

Let’s prove a lemma to deal with that last point. It says that if r is smaller than the radius of convergence of the power series \sum_{n=0}^\infty a_nz^n, then the power series \sum_{n=1}^\infty n|a_n|r^{n-1} converges.

The proof is very similar to an argument we have seen already. Let R be the radius of convergence, and pick w with r<|w|<R. Then the power series \sum_{n=0}^\infty a_nw^n converges, so the terms |a_nw^n| are bounded above, by M, say. Then n|a_n|r^n=n|a_nw^n|(r/|w|)^n\leq Mn(r/|w|)^n.

But the series \sum_{n=1}^\infty Mn(r/|w|)^n converges, by the ratio test. Therefore, by the comparison test, the series \sum_{n=1}^\infty n|a_n|r^{n-1} converges.

This shows also that if |z|=r then the power series \sum_{n=1}^\infty na_nz^{n-1} converges (since we have just proved that it converges absolutely). So if we differentiate a power series term by term, we get a new power series that has the same radius of convergence, something we needed earlier.

If we apply this lemma a second time, we get that the power series \sum_{n=2}^\infty n(n-1)|a_n|r^{n-2} converges, and dividing by 2 that gives us what we wanted above, namely that \sum_{n=2}^\infty\binom n2|a_n|r^{n-2} converges.

A couple of applications

An obvious way of applying the result is to take some of your favourite power series and differentiate them term by term. This illustrates the very important general point that if you can obtain something in two different ways, then you usually end up proving something interesting.

So let’s take the function e^z=\sum_{n=0}^\infty \frac{x^n}{n!}, which we have shown converges everywhere. Then we can obtain the derivative either by differentiating the function itself or by differentiating the power series term by term. That tells us that

\frac d{dz}e^z=\sum_{n=1}^\infty \frac{nz^{n-1}}{n!}, which simplifies to \sum_{n=1}^\infty\frac{z^{n-1}}{(n-1)!}, which in turn simplifies to \sum_{n=0}^\infty\frac{z^n}{n!}, which equals e^z.

Earlier we proved this result by writing (e^{z+h}-e^z)/h as e^z(e^h-1)/h and proving that (e^h-1)/h\to 1. I still prefer that proof, but you are at liberty to disagree.

As another example, let us consider the power series \sum_{n=0}^\infty z^n. When |z|<1 this equals 1/(1-z), by the formula for summing a GP. We can now differentiate the power series term by term, and we can also differentiate the function 1/(1-z). Doing so tells us the interesting fact that

\frac 1{(1-z)^2}=\sum_{n=1}^\infty nz^{n-1}

We can see that in another way as well. By our result on multiplying power series, the product of \sum_{n=0}^\infty z^n with itself is the power series \sum_{n=0}^\infty c_nz^n, where (c_n) is the convolution of the constant sequence (1) with itself. That is, c_n=a_0b_n+a_1b_{n-1}+\dots+a_nb_0 with every a_r and b_s equal to 1, which gives us n+1. (This agrees with the previous answer, since \sum_{n=0}^\infty(n+1)z^n is the same as \sum_{n=1}^\infty nz^{n-1}.)

Tidying up the proof

In the proof above, we used the identity

a^n-b^n=(a-b)(a^{n-1}+a^{n-2}b+\dots+ab^{n-2}+b^{n-1})

with a=z+h and b=z, and then we used it again to calculate what happened when we subtracted hnz^{n-1}. Can we get those calculations out of the way in advance? That is, can we begin by finding a nice formula for a^n-b^n-n(a-b)b^{n-1}?

We obviously can, by subtracting n(a-b)b^{n-1} from the right-hand side and simplifying, much as we did in the proof above (with z+h and z). However, we can do things a bit more slickly as follows. Start with the identity

\displaystyle \frac{a^n-b^n}{a-b}=a^{n-1}+a^{n-2}b+\dots+ab^{n-2}+b^{n-1}

Differentiating both sides with respect to b, we get

\displaystyle \frac{a^n-b^n-nb^{n-1}(a-b)}{(a-b)^2}=a^{n-2}+2a^{n-3}b+3a^{n-4}b^2+\dots+(n-1)b^{n-2}

If we now take z+h for a and z for b, we deduce that (z+h)^n-z^n-hnz^{n-1} is equal to

h^2((z+h)^{n-2}+2(z+h)^{n-1}z+\dots+(n-1)z^{n-2})

In particular, if |z| and |z+h| are both at most r, then |(z+h)^n-z^n-hnz^{n-1}|\leq |h|^2\binom n2 r^{n-2}, which is the main fact we needed in the proof.

Armed with this fact, we could argue as follows. We want to show that

\sum_{n=0}^\infty a_n((z+h)^n-z^n)-h\sum_{n=1}^\infty na_nz^{n-1}

is o(h). By the inequality we have just proved, if |z| and |z+h| are at most r, then the modulus of this expression is at most

\sum_{n=2}^\infty |a_n|\binom n2 r^{n-2}

and an earlier lemma told us that this converges within the circle of convergence. So the quantity we want to be o(h) is in fact bounded above by a multiple of |h|^2. (Sometimes people use the notation O(h^2) for this. The O means “bounded above in modulus by a constant multiple of the modulus of”.)

Was the “trick” a trick?

The proof in this post has relied heavily on the idea, which appeared to come from nowhere, of writing (z+h)^n-z^n not in the obvious way, which is

nhz^{n-1}+\binom n2h^2z^{n-2}+\dots+h^n

but in a “clever” way, namely

h\bigl((z+h)^{n-1}+(z+h)^{n-2}z+\dots+z^{n-1}\bigr)

Is this something one just has to remember, or can it be regarded as the natural thing to do?

I chose the words “can it be regarded as” quite carefully, since I want to argue that it is the natural thing to do, but when I was preparing this lecture, I didn’t find it the natural thing to do, as I shall now explain. I came to this result with the following background. Many years ago, I lectured a IB course called Further Analysis, which was a sort of combination of the current courses Metric and Topological Spaces and Complex Analysis, all packed into 16 lectures. (Amazingly, it worked quite well, though it was a challenge to get through all the material.) As a result of lecturing that, I learnt a proof that power series can be differentiated term by term inside their circle of convergence, but the proof uses a number of results from complex analysis. I then believed what some people say, which is that the complex analysis proof of this result is a very good advertisement for complex analysis, since a direct proof is horrible. And then at some point I was chatting to Imre Leader about the reorganization of various courses, and he told me that it was a myth that proving the result directly was hard. It wasn’t trivial, he said, but it was basically fine. In fact, it may even be thanks to him that the result is in the course.

Until a few days ago, I didn’t bother to check for myself that the proof wasn’t too bad — I just believed what he said. And then with the lecture coming up, I decided that the time had finally come to check it: something that I assumed would be a reasonably simple exercise. I duly did the obvious thing, including expanding (z+h)^n using the binomial theorem, and got stuck.

I would like to be able to say that I then thought hard about why I was stuck, and after a while thought of the idea of expanding (z+h)^n-z^n using the expansion of a^n-b^n. But actually that is not what happened. What happened was that I thought, “Damn, I’m going to have to look up the proof.” I found a few proofs online that looked dauntingly complicated and I couldn’t face reading them properly, apart from one that was quite nice and that for a while I thought I would use. But one thing all the proofs had in common was the use of that expansion, so that was how the idea occurred to me.

So what follows is a rational reconstruction of what I wish had been my thought processes, rather than of what actually went on in my mind.

Let’s go back to the question of how to differentiate z^n. I commented above that one could do it using the a^n-b^n expansion, and said that I even preferred that approach. But how might one think of doing it that way? There is a very simple answer to that, which is to use one of the alternative definitions of differentiability, namely that f is differentiable at z with derivative \lambda if \frac{f(w)-f(z)}{w-z}\to\lambda as w\to z. This is simply replacing z+h by w, but that is nice because it has the effect of making the expression more symmetrical. (One might argue that since we are talking about differentiability at z, the variables z and w are playing different roles, so there is not much motivation for symmetry. And indeed, that is why calling one point z and the other z+h is often a good idea. But symmetry is … well … sort of good to have even when not terribly strongly motivated.)

If we use this definition, then the derivative of z^n is the limit as w\to z of \frac{w^n-z^n}{w-z}, and now there is no temptation to use the binomial expansion (we would first have to write w as z+(w-z) and the whole thing would be disgusting) and the absolutely obvious thing to do is to observe that we have a nice formula for the ratio in question, namely

w^{n-1}+zw^{n-1}+z^2w^{n-2}+\dots+z^{n-1},

which obviously tends to nz^{n-1} as w\to z.

In fact, the whole proof is arguably nicer if one uses z and w rather than z and z+h.

Thus, the “clever” expansion is the natural one to do with the symmetric definition of differentiation, whereas the binomial expansion is the natural one to do with the z+h definition. So in the presentation above, I have slightly obscured the origins of the argument by applying the clever expansion to the z+h definition.

Another way of seeing that it is natural is to think about how we prove the statement that a product of limits is the limit of the products. The essence of this is to show that if a is close to a' and b is close to b', then ab is close to a'b'. This we do by arguing that ab is close to a'b, and that a'b is close to a'b'.

Suppose we apply a similar technique to try to show that (z+h)^n is close to z^n. How might we represent their difference? A natural way of doing it would be to convert all the (z+h)s into zs in a sequence of n steps. That is, we would argue that (z+h)^n is close to (z+h)^{n-1}z, which is close to (z+h)^{n-2}z^2, and so on.

But the difference between (z+h)^rz^{n-r} and (z+h)^{r-1}z^{n-r+1} is h(z+h)^{r-1}z^{n-r}, so if we adopt this approach, the we will end up showing precisely that

(z+h)^n-z^n=h\bigl((z+h)^{n-1}+(z+h)^{n-2}z+\dots+z^{n-1}\bigr).

About these ads

10 Responses to “Differentiating power series”

  1. Joel Says:

    We are appealing here to a general principle, which is that if some functions converge to f and their derivatives converge to g, then f is differentiable with f'=g. Is this general principle correct?

    If you assume uniform convergence of the functions, this principle becomes rather usefully true, doesn’t it?
    However this is presumably dealt with later (perhaps along with the Riemann integral?).

    • Joel Says:

      To be more precise, we can get away with assuming pointwise convergence of f_n to a continuous function f as long as the derivatives f'_n are continuous and converge uniformly to g, at least on some open interval/set around the point we are looking at.
      My favourite proof of this uses the Riemann integral and the Fundamental Theorem of Calculus, but there are also some relatively elementary proofs available using e.g. suitable Mean Value Theorem estimates.

    • gowers Says:

      You’re right that we haven’t done uniform convergence or the Riemann integral yet, and in fact uniform convergence isn’t on the syllabus for this course, but if I have time at the end of the course, having done integration, I might mention this proof.

  2. Pierre Says:

    Another way would to state a general principle with stronger hypothesis to makes it true ;). I suggest the following : Let S_n be a sequence a function, two times differentiable, such that
    (i) S_n converges pointwise to f
    (ii) S_n' converges pointwise to g
    (iii) S_n'' is bounded by A, for some A independent of n
    then f is differentiable and f' = g.
    This applies easily to power series since the pointwise convergence and the bound can be established by the ratio test.

    PROOF.
    Let x in your domain and \epsilon > 0.
    For all n we have
    |f(x+\epsilon)-f(x) + \epsilon g(x)| \leq |S_n(x+\epsilon)-f(x+\epsilon)| + |S_n(x)-f(x)| + \epsilon |S_n'(x)-g(x)| + |S_n(x+\epsilon)-S_n(x)-\epsilon S_n'(x)|

    The first and the second terms tends to zero as n\to\infty, by the
    hypothesis (i). The third term as well, by hypothesis (ii). By Taylor’s
    theorem with Lagrange form of the remainder (the one you told us about few days
    ago!), the last term is bounded by \epsilon^2 A. This proves that f is
    differentiable at x and that f'(x) = g(x).

    • gowers Says:

      Thanks for this. I’m thinking of adding a section to the post above, giving your argument in full detail, but if any of my students are reading this, I recommend trying to fill in the details for yourself.

  3. Christopher McClain Says:

    “So what follows is a rational reconstruction of what I wish had been my thought processes, rather than of what actually went on in my mind.”

    I absolutely love this statement, and I think this admission makes this post much more beautiful than if you had presented the “trick” from the start. I try to make my students realize that slick textbook and article presentations frequently obscure the problem-solving process itself. While such education is not necessarily the point of journal articles, textbooks for real analysis and other subjects too often present “nice” proofs without discussing how on Earth a person actually thinks to do these things. I truly appreciate your attempts to motivate proofs and to minimize the use of “magic bullets” (e.g. use of Cauchy’s generalized MVT with specially chosen functions.)

    Thank you for this blog. I enjoy it.

  4. tomcircle Says:

    Reblogged this on Math Online Tom Circle.

  5. mwildon Says:

    Thank you for a very interesting post. This is a bit late, but I’d like to give another real variable proof of the result in which the heavy work is done by uniform convergence of power series strictly within their circle of convergence.

    Fix z such that |z| < R and choose s such that |z| + s  0 there exists N such that, whenever |h| < s, the errors in approximating (i) \sum_{n=0}^\infty a_n (z+h)^n by \sum_{n=0}^N a_n (z+h)^n, (ii) \sum_{n=0}^\infty a_n z^n by \sum_{n=0}^N a_n z^n and (iii) \sum_{n=0}^\infty na_n z^{n-1} by $\sum_{n=0}^N na_n z^{n-1}$ are all < \epsilon. Since

    \displaystyle \frac{1}{h} \sum_{n=0}^N (a_n(z+h)^n - a_n z^n - na_n z^{n-1}) = h g(h)

    for some (complicated) polynomial h, the left-hand side is < \epsilon for all sufficiently small h. Hence

    \displaystyle | \frac{f(z+h) - f(z) - \sum_{n=0}^\infty na_n z^{n-1}}{h} | < 4\epsilon

    for all sufficiently small h.

    • mwildon Says:

      Sorry, the second paragraph and the LaTeX have got completely mangled. Please could you delete everything and I will put a post on my blog with the idea.

  6. Term-by-term differentiation of power series | Wildon's Weblog Says:

    […] of convergence then its derivative is , where this series has radius of convergence at least . An interesting post on Gowers’ weblog gives a direct proof of this (in the real variable case, but it all goes […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 1,579 other followers

%d bloggers like this: