I’m writing this post as a way of preparing for a lecture. I want to discuss the result that a power series is differentiable inside its circle of convergence, and the derivative is given by the obvious formula . In other words, inside the circle of convergence we can think of a power series as like a polynomial of degree for the purposes of differentiation.
A preliminary question about this is why it is not more or less obvious. After all, writing , we have the following facts.
- Writing , we have that .
- For each , .
If we knew that , then we would be done.
Ah, you might be thinking, how do we know that the sequence converges? But it turns out that that is not the problem: it is reasonably straightforward to show that it converges. (Roughly speaking, inside the circle of convergence the series converges at least as fast as a GP, and multiplying the th term by doesn’t stop a GP converging (as can easily be seen with the help of the ratio test). So, writing for , we have the following facts at our disposal.
Doesn’t it follow from that that ?
We are appealing here to a general principle, which is that if some functions converge to and their derivatives converge to , then is differentiable with . Is this general principle correct?
Unfortunately, it isn’t. Suppose we take some continuous functions that converge to a step function. (Roughly speaking, you make be 0 up to 0, then linear with gradient until it hits 1, then 1 from that point onwards.) And suppose we then let be the function that differentiates to and is 0 up to 0. Then the converge to the function that is 0 up to 0 and for positive . This function almost differentiates to the step function, but it isn’t differentiable at 0.
So we’ve somehow got to use particular facts about power series in order to prove our result — we can’t appeal to general considerations, because then we are appealing to a principle that isn’t true. (Actually, in principle some compromise might be possible, where we show that functions defined by power series have a certain property and then use nothing apart from that property from that point on. But as it happens, we shall not do this.)
Why can’t we just jump in and prove it with a big calculation?
We have a formula for . Why don’t we write out a formula for and see if we can tell what happens when ?
That is certainly a sensible first thing to try, so let’s see what happens.
What can we do with that? Perhaps we’d better apply the binomial theorem. Then we find that the right-hand side is equal to
Part of the above expression gives us what we want, namely . So we’re left wanting to prove that
tends to 0 as .
Unfortunately, as gets big, some of those binomial coefficients get pretty big too. Indeed, when is bigger than , the growth in the binomial coefficients seems to outstrip the shrinking of the powers of . What can we do?
A useful trick
Fortunately, there is a better (for our purposes at least) way of writing . We just expanded out using the binomial theorem. But we could instead have used the expansion
Applying that with and , we get
Just before we continue, note that this gives us an alternative, and in my view nicer, way to see that the derivative of is , since if you divide the right-hand side by and let then each of the terms tends to .
Anyhow, if we use this trick, then works out to be
Now let’s subtract the thing we want this to tend to, which is . (This is not valid unless we know that this series converges. So at some stage we will need to prove that.) If we think of as a sum of copies of , then we can write the difference as
Now is another example of the expansion we had above. That is, we can write it as
We haven’t yet mentioned the radius of convergence of the original power series, but let’s do so now. Suppose it is , that is such that , and that we have chosen small enough that . Then the modulus of the expression above is at most .
It follows that
Since , this is equal to .
So this will tend to zero as as long as we can prove that the sum converges.
Convergence of the power series you get when you differentiate term by term
Let’s prove a lemma to deal with that last point. It says that if is smaller than the radius of convergence of the power series , then the power series converges.
The proof is very similar to an argument we have seen already. Let be the radius of convergence, and pick with . Then the power series converges, so the terms are bounded above, by , say. Then .
But the series converges, by the ratio test. Therefore, by the comparison test, the series converges.
This shows also that if then the power series converges (since we have just proved that it converges absolutely). So if we differentiate a power series term by term, we get a new power series that has the same radius of convergence, something we needed earlier.
If we apply this lemma a second time, we get that the power series converges, and dividing by 2 that gives us what we wanted above, namely that converges.
A couple of applications
An obvious way of applying the result is to take some of your favourite power series and differentiate them term by term. This illustrates the very important general point that if you can obtain something in two different ways, then you usually end up proving something interesting.
So let’s take the function , which we have shown converges everywhere. Then we can obtain the derivative either by differentiating the function itself or by differentiating the power series term by term. That tells us that
, which simplifies to , which in turn simplifies to , which equals .
Earlier we proved this result by writing as and proving that . I still prefer that proof, but you are at liberty to disagree.
As another example, let us consider the power series . When this equals , by the formula for summing a GP. We can now differentiate the power series term by term, and we can also differentiate the function . Doing so tells us the interesting fact that
We can see that in another way as well. By our result on multiplying power series, the product of with itself is the power series , where is the convolution of the constant sequence with itself. That is, with every and equal to 1, which gives us . (This agrees with the previous answer, since is the same as .)
Tidying up the proof
In the proof above, we used the identity
with and , and then we used it again to calculate what happened when we subtracted . Can we get those calculations out of the way in advance? That is, can we begin by finding a nice formula for ?
We obviously can, by subtracting from the right-hand side and simplifying, much as we did in the proof above (with and ). However, we can do things a bit more slickly as follows. Start with the identity
Differentiating both sides with respect to , we get
If we now take for and for , we deduce that is equal to
In particular, if and are both at most , then , which is the main fact we needed in the proof.
Armed with this fact, we could argue as follows. We want to show that
is . By the inequality we have just proved, if and are at most , then the modulus of this expression is at most
and an earlier lemma told us that this converges within the circle of convergence. So the quantity we want to be is in fact bounded above by a multiple of . (Sometimes people use the notation for this. The means “bounded above in modulus by a constant multiple of the modulus of”.)
Was the “trick” a trick?
The proof in this post has relied heavily on the idea, which appeared to come from nowhere, of writing not in the obvious way, which is
but in a “clever” way, namely
Is this something one just has to remember, or can it be regarded as the natural thing to do?
I chose the words “can it be regarded as” quite carefully, since I want to argue that it is the natural thing to do, but when I was preparing this lecture, I didn’t find it the natural thing to do, as I shall now explain. I came to this result with the following background. Many years ago, I lectured a IB course called Further Analysis, which was a sort of combination of the current courses Metric and Topological Spaces and Complex Analysis, all packed into 16 lectures. (Amazingly, it worked quite well, though it was a challenge to get through all the material.) As a result of lecturing that, I learnt a proof that power series can be differentiated term by term inside their circle of convergence, but the proof uses a number of results from complex analysis. I then believed what some people say, which is that the complex analysis proof of this result is a very good advertisement for complex analysis, since a direct proof is horrible. And then at some point I was chatting to Imre Leader about the reorganization of various courses, and he told me that it was a myth that proving the result directly was hard. It wasn’t trivial, he said, but it was basically fine. In fact, it may even be thanks to him that the result is in the course.
Until a few days ago, I didn’t bother to check for myself that the proof wasn’t too bad — I just believed what he said. And then with the lecture coming up, I decided that the time had finally come to check it: something that I assumed would be a reasonably simple exercise. I duly did the obvious thing, including expanding using the binomial theorem, and got stuck.
I would like to be able to say that I then thought hard about why I was stuck, and after a while thought of the idea of expanding using the expansion of . But actually that is not what happened. What happened was that I thought, “Damn, I’m going to have to look up the proof.” I found a few proofs online that looked dauntingly complicated and I couldn’t face reading them properly, apart from one that was quite nice and that for a while I thought I would use. But one thing all the proofs had in common was the use of that expansion, so that was how the idea occurred to me.
So what follows is a rational reconstruction of what I wish had been my thought processes, rather than of what actually went on in my mind.
Let’s go back to the question of how to differentiate . I commented above that one could do it using the expansion, and said that I even preferred that approach. But how might one think of doing it that way? There is a very simple answer to that, which is to use one of the alternative definitions of differentiability, namely that is differentiable at with derivative if as . This is simply replacing by , but that is nice because it has the effect of making the expression more symmetrical. (One might argue that since we are talking about differentiability at , the variables and are playing different roles, so there is not much motivation for symmetry. And indeed, that is why calling one point and the other is often a good idea. But symmetry is … well … sort of good to have even when not terribly strongly motivated.)
If we use this definition, then the derivative of is the limit as of , and now there is no temptation to use the binomial expansion (we would first have to write as and the whole thing would be disgusting) and the absolutely obvious thing to do is to observe that we have a nice formula for the ratio in question, namely
which obviously tends to as .
In fact, the whole proof is arguably nicer if one uses and rather than and .
Thus, the “clever” expansion is the natural one to do with the symmetric definition of differentiation, whereas the binomial expansion is the natural one to do with the definition. So in the presentation above, I have slightly obscured the origins of the argument by applying the clever expansion to the definition.
Another way of seeing that it is natural is to think about how we prove the statement that a product of limits is the limit of the products. The essence of this is to show that if is close to and is close to , then is close to . This we do by arguing that is close to , and that is close to .
Suppose we apply a similar technique to try to show that is close to . How might we represent their difference? A natural way of doing it would be to convert all the s into s in a sequence of steps. That is, we would argue that is close to , which is close to , and so on.
But the difference between and is , so if we adopt this approach, the we will end up showing precisely that