There are countless situations in mathematics where it helps to expand a function as a power series. Therefore, Taylor’s theorem, which gives us circumstances under which this can be done, is an important result of the course. It is also the one result that I was dreading lecturing, at least with the Lagrange form of the remainder, because in the past I have always found that the proof is one that I have not been able to understand properly. I don’t mean by that that I couldn’t follow the arguments I read. What I mean is that I couldn’t reproduce the proof without committing a couple of things to memory, which I would then forget again once I had presented them. Briefly, an argument that appears in a lot of textbooks uses a result called the Cauchy mean value theorem, and applies it to a cleverly chosen function. Whereas I understand what the mean value theorem is for, I somehow don’t have the same feeling about the Cauchy mean value theorem: it just works in this situation and happens to give the answer one wants. And I don’t see an easy way of predicting in advance what function to plug in.
I have always found this situation annoying, because a part of me said that the result ought to be a straightforward generalization of the mean value theorem, in the following sense. The mean value theorem applied to the interval tells us that there exists such that , and therefore that . Writing for some we obtain the statement . This is the case of Taylor’s theorem. So can’t we find some kind of “polynomial mean value theorem” that will do the same job for approximating by polynomials of higher degree?
Now that I’ve been forced to lecture this result again (for the second time actually — the first was in Princeton about twelve years ago, when I just suffered and memorized the Cauchy mean value theorem approach), I have made a proper effort to explore this question, and have realized that the answer is yes. I’m sure there must be textbooks that do it this way, but the ones I’ve looked at all use the Cauchy mean value theorem. I don’t understand why, since it seems to me that the way of proving the result that I’m about to present makes the whole argument completely transparent. I’m actually looking forward to lecturing it (as I add this sentence to the post, the lecture is about half an hour in the future), since the demands on my memory are going to be close to zero.
A higher order Rolle theorem
We know that we want a statement that will involve the first derivatives of at , the th derivative at some point in the interval , and the value of at . The idea with Rolle’s theorem is to make a whole lot of stuff zero, and then with the mean value theorem we take a more general function and subtract a linear part to obtain a function to which Rolle’s theorem applies. So let’s try a similar trick here: we’ll make as much as we can equal to zero. In fact, I’ll go even further and make the values of and zero.
So here’s what I’ll assume: that and also that . That’s as much as I can reasonably set to be zero. And what should be my conclusion? That there is some such that . Note that if we set then we are assuming that and trying to find such that , so this result really does generalize Rolle’s theorem. (I’m also assuming that is times differentiable on an open interval that contains . This is a slightly stronger condition than necessary, but it will hold in the situations where we want to use Taylor’s theorem.)
The proof of this generalization is almost trivial, given Rolle’s theorem itself. Since , there exists such that . But as well, so by Rolle’s theorem, this time applied to , we find such that . Continuing like this, we eventually find such that . So we can set and we are done.
For what it’s worth, I didn’t use the fact that , but just that .
A higher-order mean value theorem
Now let’s take an arbitrary function that is -times differentiable on an open interval containing . To prove the mean value theorem, we subtracted a linear function so as to obtain a function that satisfied the hypotheses of Rolle’s theorem. Here, the obvious thing to do is to subtract a polynomial of degree to obtain a function that satisfies the hypotheses of our higher-order Rolle theorem.
The properties we need to have are that , , and so on all the way up to , and finally . It turns out that we can more or less write down such a polynomial, once we have observed that the polynomial has the convenient property that except when when it is 1. This allows us to build a polynomial that has whatever derivatives we want at . So let’s do that. Define a polynomial by
Then for . A more explicit formula for is
Now doesn’t necessarily equal , so we need to add a multiple of to correct for this. (Doing that won’t affect the derivatives we’ve got at .) So we want our polynomial to be of the form
and we want . So we want to equal , which gives us . That is,
A quick check: if we substitute in for we get , which does indeed equal .
For the moment, we can forget the formula for . All that matters is its properties, which, just to remind you, are these.
- is a polynomial of degree .
- for .
The second and third properties tell us that if we set , then for and . Those are the conditions needed for our higher-order Rolle theorem. Therefore, there exists such that , which implies that .
Let us just highlight what we have proved here.
Theorem. Let be continuous on the interval and -times differentiable on an open interval that contains . Let be the unique polynomial of degree such that for and . Then there exists such that .
Note that since is a polynomial of degree , the function is constant. In the case , the constant is , the gradient of the line joining to , and the theorem is just the mean value theorem.
Actually, the result we have just proved is Taylor’s theorem! To see that, all we have to do is use the explicit formula for and a tiny bit of rearrangement. To begin with, let us use the formula
Note that for every , so the theorem tells us that there exists such that
Rearranging, that gives us that
Finally, using the formula for , which was
and setting , we can rewrite our conclusion as
which is Taylor’s theorem with the Lagrange form of the remainder.
Using Taylor’s theorem
I think it is quite rare for a proof of Taylor’s theorem to be asked for in the exams. However, pretty well every year there is a question that requires you to understand the statement of Taylor’s theorem. (I am writing this post without any knowledge of what will be in this year’s exam, and the examiners will be entirely within their rights to ask for anything that’s on the syllabus. So I certainly don’t recommend not learning the proof of Taylor’s theorem.)
You may at school have seen the following style of reasoning. Suppose we want to calculate the power series of . Then we write
Taking we deduce that . Differentiating we get that
and taking we deduce that . In general, differentiating times and setting we deduce that if is even, if mod 4, and if mod 4. Therefore,
There are at least two reasons that this argument is not rigorous. (I’ll assume that we have defined trigonometric functions and proved rigorously that their derivatives are what we think they are. Actually, I plan to define them using power series later in the course, in which case they have their power series by definition, but it is possible to define them in other ways — e.g. using the differential equation — so this discussion is not a complete waste of time.) One is that we assumed that could be expanded as a power series. That is, at best what we have just shown is that if can be expanded as a power series, then the power series must be that one.
A second reason is that we just assumed that the power series could be differentiated term by term. That holds under certain circumstances, as we shall see later in the course, and those circumstances hold for this particular power series, but until we’ve proved that is given by this particular power series we don’t know that the conditions hold.
Taylor’s theorem helps us to clear up these difficulties. Applying it with replaced by 0 and replaced by , we find that
for some . All the terms apart from the last one are just the expected terms in the power series for , so we get that is equal to the partial sum of the power series up to the term in plus a remainder term.
The remainder term is , so its magnitude is at most . It is not hard to prove that tends to zero as . (One way to do this is to observe that the ratio of successive terms has magnitude at most 1/2 once is bigger than .) Therefore, the power series converges for every , and converges to .
The basic technique here is as follows.
(i) Write down what Taylor’s theorem gives you for your function.
(ii) Prove that for each (in the range where you want to prove that the power series converges) the remainder term tends to zero as tends to infinity.
Taylor’s theorem with the Peano form of the remainder
The material in this section is not on the course, but is still worth thinking about. It begins with the definition of a derivative, which, as I said in lectures, can be expressed as follows. A function is differentiable at with derivative if
We can think of as the best linear approximation to for small .
Once we’ve said that, it becomes natural to ask for the best quadratic approximation, and in general for the best approximation by a polynomial of degree .
Let’s think about the quadratic case. In the light of Taylor’s theorem it is natural to expect that
in which case would indeed be the best quadratic approximation to for small .
What Taylor’s theorem as stated above gives us is
for some . If we know that is continuous at , then as , so we can write , where . But then , as we wanted, since .
However, this result does not need the continuity assumption, so let me briefly prove it. To keep the expressions simple I will prove only the quadratic case, but the general case is pretty well exactly the same.
I’ll do the same trick as usual, by which I mean I’ll first prove it when various things are zero and then I’ll deduce the general case. So let’s suppose that . We want to prove now that .
Since , we have that
Therefore, for every we can find such that for every with .
This gives us several inequalities, one of which is that for every such that . If we now set to be , then we have that for every . So by the mean value theorem, for every such , which implies that .
If we run a similar argument using the fact that we get that . And we can do similar arguments with as well, and the grand conclusion is that whenever we have .
What we have shown is that for every there exists such that whenever , which is exactly the statement that as , which in turn is exactly the statement that .
That does the proof when . Now let’s take a general and define a function by
Then , so , from which it follows that
which after rearranging gives us the statement we wanted:
As I said above, this argument generalizes straightforwardly and gives us Taylor’s theorem with what is known as Peano’s form of the remainder, which is the following statement.
For that we need to exist but we do not need to exist anywhere else, so we certainly don’t need any continuity assumptions on .
This version of Taylor’s theorem is not as useful as versions with an explicit formula for the remainder term, as you will see if you try to use it to prove that can be expanded as a power series: the information that the remainder term is is, for fixed , of no use whatever. But the information that it is gives us an expression that we can prove tends to zero.
An expression for
However, one amusing (but not, as far as I know, useful) thing it gives us is a direct formula for the second derivative. By direct I mean that we do not go via the first derivative. Let us take the quadratic result and apply it to both and . We get
From this it follows that
Dividing through by we get that
I’m not claiming the converse, which would say that if this limit exists, then is twice differentiable at . In fact, doesn’t even have to be once differentiable at . Consider, for example, the following function. For every integer (either positive or negative) and every in the interval we set equal to . We also set , and we take when . (That is, for negative we define so as to make it an odd function.)
Then for every , so for every , and in particular it tends to 0 as . However, is not differentiable at 0. To see this, note that when we have , whereas when is close to we have close to . Therefore, the ratio does not converge as , which tells us that is not differentiable at 0.
If you want an example that is continuous everywhere, then take . This again has the property that for every , and it is not differentiable at 0.
Even if we assume that is differentiable, we can’t get a proper converse. For example, the condition
does not imply that exists and equals 0. For a counterexample, take a function such as (and 0 at 0). Then must lie between and therefore certainly be . But the oscillations near zero are so fast that is unbounded near zero, so doesn’t exist at 0.