There are countless situations in mathematics where it helps to expand a function as a power series. Therefore, Taylor’s theorem, which gives us circumstances under which this can be done, is an important result of the course. It is also the one result that I was dreading lecturing, at least with the Lagrange form of the remainder, because in the past I have always found that the proof is one that I have not been able to understand properly. I don’t mean by that that I couldn’t follow the arguments I read. What I mean is that I couldn’t reproduce the proof without committing a couple of things to memory, which I would then forget again once I had presented them. Briefly, an argument that appears in a lot of textbooks uses a result called the Cauchy mean value theorem, and applies it to a cleverly chosen function. Whereas I understand what the mean value theorem is for, I somehow don’t have the same feeling about the Cauchy mean value theorem: it just works in this situation and happens to give the answer one wants. And I don’t see an easy way of predicting in advance what function to plug in.

I have always found this situation annoying, because a part of me said that the result ought to be a straightforward generalization of the mean value theorem, in the following sense. The mean value theorem applied to the interval tells us that there exists such that , and therefore that . Writing for some we obtain the statement . This is the case of Taylor’s theorem. So can’t we find some kind of “polynomial mean value theorem” that will do the same job for approximating by polynomials of higher degree?

Now that I’ve been forced to lecture this result again (for the second time actually — the first was in Princeton about twelve years ago, when I just suffered and memorized the Cauchy mean value theorem approach), I have made a proper effort to explore this question, and have realized that the answer is yes. I’m sure there must be textbooks that do it this way, but the ones I’ve looked at all use the Cauchy mean value theorem. I don’t understand why, since it seems to me that the way of proving the result that I’m about to present makes the whole argument completely transparent. I’m actually looking forward to lecturing it (as I add this sentence to the post, the lecture is about half an hour in the future), since the demands on my memory are going to be close to zero.

### A higher order Rolle theorem

We know that we want a statement that will involve the first derivatives of at , the th derivative at some point in the interval , and the value of at . The idea with Rolle’s theorem is to make a whole lot of stuff zero, and then with the mean value theorem we take a more general function and subtract a linear part to obtain a function to which Rolle’s theorem applies. So let’s try a similar trick here: we’ll make as much as we can equal to zero. In fact, I’ll go even further and make the values of and zero.

So here’s what I’ll assume: that and also that . That’s as much as I can reasonably set to be zero. And what should be my conclusion? That there is some such that . Note that if we set then we are assuming that and trying to find such that , so this result really does generalize Rolle’s theorem. (I’m also assuming that is times differentiable on an open interval that contains . This is a slightly stronger condition than necessary, but it will hold in the situations where we want to use Taylor’s theorem.)

The proof of this generalization is almost trivial, given Rolle’s theorem itself. Since , there exists such that . But as well, so by Rolle’s theorem, this time applied to , we find such that . Continuing like this, we eventually find such that . So we can set and we are done.

For what it’s worth, I didn’t use the fact that , but just that .

### A higher-order mean value theorem

Now let’s take an arbitrary function that is -times differentiable on an open interval containing . To prove the mean value theorem, we subtracted a linear function so as to obtain a function that satisfied the hypotheses of Rolle’s theorem. Here, the obvious thing to do is to subtract a polynomial of degree to obtain a function that satisfies the hypotheses of our higher-order Rolle theorem.

The properties we need to have are that , , and so on all the way up to , and finally . It turns out that we can more or less write down such a polynomial, once we have observed that the polynomial has the convenient property that except when when it is 1. This allows us to build a polynomial that has whatever derivatives we want at . So let’s do that. Define a polynomial by

Then for . A more explicit formula for is

Now doesn’t necessarily equal , so we need to add a multiple of to correct for this. (Doing that won’t affect the derivatives we’ve got at .) So we want our polynomial to be of the form

and we want . So we want to equal , which gives us . That is,

A quick check: if we substitute in for we get , which does indeed equal .

For the moment, we can forget the *formula* for . All that matters is its *properties*, which, just to remind you, are these.

- is a polynomial of degree .
- for .
- .

The second and third properties tell us that if we set , then for and . Those are the conditions needed for our higher-order Rolle theorem. Therefore, there exists such that , which implies that .

Let us just highlight what we have proved here.

**Theorem.** *Let be continuous on the interval and -times differentiable on an open interval that contains . Let be the unique polynomial of degree such that for and . Then there exists such that .*

Note that since is a polynomial of degree , the function is constant. In the case , the constant is , the gradient of the line joining to , and the theorem is just the mean value theorem.

### Taylor’s theorem

Actually, the result we have just proved *is* Taylor’s theorem! To see that, all we have to do is use the explicit formula for and a tiny bit of rearrangement. To begin with, let us use the formula

Note that for every , so the theorem tells us that there exists such that

Rearranging, that gives us that

Finally, using the formula for , which was

and setting , we can rewrite our conclusion as

which is Taylor’s theorem with the Lagrange form of the remainder.

### Using Taylor’s theorem

I think it is quite rare for a proof of Taylor’s theorem to be asked for in the exams. However, pretty well every year there is a question that requires you to understand the *statement* of Taylor’s theorem. (I am writing this post without any knowledge of what will be in this year’s exam, and the examiners will be entirely within their rights to ask for anything that’s on the syllabus. So I certainly don’t recommend not learning the proof of Taylor’s theorem.)

You may at school have seen the following style of reasoning. Suppose we want to calculate the power series of . Then we write

Taking we deduce that . Differentiating we get that

and taking we deduce that . In general, differentiating times and setting we deduce that if is even, if mod 4, and if mod 4. Therefore,

There are at least two reasons that this argument is not rigorous. (I’ll assume that we have defined trigonometric functions and proved rigorously that their derivatives are what we think they are. Actually, I plan to define them using power series later in the course, in which case they have their power series by definition, but it is possible to define them in other ways — e.g. using the differential equation — so this discussion is not a complete waste of time.) One is that we assumed that could be expanded as a power series. That is, at best what we have just shown is that *if* can be expanded as a power series, then the power series must be that one.

A second reason is that we just assumed that the power series could be differentiated term by term. That holds under certain circumstances, as we shall see later in the course, and those circumstances hold for this particular power series, but until we’ve proved that is given by this particular power series we don’t know that the conditions hold.

Taylor’s theorem helps us to clear up these difficulties. Applying it with replaced by 0 and replaced by , we find that

for some . All the terms apart from the last one are just the expected terms in the power series for , so we get that is equal to the partial sum of the power series up to the term in plus a remainder term.

The remainder term is , so its magnitude is at most . It is not hard to prove that tends to zero as . (One way to do this is to observe that the ratio of successive terms has magnitude at most 1/2 once is bigger than .) Therefore, the power series converges for every , and converges to .

The basic technique here is as follows.

(i) Write down what Taylor’s theorem gives you for your function.

(ii) Prove that for each (in the range where you want to prove that the power series converges) the remainder term tends to zero as tends to infinity.

### Taylor’s theorem with the Peano form of the remainder

The material in this section is not on the course, but is still worth thinking about. It begins with the definition of a derivative, which, as I said in lectures, can be expressed as follows. A function is differentiable at with derivative if

We can think of as the best linear approximation to for small .

Once we’ve said that, it becomes natural to ask for the best quadratic approximation, and in general for the best approximation by a polynomial of degree .

Let’s think about the quadratic case. In the light of Taylor’s theorem it is natural to expect that

in which case would indeed be the best quadratic approximation to for small .

What Taylor’s theorem as stated above gives us is

for some . If we know that is continuous at , then as , so we can write , where . But then , as we wanted, since .

However, this result does not need the continuity assumption, so let me briefly prove it. To keep the expressions simple I will prove only the quadratic case, but the general case is pretty well exactly the same.

I’ll do the same trick as usual, by which I mean I’ll first prove it when various things are zero and then I’ll deduce the general case. So let’s suppose that . We want to prove now that .

Since , we have that

Therefore, for every we can find such that for every with .

This gives us several inequalities, one of which is that for every such that . If we now set to be , then we have that for every . So by the mean value theorem, for every such , which implies that .

If we run a similar argument using the fact that we get that . And we can do similar arguments with as well, and the grand conclusion is that whenever we have .

What we have shown is that for every there exists such that whenever , which is exactly the statement that as , which in turn is exactly the statement that .

That does the proof when . Now let’s take a general and define a function by

Then , so , from which it follows that

which after rearranging gives us the statement we wanted:

As I said above, this argument generalizes straightforwardly and gives us Taylor’s theorem with what is known as *Peano’s form of the remainder*, which is the following statement.

For that we need to exist but we do not need to exist anywhere else, so we certainly don’t need any continuity assumptions on .

This version of Taylor’s theorem is not as useful as versions with an explicit formula for the remainder term, as you will see if you try to use it to prove that can be expanded as a power series: the information that the remainder term is is, for fixed , of no use whatever. But the information that it is gives us an expression that we can prove tends to zero.

### An expression for

However, one amusing (but not, as far as I know, useful) thing it gives us is a direct formula for the second derivative. By direct I mean that we do not go via the first derivative. Let us take the quadratic result and apply it to both and . We get

and

From this it follows that

Dividing through by we get that

as .

I’m not claiming the converse, which would say that if this limit exists, then is twice differentiable at . In fact, doesn’t even have to be once differentiable at . Consider, for example, the following function. For every integer (either positive or negative) and every in the interval we set equal to . We also set , and we take when . (That is, for negative we define so as to make it an odd function.)

Then for every , so for every , and in particular it tends to 0 as . However, is not differentiable at 0. To see this, note that when we have , whereas when is close to we have close to . Therefore, the ratio does not converge as , which tells us that is not differentiable at 0.

If you want an example that is continuous everywhere, then take . This again has the property that for every , and it is not differentiable at 0.

Even if we assume that is differentiable, we can’t get a proper converse. For example, the condition

does not imply that exists and equals 0. For a counterexample, take a function such as (and 0 at 0). Then must lie between and therefore certainly be . But the oscillations near zero are so fast that is unbounded near zero, so doesn’t exist at 0.

February 11, 2014 at 2:38 pm |

Re: “However, one amusing (but not, as far as I know, useful) thing it gives us is a direct formula for the second derivative.” finite difference method for the laplacian.

February 11, 2014 at 4:31 pm |

This proof of Taylor’s theorem with the Lagrange remainder is virtually identical to the one in the book “Advanced Calculus of Several Variables” by CH Edwards, Jr (despite the title, this is one of the few one-variable topics that he covers explicitly). I really like it, though he doesn’t do the Peano form and, consequently, assumes excessive hypotheses on the differentiability of his functions in various places.

February 12, 2014 at 12:48 am |

Reblogged this on Singapore Maths Tuition.

February 12, 2014 at 6:48 pm |

Looks like there’s a typo in the first sentence of the second paragraph of “A higher order Rolle theorem”: n should be n – 1.

Thanks very much — I’ve corrected it now.February 13, 2014 at 12:49 am |

I also don’t like how most books use the Cauchy Mean Value theorem to prove Taylor’s Theorem. So I gave a slightly different proof and look on my blog. It’s pitched at a slightly lower level (for my calc I students), but the gist is there.

In short, I start with (in the 2nd order case – they’re all the same) from the mean value theorem applied to . Integrating from to , and a little rearrangement, gives . Integrating again gives the third order Taylor polynomial, with Lagrange remainder. This clearly generalizes.

I like this one because it’s a reasonable extension of what my students had recently learned: the fundamental theorem of calculus, and I had a lot more success with this version than my previous attempts.

February 13, 2014 at 12:53 am

My link died for some reason, so if you’ll forgive the repeat, I try again. It’s here, or (trying a different format), here.

December 11, 2014 at 3:38 pm

This is a good heuristic, but not a proof since the $c$ depends on $t$ and thus is not constant in the integration.

December 11, 2014 at 6:09 pm

Yes, you’re right. I didn’t notice that when I first wrote it down. But i later returned, discovered my lapse, and tried to correct it. It turns out that you can still prove it in the way I wrote down, though it becomes more technical and demands more attention.

Ultimately, I think it’s sufficiently annoying that I don’t advise teaching it at all. This last semester, I taught the way that Dr. Gowers mentions in this post.

December 11, 2014 at 7:22 pm

You’re in good company. A number of real analysis textbooks have faulty “proofs” of the convergence of certain Taylor series (e.g. for $e^x$) which make the same mistake.

February 13, 2014 at 3:37 am |

I am pretty sure that I remember the expression you wrote for the second derivative is used in order to estimate second derivatives numerically, since you cannot take the derivative of the first derivative if you only have some sample values of your function at a few points.

February 13, 2014 at 4:13 pm |

1) Yes, this is The Proof. It belongs in the Book.

2) Some fun might be had in generalizing the higher Rolle’s Theorem: for example, , which leads one to consider the Bernstein basis polynomials.

February 14, 2014 at 8:43 pm |

> It turns out that we can more or less write down such a polynomial, once we have observed that the polynomial q_k(x)=(u-x)^k/k! has the convenient property that q_k{(j)}(x)=0 except when j=k when it is 1.

You probably meant

Thanks — I’ve corrected it now.February 15, 2014 at 5:32 pm

Also, instead of .

The choice of the names of the variables is really unfortunate. Why didn’t you resort to the more standard for the fixed point and for the function argument?

Am I really unable to edit and even preview my own comments?

February 15, 2014 at 9:25 pm |

Very nice! I have had exactly the same feelings in the past about the standard proofs of Taylor’s theorem.

I have one comment here on possible student confusion. I have had problems in the past with treating $$ as a constant during proofs for my students. And a question I once asked involving the phrase

resulted in mass carnage!

The potential confusion (for some students) in the exposition above comes from the fact that the various derivatives of $$ are with taken with respect to $$, even when they are evaluated at $$. Because of this, I suppose that I would probably play it safe and use $$ instead of $$ here, simply because students are unlikely to think that you might be differentiating with respect to $$. (That also frees up $$ as a variable name, in case you want to use $$ instead of $$ latex u$$, for example.)

However it is probably very good for students to get used to treating $$ as a constant during a proof and to differentiate with respect to another variable instead. So perhaps allowing some students to get temporarily confused is best, as long as the confusion is readily resolved!

February 15, 2014 at 9:32 pm

Ah, I see here that (a) I accidentally commented anonymously and (b) I have mistakenly used double dollars instead of single dollars (as well as making at least one basic typo).

February 16, 2014 at 12:01 pm

To be precise, I think my old exam question (from about 14 years ago) went as follows.

Suppose that is continuous. Prove that there exist and such that .

Very few students appeared to understand what this question was asking. A lot of students assumed or tried to prove that the function actually

wasthe function .I think more students would have at least understood the question if I had used (or ) instead of . (Whether they would have then been able to solve the problem is another matter.)

February 15, 2014 at 9:52 pm |

[…] for proving Taylor’s Theorem (with the Lagrange form of the remainder), have a look at Timothy Gowers’s blog post on this, where you will find what is, perhaps, a more natural proof than the usual […]

April 2, 2014 at 2:21 pm |

R. P. Burn’s “Numbers and Functions” (recommended to students taking the Analysis I course at Cambridge) explicitly introduces the “second mean value theorem”, the “third mean value theorem” and the “n-th mean value theorem”, or Taylor’s Theorem (WLFOTR). Quite aside from this, it is an excellent introduction to the subject; its choice of completeness axiom (every real number has a decimal expansion) isn’t very clean, but at least it is natural.

April 24, 2014 at 12:21 am |

On the hypotheses: if is differentiable in a neighbourhood of an interval, it is also continuous in that interval. That said, the usual hypotheses of Rolle’s Theorem are continuity in the closed interval, and differentiability in the open one. The generalized version should therefore assume continuous derivatives in (one-sided derivatives at the endpoints), and derivatives in . The hypotheses for Taylor’s Theorem should then be the same.

July 3, 2014 at 1:10 pm |

This proof is very much like the usual proof of the formula for the error in polynomial interpolation. (See, e.g., Atkinson’s Intro to Numerical Analysis.) Personally, I prefer proving the Taylor theorem with integral remainder as it remains true for vector-valued functions while the derivative form of the remainder is only valid in the scalar case.

November 5, 2014 at 10:03 pm |

[…] is a little known generalization of the Mean Value Theorem to higher derivatives (Gowers’s BlogĀ is the only reference the author has seen). A reference for the rest of this talk is a currently […]

December 10, 2014 at 11:01 pm |

This is a nice argument. A similar argument appears in Pugh’s book Real Mathematical Analysis. But Cauchy’s MVT is not so hard to understand intuitively: if the two functions are regarded as the coordinate functions of a parametrized curve in the plane, the result just says there is a point on the curve where the tangent line is parallel to the line through the endpoints. For a quick geometric “proof” (which can easily be made rigorous), just take a point on the curve of maximum distance from the line through the endpoints.