Taylor’s theorem with the Lagrange form of the remainder

There are countless situations in mathematics where it helps to expand a function as a power series. Therefore, Taylor’s theorem, which gives us circumstances under which this can be done, is an important result of the course. It is also the one result that I was dreading lecturing, at least with the Lagrange form of the remainder, because in the past I have always found that the proof is one that I have not been able to understand properly. I don’t mean by that that I couldn’t follow the arguments I read. What I mean is that I couldn’t reproduce the proof without committing a couple of things to memory, which I would then forget again once I had presented them. Briefly, an argument that appears in a lot of textbooks uses a result called the Cauchy mean value theorem, and applies it to a cleverly chosen function. Whereas I understand what the mean value theorem is for, I somehow don’t have the same feeling about the Cauchy mean value theorem: it just works in this situation and happens to give the answer one wants. And I don’t see an easy way of predicting in advance what function to plug in.

I have always found this situation annoying, because a part of me said that the result ought to be a straightforward generalization of the mean value theorem, in the following sense. The mean value theorem applied to the interval [x,x+h] tells us that there exists y\in (x,x+h) such that f'(y)=\frac{f(x+h)-f(x)}h, and therefore that f(x+h)=f(x)+hf'(y). Writing y=x+\theta h for some \theta\in(0,1) we obtain the statement f(x+h)=f(x)+hf'(x+\theta h). This is the case n=1 of Taylor’s theorem. So can’t we find some kind of “polynomial mean value theorem” that will do the same job for approximating f by polynomials of higher degree?

Now that I’ve been forced to lecture this result again (for the second time actually — the first was in Princeton about twelve years ago, when I just suffered and memorized the Cauchy mean value theorem approach), I have made a proper effort to explore this question, and have realized that the answer is yes. I’m sure there must be textbooks that do it this way, but the ones I’ve looked at all use the Cauchy mean value theorem. I don’t understand why, since it seems to me that the way of proving the result that I’m about to present makes the whole argument completely transparent. I’m actually looking forward to lecturing it (as I add this sentence to the post, the lecture is about half an hour in the future), since the demands on my memory are going to be close to zero.

A higher order Rolle theorem

We know that we want a statement that will involve the first n-1 derivatives of f at x, the nth derivative at some point in the interval (x,x+h), and the value of f at x+h. The idea with Rolle’s theorem is to make a whole lot of stuff zero, and then with the mean value theorem we take a more general function and subtract a linear part to obtain a function to which Rolle’s theorem applies. So let’s try a similar trick here: we’ll make as much as we can equal to zero. In fact, I’ll go even further and make the values of f(x) and f(x+h) zero.

So here’s what I’ll assume: that f(x)=f'(x)=\dots=f^{(n-1)}(x)=0 and also that f(x+h)=0. That’s as much as I can reasonably set to be zero. And what should be my conclusion? That there is some \theta\in(0,1) such that f^{(n)}(x+\theta h)=0. Note that if we set n=1 then we are assuming that f(x)=f(x+h)=0 and trying to find \theta such that f'(x+\theta h)=0, so this result really does generalize Rolle’s theorem. (I’m also assuming that f is n times differentiable on an open interval that contains [x,x+h]. This is a slightly stronger condition than necessary, but it will hold in the situations where we want to use Taylor’s theorem.)

The proof of this generalization is almost trivial, given Rolle’s theorem itself. Since f(x)=f(x+h), there exists \theta_1\in(0,1) such that f'(x+\theta_1h)=0. But f'(x)=0 as well, so by Rolle’s theorem, this time applied to f', we find \theta_2\in(0,1) such that f''(x+\theta_2\theta_1h)=0. Continuing like this, we eventually find \theta_1,\dots,\theta_n\in (0,1) such that f^{(n)}(x+\theta_n\dots\theta_1h)=0. So we can set \theta=\theta_1\dots\theta_n and we are done.

For what it’s worth, I didn’t use the fact that f(x)=f(x+h)=0, but just that f(x)=f(x+h).

A higher-order mean value theorem

Now let’s take an arbitrary function f that is n-times differentiable on an open interval containing [x,x+h]. To prove the mean value theorem, we subtracted a linear function so as to obtain a function that satisfied the hypotheses of Rolle’s theorem. Here, the obvious thing to do is to subtract a polynomial p of degree n to obtain a function that satisfies the hypotheses of our higher-order Rolle theorem.

The properties we need p to have are that p(x)=f(x), p'(x)=f'(x), and so on all the way up to p^{(n-1)}(x)=f^{(n-1)}(x), and finally p(x+h)=f(x+h). It turns out that we can more or less write down such a polynomial, once we have observed that the polynomial q_k(x)=(u-x)^k/k! has the convenient property that q_k^{(j)}(x)=0 except when j=k when it is 1. This allows us to build a polynomial that has whatever derivatives we want at x. So let’s do that. Define a polynomial q by

q(u)=f(x)+q_1(u)f'(x)+q_2(u)f''(x)+\dots+q_{n-1}(u)f^{(n-1)}(x)

Then q^{(k)}(x)=f^{(k)}(x) for k=0,1,\dots,n-1. A more explicit formula for q(u) is

\displaystyle f(x)+(u-x)f'(x)+\frac{(u-x)^2}{2!}f''(x)+\dots+\frac{(u-x)^{n-1}}{(n-1)!}f^{(n-1)}(x)

Now q(x+h) doesn’t necessarily equal f(x+h), so we need to add a multiple of (u-x)^n to correct for this. (Doing that won’t affect the derivatives we’ve got at x.) So we want our polynomial to be of the form

p(u)=q(u)+\lambda(u-x)^n

and we want p(x+h)=f(x+h). So we want q(x+h)+\lambda h^n to equal f(x+h), which gives us \lambda=h^{-n}(f(x+h)-q(x+h)). That is,

\displaystyle p(u)=q(u)+\frac{(u-x)^n}{h^n}(f(x+h)-q(x+h))

A quick check: if we substitute in x+h for u we get q(x+h)+(h^n/h^n)(f(x+h)-q(x+h)), which does indeed equal f(x+h).

For the moment, we can forget the formula for p. All that matters is its properties, which, just to remind you, are these.

  1. p is a polynomial of degree n.
  2. p^{(k)}(x)=f^{(k)}(x) for k=0,1,\dots,n-1.
  3. p(x+h)=f(x+h).

The second and third properties tell us that if we set g(u)=f(u)-p(u), then g^{(k)}(x)=0 for k=0,1,\dots,n-1 and g(x+h)=0. Those are the conditions needed for our higher-order Rolle theorem. Therefore, there exists \theta\in(0,1) such that g^{(n)}(x+\theta h)=0, which implies that f^{(n)}(x+\theta h)=p^{(n)}(x+\theta h).

Let us just highlight what we have proved here.

Theorem. Let f be continuous on the interval [x,x+h] and n-times differentiable on an open interval that contains [x,x+h]. Let p be the unique polynomial of degree n such that p^{(k)}(x)=f^{(k)}(x) for k=0,1,\dots,n-1 and p(x+h)=f(x+h). Then there exists \theta\in(0,1) such that f^{(n)}(x+\theta h)=p^{(n)}(x+\theta h).

Note that since p is a polynomial of degree n, the function p^{(n)} is constant. In the case n=1, the constant is (f(x+h)-f(x))/h, the gradient of the line joining (x,f(x)) to (x+h,f(x+h)), and the theorem is just the mean value theorem.

Taylor’s theorem

Actually, the result we have just proved is Taylor’s theorem! To see that, all we have to do is use the explicit formula for p and a tiny bit of rearrangement. To begin with, let us use the formula

\displaystyle p(u)=q(u)+\frac{(u-x)^n}{h^n}(f(x+h)-q(x+h))

Note that p^{(n)}(u)=\frac{n!}{h^n}(f(x+h)-q(x+h)) for every u, so the theorem tells us that there exists \theta\in(0,1) such that

\displaystyle f^{(n)}(x+\theta h)=\frac{n!}{h^n}(f(x+h)-q(x+h))

Rearranging, that gives us that

\displaystyle f(x+h)=q(x+h)+\frac{h^n}{n!}f^{(n)}(x+\theta h)

Finally, using the formula for q(u), which was

\displaystyle f(x)+(u-x)f'(x)+\frac{(u-x)^2}{2!}f''(x)+\dots+\frac{(u-x)^{n-1}}{(n-1)!}f^{(n-1)}(x)

and setting u=x+h, we can rewrite our conclusion as

\displaystyle f(x+h)=f(x)+hf'(x)+\frac{h^2}{2!}f''(x)+\dots

\displaystyle \dots + \frac{h^{n-1}}{(n-1!)}f^{(n-1)}(x)+\frac{h^n}{n!}f^{(n)}(x+\theta h)

which is Taylor’s theorem with the Lagrange form of the remainder.

Using Taylor’s theorem

I think it is quite rare for a proof of Taylor’s theorem to be asked for in the exams. However, pretty well every year there is a question that requires you to understand the statement of Taylor’s theorem. (I am writing this post without any knowledge of what will be in this year’s exam, and the examiners will be entirely within their rights to ask for anything that’s on the syllabus. So I certainly don’t recommend not learning the proof of Taylor’s theorem.)

You may at school have seen the following style of reasoning. Suppose we want to calculate the power series of \sin(x). Then we write

\sin(x)=a_0+a_1x+a_2x^2+a_3x^3+\dots

Taking x=0 we deduce that 0=a_0. Differentiating we get that

\cos(x)=a_1+2a_2x+3a_3x^2+\dots

and taking x=0 we deduce that 1=a_1. In general, differentiating n times and setting n=0 we deduce that n!a_n=0 if n is even, 1 if n\equiv 1 mod 4, and -1 if n\equiv -1 mod 4. Therefore,

\displaystyle \sin(x)=x-\frac{x^3}{3!}+\frac{x^5}{5!}-\dots

There are at least two reasons that this argument is not rigorous. (I’ll assume that we have defined trigonometric functions and proved rigorously that their derivatives are what we think they are. Actually, I plan to define them using power series later in the course, in which case they have their power series by definition, but it is possible to define them in other ways — e.g. using the differential equation f''(x)=-f(x) — so this discussion is not a complete waste of time.) One is that we assumed that \sin(x) could be expanded as a power series. That is, at best what we have just shown is that if \sin(x) can be expanded as a power series, then the power series must be that one.

A second reason is that we just assumed that the power series could be differentiated term by term. That holds under certain circumstances, as we shall see later in the course, and those circumstances hold for this particular power series, but until we’ve proved that \sin(x) is given by this particular power series we don’t know that the conditions hold.

Taylor’s theorem helps us to clear up these difficulties. Applying it with x replaced by 0 and h replaced by x, we find that

\sin(x)=\sin(0)+x\cos(0)+\dots+\frac{x^{n-1}}{(n-1)!}\sin^{(n-1)}(0)+\frac{x^n}{n!}\sin^{(n)}(\theta x)

for some \theta\in(0,1). All the terms apart from the last one are just the expected terms in the power series for \sin(x), so we get that \sin(x) is equal to the partial sum of the power series up to the term in x^{n-1} plus a remainder term.

The remainder term is \frac{x^n}{n!}\sin^{(n)}(\theta x), so its magnitude is at most \frac{x^n}{n!}. It is not hard to prove that \frac{x^n}{n!} tends to zero as n\to\infty. (One way to do this is to observe that the ratio of successive terms has magnitude at most 1/2 once n is bigger than 2x.) Therefore, the power series converges for every x, and converges to \sin(x).

The basic technique here is as follows.

(i) Write down what Taylor’s theorem gives you for your function.

(ii) Prove that for each x (in the range where you want to prove that the power series converges) the remainder term tends to zero as n tends to infinity.

Taylor’s theorem with the Peano form of the remainder

The material in this section is not on the course, but is still worth thinking about. It begins with the definition of a derivative, which, as I said in lectures, can be expressed as follows. A function f is differentiable at x with derivative \lambda if

f(x+h)=f(x)+\lambda h+o(h)

We can think of f(x)+\lambda h as the best linear approximation to f(x+h) for small h.

Once we’ve said that, it becomes natural to ask for the best quadratic approximation, and in general for the best approximation by a polynomial of degree n.

Let’s think about the quadratic case. In the light of Taylor’s theorem it is natural to expect that

\displaystyle f(x+h)=f(x)+hf'(x)+\frac{h^2}2f''(x)+o(h^2)

in which case f(x)+hf'(x)+\frac{h^2}2f''(x) would indeed be the best quadratic approximation to f(x+h) for small h.

What Taylor’s theorem as stated above gives us is

\displaystyle f(x+h)=f(x)+hf'(x)+\frac{h^2}2f''(x+\theta h)

for some \theta\in(0,1). If we know that f'' is continuous at x, then f''(x+\theta h)\to f''(x) as h\to 0, so we can write f''(x+\theta h)=f''(x)+\epsilon(h), where \epsilon(h)\to 0. But then \frac{h^2}2f''(x+\theta h)=\frac{h^2}2f''(x)+o(h^2), as we wanted, since \frac{h^2}2\epsilon(h)=o(h^2).

However, this result does not need the continuity assumption, so let me briefly prove it. To keep the expressions simple I will prove only the quadratic case, but the general case is pretty well exactly the same.

I’ll do the same trick as usual, by which I mean I’ll first prove it when various things are zero and then I’ll deduce the general case. So let’s suppose that f(x)=f'(x)=f''(x)=0. We want to prove now that f(x+h)=o(h)^2.

Since f'(x)=f''(x)=0, we have that

f'(x+h)=f'(x)+hf''(x)+o(h)=o(h)

Therefore, for every \epsilon>0 we can find \delta>0 such that |f'(x+h)|<\epsilon|h| for every h with |h|<\delta.

This gives us several inequalities, one of which is that f'(x+h)<\epsilon h for every h such that 0<h<\delta. If we now set g(u) to be f(u)-\frac 12\epsilon (u-x)^2, then we have that g'(x+h)=f'(x+h)-\epsilon h<0 for every h\in(0,\delta). So by the mean value theorem, g(x+h)<0 for every such h, which implies that f(x+h)<\frac 12\epsilon h^2.

If we run a similar argument using the fact that f'(x+h)>-\epsilon h we get that f(x+h)>-\frac 12\epsilon h^2. And we can do similar arguments with h<0 as well, and the grand conclusion is that whenever |h|<\delta we have |f(x+h)|<\frac 12\epsilon h^2.

What we have shown is that for every \epsilon>0 there exists \delta>0 such that |f(x+h)/h^2|<\epsilon whenever |h|<\delta, which is exactly the statement that f(x+h)/h^2\to 0 as h\to 0, which in turn is exactly the statement that f(x+h)=o(h^2).

That does the proof when f(x)=f'(x)=f''(x)=0. Now let’s take a general f and define a function g by

g(u)=f(u)-f(x)-(u-x)f'(x)-\frac 12(u-x)^2f''(x)

Then g(x)=g'(x)=g''(x)=0, so g(x+h)=o(h^2), from which it follows that

f(x+h)-f(x)-hf'(x)-\frac 12h^2f''(x)=o(h^2)

which after rearranging gives us the statement we wanted:

f(x+h)=f(x)+hf'(x)+\frac 12h^2f''(x)+o(h^2)

As I said above, this argument generalizes straightforwardly and gives us Taylor’s theorem with what is known as Peano’s form of the remainder, which is the following statement.

\displaystyle f(x+h)=f(x)+hf'(x)+\frac{h^2}{2!}f''(x)+\dots

\displaystyle \dots+\frac{h^n}{n!}f^{(n)}(x)+o(h^n)

For that we need f^{(n)}(x) to exist but we do not need f^{(n)} to exist anywhere else, so we certainly don’t need any continuity assumptions on f^{(n)}.

This version of Taylor’s theorem is not as useful as versions with an explicit formula for the remainder term, as you will see if you try to use it to prove that f(x)=\sin(x) can be expanded as a power series: the information that the remainder term is o(x^n) is, for fixed x, of no use whatever. But the information that it is \frac{x^{n+1}}{(n+1)!}f^{(n+1)}(\theta x) gives us an expression that we can prove tends to zero.

An expression for f''(x)

However, one amusing (but not, as far as I know, useful) thing it gives us is a direct formula for the second derivative. By direct I mean that we do not go via the first derivative. Let us take the quadratic result and apply it to both h and 2h. We get

\displaystyle f(x+h)=f(x)+hf'(x)+\frac 12h^2f''(x)+o(h^2)

and

\displaystyle f(x+2h)=f(x)+2hf'(x)+2h^2f''(x)+o(h^2)

From this it follows that

f(x+2h)-2f(x+h)+f(x)=h^2f''(x)+o(h^2)

Dividing through by h^2 we get that

\displaystyle \frac{f(x+2h)-2f(x+h)+f(x)}{h^2}\to f''(x)

as h\to 0.

I’m not claiming the converse, which would say that if this limit exists, then f is twice differentiable at x. In fact, f doesn’t even have to be once differentiable at x. Consider, for example, the following function. For every integer n (either positive or negative) and every x in the interval (2^n,2^{n+1}] we set f(x) equal to 2^n. We also set f(0)=0, and we take f(x)=-f(-x) when x<0. (That is, for negative x we define f so as to make it an odd function.)

Then f(2x)=2f(x) for every x, so \frac{f(2h)-2f(h)+f(0)}{h^2}=0 for every h, and in particular it tends to 0 as h\to 0. However, f is not differentiable at 0. To see this, note that when h=2^n we have f(h)/h=1, whereas when h is close to 2^{n+1} we have f(h)/h close to 1/2. Therefore, the ratio (f(h)-f(0))/h does not converge as h\to 0, which tells us that f is not differentiable at 0.

If you want an example that is continuous everywhere, then take f(x)=x\sin(2\pi\log_2|x|). This again has the property that f(2x)=2f(x) for every x, and it is not differentiable at 0.

Even if we assume that f is differentiable, we can’t get a proper converse. For example, the condition

f(x+2h)-2f(x+h)+f(x)=o(h^2)

does not imply that f''(x) exists and equals 0. For a counterexample, take a function such as x^4\sin(1/x^{20}) (and 0 at 0). Then f(2h)-2f(h)+f(0) must lie between \pm((2h)^4+2h^4) and therefore certainly be o(h^2). But the oscillations near zero are so fast that f' is unbounded near zero, so f'' doesn’t exist at 0.

About these ads

17 Responses to “Taylor’s theorem with the Lagrange form of the remainder”

  1. chorasimilarity Says:

    Re: “However, one amusing (but not, as far as I know, useful) thing it gives us is a direct formula for the second derivative.” finite difference method for the laplacian.

  2. Ryan Reich Says:

    This proof of Taylor’s theorem with the Lagrange remainder is virtually identical to the one in the book “Advanced Calculus of Several Variables” by CH Edwards, Jr (despite the title, this is one of the few one-variable topics that he covers explicitly). I really like it, though he doesn’t do the Peano form and, consequently, assumes excessive hypotheses on the differentiability of his functions in various places.

  3. mathtuition88 Says:

    Reblogged this on Singapore Maths Tuition.

  4. Tom Leinster Says:

    Looks like there’s a typo in the first sentence of the second paragraph of “A higher order Rolle theorem”: n should be n – 1.

    Thanks very much — I’ve corrected it now.

  5. mixedmath Says:

    I also don’t like how most books use the Cauchy Mean Value theorem to prove Taylor’s Theorem. So I gave a slightly different proof and look on my blog. It’s pitched at a slightly lower level (for my calc I students), but the gist is there.

    In short, I start with (in the 2nd order case – they’re all the same) f''(t) = f''(0) + f'''(c)t from the mean value theorem applied to f''(x). Integrating from 0 to x, and a little rearrangement, gives f'(x) = f'(0) + f''(0)x + f'''(c)x^2/2. Integrating again gives the third order Taylor polynomial, with Lagrange remainder. This clearly generalizes.

    I like this one because it’s a reasonable extension of what my students had recently learned: the fundamental theorem of calculus, and I had a lot more success with this version than my previous attempts.

  6. Jean-Philippe Burelle Says:

    I am pretty sure that I remember the expression you wrote for the second derivative is used in order to estimate second derivatives numerically, since you cannot take the derivative of the first derivative if you only have some sample values of your function at a few points.

  7. jessemckeown Says:

    1) Yes, this is The Proof. It belongs in the Book.

    2) Some fun might be had in generalizing the higher Rolle’s Theorem: for example, f^{(k)}(0) = f^{(l)}(1) = 0 ; k \leq K , l \leq L, K + L = n, which leads one to consider the Bernstein basis polynomials.

  8. pastafarianist Says:

    > It turns out that we can more or less write down such a polynomial, once we have observed that the polynomial q_k(x)=(u-x)^k/k! has the convenient property that q_k{(j)}(x)=0 except when j=k when it is 1.

    You probably meant q_k^{(j)}(x)=0

    Thanks — I’ve corrected it now.

    • pastafarianist Says:

      Also, q_k(u)=(u-x)^k/k! instead of q_k(x)=(u-x)^k/k!.
      The choice of the names of the variables is really unfortunate. Why didn’t you resort to the more standard x_0 for the fixed point and x for the function argument?
      Am I really unable to edit and even preview my own comments?

  9. Anonymous Says:

    Very nice! I have had exactly the same feelings in the past about the standard proofs of Taylor’s theorem.

    I have one comment here on possible student confusion. I have had problems in the past with treating $x$ as a constant during proofs for my students. And a question I once asked involving the phrase

    Prove that there exist $x \in [0,2]$ and $n \in \mathbb{N}$ with $f(x)=x^n$

    resulted in mass carnage!

    The potential confusion (for some students) in the exposition above comes from the fact that the various derivatives of $q$ are with taken with respect to $u$, even when they are evaluated at $x$. Because of this, I suppose that I would probably play it safe and use $a$ instead of $x$ here, simply because students are unlikely to think that you might be differentiating with respect to $a$. (That also frees up $x$ as a variable name, in case you want to use $x$ instead of $$ latex u$$, for example.)

    However it is probably very good for students to get used to treating $x$ as a constant during a proof and to differentiate with respect to another variable instead. So perhaps allowing some students to get temporarily confused is best, as long as the confusion is readily resolved!

    • Joel Says:

      Ah, I see here that (a) I accidentally commented anonymously and (b) I have mistakenly used double dollars instead of single dollars (as well as making at least one basic typo).

    • Joel Says:

      To be precise, I think my old exam question (from about 14 years ago) went as follows.

      Suppose that f:[-2,2] \to \mathbb{R} is continuous. Prove that there exist x \in [-2,2] and n \in \mathbb{N} such that f(x)=x^n.

      Very few students appeared to understand what this question was asking. A lot of students assumed or tried to prove that the function f actually was the function x \mapsto x^n.

      I think more students would have at least understood the question if I had used a (or x_0) instead of x. (Whether they would have then been able to solve the problem is another matter.)

  10. A more natural proof of Taylor’s Theorem? | Explaining mathematics Says:

    […] for proving Taylor’s Theorem (with the Lagrange form of the remainder), have a look at Timothy Gowers’s blog post on this, where you will find what is, perhaps, a more natural proof than the usual […]

  11. WJ Says:

    R. P. Burn’s “Numbers and Functions” (recommended to students taking the Analysis I course at Cambridge) explicitly introduces the “second mean value theorem”, the “third mean value theorem” and the “n-th mean value theorem”, or Taylor’s Theorem (WLFOTR). Quite aside from this, it is an excellent introduction to the subject; its choice of completeness axiom (every real number has a decimal expansion) isn’t very clean, but at least it is natural.

  12. Lior Silberman Says:

    On the hypotheses: if f is differentiable in a neighbourhood of an interval, it is also continuous in that interval. That said, the usual hypotheses of Rolle’s Theorem are continuity in the closed interval, and differentiability in the open one. The generalized version should therefore assume n-1 continuous derivatives in [x,x+h] (one-sided derivatives at the endpoints), and n derivatives in (x,x+h). The hypotheses for Taylor’s Theorem should then be the same.

  13. David Stewart Says:

    This proof is very much like the usual proof of the formula for the error in polynomial interpolation. (See, e.g., Atkinson’s Intro to Numerical Analysis.) Personally, I prefer proving the Taylor theorem with integral remainder as it remains true for vector-valued functions while the derivative form of the remainder is only valid in the scalar case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 1,596 other followers

%d bloggers like this: