Recently I was giving a talk at a school in London and had cause to think about sums of kth powers, because I wanted to show people the difference between unenlightening proofs and enlightening ones. (My brief was to try to give them some idea of what proofs are and why they are worth bothering about.) On the train down, I realized that there is a very simple general argument that can be used to work out formulae for sums of kth powers. After a brief look online, I haven’t found precisely this method, though I’ve found plenty of methods in the vicinity of it. It’s bound to be in the literature somewhere, so perhaps somebody can point me towards a reference. But even so, it’s worth a short post.

I’ll start with the case . I want to phrase a familiar argument in a way that will make it easy to generalize in the way I want to generalize it. Let be the rectangle consisting of all integer points such that and . We can partition into those points for which and those points for which . The number of points of the first kind is , since for each we get possibilities for . The number of points of the second kind is , since for each we get possibilities for . Therefore, and we get the familiar formula.

Now let’s move to sums of squares. This time I’ll let be the set of points with , and . We can partition into three sets, the first consisting of points for which is maximal, the second of points for which is maximal and is not maximal, and the third of points for which is strictly larger than both and . The numbers of points in these sets are easily seen to be

and

,

respectively. This gives us the formula

from which we get the familiar formula for the sum of the first squares. Writing for the sum of the first th powers, we also get the relationship

A striking fact about power sums is that . One way of explaining this can be found in a rather nice animation that I came across as a result of a Google Plus post of Richard Green. Another comes from continuing with the approach here.

This time I’ll let be the set of points such that and are between 1 and n and and are between 1 and . Again I’ll partition according to which of is the largest, taking the first one that’s largest when there is a tie. That gives me four sets. Here are their sizes.

first largest: .

first largest: .

first largest: .

first largest: .

These sizes can be written as , and . So we get , which gives us that . It also gives us a kind of explanation of that fact: for we decompose into two equal pieces of size , while for we decompose into four pieces that don’t quite all have size but the two errors cancel out.

To see that this is a partial but not total coincidence, I’m going to jump to now. I’ll let be the set of points such that are between 1 and and are between 1 and . This time the calculations are as follows.

first largest: .

first largest: .

first largest: .

first largest: .

first largest: .

first largest: .

Adding all these up we find that . From that we get that

In general, if we use this method to calculate when is odd, then (as with other methods) we obtain a relationship between and earlier . But what is nice about it is that there is a lot of cancellation: all the for even make a contribution of zero.

Indeed, if , then we have sets, and their sizes are , , , and so on down to

and then the same thing but with all minus signs in these linear combinations replaced by plus signs. Adding it all up we get a linear combination of , , … , equalling , where if is even and if is odd. When is small, we don’t have to take into account too many , so the formulae remain quite nice for a while before they become disgusting.

Note that it is quite easy to work out the coefficients of the various in the above linear combination: they are just sums of binomial coefficients. Several other methods require one to solve simultaneous equations, though they are usually in triangular form, so not too bad.

A small remark is that the basic idea of this argument is to discretize a much easier continuous argument that shows that . That argument is to take the -dimensional cube consisting of all points such that each belongs to the interval and partition it into pieces according to which coordinate is largest. (I call the argument geometrical because these pieces are pyramids with -cube bases.) In the continuous case, we don’t have to worry about what happens if there is more than one largest coordinate, since that set has measure zero. Each piece has measure , and there are pieces, while the cube has measure , so we are done.

A second remark is that the method I previously knew of for calculating sums of th powers is to exploit the fact that deserves to be called the natural discrete analogue of the above. Define to be . This we think of as the discrete analogue of . Then we do a discrete analogue of differentiating, which is to look at , which equals . This is the discrete analogue of the fact that the derivative of is . Next, we use the discrete analogue of the fundamental theorem of calculus, which is the statement that to deduce that . This gives us for each a polynomial of degree that we can sum easily, namely , and then to work out the sum of the first th powers, we write it as a linear combination of the , sum everything, and simplify the answer. That works fine, but the calculations are quite a bit more complicated than what I did above, and the proof is too algebraic to explain why the answers have the fairly nice forms they do. (For example, why is a factor of the sum of the first th powers whenever is odd and at least 3? From the argument I gave earlier in the post, this follows fairly easily by induction.)

November 5, 2014 at 12:08 am |

1) Once upon a time David Bernstein explained to me the combinatorial meaning of the formula for the sum of m(m-1)…(m-k+1)/k! where m runs from 1 to n. To pick k+1 numbers out of the first n, we pick the biggest one in our future sample first, and then pick k numbers from the ones that are smaller.

2) From the finite difference considerations it is clear that the formula for the sum of the k-th powers is a polinomial of degree k+1 in n. Actually we can just guess that from examples. Then we can figure out the coefficients of this polynomial by fitting the small n data, and then finish the proof by induction.

November 5, 2014 at 4:53 am |

An approach similar to yours is explained in sections 3.1-3.4 of “Mathematical Discovery” by George Polya, where the whole capter 3 is called “Recursion.”

November 5, 2014 at 1:59 pm

Thanks — I’ll check that out.

November 5, 2014 at 5:58 am |

Reblogged this on Singapore Maths Tuition and commented:

Very nice proof for the sum of kth powers. Recommended for reading!

November 5, 2014 at 8:08 am |

I’ll look at this proof later, but I have a somewhat off topic comment: I find the TeX’d formulas really hard to read, as the font is so much thinner than the surrounding text font. I am old enough to have developed some cataracts, and reading the mathematics here takes real effort. Could you consider either moving to MathJax, or else configuring your setup to use a fatter font? I imagine either stix or Knuth’s concrete fonts would be an improvement. But MathJax, or some clever solution uses MathML is best, for then the formula will scale along when I scale up the text.

November 10, 2014 at 6:22 pm

I am using the latest Firefox on a Mac; for me, the math does scale properly along with the text, when I make the text larger using cmd-+.

November 5, 2014 at 9:01 am |

This is nice! Seems in the case that you want the subdivision to be and , which the current version has reversed.

Thanks — I’ve corrected that now.November 5, 2014 at 11:00 am |

Looks like inclusion-exclusion.

Not sure if that’s significant or obvious.

November 5, 2014 at 2:13 pm

It is significant, in that it can be used to give a slightly different method of proof. I’ll illustrate it with summing squares. This time, let’s take the discrete cube consisting of all points with every coordinate between 1 and , and let’s simply count points with maximal, points with maximal and points with maximal. I’ll call these sets and .

Then and each have size . The intersection of any two of them has size . And finally the intersection of all three of them has size . So by the inclusion-exclusion formula we get

That gives us that , which equals .

In general, if we do things this way, we get that

For example, , so .

This formula is perhaps simpler conceptually than the one in the post, but requires more calculation because you have to calculate all the earlier .

November 6, 2014 at 2:32 pm |

A really quick approach to summing th powers, which I think is different from both your method here and the method you mention at the end (which was previously the only one I knew too) was pointed out to me in 2008 by Gareth Taylor, which is to observe that (for example)

and since most of the LHS sum of cancels against most of the RHS sum of , we can rearrange trivially to get a formula for the next term down, namely :

This approach generalises to give a formula for each in terms of all the previous ones:

and so on in an obvious pattern.

November 6, 2014 at 6:27 pm

That is indeed very easy conceptually, though the trick in the approach in the post leads to cancellation that means that it requires less calculation. Also, with this approach it looks like a miracle that ends up simplifying down to , rather than being a very natural consequence of the argument.

November 7, 2014 at 12:45 am |

[…] Last week, I put up a post on deriving a formula for for positive integer , using the method of telescoping sums. Interestingly enough, Tim Gowers, a Fields Medallist, just put up a post outlining a very elegant outline of how to derive these formulas. (See his post here.) […]

November 10, 2014 at 8:42 pm |

Very nice! Tangentially related: do you happen to know what (if any) is the next term in the sequence of equations:

$$3^2 + 4^2 = 5^2$$

$$3^3 + 3^4 + 3^5 = 3^6$$

?

November 10, 2014 at 8:43 pm

(Also: how do I make part of my replies into Latex if it is apparently not by typing ‘$$’?)

November 11, 2014 at 12:33 am

O I misspelled, I meant:

$$3^2 + 4^2 = 5^2$$

$$3^3 + 4^3 + 5^3 = 6^3$$

which is not only more related to the post than what I wrote, but also more true.

November 12, 2014 at 10:09 am

I didn’t know that . (To get the latex to appear one writes @latex 3^3+4^3+5^3=6^3@ except that instead of the @ signs one puts $ signs.)

I presume the obvious pattern doesn’t continue. In other words, I presume that it is

nottrue that . Indeed, it can’t hold because the left-hand side is even.One could also ask the much more general question of whether there are other examples of identities of the form

For that to hold, we need , which means that must be small enough that the powers are still increasing pretty fast. So we need to be of a similar order of magnitude to . This order of magnitude should be around , so that needs to be around . A back-of-envelope calculation then tells us that will have to be of similar magnitude to . But whether there are any more solutions I don’t know: I wouldn’t expect an infinite family of solutions.

In fact, I think there could be a heuristic argument that suggests that there are only finitely many solutions. Roughly speaking, for each there will be only a small number of s that could possibly work. If you now test each one, it will have a probability of … this is the bit I’m not quite sure about, but I’m pretty confident that the answer is small … something like of actually working. Summing over all gives a finite result.

November 16, 2014 at 6:33 am |

Very nice argument. btw there is a typo right below “and so on down to” where should be

Thanks — I’ve put that right now.November 18, 2014 at 12:00 am |

Hey Tim how are you? How is the maths going? Tom Bancroft ex-Fineline drummer

June 3, 2015 at 9:21 am |

Very nice article.

September 15, 2015 at 10:05 pm |

My approach for summing kth powers is similar to the approach, but uses binomial coefficients, in Pascal’s triangle. First we need to write n^k in terms of binomial coefficients. Let’s try k=3. I write the values for j^3 on a line and then I look at differences, writing those on a second line. Then I write the second line differences on the third line, etc.

0, 1, 8, 27, 64, 125, …

1, 7, 19, 37, 61, …

6, 12, 18, 24, …

6, 6, 6, 6, …

0, 0, 0, …

From this I read the first column to tell me that j^3 = 0 (j choose 0) + 1 (j choose 1) + 6 (j choose 2) + 6 (j choose 3). Then I observe on Pascal’s triangle that the sum of (j choose i) from j = 0 to n is simply (n+1 choose i+1). Thus S_k(n) = 0 (n+1 choose 1) + 1 (n+1 choose 2) + 6 (n+1 choose 3) + 6 (n+1 choose 4).

April 10, 2017 at 9:44 pm |

There is a simple mehtod to work out the result. Namely,

S’_k = kS_(k-1) + constant

Actually the constant = (-1)^k * B(k), B(k) is the kth Bernoulli Number

B(0) = 1, B(1) = -1/2, B(2) = 1/6…

Even if you do not Bernoulli Numbers, you may use simple substitution to wok out the “constant)

using simple integration and substitution, you can then work out all kth power sums.

For example,

S0 = n, S1 = (n^2)/2 + cn (integration of S0)

put n = 1, S1(1) = 1 = (1^2)/2 + c, c= 1/2

hence, S1 = (n^2)/2 + n/2

2*S1 = (n^2) + n

integrate 2*S1 = (n^3)/3 + (n^2)/2

hence S2 = (n^3)/3 + (n^2)/2 + cn

put n = 1, c = 1/6

S2 = (n^3)/3 + (n^2)/2 + n/6 = n(n+1)(2n+1)/6

S3, S4… can be worked out similarly.

December 6, 2018 at 2:25 pm |

“Application of derivative in finding sum of powers of natural numbers” SCHOOL SCIENCE 1994 is also worth reading.

March 16, 2022 at 2:52 pm |

[…] The formula follows very nicely from a combinatorial argument (due to Timothy Gowers, see his blog for all the […]

October 15, 2022 at 1:12 pm |

[…] order to calculate k th power of a matrix, we must first determine its k th power. The k t h power of a matrix is multiplied by the total number of components to calculate the amount of matrix k. […]