Recently I was giving a talk at a school in London and had cause to think about sums of kth powers, because I wanted to show people the difference between unenlightening proofs and enlightening ones. (My brief was to try to give them some idea of what proofs are and why they are worth bothering about.) On the train down, I realized that there is a very simple general argument that can be used to work out formulae for sums of kth powers. After a brief look online, I haven’t found precisely this method, though I’ve found plenty of methods in the vicinity of it. It’s bound to be in the literature somewhere, so perhaps somebody can point me towards a reference. But even so, it’s worth a short post.
I’ll start with the case . I want to phrase a familiar argument in a way that will make it easy to generalize in the way I want to generalize it. Let
be the rectangle consisting of all integer points
such that
and
. We can partition
into those points for which
and those points for which
. The number of points of the first kind is
, since for each
we get
possibilities for
. The number of points of the second kind is
, since for each
we get
possibilities for
. Therefore,
and we get the familiar formula.
Now let’s move to sums of squares. This time I’ll let be the set of points
with
,
and
. We can partition
into three sets, the first consisting of points for which
is maximal, the second of points for which
is maximal and
is not maximal, and the third of points for which
is strictly larger than both
and
. The numbers of points in these sets are easily seen to be
and
,
respectively. This gives us the formula
from which we get the familiar formula for the sum of the first
squares. Writing
for the sum of the first
th powers, we also get the relationship
A striking fact about power sums is that . One way of explaining this can be found in a rather nice animation that I came across as a result of a Google Plus post of Richard Green. Another comes from continuing with the approach here.
This time I’ll let be the set of points
such that
and
are between 1 and n and
and
are between 1 and
. Again I’ll partition according to which of
is the largest, taking the first one that’s largest when there is a tie. That gives me four sets. Here are their sizes.
first largest:
.
first largest:
.
first largest:
.
first largest:
.
These sizes can be written as , and
. So we get
, which gives us that
. It also gives us a kind of explanation of that fact: for
we decompose
into two equal pieces of size
, while for
we decompose
into four pieces that don’t quite all have size
but the two errors cancel out.
To see that this is a partial but not total coincidence, I’m going to jump to now. I’ll let
be the set of points
such that
are between 1 and
and
are between 1 and
. This time the calculations are as follows.
first largest:
.
first largest:
.
first largest:
.
first largest:
.
first largest:
.
first largest:
.
Adding all these up we find that . From that we get that
In general, if we use this method to calculate when
is odd, then (as with other methods) we obtain a relationship between
and earlier
. But what is nice about it is that there is a lot of cancellation: all the
for
even make a contribution of zero.
Indeed, if , then we have
sets, and their sizes are
,
,
, and so on down to
and then the same thing but with all minus signs in these linear combinations replaced by plus signs. Adding it all up we get a linear combination of ,
, … ,
equalling
, where
if
is even and
if
is odd. When
is small, we don’t have to take into account too many
, so the formulae remain quite nice for a while before they become disgusting.
Note that it is quite easy to work out the coefficients of the various in the above linear combination: they are just sums of binomial coefficients. Several other methods require one to solve simultaneous equations, though they are usually in triangular form, so not too bad.
A small remark is that the basic idea of this argument is to discretize a much easier continuous argument that shows that . That argument is to take the
-dimensional cube consisting of all points
such that each
belongs to the interval
and partition it into
pieces according to which coordinate is largest. (I call the argument geometrical because these pieces are pyramids with
-cube bases.) In the continuous case, we don’t have to worry about what happens if there is more than one largest coordinate, since that set has measure zero. Each piece has measure
, and there are
pieces, while the cube has measure
, so we are done.
A second remark is that the method I previously knew of for calculating sums of th powers is to exploit the fact that deserves to be called the natural discrete analogue of the above. Define
to be
. This we think of as the discrete analogue of
. Then we do a discrete analogue of differentiating, which is to look at
, which equals
. This is the discrete analogue of the fact that the derivative of
is
. Next, we use the discrete analogue of the fundamental theorem of calculus, which is the statement that
to deduce that
. This gives us for each
a polynomial of degree
that we can sum easily, namely
, and then to work out the sum of the first
th powers, we write it as a linear combination of the
, sum everything, and simplify the answer. That works fine, but the calculations are quite a bit more complicated than what I did above, and the proof is too algebraic to explain why the answers have the fairly nice forms they do. (For example, why is
a factor of the sum of the first
th powers whenever
is odd and at least 3? From the argument I gave earlier in the post, this follows fairly easily by induction.)
November 5, 2014 at 12:08 am |
1) Once upon a time David Bernstein explained to me the combinatorial meaning of the formula for the sum of m(m-1)…(m-k+1)/k! where m runs from 1 to n. To pick k+1 numbers out of the first n, we pick the biggest one in our future sample first, and then pick k numbers from the ones that are smaller.
2) From the finite difference considerations it is clear that the formula for the sum of the k-th powers is a polinomial of degree k+1 in n. Actually we can just guess that from examples. Then we can figure out the coefficients of this polynomial by fitting the small n data, and then finish the proof by induction.
November 5, 2014 at 4:53 am |
An approach similar to yours is explained in sections 3.1-3.4 of “Mathematical Discovery” by George Polya, where the whole capter 3 is called “Recursion.”
November 5, 2014 at 1:59 pm
Thanks — I’ll check that out.
November 5, 2014 at 5:58 am |
Reblogged this on Singapore Maths Tuition and commented:
Very nice proof for the sum of kth powers. Recommended for reading!
November 5, 2014 at 8:08 am |
I’ll look at this proof later, but I have a somewhat off topic comment: I find the TeX’d formulas really hard to read, as the font is so much thinner than the surrounding text font. I am old enough to have developed some cataracts, and reading the mathematics here takes real effort. Could you consider either moving to MathJax, or else configuring your setup to use a fatter font? I imagine either stix or Knuth’s concrete fonts would be an improvement. But MathJax, or some clever solution uses MathML is best, for then the formula will scale along when I scale up the text.
November 10, 2014 at 6:22 pm
I am using the latest Firefox on a Mac; for me, the math does scale properly along with the text, when I make the text larger using cmd-+.
November 5, 2014 at 9:01 am |
This is nice! Seems in the
case that you want the subdivision to be
and
, which the current version has reversed.
Thanks — I’ve corrected that now.
November 5, 2014 at 11:00 am |
Looks like inclusion-exclusion.
Not sure if that’s significant or obvious.
November 5, 2014 at 2:13 pm
It is significant, in that it can be used to give a slightly different method of proof. I’ll illustrate it with summing squares. This time, let’s take the discrete cube consisting of all points
with every coordinate between 1 and
, and let’s simply count points with
maximal, points with
maximal and points with
maximal. I’ll call these sets
and
.
Then
and
each have size
. The intersection of any two of them has size
. And finally the intersection of all three of them has size
. So by the inclusion-exclusion formula we get
That gives us that
, which equals
.
In general, if we do things this way, we get that
For example,
, so
.
This formula is perhaps simpler conceptually than the one in the post, but requires more calculation because you have to calculate all the earlier
.
November 6, 2014 at 2:32 pm |
A really quick approach to summing
th powers, which I think is different from both your method here and the
method you mention at the end (which was previously the only one I knew too) was pointed out to me in 2008 by Gareth Taylor, which is to observe that (for example)
and since most of the LHS sum of
cancels against most of the RHS sum of
, we can rearrange trivially to get a formula for the next term down, namely
:
This approach generalises to give a formula for each
in terms of all the previous ones:
and so on in an obvious pattern.
November 6, 2014 at 6:27 pm
That is indeed very easy conceptually, though the
trick in the approach in the post leads to cancellation that means that it requires less calculation. Also, with this approach it looks like a miracle that
ends up simplifying down to
, rather than being a very natural consequence of the argument.
November 7, 2014 at 12:45 am |
[…] Last week, I put up a post on deriving a formula for for positive integer , using the method of telescoping sums. Interestingly enough, Tim Gowers, a Fields Medallist, just put up a post outlining a very elegant outline of how to derive these formulas. (See his post here.) […]
November 10, 2014 at 8:42 pm |
Very nice! Tangentially related: do you happen to know what (if any) is the next term in the sequence of equations:
$$3^2 + 4^2 = 5^2$$
$$3^3 + 3^4 + 3^5 = 3^6$$
?
November 10, 2014 at 8:43 pm
(Also: how do I make part of my replies into Latex if it is apparently not by typing ‘$$’?)
November 11, 2014 at 12:33 am
O I misspelled, I meant:
$$3^2 + 4^2 = 5^2$$
$$3^3 + 4^3 + 5^3 = 6^3$$
which is not only more related to the post than what I wrote, but also more true.
November 12, 2014 at 10:09 am
I didn’t know that
. (To get the latex to appear one writes @latex 3^3+4^3+5^3=6^3@ except that instead of the @ signs one puts $ signs.)
I presume the obvious pattern doesn’t continue. In other words, I presume that it is not true that
. Indeed, it can’t hold because the left-hand side is even.
One could also ask the much more general question of whether there are other examples of identities of the form
For that to hold, we need
, which means that
must be small enough that the powers are still increasing pretty fast. So we need
to be of a similar order of magnitude to
. This order of magnitude should be around
, so that needs to be around
. A back-of-envelope calculation then tells us that
will have to be of similar magnitude to
. But whether there are any more solutions I don’t know: I wouldn’t expect an infinite family of solutions.
In fact, I think there could be a heuristic argument that suggests that there are only finitely many solutions. Roughly speaking, for each
there will be only a small number of
s that could possibly work. If you now test each one, it will have a probability of … this is the bit I’m not quite sure about, but I’m pretty confident that the answer is small … something like
of actually working. Summing over all
gives a finite result.
November 16, 2014 at 6:33 am |
Very nice argument. btw there is a typo right below “and so on down to” where
should be 
Thanks — I’ve put that right now.
November 18, 2014 at 12:00 am |
Hey Tim how are you? How is the maths going? Tom Bancroft ex-Fineline drummer
June 3, 2015 at 9:21 am |
Very nice article.
September 15, 2015 at 10:05 pm |
My approach for summing kth powers is similar to the
approach, but uses binomial coefficients, in Pascal’s triangle. First we need to write n^k in terms of binomial coefficients. Let’s try k=3. I write the values for j^3 on a line and then I look at differences, writing those on a second line. Then I write the second line differences on the third line, etc.
0, 1, 8, 27, 64, 125, …
1, 7, 19, 37, 61, …
6, 12, 18, 24, …
6, 6, 6, 6, …
0, 0, 0, …
From this I read the first column to tell me that j^3 = 0 (j choose 0) + 1 (j choose 1) + 6 (j choose 2) + 6 (j choose 3). Then I observe on Pascal’s triangle that the sum of (j choose i) from j = 0 to n is simply (n+1 choose i+1). Thus S_k(n) = 0 (n+1 choose 1) + 1 (n+1 choose 2) + 6 (n+1 choose 3) + 6 (n+1 choose 4).
April 10, 2017 at 9:44 pm |
There is a simple mehtod to work out the result. Namely,
S’_k = kS_(k-1) + constant
Actually the constant = (-1)^k * B(k), B(k) is the kth Bernoulli Number
B(0) = 1, B(1) = -1/2, B(2) = 1/6…
Even if you do not Bernoulli Numbers, you may use simple substitution to wok out the “constant)
using simple integration and substitution, you can then work out all kth power sums.
For example,
S0 = n, S1 = (n^2)/2 + cn (integration of S0)
put n = 1, S1(1) = 1 = (1^2)/2 + c, c= 1/2
hence, S1 = (n^2)/2 + n/2
2*S1 = (n^2) + n
integrate 2*S1 = (n^3)/3 + (n^2)/2
hence S2 = (n^3)/3 + (n^2)/2 + cn
put n = 1, c = 1/6
S2 = (n^3)/3 + (n^2)/2 + n/6 = n(n+1)(2n+1)/6
S3, S4… can be worked out similarly.
December 6, 2018 at 2:25 pm |
“Application of derivative in finding sum of powers of natural numbers” SCHOOL SCIENCE 1994 is also worth reading.
March 16, 2022 at 2:52 pm |
[…] The formula follows very nicely from a combinatorial argument (due to Timothy Gowers, see his blog for all the […]
October 15, 2022 at 1:12 pm |
[…] order to calculate k th power of a matrix, we must first determine its k th power. The k t h power of a matrix is multiplied by the total number of components to calculate the amount of matrix k. […]