Another topic on the syllabus for the probability course I am giving is Stirling’s formula. This was lectured to me when I was an undergraduate but I had long since forgotten the proof completely. In fact, I’d even forgotten the precise statement, so I had some mugging up to do. It turned out to be hard to find a proof that didn’t look magic: the arguments had a “consider the following formula, do such-and-such, observe that this is less than that, bingo it drops out” sort of flavour. I find magic proofs impossible to remember, so I was forced to think a little. As a result, I came up with an argument that was mostly fully motivated, but there is one part that I still find mysterious. On the other hand, it looks so natural that I’m sure somebody can help me find a good explanation for why it works. When I say “I came up with an argument” what I mean is that I came up with a way of presenting an existing argument that doesn’t involve the pulling of rabbits out of hats, except in the place I’m about to discuss.
The first idea in proving Stirling’s formula is a very natural one: estimate instead. Since that equals , it is natural to approximate the sum by the integral . However, if you do that then the error will be around . This gives you a good rough-and-ready approximation to but nothing like as good as Stirling’s formula.
The next idea is to turn things round and try to approximate the integral by a sum rather than the other way round. The obvious next approximation to try is approximating by the function , where this is defined to equal when is an integer and to be linear in between consecutive integers. Then you can expect a much smaller error, and indeed it is not hard to prove that the difference between the integrals of the two functions is bounded. (The proof I gave for this dropped out rather nicely: if is a positive integer then by looking at derivatives it is easy to show that between and we have the inequalities . Therefore, the difference between the integrals of and between and is at most . This gives a nice telescoping sum as an upper bound for the sum of all those areas.)
Another important observation is that the integral of between 1 and works out to be (because the integral between and is ). Thus, we have a rather close relationship between and an integral that we can calculate: .
It is not hard to see that this argument shows that the ratio of to tends to a limit. The one remaining magic part is the calculation of that limit. To do this, we define to be the integral . We have established that the sequence converges; our problem is equivalent to working out what the limit is. Here there is a very nice trick, which is to consider instead the sequence , which clearly converges to the same limit. But if you calculate you find that you can get a very good estimate for it if and only if you can get a very good estimate for , essentially because inside what you write for you get , which is . This trick is I think reasonably non-magic: if you know in advance that you can calculate very accurately, then it is natural to try to use that information, and there is essentially only one way of doing that.
Even the fact that one can get a very good handle on is not as magic as all that, since it is equivalent to estimating the probability that a subset of a set of size has exactly elements. And that one can think about in all sorts of ways. For example, one can use the central limit theorem to estimate the probability that a random set has size between and and an elementary argument to show that the probabilities of the sizes within that range are all roughly equal.
However, we have not yet got to the central limit theorem in the course, so at this point I copied another proof, which goes roughly like this. You look at the integrals . By integrating by parts in a standard way, you can get the recurrence . Since the clearly decrease, we get from this that the ratio of to tends to 1. But we can also give an exact formula for this ratio by using the recurrence and calculations of and . If you do this and rearrange things a bit, then you find that you can show that tends to .
Finally, my question: is it just a coincidence that looking at the ratio of to gives you an expression that involves ? Surely it isn’t, but for the moment this part of the argument still looks to me like a convenient piece of magic. It was clean enough to be easily memorable, so this did not present a problem for the lecture, but I’d still like to know why those integrals are so closely related to the binomial coefficient.
It would be remiss of me not to include a statement of Stirling’s formula in this account of the proof: it says that the ratio of to tends to 1.