Note added 12/9/09: It seems that many people are looking at this post, because Derren Brown claims to have used “deep mathematics” combined with “the wisdom of crowds” to predict the lottery. All I can say is that this is obvious nonsense. Whatever method you use to predict the lottery, the drawing of the balls is a random process, so you will not improve your chances of being correct. Brown has done a clever trick — I won’t speculate about his methods, as I’m not interested enough in them — but his explanation of how he did it is not to be taken seriously.
In this post I shall discuss the proofs of two statements in real analysis, one of which is clearly deeper than the other. My aim is to shed some small light on what it is that we mean when we make that judgment. A related aim is to try to demonstrate that a computer is in principle capable of “having mathematical ideas”. To do these two things I shall attempt to explain how an automatic theorem prover might go about proving the two statements in real analysis: in one case this is quite easy and in the other quite hard but by no means impossible. In the hard case what interests me is the precise ways that it is hard, which I think say something about the notion of depth in mathematics.
The first statement is that if a function is continuous, then for every and every sequence that converges to we also have that converges to . Most experienced mathematicians would not regard this as a deep statement because proving it is “just an exercise” rather than a result that “needs an idea”. (Spotting that it is a useful statement is a different matter, but it’s not what I’m talking about here.)
I shall now give a proof of the statement in such a way that every step of the proof is the obvious thing to do, where by “obvious” I mean in some sense “algorithmic”. I could go into more detail about how these obvious steps could be carried out by an actual program, but then what I wrote would be much less readable. So I’m compromising by giving a slightly higher-level discussion. But if anyone thinks I’m using that to sneak in a trick that only a human could spot then I’ll be happy to elaborate on parts of the proof later. So here goes.
We want to show that if is continuous, then it commutes with taking limits. Suppose then that converges to . We would like to prove that converges to . Therefore we must let and find some such that whenever . (That is just translating the definition, which is clearly an automatic process.)
How can we find such an ? Well, what could possibly imply that ? To answer this question we write out the information we have and simply look at it to see if anything has a statement resembling as a conclusion. And we notice that the definition of continuity of ends with the assertion . (For safety’s sake, we give all our dummy variables different names.) So we focus on the whole definition: .
So the obvious thing to try is taking to be , to be and to be . We’re free to do the first two as the above assertion starts with ““, but what about the third? Well, we also have a “” but it comes with the condition that , or rather , should be at most , which depends somehow on and . So we can conclude the following: if then .
Now let’s recall precisely what we want to prove. We would like to show that there exists such that if then . From what we have just shown, it will be sufficient to prove that . And now we see that just such a conclusion appears at the end of the definition of the convergence of to , and it is easy to see that the premise is exactly what we want too as it gives us our .
Incidentally, the converse of this statement can also be proved in a fully justified doable-in-principle-by-computer way too. (In fact, I discussed both directions many years ago on my web page.) However, I only really need one sample of “non-deep” mathematics to illustrate my point, so I won’t discuss the converse here. Instead, let’s move to the second statement, which is a beautiful problem that is often set to Cambridge undergraduates. It can serve either as a hard problem for those who have done a first course in analysis (one that perhaps one or two people per year are capable of solving) or a hardish exercise in applying the Baire category theorem.
It’s quite interesting to discuss how a computer could solve the problem if it had the hint that the Baire category theorem should be applied, but then it becomes more like the first example: it can be done by “pattern matching”. But then one would feel that the computer had cheated and been told the idea. So it is even more interesting to see how a computer could solve the problem from scratch, succeeding where all but a handful of students fail.
The statement in question is this. Let be a continuous function and suppose that for every the sequence tends to 0. Prove that tends to 0 as . If you haven’t seen this before and want to get the most out of this post then you should (of course) make a serious attempt to solve this beautiful problem before reading on.
On then to the proof. The aim, as with the previous result, is to present the proof in such a way that every step is “the obvious” thing to do, or at least sufficiently obvious that it would be one of the first things that a well-programmed computer would try.
Let us begin with steps that really are automatic, such as translating the problem into more formal language and converting “for all”s into “let”s (by which I mean that if you want to prove a statement that begins “for every x in A” you start by writing “let x be in A”, and so on). In this case we are trying to prove that tends to 0, so we let be positive. We would now like to find such that whenever . But simple pattern-matching lets us down: the obvious (and only) statement available to us that resembles is the statement that comes at the end of a formal definition of the hypothesis of the problem. And that is true only if is at least as big as some that depends on in a way that we know nothing about.
This is a point where many human mathematicians will feel stuck. But one move that sometimes helps is to do something else that can be easily automated and aim for a proof by contradiction, so let’s try it.
If we cannot achieve our goal then for every there exists such that . And now, if we want a contradiction, we must prove that there exists such that does not tend to 0. Writing out this last statement in full gives us .
Now we make three observations. First, it is fairly clear that we shall need to use the fact that is continuous. (I have thought quite hard about how a computer might come to this realization, or at least construct, when asked, an example of a discontinuous function that does not tend to zero despite the fact that always does. I’ll save my conclusions about that for another post. For now let us be satisfied with the idea that the continuity of was given as a hypothesis and the computer will naturally tend to see what comes of using the hypotheses.) Second, there is a promising resemblance between and . Third, we are trying to construct a real number that satisfies an infinite set of conditions, one for each .
Let us think about the last observation first. How does one construct a real number with infinitely many properties? A standard answer, and one that is closely bound up in the very idea of a real number, is to construct it as the limit of a sequence. But what will make that limit satisfy all the properties? To make this question slightly more concrete, let us call the properties and let the sequence be , converging to . We don’t have much to play with here: all we can say about each and each is whether or not has property . And our main information about is that the get close to it.
As we build our sequence , how can we make sure that its limit at least has property ? A natural answer is to insist that every belongs to some closed set , all of whose elements satisfy property . Here I am using “closed” in the sense of “closed under taking limits”. The most basic examples of closed sets are closed intervals, and this thought leads us to one of the basic theorems of real analysis: that a nested intersection of closed bounded intervals is non-empty. But I prefer to think of this statement as yet another version of the completeness axiom, and it leads us to the following basic real-number-constructing principle, which I would imagine a computer as having been taught rather than as having invented: if you want to construct a real number that has properties , then see if you can construct a nested sequence of closed bounded intervals (of non-zero length) such that every has property . If you can, then any in the intersection will have all the properties simultaneously.
Now let us return to the problem at hand. Property is the property that there exists such that . So let us suppose that we have already constructed the closed bounded interval and see if we can find a closed subinterval all of whose elements satisfy .
It would be nice to avoid the language of intervals, so let be the interval . Our task is then to find real numbers and such that (that makes a subinterval) and such that for every with there exists such that (that makes every element of satisfy property ).
Before we attack this (by now not terribly hard) problem it feels as though we need to choose our . From string-matching we more or less know that will depend on (since the only lower bounds we have for values of are that there are arbitrarily large with ). And here there is a very useful method that human mathematicians use all the time: just guess the simplest possible dependence and make adjustments if it doesn’t work. The simplest non-trivial dependence would be , which turns out to work later if we “adjust” it to . Another method, again used by human mathematicians, is “let be a positive real number to be chosen later”. Here one sort of pretends to have chosen and as the proof proceeds one finds that must satisfy certain conditions for the argument to work. One then shows that these conditions can be satisfied. Let us adopt the latter approach here.
At this stage a computer may well not see its way to the end of the proof, but another thing it can certainly do is look about for places to apply the continuity hypothesis. And since there is only one thing we know about the values taken by (that they have modulus for arbitrarily large ), there is only one place we can apply this hypothesis.
To elaborate, we know that , and we also know that . To apply the second statement we must choose values for and . The only real number we have around is (which depends on ) so let us choose to be . That gives us the statement
Rearranging this to bring the quantifiers as far as possible to the left, we obtain the equivalent statement
Now an applying-the-triangle-inequality module will leap into action and observe that from the final two inequalities it follows that . The end of the conclusion we are trying to obtain is for some , so string-matching tells us that we would like to be positive, and an elementary-inequality module will tell us that the simplest that is both positive and less than is . So let us make this choice of , take to be , and see what we have when we put all the quantifiers back. We have . (Incidentally, there should, strictly speaking, have been ““s in some of the earlier statements, but it is quite common, if sloppy, to regard those as sort of implied by the symbol ““.)
Now we observe that we are in a stronger position than before. Although is smaller than , all we knew about was that it was positive, and that’s all we know about . So from the perspective of proving the result, there is absolutely no loss in passing from to . (This is itself a useful principle that it would be quite good to make a bit more formal.) On the other hand, the range of values where is at least has increased from a single arbitrarily large to an interval of arbitrarily large s.
Suppose we have some and such that implies that . String-matching tells us that we will be done if we can choose and and an integer such that and such that whenever . Elementary manipulation of inequalities tells us that for this we need and . Let us pretend that we have chosen . Then the Archimedean principle tells us that we can find with and a standard reflex tells us that if there exists such an then there exists a minimal one. So let be minimal such that . Can we now find with such that ? Obviously we can if and only if . Now we have not yet chosen , so let us try to choose and in such a way that and .
Let us focus first on choosing . If we have chosen then we will be able to find our if and only if and . Let us choose the minimal that satisfies the second inequality. Then , so , so we are done if , for which we need to be at least . We have the lower bound , and can be made arbitrarily large (because we may as well impose the bound if it helps, which it does). Therefore, we can have our lower bound on .
That argument can of course be very significantly tidied up. The basic idea is that every sufficiently large real number is a multiple of an element of , since the intervals start to overlap when gets large. So if is sufficiently large then we can choose and with and . (But the computer needs an extra idea to try for the stronger statement rather than the weaker statement .) Having chosen such a pair we can easily choose in such a way that and .
If you agree that the above account does demonstrate that the proof can be generated in a fairly automatic way from a rather small and simple set of precise problem-solving techniques in real analysis, then there are two possible reactions. One would be to say that the proof is less deep than it looks, and only appears deep in its usual presentation because the Baire category theorem is used as a piece of magic (which we might refer to as the “insight” of certain mathematicians from around a century ago). Another would be to search for differences between the way the second proof is generated and the way the first one is. And a definite difference is that the second involves a process of construction: of the nested closed intervals . The fact that we just went ahead and did the construction in a fairly automatic way does place some upper bound on the depth, and it is perhaps the automatic nature of this stage of the proof that encourages one to abstract out what has been done and formulate the Baire category theorem.
Let me be more precise about the difference. At times during both proofs we needed to find real numbers with certain properties. However, only in the second proof did we need to find a real number with an infinite sequence of properties, which resulted from the fact that the string of quantifiers in the statement we wished to prove had “” after the ““. In order to find the real number with infinitely many properties, we converted the problem into one about constructing a sequence, which then meant that we were back in the world of finding real numbers with just finitely many properties.
I’ll end with a brief remark that I hope to elaborate on in a future post. It seems that a significant challenge for an automatic theorem prover is to deal with statements that do not have uniquely obvious proofs. (This of course relates to my earlier post on when two proofs are the same.) And this non-uniqueness often arises when a construction is involved. If one is asked to construct a mathematical object, one can sometimes create an artificial uniqueness by imposing extra properties on what one is constructing, or restrictions on how one goes about constructing it (such as always trying the simplest thing first, whatever that may mean in the given context). But it is not always easy.
To illustrate this, here is a simple problem where the non-uniqueness seems to create difficulties for a computer: find an injection from to . In a later post I shall discuss this example, but for now let me just try to explain why it is difficult. The reason is that if you want to approach it systematically, then you need to choose values of the function for each pair of integers in turn. But what does “in turn” mean? It seems to require you to put the pairs in order, and that is rather close to what you were asked to do in the first place. (However, “rather close” turns out not to mean “identical”.) I mention this problem here just to suggest in a tentative way that non-uniqueness of this kind may also be a major contributor to our perceptions of depth in mathematics.