In this post I shall continue some of the themes of the previous one by looking at fairly simple but not completely trivial mathematical statements and trying to imagine how a computer might systematically solve them. The main one comes later on: I shall look at how a computer might arrive at a construction of an injection from to . (If you don’t know what this means, it’s simple to explain: to each pair of positive integers one must assign a positive integer in such a way that no two pairs get assigned the same positive integer.)
Incidentally, I followed a suggestion in one of the comments on the previous post and looked up Coq. My reaction to it was mixed. On the positive side, it seems that Coq has a notion of a “tactic”, which does several of the basic steps of the kind that I claimed ought to be simple, such as taking a hypothesis and seeing where it can be directly applied. On the negative side, I found the tutorial hard to understand and felt that in order to use the system I’d have to make a big effort to learn it first. But this is a problem I have with most tutorials of that kind: they seem to be written for someone who is already familiar with four or five programming languages and therefore just needs to know the characteristic features of the language in question. They are not aimed at someone who knew how to write simple programs in basic about thirty years ago and hasn’t done any programming to speak of since. What I would very much like to see is a more mathematician-friendly version of Coq, where what you typed in was closer to what you would actually write as a mathematician and the computer translated it into the lower-level language for you. Back on the positive side, it seems that one thing Coq is good at, and this was one of the main aims of its developers, is fully certifying proofs. That is, one can prove formally that any proof written in Coq is correct (if the computer accepts it as a valid proof). Thus, if you can write a proof of a theorem in Coq, then it is guaranteed to be correct. What is more, this has been done for some notable theorems, such as the four-colour theorem.
The first thing I’ll do is respond to a couple of other comments on the last post. First, when I said that I discussed the first example and its converse “many years ago” on my web page that was a bit of an exaggeration. I started the page under ten years ago, but in the world of the internet that feels like ancient history. Nevertheless, I intended that to mean that the discussion could still be found. But I think now that I was muddling it up with another example: that is continuous if and only if the inverse image of every open set is open. So I am now going to think and write at the same time, and try to find a quasi-automatic proof that if converges to whenever converges to , then must be continuous. (The person who asked for this suggested that this would be hard for a computer because the most natural proof used the axiom of choice. We shall see.)
Suppose then that is not continuous. Then there must exist and such that for every there is some such that and . (I’ve decided to write this in prose, but of course a computer would just write out the definition of continuity in terms of quantifiers and apply the easy algorithm to turn it into its negation, ending up with precisely the above statement but with symbols for the quantifiers.)
Now we know that we want to apply a hypothesis about sequences, and there is a very standard and easy-to-automate trick in analysis for generating sequences from a statement of the form “” It is to observe that this is equivalent to the statement ““. (Actually, this is implicitly using the Archimedean property, but only in the direction we shall not need.) Fixing and and applying this trick, we end up with the statement that for every there exists such that and .
Here there is another standard move. We say, “Let us write for this .” Of course, this is a cheat, because the has not been specified, so we have to make an arbitrary choice. Furthermore, we have to do that infinitely many times so we do indeed have to use the axiom of (countable) choice. However, that is hardly a serious obstacle for the automatic theorem prover, which can just be told the syntactic principle that it’s OK to go from “” to “let be a sequence such that for every .” In other words, one form of the axiom of choice can be embedded into the computer as a simple replacement rule (that human mathematicians use the whole time, often not pausing to note that they are applying the axiom of choice).
In our case, we obtain a sequence such that for every we have the inequalities and . The Archimedean property implies that and a very simple verification (which I shan’t bother to give, as it is so clearly one that can be automated) shows that does not converge to , which completes the proof.
Now let me tackle the problem of finding an injection from to . It’s hard to think about from the perspective of automation for two reasons. The first is that almost nobody has a chance to solve this problem for themselves: they are presented with the result (or one that is obviously equivalent to it) as a basic lemma in the theory of countability. The second, and more significant, is that there are at least two different ways of doing it. One is to do a sort of zig-zagging count: , , , , , , , and so on. The other is to send to , or to any one of a multitude of functions of that in one way or another exploit unique factorization. The second class of examples may cause us to despair: does the computer have a significant background in integer arithmetic and elementary nmber theory? This is a point to which I shall return soon.
How might a computer come up with something like the zig-zag count? The most natural way of arriving at this construction (for a human) seems to me to be this. You first observe a very simple lemma: that if there is a function such that each has at most finitely many preimages, then there is also an injection from to . This lemma can be proved in a just-do-it way: we have to assign values to elements of in turn, and although we don’t have a total ordering on we can at least find images for the preimages of each integer in turn, and at no stage have we assigned more than finitely many values. Also, once we have decided to apply this lemma then the function will come very early on in any list of simplest possible functions of two variables, though many other simple functions would do just as well. (Note, however, that we have imported the “external” idea of addition: or to put it another way, we are using more about than just the fact that it is a countable set.) The problem here is noticing the lemma in the first place, and for this I do not see a plausible argument that a computer might find if it considers this problem in isolation. More likely, it would try several countability problems and find that it was using a principle of this kind. But how would it solve those problems if it needs the principle to do so? There is a circularity here and one needs to break it somehow.
Perhaps one way is to have a computer that knows a few things about orderings. It might experiment with some simple functions and end up trying . Having failed to stumble on an injection that it can easily prove is an injection, it would wonder about adjusting . It would observe that allowed it to give a partial ordering on . It would know that it was looking for a total ordering, so it might try extending the partial ordering it already had to a total ordering. That would be fairly easy to do and would also turn out to be order-isomorphic to . And then, if it tried to think about why the approach had worked, it would realize that partial orderings of the kind it started with would have to give total orderings isomorphic to that of , and the lemma would be formulated. But that account is assuming a significantly more sophisticated theorem prover than I have needed to assume in the other examples.
How about constructions that use the multiplicative structure of ? Well, one way that one might arrive at such an example is to attempt a just-do-it argument as follows. There is a very obvious well-ordering on , namely the lexicographic ordering, so we shall attempt to choose values for in order, using this ordering.
The natural way to do this is greedily, picking at each stage the minimal integer that has not yet been used. And if we do this then we define to be for every , at which point we find that we have run out of integers.
At this point there are at least two routes that one could imagine taking. The simpler one is to stand back (something it is challenging but not trivially impossible to get a computer to do) and think about what has gone wrong. The position we are in is that we have a lot of pairs of integers to which values need to be assigned, and we have run out of values. So we should change what we did and assign values in such a way that we do not run out. If we are super-clever, we might even organize the first step so that there are infinitely many values that have not been used. For example, we might map to instead. (Again, it is assuming a certain sophistication on the part of the computer to “see” that this leaves out all odd numbers, of which there are infinitely many.)
It is easy to imagine a badly-programmed computer getting into an endless loop here, as it fills up all of with values of , realizes its mistake, leaves out infinitely many values, and proceeds to , and so on. This need to jump out of loops is a very basic challenge in all of AI, and human mathematicians are trained, or train themselves, to spot loops at a very early stage. So again it won’t be straightforward, but if a computer can be made to spot that it is getting a bit repetitive, then one can hope that it will realize that after missing out infinitely many positive integers when it chooses the values , it is in essentially the position it was in when it started. And then a simple induction is enough to give the injection. Moreover, if it starts by missing out the even numbers instead of missing out the odd numbers, then it will end up with precisely the function . And it won’t have needed to use number theory to prove that it was an injection.
I said that there was another approach. That would be to take more of a real-numbers perspective, and say that we shall try to define not in one go but as the limit of a sequence of integers. Of course, that means that we will need to find a sequence that is eventually constant. Let’s see how it works in practice.
We begin, as before, by defining , but we regard these values just as “first attempts”. We also keep in mind that we are allowed only finitely many attempts for each before we are not allowed to make any further changes. And now we can start a just-do-it process as follows. We begin by choosing a value for . The natural choice is the greedy one: 1. But if we do that then we have to adjust the values of the , so we go back to those, greedily changing them to . Now we also know that for each there is going to come a point where we have to fix it for all time, so let’s fix . Then we go back to , which we greedily assign the number (since and is fixed to be 2). And so on.
I have to admit that there are details in that construction that don’t seem to be completely inevitable. For example, how do we know that by the time we’ve chosen the first attempts for the values of we should have fixed all the values of ? Maybe this would be a result of trying simpler things and discovering that they didn’t work.
So, to conclude, although the above problem is much easier for a human mathematician than the main problem discussed in the previous post, the issues it raises when one thinks about automatic theorem proving are more difficult.