## Preprint about theorem-proving program up on arXiv

Mohan Ganesalingam and I have just put a preprint up on arXiv entitled “A fully automatic problem solver with human-style output.” In it, we give quite a lot of detail about how the program discussed in this post works. We also talk about our general approach, and why we think it may allow us to solve problems fully automatically that are currently out of reach. (Some fairly easy — for humans — problems fall into this category.) We have also made the program’s output for our current set of test problems available. A quick look at that may be a more efficient way of getting an idea of what the program does than reading the explanation we give in the paper.

If you are interested enough to look at the preprint and find that you spot a typo or more serious error, we would of course be very grateful to be told.

### 37 Responses to “Preprint about theorem-proving program up on arXiv”

1. mohan ganesalingam Says:

It may be worth adding that the output file Tim links to is generated by the program and hasn’t been hand-modified in any way. (The program produces a tex file which is then run through PDFLatex.)

2. Oscar_Cunningham Says:

Nice!

Some typos:

At the bottom of page 4 the sentence ending “that is, it did not interfere with those proofs in which was not potentially useful” is missing a word, probably “it”.

I might be being stupid, but near the top of page 9 the sentence “And once one has made that deduction, finding x and y in S such that xy^−1 ∈ S is significantly easier than it was before.” should in fact be “And once one has made that deduction, finding x and y in S such that xy^−1 = b is significantly easier than it was before.”

Also, page 34 is nearly entirely blank, but presumably this is a formatting problem that LaTeX may well fix on it’s own as other changes are made to the document.

• gowers Says:

Thanks for that. I’ve worked out the reason for the nearly blank page, which was that to get something to format nicely, I used the rather crude technique at one point of inserting “\eject”. Subsequent changes to the paper led to this being a bad thing to have done — a beginner’s mistake. And you’re right about the other two things. I’ve changed the file, and at some point, when it seems to be stabilizing, I’ll put up a revised draft.

• Mohan Ganesalingam Says:

arXiv seems to have caused a few formatting peculiarities when it recompiled our TeX source. The ones relating to the URLs are particularly irksome, as we put some work into making those wrap properly.

3. porton Says:

Which programming language have you used?

4. Is the future close? | Glob of Thoughts Says:
5. Jean-Luc Delatre Says:

Beside the “not serious reasons” for not releasing the source code are there serious reasons? (university copyright, expected patents or book writing, etc…)

6. Shaunak Kishore Says:

Thanks for posting this!

I’ve a question about the backwardsLibraryReasoning rule. The example uses the hypothesis “A is open” and the library result “A is open and B is open => A int B is open” to replace the target “A int B is open” with the target “B is open”. But this transformation should only be applied if the library result is an if-and-only-if, right?

(I think the statement of the backwards reasoning rule is correct, since it has an = instead of an =>, so it’s just the example that’s confusing.)

• gowers Says:

This is an interesting question. Human mathematicians frequently apply something like backwardsLibraryReasoning in contexts where the “library statement” is not an if and only if. And it’s perfectly valid logically to do this. To take the example you cite, if you’re trying to prove that $A\cap B$ is open and you know that $A$ is open, then it genuinely is sufficient to prove that $B$ is open.

What is less clear is whether reducing the problem to showing that $B$ is open is a sensible approach, since it is aiming for a strengthening, and not just the original target. This varies from context to context, so one task we face is to work out ways of assessing how “safe” it is to do this move, given that sometimes it is sensible and sometimes it isn’t.

There is a rather simple and useful criterion, which is not present in the existing program but will be added when we start to make more use of the library. Suppose you know nothing about $A$ beyond the fact that it is open. Then effectively the problem becomes equivalent to showing that for every $A$, if $A$ is open and certain facts about $B$ hold, then $A\cap B$ is open. To put that differently, in the presence of certain facts about $B$, we are trying to show that for every $A$, $A$ is open implies that $A\cap B$ is open. But that last statement is equivalent to showing that $B$ is open (since, for example, $A$ could be the entire space).

In this way, we identify a class of entirely safe backwards library deductions (and there’s a similar criterion for forwards ones). We don’t ask for the implications to be if and only if, but we do ask for them to be if and only if in the absence of any other information (in the sense just discussed) so that in some readily identifiable contexts the deduction can be made without any throwing away of potentially useful information.

This won’t be the end of the story, however. When the problems get more difficult, there will be situations where one makes deductions that genuinely do throw away information and just hopes that that information isn’t needed. This is a speculative process (amounting to weakening one’s premises or strengthening one’s target) that humans do the whole time and that sometimes goes wrong — for example, if the modified statement becomes false — and forces the person trying to solve the problem to backtrack.

7. Pete Says:

One thing I don’t see in the program, but which does seem to be a very common feature of human proofs, is splitting a problem into cases where the `right’ case distinctions are not obvious (i.e. they are not written into the problem for the split operations to pick up)

I actually can’t think of many proofs of ‘interesting’ results which don’t have such a case distinction somewhere – there are some, but those tend to contain a trick of the form ‘it is enough to prove the following very different looking result’. You have major arcs versus minor arcs, structured case versus pseudorandom case, decompositions in structural graph theory, et cetera…

I am not so convinced as you (perhaps!) are that there is much hope in the near future for automated theorem proving to do things which are useful to mathematicians other than exploiting the computer’s ability to brute-force a search space. I believe one needs at least some kind of idea of how a computer might come up with this kind of argument.

• gowers Says:

Another very interesting question. The reason this is not in the current version of the program is not that we don’t think it is important, and also not that we think it is important but hopeless, but simply that we are deliberately starting with routine problems, which we define to be problems where at every stage there’s an obvious (to humans) thing to do and if you repeatedly do that obvious thing then you end up proving what you wanted to prove.

I’ll try to indicate why I don’t think it’s hopeless by giving a simple example. Suppose you are asked to prove that $x^2-x^4$ is never greater than 1. One way you might do it is to observe that if $|x|\leq 1$, then $x^2-x^4\leq x^2\leq 1$ and you are done, whereas if $|x|\geq 1$ then $x^4\geq x^2$, so $x^2-x^4\leq 0$.

Where might such a splitting into cases come from? One possibility is that we use the following general technique. We have something we’re trying to prove, and we make a first attempt by just doing some very obvious things. This attempt fails, but we spot that there is a “missing condition” that would make it succeed. Because the missing condition is missing, we can’t finish the problem at this point, but we can add an extra assumption, namely that the missing condition doesn’t hold. There are many ways that we might identify such a condition. For example, suppose we know that $f$ is defined on the closed interval $[0,1]$ and we want to show that $f$ is bounded. We know that continuous functions on closed bounded intervals are bounded (this belongs to our “library”) so we are now free to assume that $f$ is not continuous.

Going back to the example at hand, here is how we might apply the above method. First we use the fact that $x^4\geq 0$ to observe that $x^2-x^4\leq x^2$. So the problem can be reduced to showing that $x^2\leq 1$, which is equivalent to the statement that $|x|\leq 1$. Unfortunately, it is not always the case that $|x|\leq 1$, but we can add the assumption now that $|x|\geq 1$ (or $|x|>1$ if we prefer). At that point, another piece of general knowledge kicks in, that when $|x|\geq 1$, bigger (even) powers of $x$ are bigger, so we deduce that $x^2-x^4\leq 0$ and are done.

Obviously this isn’t the entire story, but I think it qualifies as “at least some idea of how a computer might come up with this kind of argument”.

This program looks very interesting! I can’t wait to see the source code! Two comments:

1) The current version is deterministic (correct me if I’m wrong – I just skimmed the paper). Could it produce different proofs if you make it try random steps?

2) Can it make conjectures, perhaps by starting with some statement that you input and declaring it to be a conjecture if it can find no contradictions from the original statement?

I have been developing algorithms that conjecture theorems about pattern avoiding permutations. In some cases we also have proofs supplied by algorithms. See http://arxiv.org/pdf/1211.7110v1

• gowers Says:

It is indeed true that the current version is deterministic. At some point we would like to introduce nondeterminism, but we are reluctant to do that until the need genuinely arises (as it surely eventually will) because it makes regression testing much harder. It looks as though an ideal future program would be largely deterministic but have a small number of “bifurcation points” where there are two (or possibly more) pretty sensible things to do that lead to different proofs. I have a few examples of this phenomenon.

I believe that making conjectures is something that will not require a big AI breakthrough for the program to be able to do. In fact, there is an amusing anecdote that goes some way towards illustrating this. When we first tested the program, it was on the problem of showing that a continuous inverse image of an open set is open. The program did pretty well, but got stuck right near the end. The point it got to could be described in human terms as follows: I can make $d(x,y)$ as small as I like, and I need $d(f(x),f(y))$ to be less than $\epsilon$. We then realized that we had forgotten to tell the program that $f$ was continuous! So in effect, the program had “invented” the definition of continuity, or at least identified that that was the property you needed to make inverse images of open sets open.

That isn’t quite the same as making a conjecture, but I think the mechanism is the same: a lot of the time, conjectures are generated when you can’t solve a problem but you can reduce it to a nice clean statement that you can show would be sufficient.

Note that this is a very different process from the way that the Graffiti program operates. If I understand correctly, that just takes a bunch of concepts from a field, chucks them together in various ways, weeds out statements that have obvious answers, and declares the rest to be conjectures. I think that generating conjectures as “missing steps” is much more like what humans do.

9. Sten-Ake Tarnlund Says:

In the paper, you write: Bledsoe expressed the opinion that “purely syntactic” methods such as resolution had reached a plateau, and that further progress would not be made without using more human techniques.

Bledsoe’s observation is probably true. At this plateau, however, noticing resolution theorem provers may be worthwhile, e.g Vampire, a world cup champion in theorem proving, of Andrei Voronkov’s group.

I asked Andrei for advice on choosing a theorem prover. He recommended Vampire without asking, what problem do you have.

Vampire solved the problem, eventually, proving that: P is not equal to NP. Here is a link to the paper: Verifying that: P is not equal to NP, using a theorem prover.

Vampire’s resolution proof, however, is the third version of the proof, and Vampire benefited from earlier versions. The first version is a human proof. The second version is a human proof in Hilbert’s proof theory, using the first proof idea. In addition, there is a human translation from Hilbert’s proof theory to Vampire formulas.

Shortly, the (human) proof idea is an axiomatization of Turing’s theory of computing in first-order logic. In this manner, the axiom connects concepts of computing with foundations of mathematics, e.g., formal proofs. So, the axiom opens up for a proof using textbook methods of proof theory.

Here is a link to the paper: P is not equal to NP, with the human proof only.

The first version of the proof seems to be out of reach for an automated theorem prover, at least for a while. The second version and the translation may still be hard, but they are likely to be within reach of an automated theorem prover, given the first (human) version.

A similar proof gives a proof of: The tautology problem is not in NP.
Here is a link to the paper.

Thus, theorem proving is not that simple after all. Yet, it was interesting talking to Woody Bledsoe. At the time, I was more into using resolution theorem proving for computing. Nonetheless, Woody turned on the thought that human-oriented theorem proving has great expectations.

10. Q Says:

Dear Prof. Gowers,

I’m a PhD student in mathematics who has largely due to your well written posts and the polymath initiative started to believe that the nicest way to do research doesn’t involve hiding for personal benefit, but a fully open-source collaboration. Thus it comes as a surprise that the source code of this most interesting program is not not open source.
I’m sure you have a good reason for that, but maybe you would care to explain your reasons, so that I, and maybe some others, wouldn’t lose faith in your statements of openness and sharing in research?

Thank you.

• Mohan Ganesalingam Says:

Short answer: there are legal reasons why it’s not possible to release the code in its current form. It might be possible to work around those issues, but that would take time, and we are short of that right now. (On that last count, that’s more my fault than Tim’s — before too long I’ll be in the position of having to apply for jobs, and hence I’m insisting that we focus on squeezing publications out of this project!)

• gowers Says:

I’d like to add that my commitment to the ideal of openness in research is as strong as it ever was. (I intend to demonstrate that soon.) I very much look forward to this project becoming more open and involving more people, not least because I think that that is the only realistic way that it can develop far enough, and in enough different directions, to become more than an amusing curiosity. So I’m not entirely comfortable with the situation as it is at the moment, but I hope that it won’t be too long before things change.

Another point I’d like to add is that in our paper we try to describe the program in enough detail that we are not hiding anything important about how it works. So we are being open about the ideas that go into the program. I realize that’s not the entire issue, but I hope it counts for at least something.

• Q Says:

Thank you very much for your responses and explanations, I’m reassured!

11. Jean-Luc Delatre Says:

@Mohan
LOL! I hope you NEVER EVER, EVER USED any Microsoft software as any kind of auxilliary support in your research because anything touched by Microsoft products gets “tainted” by their outrageous licensing terms, all your software now belongs to them.
Look at the license for the Z3 SAT solver for instance: https://z3.codeplex.com/SourceControl/latest#LICENSE.txt

12. artellacoding Says:

Does the book “The Language of Mathematics by Mohan Ganesalingam” (http://www.springer.com/computer/image+processing/book/978-3-642-37011-3) help in any way for understanding the concepts, or is the paper self contained? Thanks.

• gowers Says:

Mohan’s book is very interesting, but it is about a complementary project and is therefore not necessary for understanding our paper. “The Language of Mathematics” is about analysing the kind of language mathematicians use when they are writing formally, but not actually in a formal language, and doing so in enough detail for a computer program to be able to take what they write and convert it into logical form. Because of that work (carried out with Tom Barnet-Lamb) it should one day in the not too distant future be possible to input problems into our program, and other theorem-proving programs, without having to learn a special language to do so.

13. Bogdan Says:

Thank you! I have read the paper with great interest! Some random thoughts are below.

Although you claim to concentrate more on fully automatic theorem proving, your requirement 4 (easy extendibility) is (as far as I understand) almost all that needed to use your program as interactive theorem prover. Namely, if the program cannot prove my theorem fully automatically, I can divide it to several lemmas, prove them, add to the library in the form of facts, and then the program should succeed using these facts.

You are right that “to become more than an amusing curiosity, this project should involve more people”. Is it possible to set up your program to work in the internet, so that I could type my theorems and it would (try to) prove it? In case of success these theorems could be used to prove further results and so on. There are some online libraries of this kind, but I think the main problem is exactly that the language is unnatural. If your program would understand natural language, and accept theorems through the internet, it can become popular and its library may quickly become huge (like the success of Wikipedia). Ideally, I would imagine work of future mathematician like (1) here is the problem, lets type it in the program – ok, it cannot solve it, so the problem is new and non-trivial, let me work on it (2) ok, here is the idea, but for this to work I would need auxiliary (boring) lemmas 1, 2 and 3. Let me type them to the program. Oh! Here are proofs of my lemmas. Good. And now, when the program knows the lemmas, it can also prove my main theorem. (3) When submitting the paper for publication, one needs to provide evidence that (a) the program could not prove the theorem earlier, so that it is new and non-trivial, and (b) the program can solve it now, so that the theorem is true and the proof is correct. The referee therefore does not need to verify the correctness.

Ok, back to reality. I think in your list of properties you need property (5) that the program should be strong enough to prove at least the results human is able to prove easily. Without this requirement your aim is easy: program that proves nothing satisfy your requirements (1)-(4) 🙂

Concerning your statement “there are no examples of well-established mathematical results of this level of simplicity that have been found to be fundamentally wrong”, what about the naïve set theory and, in particular, statement that “for every property, there is a set of all things satisfying that property”?

14. vznvzn Says:

hi wtg fyi just endorsed your ATM ideas & research program & related paper (the old one with the AI expert, great stuff) on mathoverflow & the answer got deleted in a mere ~7 hrs. of course its due entirely to the way I said it … 😕
what is it about stackexchange where imagination/creativity/innovation/out-of-the-box/unorthodox thinking is quickly quashed/squished/crushed by groupthink?
“the pioneers are the ones with arrows in their backs” 💡

15. Anonymous Says:

skimmed the paper & am delighted it includes the blog-based poll experiment/results at end. a near sociological study in a kuhnian paradigm-shift-in-progress.

ah managed to get a plug for ATP in general & your directions in particular in this related question, it just went viral on se (54v): why are math proofs based on ATP controversial, with the answer/citation here

16. Bruce Smith Says:

Here are some typos (etc) not already mentioned above.

missing words in these phrases:

standard TPTP test library automated theorem provers

typos in these phrases:

rather then a lucky break. [then -> than — but it’s inside a quotation, so it might be an error by the quoted author]

tagged to indicate that it has already being used [being -> been]

type possible, terminating when the result is proved or when no no moves can be made. [no no -> no more]

which move is the best than in each situation:

which write-ups were the by the program.

The program has not had this problem, [The -> This]

higher-level writing issues:

However, Barnet-Lamb and Ganesalingam have recently written a program

a reference would be desirable.

17. Bruce Smith Says:

When you get far enough beyond this program, it would be interesting to move up to the kind of problems used in the IMO (International Math Olympiad) and/or USAMO (USA Math Olympiad), since those are selected for depending on only elementary math, not taking too long to solve, but (I think) needing exactly one “non-obvious idea” to solve. Furthermore, some of the contestants (and other people) might well be willing to supply you with detailed accounts of their reasoning while working on some of those problems. (Preferably shortly after working on them for the first time, for increased accuracy.) Then you could separate the projects of “making a formal model of a specific chain of reasoning and discovery” (reported by someone working on a problem), and “making a program capable of discovering a given chain of reasoning”. Independently of that, you could see if your program can “fill in the gaps” which were left out of the reported chains of reasoning due to being “too routine to spell out”.

18. Anonymous Says:

Let me share a reference which I think you might find entertaining and somewhat related.

Some simple definitions and arguments in point-set topology translate effortlessly into diagram chasing computations, and perhaps this observation may hint how the human mind think in that area, and lead to an automatic theorem prover similar in spirit to the one you have written. The prover may be reasonably easy to write because diagrams involved have lots of finite spaces (in simple cases), and diagram chasing is already quite computational.

The following paper details this observation.

http://mishap.sdf.org/point-set-diagram-chasing.pdf

19. interested non-expert Says:

This is an absolut amazing research! I hope you will continue writing about progress and ideas about it. And especially I am interested in what your optimistic prediction on a long-term?!

Typo at page 7, paragraph after “Problem 1: “feel that that statement has at least…”
Typo at page 36, 3rd paragraph: “writing down down a proof while…”

Thank you for this truely mind-boggling research 🙂

20. BMC/BAMC 2015 Monday: The Future of Proof | Wildon's Weblog Says:

[…] not the only panel member trying to do himself out of a job. Timothy Gowers mentioned his work on theorem proving by computer and predicted that by 2100 computers would be so good at proving theorems that there would be […]

21. Lockhart Says:

What is the current state of this project?

• gowers Says:

It’s been on hold for a little while and I’ve been rather busy working on more conventional maths. But I’m definitely coming back to this kind of thing and will report on it here when I do.

22. UX, experiments, and the future of mathematics, part 1 – Joshua Tan Says:

[…] formalization is hard, and tedious. Can we make something useful—whether a library of math or a proof assistant with human-style outputs—without requiring a full formalization of […]

23. nicodean Says:

This paper (together with another one dealing with some sort of automated science) has changed in some sence my professional working path – for which i’m very thankful. I wonder whether there is a plan to continue this truely great project? I’m waiting for any kind of update on this for more than 2 years. It would be fascinating to hear your thoughts on such ideas now, 2 years after finishing this first part? Thank you a lot!