Mohan Ganesalingam and I have just put a preprint up on arXiv entitled “A fully automatic problem solver with human-style output.” In it, we give quite a lot of detail about how the program discussed in this post works. We also talk about our general approach, and why we think it may allow us to solve problems fully automatically that are currently out of reach. (Some fairly easy — for humans — problems fall into this category.) We have also made the program’s output for our current set of test problems available. A quick look at that may be a more efficient way of getting an idea of what the program does than reading the explanation we give in the paper.

If you are interested enough to look at the preprint and find that you spot a typo or more serious error, we would of course be very grateful to be told.

### Like this:

Like Loading...

*Related*

This entry was posted on September 19, 2013 at 10:21 am and is filed under Computing. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

September 19, 2013 at 2:36 pm |

It may be worth adding that the output file Tim links to is generated by the program and hasn’t been hand-modified in any way. (The program produces a tex file which is then run through PDFLatex.)

September 19, 2013 at 3:37 pm |

Nice!

Some typos:

At the bottom of page 4 the sentence ending “that is, it did not interfere with those proofs in which was not potentially useful” is missing a word, probably “it”.

I might be being stupid, but near the top of page 9 the sentence “And once one has made that deduction, finding x and y in S such that xy^−1 ∈ S is significantly easier than it was before.” should in fact be “And once one has made that deduction, finding x and y in S such that xy^−1 = b is significantly easier than it was before.”

Also, page 34 is nearly entirely blank, but presumably this is a formatting problem that LaTeX may well fix on it’s own as other changes are made to the document.

September 19, 2013 at 3:47 pm

Thanks for that. I’ve worked out the reason for the nearly blank page, which was that to get something to format nicely, I used the rather crude technique at one point of inserting “\eject”. Subsequent changes to the paper led to this being a bad thing to have done — a beginner’s mistake. And you’re right about the other two things. I’ve changed the file, and at some point, when it seems to be stabilizing, I’ll put up a revised draft.

September 19, 2013 at 5:03 pm

arXiv seems to have caused a few formatting peculiarities when it recompiled our TeX source. The ones relating to the URLs are particularly irksome, as we put some work into making those wrap properly.

September 19, 2013 at 8:17 pm |

Which programming language have you used?

September 19, 2013 at 10:38 pm

It’s written in Haskell.

September 20, 2013 at 3:48 am |

[…] https://gowers.wordpress.com/2013/09/19/preprint-about-theorem-proving-program-up-on-arxiv/ […]

September 20, 2013 at 6:26 pm |

Beside the “not serious reasons” for not releasing the source code are there serious reasons? (university copyright, expected patents or book writing, etc…)

September 22, 2013 at 6:34 am |

Thanks for posting this!

I’ve a question about the backwardsLibraryReasoning rule. The example uses the hypothesis “A is open” and the library result “A is open and B is open => A int B is open” to replace the target “A int B is open” with the target “B is open”. But this transformation should only be applied if the library result is an if-and-only-if, right?

(I think the statement of the backwards reasoning rule is correct, since it has an = instead of an =>, so it’s just the example that’s confusing.)

September 22, 2013 at 8:36 am

This is an interesting question. Human mathematicians frequently apply something like backwardsLibraryReasoning in contexts where the “library statement” is not an if and only if. And it’s perfectly valid logically to do this. To take the example you cite, if you’re trying to prove that is open and you know that is open, then it genuinely is sufficient to prove that is open.

What is less clear is whether reducing the problem to showing that is open

is a sensible approach, since it is aiming for a strengthening, and not just the original target. This varies from context to context, so one task we face is to work out ways of assessing how “safe” it is to do this move, given that sometimes it is sensible and sometimes it isn’t.There is a rather simple and useful criterion, which is not present in the existing program but will be added when we start to make more use of the library. Suppose you know nothing about beyond the fact that it is open. Then effectively the problem becomes equivalent to showing that for

every, if is open and certain facts about hold, then is open. To put that differently, in the presence of certain facts about , we are trying to show that for every , is open implies that is open. But that last statementisequivalent to showing that is open (since, for example, could be the entire space).In this way, we identify a class of entirely safe backwards library deductions (and there’s a similar criterion for forwards ones). We don’t ask for the implications to be if and only if, but we do ask for them to be if and only if in the absence of any other information (in the sense just discussed) so that in some readily identifiable contexts the deduction can be made without any throwing away of potentially useful information.

This won’t be the end of the story, however. When the problems get more difficult, there will be situations where one makes deductions that genuinely do throw away information and just hopes that that information isn’t needed. This is a speculative process (amounting to weakening one’s premises or strengthening one’s target) that humans do the whole time and that sometimes goes wrong — for example, if the modified statement becomes false — and forces the person trying to solve the problem to backtrack.

September 23, 2013 at 12:29 pm |

One thing I don’t see in the program, but which does seem to be a very common feature of human proofs, is splitting a problem into cases where the `right’ case distinctions are not obvious (i.e. they are not written into the problem for the split operations to pick up)

I actually can’t think of many proofs of ‘interesting’ results which don’t have such a case distinction somewhere – there are some, but those tend to contain a trick of the form ‘it is enough to prove the following very different looking result’. You have major arcs versus minor arcs, structured case versus pseudorandom case, decompositions in structural graph theory, et cetera…

I am not so convinced as you (perhaps!) are that there is much hope in the near future for automated theorem proving to do things which are useful to mathematicians other than exploiting the computer’s ability to brute-force a search space. I believe one needs at least some kind of idea of how a computer might come up with this kind of argument.

September 23, 2013 at 5:26 pm

Another very interesting question. The reason this is not in the current version of the program is not that we don’t think it is important, and also not that we think it is important but hopeless, but simply that we are deliberately starting with routine problems, which we define to be problems where at every stage there’s an obvious (to humans) thing to do and if you repeatedly do that obvious thing then you end up proving what you wanted to prove.

I’ll try to indicate why I don’t think it’s hopeless by giving a simple example. Suppose you are asked to prove that is never greater than 1. One way you might do it is to observe that if , then and you are done, whereas if then , so .

Where might such a splitting into cases come from? One possibility is that we use the following general technique. We have something we’re trying to prove, and we make a first attempt by just doing some very obvious things. This attempt fails, but we spot that there is a “missing condition” that would make it succeed. Because the missing condition is missing, we can’t finish the problem at this point, but

we can add an extra assumption, namely that the missing condition doesn’t hold. There are many ways that we might identify such a condition. For example, suppose we know that is defined on the closed interval and we want to show that is bounded. We know that continuous functions on closed bounded intervals are bounded (this belongs to our “library”) so we are now free to assume that is not continuous.Going back to the example at hand, here is how we might apply the above method. First we use the fact that to observe that . So the problem can be reduced to showing that , which is equivalent to the statement that . Unfortunately, it is not always the case that , but we can add the assumption now that (or if we prefer). At that point, another piece of general knowledge kicks in, that when , bigger (even) powers of are bigger, so we deduce that and are done.

Obviously this isn’t the entire story, but I think it qualifies as “at least some idea of how a computer might come up with this kind of argument”.

October 1, 2013 at 3:09 am |

This program looks very interesting! I can’t wait to see the source code! Two comments:

1) The current version is deterministic (correct me if I’m wrong – I just skimmed the paper). Could it produce different proofs if you make it try random steps?

2) Can it make conjectures, perhaps by starting with some statement that you input and declaring it to be a conjecture if it can find no contradictions from the original statement?

I have been developing algorithms that conjecture theorems about pattern avoiding permutations. In some cases we also have proofs supplied by algorithms. See http://arxiv.org/pdf/1211.7110v1

October 1, 2013 at 9:41 am

It is indeed true that the current version is deterministic. At some point we would like to introduce nondeterminism, but we are reluctant to do that until the need genuinely arises (as it surely eventually will) because it makes regression testing much harder. It looks as though an ideal future program would be largely deterministic but have a small number of “bifurcation points” where there are two (or possibly more) pretty sensible things to do that lead to different proofs. I have a few examples of this phenomenon.

I believe that making conjectures is something that will not require a big AI breakthrough for the program to be able to do. In fact, there is an amusing anecdote that goes some way towards illustrating this. When we first tested the program, it was on the problem of showing that a continuous inverse image of an open set is open. The program did pretty well, but got stuck right near the end. The point it got to could be described in human terms as follows: I can make as small as I like, and I need to be less than . We then realized that we had forgotten to tell the program that was continuous! So in effect, the program had “invented” the definition of continuity, or at least identified that that was the property you needed to make inverse images of open sets open.

That isn’t quite the same as making a conjecture, but I think the mechanism is the same: a lot of the time, conjectures are generated when you can’t solve a problem but you can reduce it to a nice clean statement that you can show would be sufficient.

Note that this is a very different process from the way that the Graffiti program operates. If I understand correctly, that just takes a bunch of concepts from a field, chucks them together in various ways, weeds out statements that have obvious answers, and declares the rest to be conjectures. I think that generating conjectures as “missing steps” is much more like what humans do.

October 4, 2013 at 1:58 am |

In the paper, you write: Bledsoe expressed the opinion that “purely syntactic” methods such as resolution had reached a plateau, and that further progress would not be made without using more human techniques.

Bledsoe’s observation is probably true. At this plateau, however, noticing resolution theorem provers may be worthwhile, e.g Vampire, a world cup champion in theorem proving, of Andrei Voronkov’s group.

I asked Andrei for advice on choosing a theorem prover. He recommended Vampire without asking, what problem do you have.

Vampire solved the problem, eventually, proving that: P is not equal to NP. Here is a link to the paper: Verifying that: P is not equal to NP, using a theorem prover.

Vampire’s resolution proof, however, is the third version of the proof, and Vampire benefited from earlier versions. The first version is a human proof. The second version is a human proof in Hilbert’s proof theory, using the first proof idea. In addition, there is a human translation from Hilbert’s proof theory to Vampire formulas.

Shortly, the (human) proof idea is an axiomatization of Turing’s theory of computing in first-order logic. In this manner, the axiom connects concepts of computing with foundations of mathematics, e.g., formal proofs. So, the axiom opens up for a proof using textbook methods of proof theory.

Here is a link to the paper: P is not equal to NP, with the human proof only.

The first version of the proof seems to be out of reach for an automated theorem prover, at least for a while. The second version and the translation may still be hard, but they are likely to be within reach of an automated theorem prover, given the first (human) version.

A similar proof gives a proof of: The tautology problem is not in NP.

Here is a link to the paper.

Thus, theorem proving is not that simple after all. Yet, it was interesting talking to Woody Bledsoe. At the time, I was more into using resolution theorem proving for computing. Nonetheless, Woody turned on the thought that human-oriented theorem proving has great expectations.

October 4, 2013 at 9:34 am

Your second and third links appear to be broken.

October 4, 2013 at 4:17 pm

Sorry for the broken links. Here is a second try.

Here is a link to the paper: P is not equal to NP, with the human proof only.

The tautology problem is not in NP. Here is a link to the paper.

October 4, 2013 at 12:57 pm |

Dear Prof. Gowers,

I’m a PhD student in mathematics who has largely due to your well written posts and the polymath initiative started to believe that the nicest way to do research doesn’t involve hiding for personal benefit, but a fully open-source collaboration. Thus it comes as a surprise that the source code of this most interesting program is not not open source.

I’m sure you have a good reason for that, but maybe you would care to explain your reasons, so that I, and maybe some others, wouldn’t lose faith in your statements of openness and sharing in research?

Thank you.

October 5, 2013 at 11:35 am

Short answer: there are legal reasons why it’s not possible to release the code in its current form. It might be possible to work around those issues, but that would take time, and we are short of that right now. (On that last count, that’s more my fault than Tim’s — before too long I’ll be in the position of having to apply for jobs, and hence I’m insisting that we focus on squeezing publications out of this project!)

October 5, 2013 at 2:48 pm

I’d like to add that my commitment to the ideal of openness in research is as strong as it ever was. (I intend to demonstrate that soon.) I very much look forward to this project becoming more open and involving more people, not least because I think that that is the only realistic way that it can develop far enough, and in enough different directions, to become more than an amusing curiosity. So I’m not entirely comfortable with the situation as it is at the moment, but I hope that it won’t be too long before things change.

Another point I’d like to add is that in our paper we try to describe the program in enough detail that we are not hiding anything important about how it works. So we are being open about the ideas that go into the program. I realize that’s not the entire issue, but I hope it counts for at least something.

October 7, 2013 at 11:46 am

Thank you very much for your responses and explanations, I’m reassured!

October 5, 2013 at 12:04 pm |

@Mohan

LOL! I hope you NEVER EVER, EVER USED any Microsoft software as any kind of auxilliary support in your research because anything touched by Microsoft products gets “tainted” by their outrageous licensing terms, all your software now belongs to them.

Look at the license for the Z3 SAT solver for instance: https://z3.codeplex.com/SourceControl/latest#LICENSE.txt

October 8, 2013 at 8:00 am |

Does the book “The Language of Mathematics by Mohan Ganesalingam” (http://www.springer.com/computer/image+processing/book/978-3-642-37011-3) help in any way for understanding the concepts, or is the paper self contained? Thanks.

October 8, 2013 at 10:10 am

Mohan’s book is very interesting, but it is about a complementary project and is therefore not necessary for understanding our paper. “The Language of Mathematics” is about analysing the kind of language mathematicians use when they are writing formally, but not actually in a formal language, and doing so in enough detail for a computer program to be able to take what they write and convert it into logical form. Because of that work (carried out with Tom Barnet-Lamb) it should one day in the not too distant future be possible to input problems into our program, and other theorem-proving programs, without having to learn a special language to do so.

October 8, 2013 at 6:19 pm |

Thank you! I have read the paper with great interest! Some random thoughts are below.

Although you claim to concentrate more on fully automatic theorem proving, your requirement 4 (easy extendibility) is (as far as I understand) almost all that needed to use your program as interactive theorem prover. Namely, if the program cannot prove my theorem fully automatically, I can divide it to several lemmas, prove them, add to the library in the form of facts, and then the program should succeed using these facts.

You are right that “to become more than an amusing curiosity, this project should involve more people”. Is it possible to set up your program to work in the internet, so that I could type my theorems and it would (try to) prove it? In case of success these theorems could be used to prove further results and so on. There are some online libraries of this kind, but I think the main problem is exactly that the language is unnatural. If your program would understand natural language, and accept theorems through the internet, it can become popular and its library may quickly become huge (like the success of Wikipedia). Ideally, I would imagine work of future mathematician like (1) here is the problem, lets type it in the program – ok, it cannot solve it, so the problem is new and non-trivial, let me work on it (2) ok, here is the idea, but for this to work I would need auxiliary (boring) lemmas 1, 2 and 3. Let me type them to the program. Oh! Here are proofs of my lemmas. Good. And now, when the program knows the lemmas, it can also prove my main theorem. (3) When submitting the paper for publication, one needs to provide evidence that (a) the program could not prove the theorem earlier, so that it is new and non-trivial, and (b) the program can solve it now, so that the theorem is true and the proof is correct. The referee therefore does not need to verify the correctness.

Ok, back to reality. I think in your list of properties you need property (5) that the program should be strong enough to prove at least the results human is able to prove easily. Without this requirement your aim is easy: program that proves nothing satisfy your requirements (1)-(4) :)

Concerning your statement “there are no examples of well-established mathematical results of this level of simplicity that have been found to be fundamentally wrong”, what about the naïve set theory and, in particular, statement that “for every property, there is a set of all things satisfying that property”?

October 9, 2013 at 6:24 am

Yes, indeed, mathematics is rife with errors just like any other field:

http://www.gwern.net/The%20Existential%20Risk%20of%20Mathematical%20Error

December 21, 2013 at 2:21 am |

hi wtg fyi just endorsed your ATM ideas & research program & related paper (the old one with the AI expert, great stuff) on mathoverflow & the answer got deleted in a mere ~7 hrs. of course its due entirely to the way I

saidit … :???:what is it about stackexchange where imagination/creativity/innovation/out-of-the-box/unorthodox thinking is quickly quashed/squished/crushed by groupthink?

“the pioneers are the ones with arrows in their backs” :idea:

January 11, 2014 at 5:45 pm |

skimmed the paper & am delighted it includes the blog-based poll experiment/results at end. a near sociological study in a kuhnian paradigm-shift-in-progress.

ah managed to get a plug for ATP in general & your directions in particular in this related question, it just went viral on se (54v): why are math proofs based on ATP controversial, with the answer/citation here

February 10, 2014 at 12:19 am |

Here are some typos (etc) not already mentioned above.

missing words in these phrases:

standard TPTP test library automated theorem provers

typos in these phrases:

rather then a lucky break. [then -> than — but it’s inside a quotation, so it might be an error by the quoted author]

tagged to indicate that it has already being used [being -> been]

type possible, terminating when the result is proved or when no no moves can be made. [no no -> no more]

which move is the best than in each situation:

which write-ups were the by the program.

The program has not had this problem, [The -> This]

higher-level writing issues:

However, Barnet-Lamb and Ganesalingam have recently written a program

a reference would be desirable.

February 10, 2014 at 12:22 am |

When you get far enough beyond this program, it would be interesting to move up to the kind of problems used in the IMO (International Math Olympiad) and/or USAMO (USA Math Olympiad), since those are selected for depending on only elementary math, not taking too long to solve, but (I think) needing exactly one “non-obvious idea” to solve. Furthermore, some of the contestants (and other people) might well be willing to supply you with detailed accounts of their reasoning while working on some of those problems. (Preferably shortly after working on them for the first time, for increased accuracy.) Then you could separate the projects of “making a formal model of a specific chain of reasoning and discovery” (reported by someone working on a problem), and “making a program capable of discovering a given chain of reasoning”. Independently of that, you could see if your program can “fill in the gaps” which were left out of the reported chains of reasoning due to being “too routine to spell out”.

April 1, 2014 at 3:23 pm |

Let me share a reference which I think you might find entertaining and somewhat related.

Some simple definitions and arguments in point-set topology translate effortlessly into diagram chasing computations, and perhaps this observation may hint how the human mind think in that area, and lead to an automatic theorem prover similar in spirit to the one you have written. The prover may be reasonably easy to write because diagrams involved have lots of finite spaces (in simple cases), and diagram chasing is already quite computational.

The following paper details this observation.

http://mishap.sdf.org/point-set-diagram-chasing.pdf

May 28, 2014 at 5:30 pm |

This is an absolut amazing research! I hope you will continue writing about progress and ideas about it. And especially I am interested in what your optimistic prediction on a long-term?!

Typo at page 7, paragraph after “Problem 1: “feel that that statement has at least…”

Typo at page 36, 3rd paragraph: “writing down down a proof while…”

Thank you for this truely mind-boggling research :-)