## Archive for the ‘Computing’ Category

### Preprint about theorem-proving program up on arXiv

September 19, 2013

Mohan Ganesalingam and I have just put a preprint up on arXiv entitled “A fully automatic problem solver with human-style output.” In it, we give quite a lot of detail about how the program discussed in this post works. We also talk about our general approach, and why we think it may allow us to solve problems fully automatically that are currently out of reach. (Some fairly easy — for humans — problems fall into this category.) We have also made the program’s output for our current set of test problems available. A quick look at that may be a more efficient way of getting an idea of what the program does than reading the explanation we give in the paper.

If you are interested enough to look at the preprint and find that you spot a typo or more serious error, we would of course be very grateful to be told.

### Answers, results of polls, and a brief description of the program

April 14, 2013

I have now closed the polls in the second mathematical writing experiment. Here are the results. I have also published the comments on the first and second experiments, which shed further light on the results and on (some) people’s reasons for voting the way they did.

The results were complicated slightly by the fact that after a couple of days the post was linked to from the front page of Hacker News, and suddenly the number of people who voted more than tripled in a few hours. Furthermore, there was a bit of discussion about the polls, so it is not clear that the votes were completely independent. Also, the profile of an average Hacker News reader is probably somewhat different from the profile of an average reader of this blog.

Fortunately, I had recorded the votes shortly before this happened, so below we (“we” means Mohan Ganesalingam and I) present both sets of results. As it happens, the proportions didn’t change too much. We begin with some bar charts. U stands for “undergraduate”, G for “graduate”, C for “computer” and D for “don’t know”. The portion coloured in blue represents people who claimed to be sure that they were correct, and the portion coloured in red represents those who were unsure.

At the end of the post we give the exact numbers.

A brief remark before we present the results is that none of the three “contestants” was explicitly trying to write proofs that would appear as human as possible. The two humans were asked to provide proofs and not told why. And the program was designed to produce passable write-ups, but not to fool people into thinking that those write-ups were written by a human. (There are some easy improvements that could have been made if we had intended to do that, but we did not originally envisage carrying out this second experiment.)

The general picture can be summarized as follows.

1. The computer was typically identified by around 50% of all those who voted.

2. Typically around half of those were confident that they were correct, and half not so confident.

3. A non-negligible percentage of respondents claimed to be sure that a write-up that was not by the computer was by the computer.

### Mohan Ganesalingam

There is quite a lot to say about these results, and also about the program. Most of the rest of this post is written jointly with my collaborator Mohan Ganesalingam (as indeed were the posts with the two experiments — we chose the wording very carefully), but first let me say a little about who Mohan is.
(more…)

### A second experiment concerning mathematical writing.

April 2, 2013

The time has come to reveal what the experiment in the previous post was about. As with many experiments (the most famous probably being that of Stanley Milgram about obedience to authority), its real purpose was not its ostensive purpose.

Over the last three years, I have been collaborating with Mohan Ganesalingam, a computer scientist, linguist and mathematician (and amazingly good at all three) on a project to write programs that can solve mathematical problems. We have recently produced our first program. It is rather rudimentary: the only problems it can solve are ones that mathematicians would describe as very routine and not requiring any ideas, and even within that class of problems there are considerable restrictions on what it can do; we plan to write more sophisticated programs in the future. However, each of the problems in the previous post belongs to the class of problems that it can solve, and for each problem one write-up was by an undergraduate mathematician, one by a mathematics PhD student and one by our program. (To be clear, the program was given the problems and produced the proofs and the write-ups with no further help. I will have more to say about how it works in future posts.) We wanted to see whether anybody would suspect that not all the write-ups were human-generated. Nobody gave the slightest hint that they did.

Of course, there is a world of difference between not noticing a difference that you have not been told to look out for, and being unable to detect that difference at all. Our aim was merely to be able to back up a claim that our program produces passable human-style output, so we did not want to subject that output to full Turing-test-style scrutiny, but you may, if you were kind enough to participate in the experiment, feel slightly cheated. Indeed, in a certain sense you were cheated — that was the whole point. It seems only fair to give you the chance to judge the write-ups again now that you know how they were produced. For each problem I have created a poll, and each poll has seven possible answers. These are:

The computer-generated output is definitely (a).
I think the computer-generated output is (a) but am not certain.
The computer-generated output is definitely (b).
I think the computer-generated output is (b) but am not certain.
The computer-generated output is definitely (c).
I think the computer-generated output is (c) but am not certain.
I have no idea which write-up was computer generated.

I would also be interested in comments about how you came to your judgments. All comments on both experiments and all votes in the polls will be kept private until I decide that it is time to finish the second experiment. A small remark is that I transcribed by hand all the write-ups into a form suitable for WordPress, so the existence of a typo in a write-up is not a trivial proof that it was by a human.

If you did not participate in the first experiment but nevertheless want to try this one, that’s fine. [Update: I have now closed the polls. Very soon Mohan and I will post the results and a discussion of them. Further update: The results now appear below. They appear displayed in possibly a more convenient way in this post, which also contains a discussion of how the program works.]
(more…)

### “Hello World!”

March 20, 2010

As anyone who has been following the discussion of EDP will be aware, I am interested in programming, but have until very recently favoured an extremely high-level language called BLOG. The principle of this language is that you describe in English in a blog post or comment what you want an algorithm to do, after which a probabilistic process operates. At the end of this process, if you are lucky, your description gets translated into a language that a computer can understand and the algorithm is implemented. The amazing thing is that one often is lucky. I believe that BLOG is closely related to a language called GRADSTUDENT that various mathematicians have used in the past. (My attention was drawn to the latter language on some blog or other, possibly even this one, but I can no longer find the reference.) [It has now been supplied to me.]

However, BLOG and GRADSTUDENT have certain disadvantages. One is that one does not have complete control over what one is doing. Another is that one feels slightly guilty using it. I have recently decided that it is high time I learnt to program in a more traditional language, and last week I went to a short course put on by the Cambridge computer science faculty called “C for absolute beginners”. (more…)