What I wrote gives some kind of illustration of the twists and turns, many of them fruitless, that people typically take when solving a problem. If I were to draw a moral from it, it would be this: when trying to solve a problem, it is a mistake to expect to take a direct route to the solution. Instead, one formulates subquestions and gradually builds up a useful bank of observations until the direct route becomes clear. Given that we’ve just had the football world cup, I’ll draw an analogy that I find not too bad (though not perfect either): a team plays better if it patiently builds up to an attack on goal than if it hoofs the ball up the pitch or takes shots from a distance. Germany gave an extraordinary illustration of this in their 7-1 defeat of Brazil.

I imagine that the rest of this post will be much more interesting if you yourself solve the problem before reading what I did. I in turn would be interested in hearing about other people’s experiences with the problem: were they similar to mine, or quite different? I would very much like to get a feel for how varied people’s experiences are. If you’re a competitor who solved the problem, feel free to join the discussion!

If I find myself with some spare time, I might have a go at doing the same with some of the other questions.

What follows is exactly what I wrote (or rather typed), with no editing at all, apart from changing the LaTeX so that it compiles in WordPress and adding two comments that are clearly marked in red.

**Problem** *Let be an infinite sequence of positive integers. Prove that there exists a unique integer such that*

Slight bafflement.

The expression in the middle is not an average. If we were to replace it by an average we would have the second inequality automatically.

Try looking at simple cases. Here we could consider what happens when , for example. Then the inequality says

Here we automatically have the first inequality, but there is no reason for the second inequality to be true.

Putting those observations together, we see that the first inequality is true when , and the second inequality is “close to being true” as gets large, since it is true if we replace by in the denominator.

If the inequality holds for a unique , then a plausible guess is that the first inequality fails at some and if is minimal such that it fails, then both inequalities are true for . I shall investigate that in due course, but I have another idea.

It is clear that WLOG . Can we now choose in such a way that we always get equality for the second inequality? We can certainly solve the equations, so the question is whether the resulting sequence will be increasing.

We get , so I’d better set and then continue constructing a sequence.

So , , , and so on. Thus all the with are equal, which they are not supposed to be. This feels significant.

Out of interest, what happens to the inequalities when we (illegally) take the above sequence? We get , so we get equality on both sides except when when we get .

Try to disprove the result.

Try to find the simplest counterexample you can.

An obvious thing to do is to try to make the inequality true when and when . So let’s go. Without loss of generality , . We now need .

For we need . That can be rearranged to , exactly contradicting what we had before.

That doesn’t solve the problem but it looks interesting. In particular, it suggests rearranging the first inequality in the general case, to

That’s quite nice because the right hand side is a genuine average this time.

Actually, if getting an average is what we care about, we could also rearrange the first inequality by simply multiplying through by , which gives

I think it is time to revisit that guess, in order to try to prove at least that there *exists* a solution. So we know that the first inequality holds when , since all it says then is that . Can it always hold? If so, then again WLOG and , and after that we get , , etc.

Let’s write and for . Then we have , , , etc. We also require .

Let’s set . Now the first condition becomes but . Is that possible?

Is it possible with equality? WLOG . Then we have , , , etc.

I’m starting to wonder whether the integer must be something like 1 or 2. Let’s think about it. We know that . If then we have our . Suppose instead that . Then , so . Now if then we are again done, so suppose that .

But since , we can simply insert in between the two. Why can’t we continue doing that kind of thing? Let me try.

If , then , so we can insert in between the two.

I seem to have disproved the result, so I’d better see where I’m going wrong. I’ll try to construct a sequence explicitly. I’ll take , . I need , so I’ll take . Now I need , so I’ll take . Now I need , so I’ll take .

I don’t seem to be getting stuck, so let me try to prove that I can always continue. Suppose I’ve already chosen . Then the condition I need is that

By induction we already have that , from which it follows that and therefore that . We may therefore find between these two numbers, as desired.

You idiot Gowers, read the question: the have to be positive integers.

Fortunately, the work I’ve done so far is not a complete waste of time. [The half-conscious thought in the back of my mind here, which is clearer in retrospect, was that the successive differences in the example I had just constructed were getting smaller and smaller. So it seemed highly likely that using the same general circle of thoughts I would be able to prove that I couldn't take the to be integers.]

Here’s a trivial observation: if the second inequality fails, then . So if , then . How long can we keep that going with positive integers? Answer: for ever, since we can take .

Never mind about that. I want to go back to an earlier idea. [It isn't obvious what I mean by "earlier idea" here. Actually, I had earlier had the idea of defining the as below, but got distracted by something else and ended up not writing it down. So a small part of the record of my journey to the proof is missing.] It is simply to define and for . Then for if the first inequality holds we have

So each new is less than the average of the up to that , and hence less than the average of the before that . But that means that the average of the forms a decreasing sequence. That also means that the are bounded above by , something I could have observed ages ago. So they can’t be an increasing sequence of integers.

I’ve now shown that the first inequality must fail at some point. Suppose is the first point at which it fails. Then we have

and

The second inequality tells us that exceeds the average of , which implies that it exceeds the average of . That gives us the inequality

So now I’ve proved that there exists an integer such that the inequalities both hold. It remains to prove uniqueness. This formulation with the ought to help. We’ve picked the first point at which is at least as big as the average of . Does that imply that is at least as big as the average of ? Yes, because is at least as big as that average, and is bigger than . In other words, we can prove easily that if the first inequality fails for then it fails for , and hence by induction for all .

]]>

Just before I start this post, let me say that I do still intend to write a couple of follow-up posts to my previous one about journal prices. But I’ve been busy with a number of other things, so it may still take a little while.

This post is about the next European Congress of Mathematics, which takes place in Berlin in just over two years’ time. I have agreed to chair the scientific committee, which is responsible for choosing approximately 10 plenary speakers and approximately 30 invited lecturers, the latter to speak in four or five parallel sessions.

The ECM is less secretive than the ICM when it comes to drawing up its scientific programme. In particular, the names of the committee members were made public some time ago, and you can read them here.

I am all in favour of as much openness as possible, so I am very pleased that this is the way that the European Mathematical Society operates. But what is the maximum reasonable level of openness in this case? Clearly, public discussion of the merits of different candidates is completely out of order, but I think anything else goes. In particular, and this is the main point of the post, I would very much welcome suggestions for potential speakers. If you know of a mathematician who is European (and for these purposes Europe includes certain not obviously European countries such as Russia and Israel), has done exciting work (ideally recently), and will not already be speaking about that work at the International Congress of Mathematicians in Seoul, then we would like to hear about it. Our main aim is that the congress should be rewarding for its participants, so we will take some account of people’s ability to give a good talk. This applies in particular to plenary speakers.

~~I shall moderate all comments on this post. If you suggest a possible speaker, I will not publish your comment, but will note the suggestion.~~ More general comments are also welcome and will be published, assuming that they are the kinds of comments I would normally allow.

[In parentheses, let me say what my comment policy now is. The volume of spam I get on this blog has reached a level where I have decided to implement a feature that WordPress allows, where if you have never had a comment accepted, then your comment will automatically be moderated. I try to check the moderation queue quite frequently. If you have had a comment accepted in the past, then your comments will appear as normal.

I am very reluctant to delete comments, but I do delete obvious spam, and I also delete any comment that tries to use this blog as a form of self-promotion (such as using a comment to draw attention to the author's proof of the Riemann hypothesis, or to the author's fascinating blog, etc. etc.). I sometimes delete pingbacks as well -- it depends whether I think readers of my blog might conceivably be interested in the post from which the pingback originates.]

Going back to the European Congress, if you would prefer to make your suggestion by getting in contact directly with a committee member, then that is obviously fine too. The list of committee members includes email addresses.

However you make your suggestions, it would be very helpful if you could give not just a name but a brief reason for the suggestion: what the work is that you think should be recognised, and why it is important.

The main other thing I am happy to be open about is the stage that the committee has reached in its deliberations, and the plans for how it will carry out its work. Right now, we are at the stage of trying to put together a longlist of possible speakers. I have asked the other committee members to suggest to me at least six potential speakers each, of whom at least six should be broadly in their area. I hope that will give us enough candidates to make it possible to achieve a reasonable subject balance. We will of course also strive for other forms of balance, such as gender and geographical balance, to the extent that we can. Once we have a decent-sized longlist, we will cut it down to the right sort of size.

We are aiming to produce a near-complete list of speakers by around November. This is rather a long time in advance of the Congress itself, which worried me a bit, but I have permission from the EMS to leave open a few slots so that if somebody does something spectacular after November, then we will have the option of inviting them to speak.

]]>

**Further update: figures in from Nottingham too.**

**Further update: figures now in from Oxford.**

**Final update: figures in from LSE.**

A little over two years ago, the Cost of Knowledge boycott of Elsevier journals began. Initially, it seemed to be highly successful, with the number of signatories rapidly reaching 10,000 and including some very high-profile researchers, and Elsevier making a number of concessions, such as dropping support for the Research Works Act and making papers over four years old from several mathematics journals freely available online. It has also contributed to an increased awareness of the issues related to high journal prices and the locking up of articles behind paywalls.

However, it is possible to take a more pessimistic view. There were rumblings from the editorial boards of some Elsevier journals, but in the end, while a few individual members of those boards resigned, no board took the more radical step of resigning en masse and setting up with a different publisher under a new name (as some journals have done in the past), which would have forced Elsevier to sit up and take more serious notice. Instead, they waited for things to settle down, and now, two years later, the main problems, bundling and exorbitant prices, continue unabated: in 2013, Elsevier’s profit margin was up to 39%. (The profit is a little over £800 million on a little over £2 billion.) As for the boycott, the number of signatories appears to have reached a plateau of about 14,500.

Is there anything more that can be done? One answer that is often given is that the open access movement is now unstoppable, and that it is only a matter of time before the current system will have changed significantly. However, the pace of change is slow, and the alternative system that is most strongly promoted — open access articles paid for by article processing charges — is one that mathematicians tend to find unpalatable. (And not only mathematicians: they are extremely unpopular in the humanities.) I don’t want to rehearse the arguments for and against APCs in this post, except to say that there is no sign that they will help to bring down costs any time soon and no convincing market mechanism by which one might expect them to.

I have come to the conclusion that if it is not possible to bring about a rapid change to the current system, then the next best thing to do, which has the advantage of being a lot easier, is to obtain as much information as possible about it. Part of the problem with trying to explain what is wrong with the system is that there are many highly relevant factual questions to which we do not yet have reliable answers. Amongst them are the following.

1. How willing would researchers be to do without the services provided by Elsevier?

2. How easy is it on average to find on the web copies of Elsevier articles that can be read legally and free of charge?

3. To what extent are libraries actually suffering as a result of high journal prices?

4. What effect are Elsevier’s Gold Open Access articles having on their subscription prices?

5. How much are our universities paying for Elsevier journals?

The main purpose of this post is to report on efforts that I and others have made to start obtaining answers to these questions. I shall pay particular attention to the last one, since it is about that that I have most to say. I will try to keep the post as factual as possible and give my opinions about some of the facts in a separate post.

I have two small pieces of evidence. The first is an interesting comment that was made on a Google Plus post of mine by Benoît Kloeckner, who wrote the following.

In France, when the national consortium “Couperin” was dealing with Springer for the 2012-2014 contract, we issued a petition asserting that some terms (notably interdiction to unsubscribe from a number of journals) were unacceptable and that we, mathematicians, would agree not to get access to Springer journals. This was done to give negotiators more strength, but had little effect despite a significant number of signatures.

This points to a problem that I will discuss in more detail in my next post: that different subjects have different needs. Part of the reason mathematicians find the current system so objectionable is that we have already got to the stage where we don’t really need journals for anything other than the very crude measure of quality that it gives us, since a fairly high, and ever increasing, proportion of the articles that interest us are freely available in preprint form. But in some subjects, such as biology or medicine, this is much less true, and as a result people rely far more on journal articles.

I tried to take the temperature in the mathematics faculty in Cambridge by asking my colleagues to complete a very brief questionnaire: there were two questions, with multiple-choice answers. The questions were as follows.

1. How easily could you do without access to Elsevier journals via ScienceDirect and print copies?

2. For those who negotiate on our behalf to be in a strong bargaining position, they have to be able to risk our losing access to Elsevier products (other than those that are freely available) for a significant length of time. How willing would you be for them to take that risk?

In case the results were interestingly different, I got people in DAMTP (the department of applied mathematics and theoretical physics) to answer one copy of the questionnaire and people in DPMMS (the department of pure mathematics and mathematical statistics) to answer another. The results were as follows. There were 96 responses from DAMTP and 80 from DPMMS. I give the DAMTP figure first and then the DPMMS figure, both as percentages.

1. How easily could you do without access to Elsevier journals via ScienceDirect and print copies?

(i) It would be no problem at all. [27.1, 23.8]

(ii) It would be OK, but a minor inconvenience. [26.0, 38.8]

(iii) It would be OK most of the time, but occasionally very inconvenient. [24.0, 32.5]

(iv) It would be a significant inconvenience. [14.6, 5.0]

(v) It would have a strongly negative impact on my research. [8.3, 0.0]

2. For those who negotiate on our behalf to be in a strong bargaining position, they have to be able to risk our losing access to Elsevier products (other than those that are freely available) for a significant length of time. How willing would you be for them to take that risk?

(i) Very willing [46.9, 55.7]

(ii) Willing [31.3, 39.2]

(iii) Unwilling [14.6, 3.8]

(iv) Very unwilling [7.3, 1.3]

Thus, if the responses were representative, then in both departments, most people would not suffer too much inconvenience if they had to do without Elsevier’s products and services, and a large majority were willing to risk doing without them if that would strengthen the bargaining position of those who negotiate with Elsevier.

Another question I might have asked is how much the answers would have changed if the departments were to subscribe to just a few important journals. That is an important question, since it might be that the University of Cambridge should follow the examples of Harvard, MIT, Cornell and others (that link is from 2004 so the situation may have changed), stop paying for a Big Deal contract and switch to paying for individual journals at list prices instead.

It is very easy to find websites where surveys like the one I conducted can be set up for no charge. (But be a little careful: I accidentally chose one called Surveymonkey that allowed only 100 responses, as a result of which I had to ask people to do it again.) I would be extremely interested if other people could do similar surveys in their own departments, both in mathematics and in other subjects.

My impression has for some time been that in mathematics a significant proportion of articles are available on the arXiv or on authors’ home pages, to the point where I almost never need to look at the journal version. There also appears to be a distinct positive correlation between the quality of a journal and the proportion of its articles freely available. And there seem to be national differences in the extent to which people make their papers available. But until recently it was a rather long and tedious process to obtain any hard figures about this.

Recently, however, Scott Morrison has set up a website called The Mathematics Literature Project, to which you can contribute if you have the time. Although one still has to input the information manually, Scott has written software that automates the process to some extent and makes it much quicker. The project is still in its infancy, but it already demonstrates that a large proportion of articles in various different journals, not all of them Elsevier journals, are indeed freely available in preprint form. And there is some evidence for the correlation with quality: for example, Discrete Mathematics is a less good journal than the Journal of Combinatorial Theory A and B, and a lot fewer of its articles can be found. (For JCTA the proportion is over 80%, whereas for Discrete Mathematics it is more like 30%.)

Thus, there is plenty of evidence that mathematicians at least do not really need their universities to pay large sums of money to Elsevier. Unfortunately, because of bundling, that fact on its own has had almost no effect on prices.

I’m tempted just to suggest that you go and talk to a librarian. You won’t be left in much doubt about the answer, at least qualitatively speaking. In brief, libraries suffer because bundling means that they have very little control over their budgets. If Elsevier raises its prices, then libraries simply have to pay them or else lose the entire bundle, so effectively they are forced to make cuts elsewhere. And this happens. For example, Phil Sykes, former chair of Research Libraries UK, shared a document with me that includes many interesting figures, one of which is that between 2001 and 2009, mean expenditure on books went up by 0.17%, which is a substantial real-terms cut, while mean expenditure on journals went up by 82%. Apparently, the expenditure on books as a proportion of total expenditure went down from 11% to just over 7% between 1999 and 2009.

But this distortion is not confined to books. Journals that belong to a large bundle are artificially protected, at the expense of other, potentially more useful, journals that do not belong to the bundle. If you think that this is just a theoretical possibility, then take a look at the example of the Université de Paris Descartes. This is the top university in Paris for medicine, the university you try to get into if you are French and want to be a doctor.

It would seem a safe bet that a top medical university would subscribe to at least some journals from the Nature publishing group, such as Nature Medicine, which describes itself as the premier journal for medical research, or Nature, which likes to think of itself as the premier journal full stop. But no: subscriptions to all Nature journals as well as many others were cancelled this year. In the long list of cancelled subscriptions, you won’t find any mention of Elsevier journals, because they are bundled together.

From time to time, a library decides that enough is enough. A couple of years ago, the mathematics department of the Technisches Universität München decided to cancel all its subscriptions to Elsevier journals. And very recently the entire Universität Konstanz, also in Germany, decided to cancel its license negotiations and replace its license by “alternative procurement channels”. Given the evidence that we are becoming less reliant on journal subscriptions, it would seem rational for other libraries to consider whether to take similar measures.

Recall that Gold Open Access refers to the practice where a publisher makes an article freely available online in return for an article processing charge (APC), which is typically paid by an author’s institution or by a grant-awarding body. Elsevier now has various journals that are funded that way, as well as “hybrid” journals — that is, journals to which libraries still subscribe but which allow authors to make their articles open access in return for an APC. The proportion of Elsevier articles for which APCs have been paid is currently very small, but it is likely to increase, since various funding bodies are starting to insist that the academics they fund should make their articles open access, and often (but not always) the assumption is that this should be done via an APC.

A few months ago, it occurred to me to wonder what would happen if the proportion of Gold Open Access articles did indeed increase. Would Elsevier continue to rake in its subscription revenue and receive the APCs on top? This would seem particularly unjust in the case of hybrid journals, since libraries with Big Deal contracts cannot cancel their subscriptions to them, and in any case if several of the articles are not open access they may well not want to. So there would seem to be a danger that Elsevier is receiving substantial article processing charges that are not needed to cover the cost of processing (the additional cost of making an article open access is at least an order of magnitude less than the APCs), or to compensate Elsevier for loss of subscription revenue.

I then discovered that, not surprisingly, many other people had been concerned about this point. There is even a technical term for the practice of effectively charging twice for the same article: it is called *double dipping*. I found a page on Elsevier’s website where they stated that they had a no-double-dipping policy. However, that mentioned only the list prices of journals, so it did not address my concern at all, given that most libraries have Big Deal contracts. I decided to write to Elsevier to ask about this, and the result was that they updated the relevant page.

I think one can summarize what they say on the page now as follows: they set their prices based on the number of non-open-access articles included in the Freedom Collection; this has gone up, so they feel no compunction about charging more for the Freedom Collection. So they are at least *implying* that if enough open-access articles were published that the total volume of non-open-access articles went down, they would lower their prices.

That leaves me with two concerns. The first is that if their Big Deal contracts are confidential, then we have no way of knowing whether they are sticking to their official policy. The second is that what matters should not be the number of open access articles as a proportion of the whole, but the proportion of open access articles *amongst the articles that people actually want to read*. If, for example, half the articles in journals such as Cell and The Lancet became open access but Elsevier launched a handful of joke journals that published a comparable volume of articles, then the value of the non-open-access component to libraries would have gone down substantially, but according to Elsevier’s stated policy their charges would not be decreased.

On top of all that is a remarkable scandal that has attracted a great deal of attention recently, which is that Elsevier has been double dipping in the most direct way possible: charging people to download articles for which APCs have been paid. Mike Taylor spotted this about two years ago. Elsevier’s response, coordinated by Alicia Wise, was less than swift, not surprisingly given their strong incentive to drag their feet about it. Peter Murray-Rust has been vigorously campaigning about this issue. If you’re interested, you can check out the March 2014 archive of his blog and work backwards.

Now we come to the big question. One of the most annoying aspects of the current situation in academic publishing is that the big publishers don’t want us to know what our universities are paying for their journals, so they insist on confidentiality clauses. As a result, we can’t tell whether we are getting good value for money, though there is plenty of indirect evidence, and even some direct evidence, that we are not.

There have been a few attempts in the past to use freedom-of-information legislation to get round these confidentiality clauses, some successful and others not. Also, some information has been made available by other means. Here are the cases I know about, but this list is very likely to be incomplete. (If I am notified of further useful information, I will be happy to add it to the list with appropriate acknowledgement.)

1. In 2009 public-record requests were made by Paul Courant, Ted Bergstrom and Preston McAfee to a large number of US universities asking for details of their Big Deal contracts with publishers. They had considerable success with this, obtaining information from 36 institutions. Elsevier made strenuous efforts to prevent the disclosures, contesting the request to Washington State University, but a judge ruled against them. See this page for further details. Together with Michael Williams they wrote an analysis of what they discovered, which ~~will soon become available in preprint form.~~ has now been published. It includes the following figures for what a number of universities spent on Elsevier contracts. The first figure in each row is the cost in dollars of the Elsevier Freedom Package and the figure in brackets is the enrolment. (The latter is not by any means a perfect measure of the size of a university, but it gives at least some idea.)

University |
Cost in dollars |
Enrolment |

Arizona Universities* | 2,724,888 | 123,473 |

Auburn | 1,252,544 | 22,654 |

Clemson | 1,296,044 | 16,582 |

Colorado State | 1,319,633 | 24,409 |

Cornell | 1,969,908 | 20,340 |

Georgia State | 934,764 | 25,135 |

Louisiana State | 1,198,237 | 28,467 |

New York U. | 1,878,962 | 40,291 |

U of Alabama | 1,018,614 | 22,971 |

U of California** | 8,760,968 | 218,320 |

U of Colorado | 1,725,023 | 28,333 |

U of Denver | 467,406 | 10,036 |

U of Georgia | 1,854,419 | 33,079 |

U of Idaho | 750,808 | 10,008 |

Illinois Universities*** | 2,319,383 | 72,751 |

U of Iowa | 1,420,484 | 27,361 |

U of Maryland | 1,760,173 | 31,573 |

U of Michigan | 2,164,830 | 39,447 |

U of Tennessee | 579,815 | 27,635 |

U of Texas, Arlington | 620,042 | 20,136 |

U of Texas, Austin | 1,539,380 | 46,537 |

U of Wisconsin | 1,215,516 | 35,295 |

U of Wyoming | 497,014 | 10,478 |

*A consortium of three universities in Arizona

**A joint license for ten University of California campuses

***A joint license for three University of Illinois campuses

If you like this kind of thing, then take a look at the appendix to their paper, from which the above table comes, and which is not behind a paywall. In case you have access to PNAS, the article is here.

One related thing I have found, which interests me a lot because of its relevance to this post, is a judgment from Greg Abbott, the Attorney General of Texas, that the University of Texas should release details of its contracts with publishers. The part that interests me starts near the bottom of page 3, where there is a detailed discussion of what constitutes a trade secret. Roughly speaking, information is a trade secret of one company if disclosing it to other companies would cause substantial competitive harm to the first company. The Attorney General concludes in robust terms that the Big Deal contracts do not meet the definition of a trade secret, which I agree with because the different publishing companies are not competing to sell the same product.

2. There is a fascinating blog post by David Colquhoun written in December 2011, which I would certainly have referred to before if I had been aware of it, in which he discusses in detail the situation at his institution, which is University College London. In it, he says, “I’ve found some interesting numbers, with help from librarians, and through access to The Journal Usage Statistics Portal (JUSP).” The word “interesting” is an understatement. The first number is that UCL then paid Elsevier €1.25 million for electronic only access to Elsevier journals. But as interesting as that headline figure is his analysis of the usage of Elsevier (and other) journals. As one might expect, but it is very good to see this confirmed, there are a few journals that are used a lot, but the usage tails off extremely rapidly.

3. In this country, there have been Freedom of Information requests to De Montfort University in 2010 (successful), Swansea University in 2014 (unsuccessful), and the University of Edinburgh in 2014 (successful). I recommend at this point that you read the refusal letter by Swansea. For reasons that I’ll come to, it is fairly clear that the letter was basically written by Elsevier, so it gives us some insight into their official reasons for wanting to keep their contracts secret. As I’ll discuss later, their arguments are very weak.

There was also a successful request to Swansea in 2013, but this one asked for the amount spent on all journal subscriptions, rather than just Elsevier subscriptions. It reveals that the amount went up from £1,514,890.88 in 2007/8 to £1,861,823.92 in 2011/12. (From the wording, it seems that these figures include VAT, but I’m not quite sure.) That’s a whopping 23% increase in four years. Of course, that may be because Swansea University decided to increase significantly the number of journals it subscribed to, but that explanation seems a trifle unlikely in the current economic climate. Whatever the explanation, the amount of money is very high.

The successful request to Edinburgh was made on January 16th by Sean Williams. The response was delayed, but on April 8th they finally responded, giving full details for two years and the totals for three. This reveals that Edinburgh spends around £845,000 plus VAT per year.

4. Recently there was a long negotiation between Elsevier and Couperin, a large consortium representing French academic insitutions. (Actually, I say long, but Elsevier apparently has an annoying habit of not beginning the process of negotiation in earnest until close to the end of the existing contract, so that the other side must either make decisions very quickly or risk large numbers of academics temporarily losing access to Elsevier journals.) The result was what one might call a Huge Deal, one that gave complete access to ScienceDirect to all academic institutions, from the very largest to the very smallest. Couperin professed to be pleased with the deal. I do not yet know whether that satisfaction is shared by the universities that are actually paying for it. If you want to know how much France is paying for access to ScienceDirect, then I recommend typing “Elsevier Couperin” into Google. After at most a couple of minutes of digging, you will find a document that tells you. Three important aspects of this deal are (i) that it lasts for five years, (ii) that the total amount paid to Elsevier is initially lower than before but goes up each year and ends up higher and (iii) that the access is now spread to many more institutions. What I do not know is what the effect of this is on the large universities that were paying for Elsevier journals before. Does the fact that many more institutions are involved mean that prices have gone down substantially? Or are most of the institutions that have newly been granted access paying very little for it and therefore not saving much money for the others? It would be good to have some insight into these questions. The bottom line though, is that Elsevier’s profits in France are protected by the deal.

5. Brazil too has a national agreement with Elsevier, and refuses to sign a confidentiality clause. Somewhere I did once find, or get referred to, a page with details about the deal, but have not managed to find it again. My memory of it was that it was rather hard to understand.

**Update 25/4/2014**: many thanks to Rafael Pezzi, whose comment below I reproduce here, for more information about the situation in Brazil.

From the Brazilian open science mailing list:

Brazil has an nation wide agreement providing journal access to 423 academic and research institutions. It is called Portal de Periódicos, provided by CAPES. According to its 2013 financial report [1], last year CAPES spent US$ 93,872,151.11 (with US$ 31,644,204.12 paid to Elsevier).

Some institutions that are not covered by the agreement, as they do not meet the eligibility criteria, had to pay in separate in order to get access to this portal, spending additional US$ 11,560,438.93.

Rafael

[1] http://www.capes.gov.br/images/stories/download/Contas_Publicas/Relatorio-de-Gestao-2013.pdf

6. A comment by Anonymous below points me to a blog post that says that at the end of 2011 Purdue agreed a $2.9 million deal with Elsevier and describes the general situation facing libraries when they negotiate these deals. It also links to a post about Pittsburgh (with less precise figures).

In early January, I decided to try to find out more about what UK universities are paying by making a request under the Freedom of Information Act. As in France, the negotiations are carried out by a consortium: the British one is called JISC collections. (It’s surprisingly hard to find out what JISC stands for: the answer is Joint Information Systems Committee.) Initially (to be precise, on the 8th of January), I wrote to Lorraine Estelle, who is the head of JISC collections. I made a FOI request, and the information I asked to be told was how much JISC had agreed to pay Elsevier in the most recent round of negotiations, and how that payment was shared between the institutions represented by JISC.

She suggested that we should speak on the phone, which we did. I learned some important things from the phone call, which I will come to later, but I did not get the information I had actually asked for. She explained why on the phone, and some time later, when I found that I couldn’t quite remember her explanation, I asked for a clarification in writing. She provided me with the following.

Your question: As I understood it, you didn’t actually have the data that I was asking for. Is that correct? And do you mean that you negotiated a total — which, presumably, you would know — but do not know how it was split between the various universities?

Answer: We do have the data and we do know the split – but because we do not actually aggregate the subscriptions ourselves for the Elsevier deal, I have to get the total sum and the split from Elsevier.

I interpret that as meaning that for legal purposes she did not have the information in a form that might have obliged her to disclose it under the Freedom of Information Act.

And thus, I was passed on to Alicia Wise. As many people who have had dealings with Alicia Wise have found, including Peter Murray-Rust in his attempts to stop Elsevier charging for access to open access articles, this is not a good situation to be in.

Obviously she didn’t say, “Of course, I’d be happy to provide you with that information.” But I’d have been satisfied with a clear statement from her that she was not prepared to provide it, and I couldn’t get that either. Here is a sample of our correspondence. (Incidentally, owing first to some misunderstanding and then, apparently, to Alicia Wise wanting to check that Lorraine Estelle had not given me any confidential information, which she hadn’t, the correspondence didn’t even begin until about a fortnight after Lorraine Estelle had passed on my request.)

Her first email message, sent on February 5th, explained that Elsevier makes “an array of pricing information publicly available” and provided some links. These were to list prices of journals, which, because of bundling, give no indication of what universities actually pay. She also proposed that we should meet, or perhaps talk on the phone. I wrote back on the 7th suggesting that a phone conversation would be more convenient. I got no response for four days, so on the 11th I sent my reply again, which prompted a suggestion of several possible dates for a meeting. She said,

Sorry, should have sent you a receipt acknowledgment. We’ve worked out internally that Chris Greenwell and I should, together, be able to answer questions that arise (although I am also contemplating inviting someone from our pricing team along in case you have very very detailed questions!)

At this point I had a little worry, so I put it to her.

But before we actually arrange anything, and in particular before we decide whether it is better to meet physically or by phone, perhaps it is worth clarifying what could come out of such a meeting. The main question I asked in my FOI request was the following: “there is one particular thing I would like to know, and that is details of the most recent round of negotiations between JISC and Elsevier. I would like to know what annual payment was agreed, and how that payment was shared between the higher education institutions represented.”

If you are prepared to answer that question in full (I’m talking actual amounts of money rather than the general principles underlying the negotiations), and without binding me to any confidentiality agreement, then we have something serious to talk about. If not, then I’m not sure there is any point in having a discussion. However, in the second case, it would still be useful to know your reasons for not being prepared to divulge the information.

She responded as follows.

Thanks for this. I continue to think a call or meeting would be helpful as my immediate question is what hypothesis do you have, or are you testing, that require data at this level of granularity? The data you request are commercially sensitive. I am wondering if publicly available data – for example the attached which is from publications by the Society of College, University, and National Libraries (http://www.sconul.ac.uk/) – might serve your purpose? If we could understand better what you are after and why, we might be better able to come up with data that helps you. (And, yes, we would have even greater flexibility if you were prepared to consider treating some information in confidence but I appreciate you might be unwilling to do so.)

To which I said this.

Thanks for sending those slides, though of course you must have known perfectly well that they would not be of any help to me.

I can’t see what is unclear about what I am after. As I said, I would like to know what the UK universities represented by JISC are paying annually for Elsevier journals (a combination of Core Collections and access to Science Direct). My main reason for wanting to know that is that I think it is in the public interest for people to know how much universities are spending.

However, there are more specific reasons that I am interested in the data. One is that because the cost to universities of their Core Collections is based on historic spend on print journals, there is the potential for very similar universities to pay very different amounts for a similar service from Elsevier. I have been told that this is the case — for example, Cambridge suffers because historically college libraries have subscribed to journals — but would like to have the data so that I can confirm this.

If you won’t give me this information on the grounds of commercial sensitivity, then just let me know, and it will save us all time.

That was on February 12th. Her next reply came on March 7th, and said this.

Thanks for this. I did intend for the slides to be useful to you, but now that you have explained more clearly what you are after can see this was not the case. They have, however, helped to move our conversation on. We are focused on delivering value for money to all our customers, including Cambridge. The most direct way to find out the information you are looking for with respect to Cambridge might be a conversation with the library there?

So after all that, I still didn’t have a straight answer. However, by then I had long since lost patience: on February 19th, I submitted Freedom of Information requests to all 24 Russell Group universities, with the exceptions of Cardiff, where my email kept bouncing back, and Exeter, which I missed out accidentally. (Later I sent requests to them too.) My request was as follows.

Dear [Head of university library],

I would like to make a request under the Freedom of Information Act. I am interested to know what [name of university] currently spends annually for access to Elsevier journals. I understand that this is typically split into three parts, a subscription price for core content, which is based on historic spend, a content fee for accessing those journals via ScienceDirect, and a further fee for accessing unsubscribed titles from the Freedom Collection, also via ScienceDirect. I would like to know the total fee, and how it is split up into those three components.

Many thanks in advance for any help you can give me on this.

Yours sincerely,

Timothy Gowers

When I sent these requests, I had very little idea what my chances were of finding anything out at all. Lorraine Estelle had told me that JISC Collections are firmly against confidentiality clauses, but that Elsevier had insisted. But also, and crucially, there was a clause about FOI requests that made it not completely certain that they would fail. Unfortunately, this clause cannot be made public. (Yes, you read that correctly: the confidentiality clause is itself confidential.) However, as we shall see, the responses by some of the universities give some indication of what is probably in it.

In the end, the result was that, to my surprise and delight, a substantial majority of universities decided to give me the information I wanted, though many of them gave me just the total and not the breakdown into its three components. Here are the figures from the 18 universities that were brave and public spirited enough to give me them, together with Edinburgh, which, for reasons I don’t understand, refused to give any figures to me but provided them to Sean Williams. The figures *exclude* VAT, which adds a not exactly negligible 20% to the cost, but at least that goes back to the taxpayer rather than swelling even further the coffers of Elsevier. The price is rounded to the nearest pound. I obtained the enrolment figures from this page.

**Update 25/4/2014:** Richard van Noorden has kindly pointed me to a document from which I can obtain staff numbers. So I’ve now added a third column to the table, which gives the number of full-time academic staff followed by the number of part-time academic staff. (These figures are for the academic year 2012/3. Again, they may not be a perfect measure of how much people are using Elsevier journals, but they are probably better than student numbers.)

**Update 28/4/2014** Imperial College London has responded to my request for a review of their initial decision by providing me with their total figure (but not the breakdown).

**Update 30/4/2014** The University of Nottingham has done the same. The breakdown is not provided because they “consider the likelihood and scale of prejudice here [to both Elsevier's and the University's commercial interests] to be very high and therefore the test favours application of the exemption.” It is clear that there is some kind of game going on here, since everybody knows that the breakdown is basically that almost the entire amount is the subscription fee, with the content fee and Freedom Collection fee being a tiny proportion of the whole. (See below for an explanation of what I am talking about here.) So there is no imaginable effect that publishing the exact numbers could possibly have. However, equally, it is not all that important to know them.

**Update 16/5/2014** Queen Mary University of London has supplied their total figure to Edward Hughes, who is there.

**Update 23/5/2014** I now have the figures from Oxford.

**Update 31/5/2014** Figures from LSE added.

University |
Cost |
Enrolment |
Academic Staff |

Birmingham | £764,553 | 31,070 | 2355 + 440 |

Bristol | £808,840 | 19,220 | 2090 + 525 |

Cambridge | £1,161,571 | 19,945 | 4205 + 710 |

Cardiff | £720,533 | 30,000 | 2130 + 825 |

*Durham | £461,020 | 16,570 | 1250 + 305 |

**Edinburgh | £845,000 | 31,323 | 2945 + 540 |

*Exeter | £234,126 | 18,720 | 1270 + 290 |

Glasgow | £686,104 | 26,395 | 2000 + 650 |

Imperial College London | £1,340,213 | 16,000 | 3295 + 535 |

King’s College London | £655,054 | 26,460 | 2920 + 1190 |

Leeds | £847,429 | 32,510 | 2470 + 655 |

Liverpool | £659,796 | 21,875 | 1835 + 530 |

§London School of Economics | £146,117 | 9,805 | 755 + 825 |

Manchester | £1,257,407 | 40,860 | 3810 + 745 |

Newcastle | £974,930 | 21,055 | 2010 + 495 |

Nottingham | £903,076 | 35,630 | 2805 + 585 |

Oxford | £990,775 | 25,595 | 5190 + 775 |

* ***Queen Mary U of London | £454,422 | 14,860 | 1495 + 565 |

Queen’s U Belfast | £584,020 | 22,990 | 1375 + 170 |

Sheffield | £562,277 | 25,965 | 2300 + 460 |

Southampton | £766,616 | 24,135 | 2065 + 655 |

University College London | £1,381,380 | 25,525 | 4315 + 1185 |

Warwick | £631,851 | 27,440 | 1535 + 305 |

*York | £400,445 | 17,405 | 1205 + 285 |

*Joined the Russell Group two years ago.

**Information obtained by Sean Williams.

***Information obtained by Edward Hughes.

§LSE subscribes to a package of subject collections rather than to the full Freedom Collection.

~~The universities for which I still do not have the information are~~ ~~Imperial College London~~, ~~London School of Economics and Political Science,~~ ~~Nottingham,~~ ~~and Oxford.~~ ~~, and Queen Mary University of London.~~ ~~I still have hopes of finding out the figures for~~ ~~Imperial~~, ~~Nottingham and~~ ~~Oxford, and will provide them if I do.~~

A striking aspect of these amounts is just how much they vary. How does it come about, for example, that University College London pays over twice as much as King’s College London, and almost six times as much as Exeter? In order to explain this, I need to say something about the system as it is at the moment. It is here that I am indebted to Lorraine Estelle.

The present system (as it is in the UK, but my guess is that these remarks apply more generally) would be inexplicable were it not for the fact that it grew out of an older system that existed before the internet. Given that fact, though, it makes a lot more sense. (I don’t mean that it is fair — just that its existence is comprehensible.) If you were an Elsevier executive managing the transition from a world of print journals to a world where most people want to read articles online, what service would you offer and what would you do about prices? Since it costs almost nothing to make articles that are already online available to more people, and since it is convenient for a university to have access to everything, the obvious service to offer is complete access to all Elsevier journals. But what should you charge for this service?

Up to now, different universities have spent significantly different amounts on Elsevier journals, so if you start all over again and work out a price for the complete package, either some universities will have to pay much more than they did before, which they would probably be unwilling to do, or some universities will end up paying much less than they did before and profits will suffer quite badly. So you try to devise a system that will give universities the new service at prices that are based on the old service. That way, no university ends up paying significantly more or less than it did before. But because this is unfair — after all, now different universities will be paying very different amounts for the same service — you feel that you can’t let the universities know what other universities are paying.

The current system in the UK is very much as the above thought experiment would lead one to expect. So it is easy to see why Elsevier wants confidentiality clauses. It also explains the rather strange structure of the deals that universities have with Elsevier. Typically they have a certain “core content” (roughly, the journals they subscribed to before the transition), for which they pay something close to list prices and receive print copies. They then pay a small extra fee for permanent electronic access to that core content, and another small extra fee for electronic access to all other Elsevier journals, but this time only while the university continues to have a contract with Elsevier. Of course, in such a situation a university would like to cut down its core content to zero, but that is not allowed: there are strict controls on what they are allowed to cancel. The buzz phrase here is “historic spend”, which roughly means what universities spent on print subscriptions before the transition to electronic access. The system ensures that what universities pay now closely matches their historic spend.

Here is how Lorraine Estelle explains it.

Prior to the move to online journal, each institution subscribed to titles on a title by title basis.

When NESLI was set up, our negotiations were confined to the “e-fee” or “top-up fee”.

This was the fee that institutions needed to pay in order to have access to all a publisher’s content in electronic format. Their “subscribed titles” plus all other titles from that publisher. (This is the deal that has become known as “The Big Deal’ and adopted by all major publishers).

The “e-fee” or “top-up fee” was (and usually is still) contingent of the institutions maintaining the level of spend for the “subscribed titles”.This article provides the background to NESLI http://www.uksg.org/serials/nesli back in 1998

As institutions have moved to e-only – we negotiate with most publishers on the total cost across the consortium. However, in most (but not all) deals the division of spend across the UK library consortium is uneven – and still depends on the level of historic spend on subscribed titles. So an institution that used to subscribe to many titles, will still pay more than one that used to subscribe to fewer.

We negotiate the total increase – known as the price cap, the cancellation allowance (which means institutions can cancel a percentage of historically subscribed titles and still retain e-access), and the licence terms and conditions. This is not unique and it is the model employed by most academic library consortia across the world.

The deal is negotiated by Jisc Collections – but we do have support and input from the institutions. Oversight of our negotiations is provided by our Electronic Information Resources working group http://www.jisc-collections.ac.uk/About-JISC-Collections/Advisory-Groups/Electronic-Resources-Information-Group/ It is very rare for an institution to negotiate its own deal, because it would be difficult for them to get the same terms on an individual basis. The few exceptions are where an institution has a special relationship with a publisher – University of Oxford for OUP titles, for example.

All this is important, because it shows that a certain picture of how Elsevier operates, one that I used to believe in, is an oversimplification. In that picture, Elsevier insists on confidentiality clauses in order to be able to screw each university for whatever it can get. However, such a description is misleading on two counts. First, Elsevier negotiates with JISC rather than directly with universities, and secondly, the amount that universities pay is based on historic spend rather than on what Elsevier manages to wring out of them.

I say “an oversimplification” rather than “wrong” because if Elsevier *did* operate in the way I had previously imagined, the results would probably be rather similar. What is the maximum that Elsevier would be likely to persuade a university to pay? It would be very hard to persuade a university to agree to a huge leap in prices, so in each year one would expect the maximum to be whatever the university paid in the previous year plus a small real-terms increase. And all the evidence suggests that that is more or less exactly what Elsevier has managed to achieve.

Another factor that is perhaps worth briefly discussing is the fact that Durham, Exeter, Queen Mary University of London and York joined the Russell Group only two years ago. This probably helps to explain why (apart from QMUL, which refused to provide me with its figures) these universities are paying significantly less than most of the others. Whether Elsevier had an explicit policy of charging less to supposedly less prestigious universities (though the list of universities not in the Russell Group contains several that appear to me to be at least as prestigious as several that are in the Russell Group), or whether there is merely a strong correlation between membership of the Russell Group and historic spend on Elsevier journals, I don’t know. I think the former may be the case, since I have heard librarians talking about a “banding system” (I don’t know any details about how it works), and also because Bergstrom et al mention in their paper that in the US there is a classification of universities into different types according to how research intensive they are, with prices depending to a considerable extent on this classification.

A further factor that may possibly explain some of the data is that some institutions have recently merged with others. For example, The University of Manchester, one of the universities that pays most, merged in 2004 with UMIST (University of Manchester Institute of Science and Technology), and UCL merged in 2012 with The School of Pharmacy, University of London. The latter fact may help to explain why they are paying so much more now than what David Colquhoun said they were paying in 2011.

Although the differences between the amounts that different universities pay are eye-catching, it is important to be clear that they are a *symptom* of what is wrong with the system, and not the problem itself. The problem is quite simply that Elsevier has a monopoly over a product for which the demand is still very inelastic (the lack of elasticity being largely the fault of the academic community), with the result that the prices are unreasonably high for the service that Elsevier provides. (It bears repeating that the refereeing process and editorial selection are not paid for by Elsevier — those services are provided free of charge by academics.) If Elsevier were to equalize the prices (or equalize some suitable quantity such as price divided by size of university, or price per use) while keeping the aggregate the same, this would *not* solve the underlying problem.

As I have explained above, the price that a typical university pays to Elsevier in its Big Deal is divided into three components. One is a “subscription fee”, which is to pay for a certain collection of journals at something comparable to their list prices. Another is a “content fee”, which is to pay for electronic access in perpetuity to those titles (via ScienceDirect). The third is a “Freedom Collection fee”, which is to pay for electronic access to the rest of Elsevier’s journals, but this access, unlike the access covered by the content fee, is lost if you cancel the Big Deal.

I have got breakdowns from seven universities, but rather than give them here, I would rather simply make a few general points about them.

1. The content fee (that is, the fee for electronic access to the subscribed titles) is, in all the cases I know about, very close to 5.8824% of the subscription fee. Since 1/17=0.05882352941, I think that is saying that the content fee is exactly one seventeenth of the subscription fee, with the tiny differences coming from rounding errors. Of course, the precise details here are unimportant: what matters is that it is a very small amount compared with the subscription fee itself.

2. The Freedom Collection fees do not have an obvious relationship with the subscription fee, but, amusingly, with the seven examples I have, the more you pay for the latter, the less you pay for the former. That actually makes some kind of sense, since the more you are paying the content fee, the bigger the chunk of the Freedom Collection you are already subscribing to. I haven’t managed to reverse-engineer any kind of simple quantitative relationship between the two prices, however.

3. The inverse relationship in point 2 might seem to make things fairer, and to a very small extent it does, but we are talking about fees of between £10,000 and £25,000 here, so even for a university with a small subscription fee the price of the Freedom Collection fee is well under a tenth of its subscription fee. In fact, it doesn’t even make up for the discrepancy in the content fees, because the price is not high enough to do so. Of course, it is grotesquely misleading to say that the Freedom Collection costs so little, because the price you pay for it is conditional on not cancelling the subscriptions that keep the subscription fee extremely high. Indeed, the entire “breakdown” is misleading for that reason: the effective cost of the Freedom Collection is far higher than its nominal cost.

The moral of all this is that the figures giving the total cost are what matter. What universities actually need is electronic access to Elsevier’s journals. In order to get that access, Elsevier insists that they nominally pay for something else, namely subscriptions that they are not allowed to cancel (even when they are duplicates, as has happened in Cambridge because of college libraries, and probably in Manchester and UCL as a result of mergers). But that is of no practical importance. It’s a bit like those advertisements that say “FREE OFFER!” and then in very small print they add “when you spend over £X,” which of course means that the so-called free offer is not free at all.

While I was still not at all sure that I would get any information about prices, I comforted myself with the thought that an institution that refuses a FOI request has to give reasons, and those reasons might well be informative. For example, they might reveal that the main reason for confidentiality is to protect Elsevier’s profits, which would conflict with Elsevier’s official reasons.

Or would it? If you’ve read this far, then your reward is the following rather wonderful video (which has done the rounds for a while, so you may have seen it) of David Tempest, from Elsevier, explaining why confidentiality clauses are necessary. Many thanks to Mike Taylor for obtaining it. A transcript can be found on his blog.

The person who asked the question is Stephen Curry, from Imperial College London. ~~I’m sorry to say that, as mentioned above, Imperial is one of the universities I have not managed to get figures from.~~ I’m glad to say that at last he can know what his university library is spending on his behalf.

David Tempest’s lapse aside, Elsevier usually does not admit that the confidentiality clauses are there to protect its profits. But the refusal letters I received tell a different story. A good example is the first response I had from any university (other than an acknowledgement), which was a refusal from Queen’s University Belfast. I will quote it in full.

Dear Mr Gowers

Freedom of Information Request – Elsevier JournalsMy letter, dated 21 February 2014, in relation to the above refers. [sic]

Having reviewed your request and consulted with appropriate colleagues, I would respond as set out below:

I would like to make a request under the Freedom of Information Act. I am interested to know what Queen’s University Belfast currently spends annually for access to Elsevier journals. I understand that this is typically split into three parts, a subscription price for core content, which is based on historic spend, a content fee for accessing those journals via ScienceDirect, and a further fee for accessing unsubscribed titles from the Freedom Collection, also via ScienceDirect. I would like to know the total fee, and how it is split up into those three components.I can confirm that whilst the University does hold this information, it is not being provided to you as it is considered exempt under Section 43(2) of the Act.

Section 43(2) of the Act provides that information is exempt if its disclosure under the Act would be likely to prejudice the commercial interests of any person, including the public authority itself.

Commercial interests relate to the ability to successfully participate in a commercial activity. This could be the ability to buy or sell goods or services or the disclosure of financial and planning information to market competitors. It is, therefore, necessary to decide whether release of this information will have an impact on the commercial activity of Elsevier or the University.

In making this determination, the University has consulted with Elsevier regarding the disclosure of the requested information and whether such disclosure would be likely to prejudice Elsevier’s commercial interests.

In written representations to the University, Elsevier has indicated that the disclosure of the amount of money spent annually on access to Elsevier journals would reveal pricing information, specifically the licensing fees that have been negotiated with the University in circumstances that may include a level of discount.

The disclosure of this information would be likely to have a detrimental effect on Elsevier’s future negotiating position with that of the University and, indeed, the wider HE sector – which represents a large percentage of their market.

The University accepts this argument and also considers that disclosure of information that would reveal pricing would also be likely to prejudice the commercial interests of the University itself, insofar as it could have a detrimental impact on the future negotiation of tailored solutions for licensing of Elsevier’s products and discounts from list prices.

Section 43(2) is a qualified exemption and the University must, therefore, consider where the balance of the public interest lies.The University accepts the need for transparency and accountability for decision making. The requirement, however, for transparency and accountability needs to be weighed against the harm to the commercial interests of third parties or the University itself through disclosure. The University has, therefore, weighed the prejudice caused by disclosure of the requested information against the likely benefit to the wider public.

In considering arguments in favour of disclosing the information, the University has taken into account the wider interest of the general public in having access to information on how public funds are spent. In this instance, there is a public interest in demonstrating that the University has negotiated a competitive rate in relation to the procurement of Elsevier’s products and services.

The University considers, however, that this public interest is already met by the significant amount of pricing information that Elsevier currently makes publicly available – such information is available at:

http:\www.elsevier.com/librarians/journal-pricing and

http:\www.elsevier.com/librarians/physical-sciences/mathematics/journal-pricing.In relation to those factors favouring non-disclosure, the University has a duty to protect commercially sensitive information that is held about any third party. In this instance, disclosure of the amount of money spent by the University on Elsevier products would reveal pricing information that was acknowledged by both the University and Elsevier at the time the contract was entered into as being commercially confidential. Disclosure of this information would be likely to prejudice not only the commercial interests of Elsevier but also the interests of the University itself, along with the relationship that the University has with its supplier.

It is reasonable, therefore, in all the circumstances of this case that the exemption should be maintained and the requested information not disclosed.

If you are dissatisfied with the response provided, please put your complaint in writing to me at the above address. If this fails to resolve the matter, you have the right to apply to the Information Commissioner.

Yours sincerely

Amanda Aicken

Information Compliance Unit

I responded as follows.

Dear Amanda Aicken,

Thank you for your response to my Freedom of Information Request (reference FOI/14/42). You invited me to write to you if I was dissatisfied with it. I have a number of reasons for dissatisfaction, so I am taking you up on your invitation.

My main objection is that I disagree with several of your reasons for declining my request. I will present them as a numbered list.

1. You say that the disclosure of the information I ask for would be likely to have a detrimental effect on Elsevier’s future negotiating position with that of the university. You also say that it would be likely to prejudice the commercial interests of the university itself. I do not find these two statements easy to reconcile. Could you please explain how it is possible for

bothparties to lose out?2. You agree with me that there is a public interest in demonstrating that the university has negotiated a competitive rate in relation to the procurement of Elsevier’s products and services. You go on to say that this public interest is already met by the information that Elsevier has made publicly available online. However, this is manifestly untrue. The only figures provided by Elsevier are for the list prices of their journals. But since universities pay for Elsevier’s Freedom Collection with a Big Deal, the list prices do not give me any way of verifying that the university has negotiated a competitive rate. Indeed, they do not even allow me to work out the order of magnitude of how much Queen’s University is paying to Elsevier. Please would you either retract your statement that this public interest has already been met by Elsevier, or else explain to me how to use the list prices to estimate the total amount paid by Queen’s University?

3. Your letter implies that there are direct negotiations between Elsevier and Queen’s University of Belfast. However, this is also not true. The negotiations are mediated through JISC. Therefore, there is no obvious mechanism whereby disclosing the prices would cause any commercial harm to the university.

4. It has not escaped my notice that the letter you sent is remarkably similar to a letter sent by the University of Swansea to somebody else who made a similar request. It is clear that you used that letter as a template, or else that you and the University of Swansea used the same template, perhaps provided by Elsevier. This suggests to me that you have not considered the balance of arguments for and against disclosure with sufficient independence.

In summary, the main two points that I cannot accept are that the financial interests of Queen’s University are likely to be prejudiced by the disclosure of this information, and that there is sufficient information in the public domain to enable me to determine whether the university has negotiated a competitive rate. If you are going to refuse to disclose the information, then I would like it to be for reasons that are not obviously false.

Yours sincerely,

Timothy Gowers

The Swansea letter I referred to is this one, which I have already mentioned. It was the formulaic nature of the response, with ghastly Orwellian phrases such as “tailored solutions” and misleading references to “a level of discount” that appeared not just in these two letters but in many other refusal letters that I was to receive, that got me annoyed enough to express my dissatisfaction, which in the case of Queen’s University Belfast and a handful of other universities eventually resulted in success. The response I received to my letter above was as follows. It did not really address my arguments, but since it gave me the information that was not a big concern.

Dear Mr Gowers,

Freedom of Information Request — Elsevier Journals — Internal ReviewYour email to Mrs Amanda Aicken, dated 5 March 2014, requesting an internal review of the University’s response to your Freedom of Information request on the above, refers.

On 21 February 2014, you submitted a request for information in relation to the University’s annual expenditure on access to Elsevier Journals. You requested details of the total fee and how this is split up into three components: a subscription price for core content; a contnet fee for accessing those journals via ScienceDirect; and a further fee for accessing unsubscribed titles from the Freedom Collection.

On 4 March 2014, the University responded to your request, confirming that whilst this information was held, it was not being provided to you as it was considered commercially sensitive information and, therefore, was exempt under Section 43(2) of the Act. The University had made this determination following consultation with Elsevier, which had indicated that the disclosure of the requested information would prejudice its commercial interests by revealing pricing information. In particular, Elsevier argued that disclosure of the information would reveal the licensing fees that had been negotiated with the University in circumstances that may have included a level of discount.

I understand that you, subsequently, lodged a complaint in respect of the University’s response to your request and this complaint has been handled as an internal review of the decision not to provide the requested information.

You have expressed dissatisfaction with the response on the grounds that you ‘cannot accept (are) that the financial interests of Queen’s University are likely to be prejudiced by the disclosure of this information, and that there is sufficient information in the public domain to enable me to determine whether the University has negotiated a competitive rate’.

I have now completed my review and my findings are detailed below.

I have reconsidered the nature of the requested information and the application of the exemption to withhold this information. In doing so, I have taken into account written advice from relevant senior staff in the University’s McClay Library and advice received from JISC regarding the detail of the contract with Elsevier. I have also noted your comments regarding the need for transparency and the public interest in demonstrating that the University has negotiated a competitive rate in relation to the procurement of Elsevier’s products and services.

At the time of your request, the University was clearly of the view that disclosure of the requested information would be likely to have a detrimental effect on Elsevier’s future negotiating position with that of the University and, indeed, the wider HE sector. An additional, albeit secondary argument, was the possibility that disclosure would prejudice the interests of the University itslef with respect to the relationship that the University has with Elsevier as a supplier. I am persuaded that that [sic] this was not, in the circumstances, an unreasonable view.

I do, however, believe that on balance, the public interest in disclosure was greater than that in maintaining the commercial interests exemption. I also understand that subsequent to your original request, several institutions have disclosed information, either in relation to the total annual expenditure on access to Elsevier Journals, or on the detailed breakdown of expenditure as requested.

In light of the above, it is my view that the information should now be disclosed. I am, therefore, providing the requested information in relation to 2014 — this is provided in the table below.

I have had several correspondences like this. I would like to pick out a couple of excerpts from other refusal letters that are not essentially contained in the Belfast letter. I had this rather chilling paragraph from Queen Mary University of London.

However, in addition to the reasons outlined above already, revealing this information to the world at large may damage the relationship that QML has with Elsevier including the prospect of legal action that may be taken against QML. This could result in QML being unable to offer Elsevier products which would have the knock-on effect of impacting our resources, our research and even student recruitment. Since these would imperil QML’s finances, in financially tough times and while receiving less and less from the public purse, this cannot be said to be in the public interest.

It would be interesting to know what Elsevier said to them to provoke that. Because of this paragraph, I felt sorry for QMUL and decided not to request a review of their decision (16/5/2014 — they have now provided the total figure to Edward Hughes, perhaps reasoning that there was safety in numbers).

However, the following paragraph from Oxford had the opposite effect on me.

Maintaining confidentiality with regard to the information requested enables the University and Elsevier to arrive at a fair and competitive negotiated and customised price. Full pricing transparency would mean that the best pricing model publishers could offer would be list price, which would be likely to result in increased costs to the University. Disclosure of pricing terms would inhibit publishers’ ability to develop flexible, tailored solutions suitable for a particular customer’s needs.

Part of my response to that was that the statement beginning “Full pricing transparency” was manifestly false: publishers could offer any model they like. Also, that “tailored solutions” phrase is a red rag to a bull: knowing about how the system works, and how little it is “tailored for a particular customer’s needs”, I cannot read it without getting annoyed. I have requested a review from Oxford ~~but not yet heard back (though they should, legally, have responded by now).~~ and they have now given me their total figure.

Incidentally, although I wrote initially to librarians, they were legally obliged to pass my requests on to their Freedom of Information offices, so the letters I got back were (mostly) from bureaucrats. So when I got refusals, this did not necessarily reflect the wishes of the librarians, who stand to gain from the prices being known.

When it comes to high prices and confidentiality contracts, Elsevier are not the only offenders, though there is some anecdotal evidence that they are the leaders, in the sense that other publishers use Elsevier as a benchmark to see what they can get away with. So why submit Freedom of Information requests for Elsevier contracts without doing the same for Springer, Wiley, Taylor-Francis, etc.?

There is no good reason. My answer to this inevitable question is that I do not regard the work of finding out about journal prices as finished. I will report on this blog if and when I or other people find out about other publishers and other universities.

There is a great deal more that could be said about journal prices and what should be done about them. However, this post has passed the 10,000-word mark, so I shall leave further discussion for a second post. Among the questions I intend to address are the following, many of which concern other big publishers just as much as they concern Elsevier.

1. Is it fair to say that Elsevier is a monopoly?

2. Does Elsevier’s pricing policy violate competition law?

3. What would be a fair system for charging for electronic access to a large collection of journals?

4. Are the current prices really all that unreasonable, given the importance to science of journal articles?

5. Is it better for university libraries to form consortia or should they negotiate individually?

6. What would be the implications for Cambridge (and perhaps other universities too) of a switch to paying list prices for individual journals?

7. Different subjects have very different publishing cultures and very different needs. Are they better off campaigning together in a single open access movement or would it be better to have a fragmented movement, with different subjects campaigning separately for their different interests?

8. What more can be done to accelerate a move towards a cheaper journal system?

]]>

A good way to test your basic knowledge of (some of) the course would be to do a short multiple-choice quiz devised by Vicky Neale. If you don’t get the right answer first time for every question, then it will give you an idea of the areas of the course that need attention.

Terence Tao has also created a number of multiple-choice quizzes, some of which are relevant to the course. They can be found on this page. The quiz on continuity expects you to know the definitions of adherent points and limit points, which I did not discuss in lectures.

The first five posts on this blog in the IA Analysis category are devoted to the questions on this course in the 2003 Tripos. The course has not changed much since then, so these questions are similar to the kind of thing that could be set now. I try to say not just what the answers are but how I thought of them, how I decided what to write out in detail and what just to assume, and so on. They may be of some use when you prepare for the exams.

A long time ago I wrote a number of informal discussions of undergraduate mathematical topics. My ideas about some of these are not always identical to what they were then, but again you may find some of them helpful, particularly the ones on analysis.

If I think of further resources, I’ll add them to the post.

Finally, I’ve very much enjoyed giving this course — thanks for being a great audience (if that’s the right word).

]]>

and

relate to things like the opposite, adjacent and hypotenuse. Using the power-series definitions, we proved several facts about trigonometric functions, such as the addition formulae, their derivatives, and the fact that they are periodic. But we didn’t quite get to the stage of proving that if and is the angle that the line from to makes with the line from to , then and . So how does one establish that? How does one even *define* the angle? In this post, I will give one possible answer to these questions.

A cheating and not wholly satisfactory method would be to define the angle to be . Then it would be trivial that and we could use facts we know to prove that . (Or could we? Wouldn’t we just get that it was ? The fact that many angles have the same and creates annoying difficulties for this approach, though ones that could in principle be circumvented.) But if we did this, how could we be confident that the notion of angle we had just defined coincided with what we think angle should be? The problem has not been fully solved.

Another approach might be to define trigonometric functions geometrically, prove that they have the basic properties that we established using the power series definitions, and prove that these properties characterize the trigonometric functions (meaning that any two functions and that have the properties must be and ). However, this still requires us to make sense of the notion of angle somehow, and we might also feel slightly worried about whether the geometric arguments we used to justify the addition formulae and the like were truly rigorous. (I’m not saying it can’t be done satisfactorily — just that I don’t immediately see a good way of doing it, and I have a different approach to present.)

How are radians defined? You take a line L starting at the origin, and it hits the unit circle at some point P. Then the angle that line makes with the horizontal (or rather, the horizontal heading out to the right) is defined to be the length of the circular arc that goes anticlockwise round the unit circle from to P. (This defines a number between 0 and , but we can worry about numbers outside this range later.)

There is nothing wrong with this definition, except that it requires us to make rigorous sense of the length of a circular arc. How are we to do this?

For simplicity, let’s assume that our point P is and that both and are positive. So P is in the top right quadrant of the unit circle. How can we define and then calculate the length of the arc from to , or equivalently from to ?

One non-rigorous but informative way of thinking about this is that for each between and , we should take an interval , work out the length of the bit of the circle vertically above this interval, and sum up all those lengths. The bit of the circle in question is a straight line (since is infinitesimally small) and by similar triangles its length is .

How did I write that down? Well, the big triangle I was thinking of was one with vertices , and the point on the circle directly above , which is , by Pythagoras’s theorem. The little triangle has one side of length , which corresponds to the side in the big triangle of length . So the hypotenuse of the little triangle is , as I claimed.

Adding all these little lengths up, we get , so it remains to evaluate this integral.

This is of course a very standard integral, usually solved by substituting or for . If you do that, you find that the length works out as , which is just what we hoped. However, we haven’t discussed integration by substitution in this course, so let us see it in a more elementary way (not that proving an appropriate form of the integration-by-substitution rule is especially hard).

Using the rules for differentiating inverses, we find that

and since , this gives us . So the integrand has as an antiderivative, and therefore, by the fundamental theorem of calculus,

So the angle between the horizontal and the line joining the origin to is (by definition) the length of the arc from to , which we have calculated to be . Therefore, .

The process I just went through, of saying “Let’s add up a whole lot of infinitesimal lengths; that says we should write down the following integral; calculating the integral gives us L, so the length is L,” is a process that one often goes through when calculating similar quantities. Why are we so confident that it is OK?

I sometimes realize with mathematical questions like this that I have been a mathematician for many years and never bothered to worry about them. It’s just sort of obvious that if a function is reasonably nice, then writing something down that’s approximately true with and turning into and writing a nice sign in front gives you a correct expression for the quantity in question. But let’s try to think a bit about how we might define length rigorously.

First, we should say what a curve is. There are various definitions, according to how much niceness one wants to assume, but let me take a basic definition: a curve is a continuous function from an interval to . (I haven’t defined continuous functions to , but it simply means that if , then and are both continuous functions from to .)

This is an example of a curious habit of mathematicians of defining objects as things that they clearly aren’t. Surely a curve is not a function — it’s a special sort of subset of the plane. In fact, shouldn’t a curve be defined as the *image* of a continuous function from to ? It’s true that that corresponds more closely to what we are thinking of when we use the word “curve”, but the definition I’ve just given turns out to be more convenient, though it’s important to add that two curves (as I’ve defined them) and are *equivalent* if there is a strictly increasing continuous bijection such that for every . In this situation, we think of and as different ways of representing the same curve.

Incidentally, if you want a reason not to identify curves with their images, then one quite good reason is the existence of objects called *space-filling curves*. These are continuous functions from intervals of reals to that fill up entire two-dimensional sets. Here’s a picture of one, lifted from Wikipedia.

It shows the first few iterations of a process that gives you a sequence of functions that converge to a continuous limit that fills up an entire square.

Going back to lengths, let’s think about how one might define them. The one thing we know how to define is the length of a line segment. (Strictly speaking, I’m not allowed to say that, since a line segment isn’t a function, but let’s understand it as a particularly simple function from an interval to a line segment in the plane.) Given that, a reasonable definition of length would seem to be to approximate a given curve by a whole lot of little line segments. That leads to the following idea for at least approximating the length of a curve . We take a dissection and add up all the little distances . Here I am defining the distance between two points in in the normal way by Pythagoras’s theorem. This gives us the expression

for the approximate length given by the dissection. We then hope that as the differences get smaller and smaller, these estimates will tend to a limit. It isn’t hard to see that if you refine a dissection, then the estimate increases (you are replacing the length of a line segment that joins two points by the length of a path that consists of line segments and joins the same two points).

Actually, that hope is not always fulfilled: sometimes the estimates tend to infinity. Indeed, for space-filling curves, or fractal-like curves such as the Koch snowflake, the estimates *do* tend to infinity. In this case, we say that they have infinite length. But if the estimates tend to a limit as the maximum of the differences tends to zero, we call that limit the length of the curve. A curve that has a finite length defined this way is called *rectifiable*.

Suppose now that we have a curve given by and that the two functions and are continuously differentiable. Then both and are bounded on , so let’s suppose that is an upper bound for and . Then by the mean value theorem,

Therefore, for every dissection, which implies that the curve is rectifiable. (Remark: I didn’t really use the continuity of the derivatives there — just their boundedness.)

We can say slightly more than this, however. The differentiability of tells us that for some . And similarly for with some . Therefore, the estimate for the length can be written

This looks very similar to the kind of thing we write down when doing Riemann integration, so let’s see whether we can find a precise connection. We are concerned with the function . If we now *do* use the continuity of and , then is continuous too, so it can be integrated. Now since and belong to the interval , and both lie between the lower and upper sums given by the dissection. That implies the same for

Since is integrable, the limit of as the largest (which is often called the *mesh* of the dissection) tends to zero is .

We have shown that the length of the curve is given by the formula

Now, finally, let’s see whether we can justify our calculation of the length of the arc of the unit circle between and . It would be nice to parametrize the circle as , but we can’t do that, since we are defining using length, so we would end up with a circular definition (in more than one sense). [Actually, we *can* do something very close to this. See the final section of the post for details.] So let’s parametrize it as follows. We’ll define on the interval and we’ll send to . Then and , so

So the length is , which is exactly the expression we wrote down earlier.

Let me make two quick remarks about that. First, you might argue that although I have shown that the final *expression* is indeed correct, I haven’t shown that the informal *argument* is (essentially) correct. But I more or less have, since what I have effectively done is calculate the lengths of the hypotenuses of the little triangles in a slightly different way. Before, I used the fact that one side was and used similar triangles. Here I’ve used the fact that one side is and another side is and used Pythagoras.

A slightly more serious objection is that for this calculation I used a general result that depended on the assumption that both and are continuously differentiable, but didn’t check that the appropriate conditions held, which they don’t. The problem is that , so , which tends to infinity as and is undefined at .

However, it is easy to get round this problem. What we do is integrate from to , in which case the argument is valid, and then let tend to zero. The integral between and is , and that tends to .

One final remark is that this length calculation explains why the usual substitution of for in an integral of the form is not a piece of unmotivated magic. It is just a way of switching from one parametrization of a circular arc (using the x-coordinate) to another (using the angle, or equivalently the distance along the circular arc) that one expects to be simpler.

Thanks to a comment of Jason Fordham below, I now realize that we can after all parametrize the circle as . However, this is not the I’m trying to calculate, so let’s call it . I’m just taking to be an ordinary real number, and I’m defining and using the power-series definition. Then the arc of the unit circle that goes from to can be defined as the curve defined on the interval by the formula . The general formula for the length of a curve then gives us

So the length of the arc satisfies .

]]>

A preliminary question about this is why it is not more or less obvious. After all, writing , we have the following facts.

- Writing , we have that .
- For each , .

If we knew that , then we would be done.

Ah, you might be thinking, how do we know that the sequence converges? But it turns out that that is not the problem: it is reasonably straightforward to show that it converges. (Roughly speaking, inside the circle of convergence the series converges at least as fast as a GP, and multiplying the th term by doesn’t stop a GP converging (as can easily be seen with the help of the ratio test). So, writing for , we have the following facts at our disposal.

Doesn’t it follow from that that ?

We are appealing here to a general principle, which is that if some functions converge to and their derivatives converge to , then is differentiable with . Is this general principle correct?

Unfortunately, it isn’t. Suppose we take some continuous functions that converge to a step function. (Roughly speaking, you make be 0 up to 0, then linear with gradient until it hits 1, then 1 from that point onwards.) And suppose we then let be the function that differentiates to and is 0 up to 0. Then the converge to the function that is 0 up to 0 and for positive . This function *almost* differentiates to the step function, but it isn’t differentiable at 0.

So we’ve somehow got to use particular facts about power series in order to prove our result — we can’t appeal to general considerations, because then we are appealing to a principle that isn’t true. (Actually, in principle some compromise might be possible, where we show that functions defined by power series have a certain property and then use nothing apart from that property from that point on. But as it happens, we shall not do this.)

We have a formula for . Why don’t we write out a formula for and see if we can tell what happens when ?

That is certainly a sensible first thing to try, so let’s see what happens.

What can we do with that? Perhaps we’d better apply the binomial theorem. Then we find that the right-hand side is equal to

Part of the above expression gives us what we want, namely . So we’re left wanting to prove that

tends to 0 as .

Unfortunately, as gets big, some of those binomial coefficients get pretty big too. Indeed, when is bigger than , the growth in the binomial coefficients seems to outstrip the shrinking of the powers of . What can we do?

Fortunately, there is a better (for our purposes at least) way of writing . We just expanded out using the binomial theorem. But we could instead have used the expansion

Applying that with and , we get

Just before we continue, note that this gives us an alternative, and in my view nicer, way to see that the derivative of is , since if you divide the right-hand side by and let then each of the terms tends to .

Anyhow, if we use this trick, then works out to be

Now let’s subtract the thing we want this to tend to, which is . (This is not valid unless we know that this series converges. So at some stage we will need to prove that.) If we think of as a sum of copies of , then we can write the difference as

which equals

Now is another example of the expansion we had above. That is, we can write it as

We haven’t yet mentioned the radius of convergence of the original power series, but let’s do so now. Suppose it is , that is such that , and that we have chosen small enough that . Then the modulus of the expression above is at most .

It follows that

Since , this is equal to .

So this will tend to zero as as long as we can prove that the sum converges.

Let’s prove a lemma to deal with that last point. It says that if is smaller than the radius of convergence of the power series , then the power series converges.

The proof is very similar to an argument we have seen already. Let be the radius of convergence, and pick with . Then the power series converges, so the terms are bounded above, by , say. Then .

But the series converges, by the ratio test. Therefore, by the comparison test, the series converges.

This shows also that if then the power series converges (since we have just proved that it converges absolutely). So if we differentiate a power series term by term, we get a new power series that has the same radius of convergence, something we needed earlier.

If we apply this lemma a second time, we get that the power series converges, and dividing by 2 that gives us what we wanted above, namely that converges.

An obvious way of applying the result is to take some of your favourite power series and differentiate them term by term. This illustrates the very important general point that if you can obtain something in two different ways, then you usually end up proving something interesting.

So let’s take the function , which we have shown converges everywhere. Then we can obtain the derivative either by differentiating the function itself or by differentiating the power series term by term. That tells us that

, which simplifies to , which in turn simplifies to , which equals .

Earlier we proved this result by writing as and proving that . I still prefer that proof, but you are at liberty to disagree.

As another example, let us consider the power series . When this equals , by the formula for summing a GP. We can now differentiate the power series term by term, and we can also differentiate the function . Doing so tells us the interesting fact that

We can see that in another way as well. By our result on multiplying power series, the product of with itself is the power series , where is the convolution of the constant sequence with itself. That is, with every and equal to 1, which gives us . (This agrees with the previous answer, since is the same as .)

In the proof above, we used the identity

with and , and then we used it again to calculate what happened when we subtracted . Can we get those calculations out of the way in advance? That is, can we begin by finding a nice formula for ?

We obviously can, by subtracting from the right-hand side and simplifying, much as we did in the proof above (with and ). However, we can do things a bit more slickly as follows. Start with the identity

Differentiating both sides with respect to , we get

If we now take for and for , we deduce that is equal to

In particular, if and are both at most , then , which is the main fact we needed in the proof.

Armed with this fact, we could argue as follows. We want to show that

is . By the inequality we have just proved, if and are at most , then the modulus of this expression is at most

and an earlier lemma told us that this converges within the circle of convergence. So the quantity we want to be is in fact bounded above by a multiple of . (Sometimes people use the notation for this. The means “bounded above in modulus by a constant multiple of the modulus of”.)

The proof in this post has relied heavily on the idea, which appeared to come from nowhere, of writing not in the obvious way, which is

but in a “clever” way, namely

Is this something one just has to remember, or can it be regarded as the natural thing to do?

I chose the words “can it be regarded as” quite carefully, since I want to argue that it is the natural thing to do, but when I was preparing this lecture, I didn’t find it the natural thing to do, as I shall now explain. I came to this result with the following background. Many years ago, I lectured a IB course called Further Analysis, which was a sort of combination of the current courses Metric and Topological Spaces and Complex Analysis, all packed into 16 lectures. (Amazingly, it worked quite well, though it was a challenge to get through all the material.) As a result of lecturing that, I learnt a proof that power series can be differentiated term by term inside their circle of convergence, but the proof uses a number of results from complex analysis. I then believed what some people say, which is that the complex analysis proof of this result is a very good advertisement for complex analysis, since a direct proof is horrible. And then at some point I was chatting to Imre Leader about the reorganization of various courses, and he told me that it was a myth that proving the result directly was hard. It wasn’t trivial, he said, but it was basically fine. In fact, it may even be thanks to him that the result is in the course.

Until a few days ago, I didn’t bother to check for myself that the proof wasn’t too bad — I just believed what he said. And then with the lecture coming up, I decided that the time had finally come to check it: something that I assumed would be a reasonably simple exercise. I duly did the obvious thing, including expanding using the binomial theorem, and got stuck.

I would like to be able to say that I then thought hard about why I was stuck, and after a while thought of the idea of expanding using the expansion of . But actually that is not what happened. What happened was that I thought, “Damn, I’m going to have to look up the proof.” I found a few proofs online that looked dauntingly complicated and I couldn’t face reading them properly, apart from one that was quite nice and that for a while I thought I would use. But one thing all the proofs had in common was the use of that expansion, so that was how the idea occurred to me.

So what follows is a rational reconstruction of what I *wish* had been my thought processes, rather than of what actually went on in my mind.

Let’s go back to the question of how to differentiate . I commented above that one could do it using the expansion, and said that I even preferred that approach. But how might one think of doing it that way? There is a very simple answer to that, which is to use one of the alternative definitions of differentiability, namely that is differentiable at with derivative if as . This is simply replacing by , but that is nice because it has the effect of making the expression more symmetrical. (One might argue that since we are talking about differentiability *at* , the variables and are playing different roles, so there is not much motivation for symmetry. And indeed, that is why calling one point and the other is often a good idea. But symmetry is … well … sort of good to have even when not terribly strongly motivated.)

If we use this definition, then the derivative of is the limit as of , and now there is no temptation to use the binomial expansion (we would first have to write as and the whole thing would be disgusting) and the absolutely obvious thing to do is to observe that we have a nice formula for the ratio in question, namely

which obviously tends to as .

In fact, the whole proof is arguably nicer if one uses and rather than and .

Thus, the “clever” expansion is the natural one to do with the symmetric definition of differentiation, whereas the binomial expansion is the natural one to do with the definition. So in the presentation above, I have slightly obscured the origins of the argument by applying the clever expansion to the definition.

Another way of seeing that it is natural is to think about how we prove the statement that a product of limits is the limit of the products. The essence of this is to show that if is close to and is close to , then is close to . This we do by arguing that is close to , and that is close to .

Suppose we apply a similar technique to try to show that is close to . How might we represent their difference? A natural way of doing it would be to convert all the s into s in a sequence of steps. That is, we would argue that is close to , which is close to , and so on.

But the difference between and is , so if we adopt this approach, the we will end up showing precisely that

]]>

The problem is to show that if is an infinite sequence of s, then for every there exist and such that has modulus at least . This result is straightforward to prove by an exhaustive search when . One thing that the Polymath project did was to discover several sequences of length 1124 such that no sum has modulus greater than 2, and despite some effort nobody managed to find a longer one. That was enough to convince me that 1124 was the correct bound.

However, the new result shows the danger of this kind of empirical evidence. The authors used state of the art SAT solvers to find a sequence of length 1160 with no sum having modulus greater than 2, and also showed that this bound is best possible. Of this second statement, they write the following: “The negative witness, that is, the DRUP unsatisfiability certificate, is probably one of longest proofs of a non-trivial mathematical result ever produced. Its gigantic size is comparable, for example, with the size of the whole Wikipedia, so one may have doubts about to which degree this can be accepted as a proof of a mathematical statement.”

I personally am relaxed about huge computer proofs like this. It is conceivable that the authors made a mistake somewhere, but that is true of conventional proofs as well. The paper is by Boris Konev and Alexei Lisitsa and appears here.

]]>

I have always found this situation annoying, because a part of me said that the result ought to be a straightforward generalization of the mean value theorem, in the following sense. The mean value theorem applied to the interval tells us that there exists such that , and therefore that . Writing for some we obtain the statement . This is the case of Taylor’s theorem. So can’t we find some kind of “polynomial mean value theorem” that will do the same job for approximating by polynomials of higher degree?

Now that I’ve been forced to lecture this result again (for the second time actually — the first was in Princeton about twelve years ago, when I just suffered and memorized the Cauchy mean value theorem approach), I have made a proper effort to explore this question, and have realized that the answer is yes. I’m sure there must be textbooks that do it this way, but the ones I’ve looked at all use the Cauchy mean value theorem. I don’t understand why, since it seems to me that the way of proving the result that I’m about to present makes the whole argument completely transparent. I’m actually looking forward to lecturing it (as I add this sentence to the post, the lecture is about half an hour in the future), since the demands on my memory are going to be close to zero.

We know that we want a statement that will involve the first derivatives of at , the th derivative at some point in the interval , and the value of at . The idea with Rolle’s theorem is to make a whole lot of stuff zero, and then with the mean value theorem we take a more general function and subtract a linear part to obtain a function to which Rolle’s theorem applies. So let’s try a similar trick here: we’ll make as much as we can equal to zero. In fact, I’ll go even further and make the values of and zero.

So here’s what I’ll assume: that and also that . That’s as much as I can reasonably set to be zero. And what should be my conclusion? That there is some such that . Note that if we set then we are assuming that and trying to find such that , so this result really does generalize Rolle’s theorem. (I’m also assuming that is times differentiable on an open interval that contains . This is a slightly stronger condition than necessary, but it will hold in the situations where we want to use Taylor’s theorem.)

The proof of this generalization is almost trivial, given Rolle’s theorem itself. Since , there exists such that . But as well, so by Rolle’s theorem, this time applied to , we find such that . Continuing like this, we eventually find such that . So we can set and we are done.

For what it’s worth, I didn’t use the fact that , but just that .

Now let’s take an arbitrary function that is -times differentiable on an open interval containing . To prove the mean value theorem, we subtracted a linear function so as to obtain a function that satisfied the hypotheses of Rolle’s theorem. Here, the obvious thing to do is to subtract a polynomial of degree to obtain a function that satisfies the hypotheses of our higher-order Rolle theorem.

The properties we need to have are that , , and so on all the way up to , and finally . It turns out that we can more or less write down such a polynomial, once we have observed that the polynomial has the convenient property that except when when it is 1. This allows us to build a polynomial that has whatever derivatives we want at . So let’s do that. Define a polynomial by

Then for . A more explicit formula for is

Now doesn’t necessarily equal , so we need to add a multiple of to correct for this. (Doing that won’t affect the derivatives we’ve got at .) So we want our polynomial to be of the form

and we want . So we want to equal , which gives us . That is,

A quick check: if we substitute in for we get , which does indeed equal .

For the moment, we can forget the *formula* for . All that matters is its *properties*, which, just to remind you, are these.

- is a polynomial of degree .
- for .
- .

The second and third properties tell us that if we set , then for and . Those are the conditions needed for our higher-order Rolle theorem. Therefore, there exists such that , which implies that .

Let us just highlight what we have proved here.

**Theorem.** *Let be continuous on the interval and -times differentiable on an open interval that contains . Let be the unique polynomial of degree such that for and . Then there exists such that .*

Note that since is a polynomial of degree , the function is constant. In the case , the constant is , the gradient of the line joining to , and the theorem is just the mean value theorem.

Actually, the result we have just proved *is* Taylor’s theorem! To see that, all we have to do is use the explicit formula for and a tiny bit of rearrangement. To begin with, let us use the formula

Note that for every , so the theorem tells us that there exists such that

Rearranging, that gives us that

Finally, using the formula for , which was

and setting , we can rewrite our conclusion as

which is Taylor’s theorem with the Lagrange form of the remainder.

I think it is quite rare for a proof of Taylor’s theorem to be asked for in the exams. However, pretty well every year there is a question that requires you to understand the *statement* of Taylor’s theorem. (I am writing this post without any knowledge of what will be in this year’s exam, and the examiners will be entirely within their rights to ask for anything that’s on the syllabus. So I certainly don’t recommend not learning the proof of Taylor’s theorem.)

You may at school have seen the following style of reasoning. Suppose we want to calculate the power series of . Then we write

Taking we deduce that . Differentiating we get that

and taking we deduce that . In general, differentiating times and setting we deduce that if is even, if mod 4, and if mod 4. Therefore,

There are at least two reasons that this argument is not rigorous. (I’ll assume that we have defined trigonometric functions and proved rigorously that their derivatives are what we think they are. Actually, I plan to define them using power series later in the course, in which case they have their power series by definition, but it is possible to define them in other ways — e.g. using the differential equation — so this discussion is not a complete waste of time.) One is that we assumed that could be expanded as a power series. That is, at best what we have just shown is that *if* can be expanded as a power series, then the power series must be that one.

A second reason is that we just assumed that the power series could be differentiated term by term. That holds under certain circumstances, as we shall see later in the course, and those circumstances hold for this particular power series, but until we’ve proved that is given by this particular power series we don’t know that the conditions hold.

Taylor’s theorem helps us to clear up these difficulties. Applying it with replaced by 0 and replaced by , we find that

for some . All the terms apart from the last one are just the expected terms in the power series for , so we get that is equal to the partial sum of the power series up to the term in plus a remainder term.

The remainder term is , so its magnitude is at most . It is not hard to prove that tends to zero as . (One way to do this is to observe that the ratio of successive terms has magnitude at most 1/2 once is bigger than .) Therefore, the power series converges for every , and converges to .

The basic technique here is as follows.

(i) Write down what Taylor’s theorem gives you for your function.

(ii) Prove that for each (in the range where you want to prove that the power series converges) the remainder term tends to zero as tends to infinity.

The material in this section is not on the course, but is still worth thinking about. It begins with the definition of a derivative, which, as I said in lectures, can be expressed as follows. A function is differentiable at with derivative if

We can think of as the best linear approximation to for small .

Once we’ve said that, it becomes natural to ask for the best quadratic approximation, and in general for the best approximation by a polynomial of degree .

Let’s think about the quadratic case. In the light of Taylor’s theorem it is natural to expect that

in which case would indeed be the best quadratic approximation to for small .

What Taylor’s theorem as stated above gives us is

for some . If we know that is continuous at , then as , so we can write , where . But then , as we wanted, since .

However, this result does not need the continuity assumption, so let me briefly prove it. To keep the expressions simple I will prove only the quadratic case, but the general case is pretty well exactly the same.

I’ll do the same trick as usual, by which I mean I’ll first prove it when various things are zero and then I’ll deduce the general case. So let’s suppose that . We want to prove now that .

Since , we have that

Therefore, for every we can find such that for every with .

This gives us several inequalities, one of which is that for every such that . If we now set to be , then we have that for every . So by the mean value theorem, for every such , which implies that .

If we run a similar argument using the fact that we get that . And we can do similar arguments with as well, and the grand conclusion is that whenever we have .

What we have shown is that for every there exists such that whenever , which is exactly the statement that as , which in turn is exactly the statement that .

That does the proof when . Now let’s take a general and define a function by

Then , so , from which it follows that

which after rearranging gives us the statement we wanted:

As I said above, this argument generalizes straightforwardly and gives us Taylor’s theorem with what is known as *Peano’s form of the remainder*, which is the following statement.

For that we need to exist but we do not need to exist anywhere else, so we certainly don’t need any continuity assumptions on .

This version of Taylor’s theorem is not as useful as versions with an explicit formula for the remainder term, as you will see if you try to use it to prove that can be expanded as a power series: the information that the remainder term is is, for fixed , of no use whatever. But the information that it is gives us an expression that we can prove tends to zero.

However, one amusing (but not, as far as I know, useful) thing it gives us is a direct formula for the second derivative. By direct I mean that we do not go via the first derivative. Let us take the quadratic result and apply it to both and . We get

and

From this it follows that

Dividing through by we get that

as .

I’m not claiming the converse, which would say that if this limit exists, then is twice differentiable at . In fact, doesn’t even have to be once differentiable at . Consider, for example, the following function. For every integer (either positive or negative) and every in the interval we set equal to . We also set , and we take when . (That is, for negative we define so as to make it an odd function.)

Then for every , so for every , and in particular it tends to 0 as . However, is not differentiable at 0. To see this, note that when we have , whereas when is close to we have close to . Therefore, the ratio does not converge as , which tells us that is not differentiable at 0.

If you want an example that is continuous everywhere, then take . This again has the property that for every , and it is not differentiable at 0.

Even if we assume that is differentiable, we can’t get a proper converse. For example, the condition

does not imply that exists and equals 0. For a counterexample, take a function such as (and 0 at 0). Then must lie between and therefore certainly be . But the oscillations near zero are so fast that is unbounded near zero, so doesn’t exist at 0.

]]>

Suppose I were to ask you to memorize the sequence 5432187654321. Would you have to learn a string of 13 symbols? No, because after studying the sequence you would see that it is just counting down from 5 and then counting down from 8. What you want is for your memory of a proof to be like that too: you just keep doing the obvious thing except that from time to time the next step isn’t obvious, so you need to remember it. Even then, the better you can understand why the non-obvious step was in fact sensible, the easier it will be to memorize it, and as you get more experienced you may find that steps that previously seemed clever and nonobvious start to seem like the natural thing to do.

For some reason, Analysis I contains a number of proofs that experienced mathematicians find easy but many beginners find very hard. I want to try in this post to explain why the experienced mathematicians are right: in a rather precise sense many of these proofs *really are easy*, in the sense that if you just repeatedly do the obvious thing you will solve them. Others are mostly like that, with perhaps one smallish idea needed when the obvious steps run out. And even the hardest ones have easy parts to them.

I feel so strongly about this that a few years ago I teamed up with a colleague of mine, Mohan Ganesalingam, to write a computer program to solve easy problems. And after a lot of effort, we produced one that can solve several (but not yet all — there are still difficulties to sort out) problems of the kind I am talking about: easy for the experienced mathematician, but hard for the novice. Now you have some huge advantages over a computer. For example, you understand the English language. Also, you can be presented with a vague instruction such as “Do any obvious simplifications to the expression and then see whether it reminds you of anything,” and you will be able to follow it. (In principle, so could the program, but only if we spent a long time agonizing about what exactly constitutes an “obvious” simplification, what kind of similarity should be sufficient for one mathematical expression to trigger the program to call up another, and so on.) So if a mere computer can solve these problems, you should definitely be able to solve them.

What I plan to do in this post is basically explain how the program would go about proving some of the theorems we’ve proved in the course. To explain *exactly* how it works would be complicated. However, because you are humans, there are lots of technical details that I don’t need to worry about, and what remains of the algorithm when you ignore those details is really pretty simple.

The rough idea is that you should equip yourself with a small set of “moves” and simply apply these moves when the opportunity arises. That is an oversimplification, since sometimes one can do the moves in “silly” ways, but merely being consciously aware of the moves is very useful. (Incidentally, the notion of “silliness” is hard to define formally but is something that humans find easy to recognise when they see examples of it. So that’s another example of the kind of advantage you have over the computer.)

I’m going to describe a way of keeping track of where you have got to in your discovery of a proof. It’s not something I suggest you do for the rest of your mathematical lives. Rather, it is something that you might like to consider doing if you find it hard to come up with typical Analysis I proofs. If you use this technique a few times, then it should get easier, and after a while you will find that you don’t need to use the technique any more.

The technique is simply to record what statements you are likely to want to use, and what statement you are trying to prove. Both of these can change during the course of your proof discovery, as we shall see.

I think the easiest way to explain this and the moves is to begin by giving an example of the whole process in action. Then I’ll talk about the moves in a more abstract way. Let’s take as an example the proof that if a Cauchy sequence has a convergent subsequence then the sequence itself is convergent.

To begin with, we have nothing we obviously need to use, and a statement that we want to prove. That statement is the following.

—————————————————-

Every Cauchy sequence with a convergent subsequence converges

Let us begin by writing that very slightly more formally, to bring out the fact that it starts with .

—————————————————-

is Cauchy and has a convergent subsequence

converges

The next step is to apply the “let” move, which I’ve talked about several times in lectures. If you ever have a statement to prove of the form “For every such that holds, also holds,” then you can just automatically write “Let be such that holds,” and change your target to that of establishing that holds.

In our case, we write, “Let be a Cauchy sequence that has a convergent subsequence,” and modify our target to that of proving that converges. So now we represent where we’ve got to as follows.

is a Cauchy sequence

has a convergent subsequence

——————————————-

converges

Maybe the purpose of those strange horizontal lines is becoming clearer at this point. I am listing statements that we can *assume* above the line and ones that we are trying to *prove* below the line.

At this point it seems natural to give a name to the convergent subsequence that we are given. Let us call it . This again is just one instance of a very general move: if you are told you’ve got something, then give it a name. This sequence has two properties: it is a subsequence of and it converges. I’ll list those two properties separately.

is a Cauchy sequence

is a subsequence of

converges

——————————————-

converges

Having done that, I think I’ll remove the second hypothesis, since the fact that is a subsequence of is implicit in the notation.

is a Cauchy sequence

converges

——————————————-

converges

The second hypothesis here is again telling us we’ve got something: a limit of the subsequence. So let’s apply the naming move again, calling this limit .

is a Cauchy sequence

——————————————-

converges

That’s enough reformulation of our assumptions. It’s time to think about what we are trying to prove. To do that, we use a process called *expansion*. That means taking a definition and writing it out in more detail. It tends to be good to *avoid* expanding definitions unless you are genuinely stuck: that way you won’t miss opportunities to *use results from the course* rather than proving everything from first principles. However, here a proof from first principles is what is required. I’m going to do a partial expansion to start with: a sequence converges if there exists a real number that it converges to.

is a Cauchy sequence

——————————————-

converges to

Now our target has changed to an existential statement. How are we going to find an that the sequence converges to?

Sometimes proving existential statements is very hard, but here it is easy, since we have a candidate for the limit staring us in the face, and better still it is the only candidate around. So let us make a very reasonable guess that the sequence is going to converge to , and make proving that our new target.

is a Cauchy sequence

——————————————-

That’s nice because we’ve got rid of that existential quantifier. But what do we do next? We must continue to expand: this time the definition of . Note that if you want to be able to do this, it is absolutely vital that you *know your definitions*. Otherwise, you obviously can’t do this expansion move. And if you can’t do that, then you can kiss goodbye to any hopes you might have had of proving this kind of result.

is a Cauchy sequence

——————————————-

Now we have a target that begins with a universal quantifier, so it’s time for the “let” move again.

is a Cauchy sequence

——————————————-

Now things become slightly harder, because this time we do *not* have a candidate staring us in the face for the thing we are trying to find. (The thing we are trying to find is .) It’s not a bad idea in this situation to try to write out in vague terms what the key statements mean. One can do something like this.

Eventually all terms of are close to each other

Eventually all terms of are close to

————————————————

Eventually all terms of are close to

The rough idea of the proof should now be clear: if all terms in the subsequence are close to and all terms are close to each other, then eventually for each term we can say that it is close to a term in the subsequence, which is itself close to .

Since we are going to need to take two steps from a term in , one to the subsequence and one from the subsequence to , it seems a good idea to apply the two main hypotheses with . So let’s just go ahead and do that and see what we get.

——————————————-

Now we are once again in a position where we have been “given” something — in this case and . So let’s quietly drop the existential quantifiers and use the names and . (Purists might object to using the same names for the particular choices of and that we used when merely asserting that they exist. But this is very common practice amongst mathematicians and does not lead to confusion.)

——————————————-

How do we propose to “force” to be less than ? We are going to try to ensure, for suitable , that and . The first hypothesis tells us that we will be able to get the first condition if and are both at least , and the third hypothesis tells us that we we will be able to get the second condition if .

So our plan is going to be to choose and . For the plan to work, we shall need , , and .

We are now in a position to choose . We want our conclusion to hold when , and the tool we use works when , so it makes sense to take . If we substitute that in, we lose the existential quantifier in the target and arrive at the following.

——————————————-

Now we can apply the “let” move again, to get rid of the universal quantifier in the target statement.

——————————————-

We know we’re going to take , and that we can, since , so let’s go ahead and choose that value for in the first hypothesis. That leaves us with the following.

——————————————-

Just to make clear what I did there, it was a move called *substitution*. If you have a hypothesis of the form and a hypothesis , then you can substitute in for and get out . (One can also call this *modus ponens*: I prefer to call it substitution in this case because the condition is somehow not a very serious hypothesis, but more like a “restriction” applied on .)

Since I’ve used the hypothesis and am unlikely to need it again. I have deleted it.

Now we have to decide how to choose and how to choose . Recall that we needed and . In a human proof one just writes, “Let be such that and .” It’s a bit trickier for a computer to find it obvious that such a exists, but again that doesn’t matter to us here. I’ll use to denote the I’m choosing, and write down the conditions I’ve made sure satisfies.

——————————————-

Now we can substitute into the first hypothesis.

——————————————-

We can also substitute into the second hypothesis.

——————————————-

And now we are done by the triangle inequality.

Now that we have gone through a proof, let me list the main proof-generating moves we used.

If you are trying to prove a statement of the form “For every such that holds, also holds,” then write, “Let be such that holds,” (or words to that effect) and adjust your target to proving that holds.

If you are told that something exists, then give it a name. For example, if you are given the hypothesis is convergent, then you are told that a limit exists. So give it a name such as and change the hypothesis to .

If you are trying to prove something and you can’t find a high-level argument (by which I mean one that uses results from the course that are relevant to the statement you are trying to prove), and if what you are trying to prove involves concepts such as convergence or continuity that can be written out in low-level language (often, but not always, involving quantifiers), then rephrase what you are trying to prove in this lower-level way. That is, expand out the definition.

If you are given a hypothesis of the form , then given any object of the same type as , you are free to substitute it in for and obtain the hypothesis .

For example, in the proof above, we had the hypothesis “ is Cauchy”. In expanded form, this reads

We decided to substitute in , which is of the same type of thing as (both are positive real numbers), and yielded for us the statement

(We then applied the “naming” move to get rid of the .)

Often a hypothesis takes a slightly more general form, where *conditions* are assumed. That is, it takes the form

or still more generally

There the symbol means “and”, so this is saying that whenever you can find a that satisfies the conditions , then you can give yourself the hypothesis .

Suppose that you are trying to prove a statement of the form , and suppose you have identified an object of the same type as that you believe is going to do the job. Then you can change your target statement from to . (In words, instead of trying to show that there exists something that satisfies , you are going to try to show that satisfies .)

We did this when we moved from trying to prove that converges to *something* to trying to prove that it converges to .

This is not a complete set of useful moves. However, it is a start, and I hope it will help to back up my assertion that a large fraction of the proof steps that I take when writing out proofs in lectures are fairly automatic, and steps that you too will find straightforward if you put in the practice. I’ll try to discuss more moves in future posts.

]]>

I cannot promise to follow the amazing example of Vicky Neale, my predecessor on this course, who posted after every single lecture. However, her posts are still available online, so in some ways you are better off than the people who took Analysis I last year, since you will have her posts as well as mine. (I am making the assumption here that my posts will not contribute negatively to your understanding — I hope that proves to be correct.) Having said that, I probably won’t cover exactly the same material in each lecture as she did, so the correspondence between my lectures and her posts won’t be as good as the correspondence between her lectures and her posts. Nevertheless, I strongly recommend you look at her posts and see whether you find them helpful.

You will find this course *much* easier to understand if you are comfortable with basic logic. In particular, you should be clear about what “implies” means and should not be afraid of the quantifiers and . You may find a series of posts I wrote a couple of years ago helpful, and in particular the ones where I wrote about logic (NB, as with Vicky Neale’s posts above, they appear in reverse order). I also have a few old posts that are directly relevant to the Analysis I course (since they are old posts you may have to click on “older entries” a couple of times to reach them), but they are detailed discussions of Tripos questions rather than accompaniments to lectures. You may find them useful in the summer, and you may even be curious to have a quick look at them straight away, but for now your job is to learn mathematics rather than trying to get good at one particular style of exam, so I would not recommend devoting much time to them yet.

For the rest of this post, I want to describe briefly the prerequisites for this course. One of the messages I want to get across is that in a sense the entire course is built on one axiom, namely the least upper bound axiom for the real numbers. I don’t really mean that, but it would be correct to say that it is built on one *new* axiom, together with other properties of the real numbers that you are so familiar with that you hardly give them a second’s thought.

If I want to say that more precisely, then I will say that the course is built on the following assumption: there is, up to isomorphism, exactly one complete ordered field. If the phrase “complete ordered field” is unfamiliar to you, it doesn’t matter, though I will try to explain what it means in a moment. Roughly speaking, this assumption is saying that there is exactly one mathematical structure that has all the arithmetical and order properties that you would expect of the real numbers, and also satisfies the least upper bound axiom. And that structure is the one we call the real numbers.

And now let me make *that* more precise.

A field is a set with two binary operations and that behave in the same nice ways that addition and multiplication behave in the real numbers. That is, they have the following properties.

(i) is commutative and associative and has an identity element. Every element of has an inverse under .

(ii) is commutative and associative and has an identity element. Every element of other than the identity of has an inverse under .

(iii) is distributive over . That is, for any three elements of we have .

If we define an algebraic structure with some notions of addition and multiplication, then to say that it is a field is to say that all the usual rules we use to do algebraic manipulations are valid. It can be amusing and instructive to prove facts such as that assuming nothing more than the field axioms, but in this course I shall take these slightly less elementary facts as read as well. But I assure you that they *do* follow from the field axioms.

Some examples of fields that you have already met are , , and . (That last one is the field that consists of integers mod for a prime , with addition and multiplication mod . The only axiom that is not easy to verify is the existence of multiplicative inverses for non-zero elements of the field, which follows from the fact that if and are coprime then there are integers and such that .)

This question splits into two. First we need to know what an ordering is, and then we need to know how the ordering relates to the algebraic operations. Let me take these two in turn.

A *totally ordered set* is a set together with a relation that has the following properties.

- is
*transitive*: that is, if and , then . - satisfies the
*law of trichotomy*: that is, for any exactly one of the statements , , holds.

Note that the trichotomy law implies that is *antisymmetric*: that is, if then it cannot also be the case that .

In the above situation, we say that is a *total ordering* on . Given a total ordering we can make some obvious further definitions. For instance, we can define by saying that if and only if . (Note that is also a total ordering on .) Also, we can define by saying that if and only if either or , and similarly we can define .

Here’s an example of a totally ordered set that is not just a subset of the real numbers. We take to be the set of all polynomials with real coefficients, and if and are two polynomials, we say that if there exists a real number such that for every . (That is, if is “eventually bigger than “.) It is easy to check that this relation is transitive, and an instructive exercise to prove that the trichotomy law holds. (It is also not too hard, so I think it is better not to give the proof here.)

How should we define an ordered field? A first guess might be to say that it is a field with a total ordering on it. But a moment’s thought shows that that is a ridiculous definition, since we could define a “stupid” total ordering that had nothing to do with any natural ordering we might want to put on the field. For example, we could define an ordering on the rationals as follows: given two rational numbers and , written in their lowest terms with and positive, say that if either or and . That is certainly a total ordering on the rationals, but it is a rather strange one. For example, with this ordering we have and also .

What has gone wrong? The answer is that it is not interesting to have two structures on a set (in this case, the algebraic structure and the order structure) unless those structures *interact*. In fact, we have already seen this in the field axioms themselves: we have addition and multiplication, and it is absolutely crucial to have some kind of relationship between them. The relation we have is the distributivity law. Without that, we would allow “stupid” examples of pairs of binary operations that had nothing to do with each other.

An *ordered field* is a field together with a total ordering that satisfies the following properties.

- For every , if , then .
- For every , if and , then .

Basically what these properties are saying is that the usual rules we use when manipulating inequalities, such as adding the same thing to both sides, apply.

In practice, we tend to use a rather larger set of rules. For example, if we know that , we will feel free to deduce that . And nobody will bat an eyelid if you have a real number and state without proof that . Both these facts can be deduced fairly easily from the properties of ordered fields, and again it is quite a good exercise to do this if you haven’t already. However, in this course we shall take the following attitude. There are the axioms for an ordered field. There are also some simple deductions from these axioms that provide us with some further rules for manipulating equations and inequalities. All of these we will treat in the same way: we just use them without comment.

Before I get on to the most important axiom, and the one that very definitely will *not* be used without comment, I want to discuss a distinction that it is important to understand: the distinction between the abstract and the concrete approaches to mathematics. The abstract approach is to concentrate on the *properties* that mathematical structures have. We are given a bunch of properties and we see what we can deduce from them, and we do that quite independently of whether any object with those properties exists. Of course, we do like to check that the properties are consistent, which we do by finding an object that satisfies them, but once we have carried out that check we go back to concentrating on the properties themselves.

The concrete approach to mathematics is much more focused on the objects themselves. We take an object, such as the set of all prime numbers, and try to describe it, prove results about it, and so on.

The boundary between the two approaches is extremely fuzzy, because we often like to convert the concrete approach into a more abstract one. For example, consider the function . This can be defined concretely as the function given by the formula . (That’s just a concise way of writing .) And a similar definition can be given for . But somewhere along the line we will want to prove basic facts such as that , or that , or that . And once we’ve proved a few of those facts, we find that we no longer want to use the formula, because everything we need to know follows from those basic facts. And that is because with just a couple more facts of the above kind, we find that we have *characterized* the trigonometric functions: that is, we have written down properties that are satisfied by and and *by no other pair of functions*. When this kind of thing happens, our approach has shifted from the concrete (we are given the formulae and want to prove things about the resulting functions) to the abstract (we are given some properties and want to use them to deduce other properties).

Something very similar happens with the real numbers. Up to now (at least until taking Numbers and Sets), you will have been used to thinking of the real numbers as infinite decimals. In other words, the real number system is just out there, an object that you look at and prove things about. But at university level one takes the abstract approach. We start with a set of properties (the properties of ordered fields, together with the least upper bound axiom) and use those to deduce everything else. It’s important to understand that this is what is going on, or else you will be confused when your lecturers spend time proving things that appear to be completely obvious, such as that the sequence converges to 0. Isn’t that obvious? Well, yes it is if you think of a real number as one of those things with a decimal expansion. But it takes quite a lot of work to prove, using just the properties of a complete ordered field, that every real number has a decimal expansion, and rather than rely on all that work it is much easier to prove directly that converges to 0.

Let be a set of real numbers. A real number is an *upper bound* for if for every . For example, if is the open interval , then is an upper bound for .

A real number is *the least upper bound* of if it has the following two properties.

- is an upper bound for .
- If , then is not an upper bound for .

Another way of writing these two properties is as follows. I’ll use quantifiers.

- .
- .

In words, everything in is less than or equal to , and for any there is some that is bigger than .

As an example, is the least upper bound of the open interval . Why? Because if then , and if then we can find such that . (How do we do this? Well, if then take and if then take .)

The least upper bound property is the following statement: every non-empty subset of the reals that has an upper bound has a least upper bound.

But since we are thinking abstractly, we will not think of this as a *property* (of the previously given real numbers) but more as an *axiom*. To do so we can state it as follows.

Let be an ordered field. We say that has the *least upper bound property* if every non-empty subset of that has an upper bound has a least upper bound.

For reasons that will become clear only after the course has started, we say that an ordered field with the least upper bound property is *complete*. There are then two very important theorems that we shall assume.

**Theorem 1.** *There exists a complete ordered field.*

**Theorem 2.** *There is only one complete ordered field, in the sense that any two complete ordered fields are isomorphic.*

I don’t propose to give proofs of either of these results, but let me at least give some indication, for those who are interested, of how they can be proved. The proofs are not required knowledge for the course, but it’s not a bad idea to have some inkling of how they go.

One answer to this is that *the reals are a complete ordered field*! That is, if you take the good old infinite decimals that you are used to, and you say very carefully what it means to add or multiply two of them together, and you order them in the obvious way, then you can actually prove rigorously that you have a complete ordered field. It’s not very pretty (partly because of the fact that point nine recurring equals 1) but it can be done.

Here’s how one can prove the least upper bound property. For convenience let us take a non-empty set that consists of positive numbers only. Assuming that is bounded above, we would like to find a least upper bound. We can do this as follows. First, find the smallest integer that is an upper bound for . (We know that there must be an integer — just take any integer that is bigger than the upper bound we are given for . If we are defining the reals as infinite decimals, then it is genuinely obvious that such an integer exists — you just chop off everything beyond the decimal point and add 1.) Call this integer . Next, we find the smallest multiple of that is an upper bound for . This will be one of the numbers . Then you take the smallest multiple of that is an upper bound for , and so on. This gives you a sequence that might be something like . If you look at an individual digit of the numbers in this sequence, such as the fifth after the decimal point, it will eventually stabilize, and if you take these stabilized digits as the digits of a certain number, then that number will be an upper bound for and no smaller number will be. (Both these statements need to be checked, but both are reasonably straightforward.)

A more elegant way to prove the existence of a complete ordered field is to use objects called *Dedekind cuts*. A Dedekind cut is a partition of the rational numbers into two non-empty subsets and such that every element of is less than every element of , and such that does not have a minimal element.

To see why this might be a reasonably sensible definition, consider the sets and , where consists of all rationals such that either or , and consists of all positive rationals such that . This is the Dedekind cut that corresponds to our ordinary conception of the number .

The condition that should not have a minimal element is to make sure that we don’t have two different Dedekind cuts representing each rational number. (If the rational number is , the partition we are ruling out is and . We just allow the partition and .)

If and are two Dedekind cuts, we can define their sum to be , where is defined to be the set of all numbers such that and , and similarly for . It’s a bit harder to define products — you may like to try it. It’s not so hard to define a sensible total ordering on the set of all Dedekind cuts. And then there’s a lot of checking needed to prove that what results is a complete ordered field. (I may as well admit at this point that I’ve never bothered to check this for myself, or to read a proof in a book. I’m happy to know that it can be done, just as I’m happy to fly in an aeroplane without checking that the lift will be enough to keep me in the sky.)

Here’s one answer. You just go back to your notes in Numbers and Sets and look at the proof that every real number has a decimal expansion. Obviously if you define real numbers to be things with decimal expansions, then this is saying nothing at all, but that’s not what Professor Leader did. He deduced the existence of decimal expansions from the properties of complete ordered fields. So effectively he proved the following result: *every element of a complete ordered field has a decimal expansion*. We can say slightly more: it has a decimal expansion that does not end with an infinite sequence of 9s. Oh, and two different elements have different decimal expansions. So now if you want an isomorphism between two complete ordered fields, you just match up an element of one with the element of the other that has the same decimal expansion.

Let me very briefly sketch a neater approach. You first match up 1 with 1. (That is, you match up the multiplicative identity with the multiplicative identity.) Then you match up 1+1 with 1+1, and so on, until you have “the positive integers” inside your two complete ordered fields matched together. Then you match up 0 with 0 and the additive inverses of the positive integers with the additive inverses of the positive integers. Then you match up the reciprocals of the positive integers (or rather, their multiplicative inverses) with the reciprocals of the positive integers, and finally all the rationals with all the rationals. What I’m saying here is that in any complete ordered field you can make sense in only one reasonable way of the fraction when and are integers with , and you send each in one complete ordered field to its counterpart in the other.

Now let’s take *any* element of a complete ordered field. We can associate with the set of all “rationals” less than and map that set over to the other complete ordered field, using our correspondence between rationals. That gives us a set in the other complete ordered field. The least upper bound of is then the element that corresponds to .

As ever, there is work needed if you want to turn the above idea into a complete proof: if the map you’ve defined is , then you need to check things like that or that if a set has least upper bound , then has least upper bound . But all that can be done.

If you found what I’ve just written a bit intimidating, let me remind you that all you need to take away from it is that everything in this course will be deduced from the familiar algebraic and order properties of the reals, together with the least upper bound property. Since the algebraic and order properties should be very familiar to you, that means that the main things you need to learn are the definition of a least upper bound and the statement of the least upper bound property. The details matter, so a vague idea is not enough, but even so it’s not very much to learn.

]]>

When I got to my office, those other things I’ve been thinking about (the project with Mohan Ganesalingam on theorem proving) commanded my attention and the post didn’t get written. And then in the evening, with impeccable timing, Pavel Pudlak sent me an email with an observation that shows that one of the statements that I was hoping was false is in fact true: every subset of can be Ramsey lifted to a very simple subset of a not much larger set. (If you have forgotten these definitions, or never read them in the first place, I’ll recap them in a moment.)

How much of a disaster is this? Well, it’s *never* a disaster to learn that a statement you wanted to go one way in fact goes the other way. It may be disappointing, but it’s much better to know the truth than to waste time chasing a fantasy. Also, there can be far more to it than that. The effect of discovering that your hopes are dashed is often that you readjust your hopes. If you had a subgoal that you now realize is unachievable, but you still believe that the main goal might be achievable, then your options have been narrowed down in a potentially useful way.

Is that the case here? I’ll offer a few preliminary thoughts on that question and see whether they lead to an interesting discussion. If they don’t, that’s fine — my general attitude is that I’m happy to think about all this on my own, but that I’d be even happier to discuss it with other people. The subtitle of this post is supposed to reflect the fact that I have gained something from making my ideas public, in that Pavel’s observation, though simple enough to understand, is one that I might have taken a long, or even infinite, time to make if I had worked entirely privately. So he has potentially saved me a lot of time, and that is one of the main points of mathematics done in the open.

The basic idea I was pursuing was that perhaps we can find a property that distinguishes between subsets of (or Boolean functions) of low Boolean complexity and general subsets/functions of the following kind: a low-complexity set/function can be “lifted” from to a larger, but not too much larger, structure inside which it sits more simply. This basic idea was inspired by Martin’s proof that Borel sets are determined. After considering various possible ways of making the above ideas precise, and rejecting some of them when I realized that they couldn’t work, I arrived at the following set of definitions.

An *-dimensional complexity structure with alphabet* is a subset . (Sometimes it is convenient to define it as a subset of , in which case the definitions have to be modified slightly.) If and are two -dimensional complexity structures, then I call a function a *map* if for every , depends only on . Equivalently, is of the form .

A *basic -set* in a complexity structure is a subset of the form for some . A *basic set* is any set that is a basic -set for some . Note that if is a map and is a basic set in , then is a basic set in .

The *circuit complexity* or *straight-line complexity* of a subset of a complexity structure is the minimal for which there exists a sequence of subsets of such that every is a basic set or a union or intersection of two sets earlier in the sequence, and . If is a map and , then the circuit complexity of is at most the circuit complexity of , since preserves basic sets and Boolean operations.

I often, and perhaps slightly confusingly, describe a map as a *lift* of . That’s because it’s really and its effect on subsets of that I am interested in.

Let be a complexity structure. A *coordinate specification* is a statement of the form for some and .

Let us assume that is even and let . Then the *shrinking-neighbourhoods game* is a two-player game played according to the following rules.

- Player I starts, and the players alternately make coordinate specifications.
- Player I’s specifications must be of coordinates with , and Player II’s must be of coordinates with .
- No coordinate may be specified more than once.
- At every stage of the game, there must exist a sequence that obeys all the specifications made so far.

A subset of is *I-winning* if Player I has a winning strategy for ensuring that after all coordinates have been specified, the sequence that satisfies those specifications (which is obviously unique) belongs to . It is *II-winning* if Player II has a winning strategy for ensuring that the final sequence belongs to .

Since finite games are determined, if is any subset of , then either is I-winning or is II-winning.

This can be thought of as a kind of Ramsey property, something that I mention only to explain what would otherwise be a rather strange piece of terminology. I say that a map between complexity structures is *Ramsey* if for every I-winning subset of , is a I-winning subset of , and for every II-winning subset of , is a II-winning subset of . In other words, Ramsey maps preserve winning sets and the player that wins.

It is an easy exercise to show that is Ramsey if and only if for every subset , if is I-winning then is I-winning and if is II-winning then is II-winning. (This isn’t quite a triviality, however: it uses finite determinacy.) This formulation is often more convenient.

I don’t want to be too precise about this, because part of what I hoped was that the correct statement would to some extent emerge from the proof. But roughly what I wanted was the following.

- If is a set with low circuit complexity, then there is a complexity structure that is not too large, and a Ramsey map , such that is simple.
- If is a random set then no such pair exists.
- There is an NP set for which no such pair exists.

Achieving 1 and 2 together would give a non-trivial example of a property that distinguishes between sets of low circuit complexity and random sets, which is a highly desirable thing to do, given the difficulties associated with the natural-proofs barrier, even if it doesn’t immediately solve the P versus NP problem. And achieving 1 and 3 together would show that P doesn’t equal NP.

However, it was far from clear whether these statements were true under any reasonable interpretation. Perhaps even sets of low circuit complexity require enormous sets , or perhaps there is some simple way of lifting arbitrary sets with only a small . Either of these possibilities would show that the existence of efficient Ramsey lifts does not distinguish between sets of low circuit complexity and arbitrary sets. What Pavel sent me yesterday was an observation that basically shows that the second difficulty occurs. That is, he showed that one can lift an arbitrary set quite simply.

Before I present his example, I’ll just briefly mention that I had a philosophical reason for thinking that such an example was unlikely to exist, which was that any truly simple example ought to have an infinite counterpart, but in the infinite case it is not true that arbitrary sets can be efficiently lifted. I’ll try to give some sort of indication later of why this argument does not apply to Pavel’s example.

I’ll begin by describing the example in an informal way and then I’ll make it more formal. (Pavel provided both descriptions in his message to me, so I’m not adding anything here.)

Let be any set and define an auxiliary game played on as follows. It’s just like the shrinking-neighbourhoods game, except that at some point each player must declare a bit, and the parity of the two bits they declare must be odd if the final sequence belongs to and even if it doesn’t. (So they must both play consistently with this restriction.)

Suppose that Player I has a winning strategy for the original game for some set . Then she can win the auxiliary game with payoff set as follows. Let as usual. For her first moves, she simply plays her winning strategy for the original game (ignoring the extra bit that Player II declares if he declares it). Then for her last move, she continues to play the winning strategy, but she also declares her extra bit. If Player II has declared his bit, then she looks at the two possible sequences that can result after Player II’s final move. If they are both in or both in , then she makes sure that the parity of the two bits is odd in the first case and even in the second. If one sequence is in and the other in then it does not matter what she chooses for her extra bit. If Player II has not declared his bit, then she can play her extra bit arbitrarily, which will oblige Player II to ensure that the parity of the two bits is equal to 1 if the final sequence is in and 0 otherwise.

Now suppose that Player II has a winning strategy for in the original game. In this case the proof is even simpler. He just plays this strategy, ignoring Player I’s extra bit when she plays it, and declares his extra bit right at the end, making sure that the final parity of the two bits correctly reflects whether the sequence is in .

Finally, note that to tell whether the eventual sequence in the auxiliary game belongs to , it is only necessary to look at the two extra bits. So whether or not a point belongs to can be determined by just two coordinates of that point (though which coordinates they are can vary from point to point). That makes a very simple set (I call it 2-open, since it is a union of “2-basic open” sets), even though the “board” on which the auxiliary game is played is not very large.

Now let me give a precise definition of the complexity structure . It consists of all sequences with the following properties.

- For exactly one , . In this case we will write .
- For exactly one with , . In this case we will write .
- For all other we have and write .
- if and 0 otherwise.

So there are six possibilities for each coordinate (since it can be an arbitrary element of ). Thus, we can regard as a subset of , which is not that much bigger than .

The map does the obvious thing and takes to . It is then easy to see that the shrinking-neighbourhoods game in with payoff set is basically the same as the auxiliary game I described earlier.

It may be, but I think it would be a mistake to abandon the project immediately without thinking fairly hard about what has gone wrong so far. Is it a sign that nothing even remotely like this idea could work, or is it a sign that the problems are more “local” and that certain definitions should be adjusted? In the latter case, what might a new set of definitions look like?

I’ll try to explain in a future post why I think that it is worth exploring the general strategy of attempting to show that sets of low circuit complexity can be lifted (in some sense yet to be determined) to simple sets (also in some sense yet to be determined). For now, I’d just like to make the general point that there are many aspects of the definitions above that could be changed. For the moment, I still like the definition of a complexity structure, because when I came up with it I felt myself “forced” to it. (It would take a bit of time to remember why this was, however.) I also quite like the idea that the maps we want to consider are ones that preserve some class of sets, since that gives quite a bit of flexibility. We need the class of sets to be fairly complicated, since otherwise there is a danger that verifying that the sets are preserved becomes too easy, which could then mean that the property “can be efficiently lifted” becomes too simple and is ruled out by known complexity barriers. (I’m thinking here not just of the natural-proofs barrier but also of an interesting extension of it due to Rudich.)

Looking for a class of sets might seem a hopelessly complicated task, but there are several constraints on what the class of sets can be like for the proof to work. One important one is that it should be definable in any complexity structure. So it needs to be defined in a way that isn’t too specific to . It might be worth making precise what this restriction actually means.

The rough idea here is that in Pavel’s example it is possible to provide the extra information (that is, the extra bits in the auxiliary game) right at the end of the game. In an infinite game there is no such thing as “right at the end of the game”: whenever you play, you’re still very near the beginning. This difference has caused me difficulties in the past, and I think it is worth focusing on again. Is there some natural way of ruling out this postponing of the extra information?

One crude idea is to rule it out by … ruling it out. For example, we could define a set to be -winning for Player I/II if there is a winning strategy for Player I/II such that after her/his first moves the outcome of the game is already decided. There is probably some serious drawback with such a simple-minded approach, but it is worth finding that drawback. I have given very little thought to it, so there may be something very obviously bad about it. One small point is that if, as I think is likely to be necessary, a proof that low-complexity sets can be lifted is inductive in nature, then we will want a composition of simplifying lifts to be a simplifying lift. So we would want our lifts to be such that -winning sets lift to -winning sets for the same player (and not just that winning sets lift to winning sets). So we would preserve the -winning sets we’ve already created, and attempt to create some new ones.

I think that one of the reasons Polymath9 hasn’t taken off is that I presented too much material all at once. (I did try to make it clear that it wasn’t necessary to wade through it all, but even so I can see that it might have been off-putting.) In an effort to avoid that mistake this time, I’m going to resist the temptation to think further about how to respond to Pavel’s lift and go ahead and put up this post. If I do have further ideas, I’ll post them as comments.

]]>

If you are reasonably comfortable with the kind of basic logic needed in an undergraduate course, then you may enjoy trying to find the flaw in the following argument, which must have a flaw, since I’m going to prove a general statement and then give a counterexample to it. If you find the exercise extremely easy, then you may prefer to hold back so that others who find it harder will have a chance to think about it. Or perhaps I should just say that if you don’t find it easy, then I think it would be a good exercise to think about it for a while before looking at other people’s suggested solutions.

First up is the general statement. In fact, it’s a very general statement. Suppose you are trying to prove a statement and you have a hypothesis to work with. In other words, you are trying to prove the statement

Now if and are two statements, then is true if and only if either is false or is true. Hence what we are trying to prove can be rewritten as follows.

Now we can bring the inside the as long as we convert the into , so let’s do that. What we want to prove becomes this.

I’ll assume here that we haven’t done something foolish and given the name to one of the variables involved in the statement . So now I’m going to use the general rule that is equivalent to to rewrite what we want to prove as the following.

Finally, let’s rewrite what’s inside the brackets using the sign.

Every single step I took there was a logical equivalence, so the conclusion is that if you want to show that implies , your task is the same as that of finding a single such that .

Now let me give a counterexample to that useful logical principle. Let be a set of real numbers. Define the *diameter* of to be . I’ll write it .

Consider the following implication.

That is clearly correct: if every element of has modulus at most 1, then is contained in the interval , so clearly can’t have diameter greater than 2.

But then, by the logical principle just derived, there must be a single element of such that if *that* element has modulus at most 1, then the diameter of is at most 2. In other words,

But that is clearly nonsense. If all we know is that one particular element of has modulus at most 1, it can’t possibly imply that has diameter at most 2.

What has gone wrong here? If you can give a satisfactory answer, then you will have a good grasp of what mathematicians mean by “implies”.

]]>

I’ve thought a little about what phrase to attach to the project (the equivalent of “density Hales-Jewett” or “Erdős discrepancy problem”). I don’t want to call it “P versus NP” because that is misleading: the project I have in mind is much more specific than that. It is to assess whether there is any possibility of proving complexity lower bounds by drawing inspiration from Martin’s proof of Borel determinacy. Only if the answer turned out to be yes, which for various reasons seems unlikely at the moment, would it be reasonable to think of this as a genuine attack on the P versus NP problem. So the phrase I’ve gone for is “discretized Borel determinacy”. That’s what DBD stands for above. It’s not a perfect description, but it will do.

For the rest of this post, I want to set out once again what the approach is, and then I want to explain where I am running into difficulties. I’m doing that to try to expose the soft underbelly of my proof attempt, in order to make it as easy as possible for somebody else to stick the knife in. (One could think of this as a kind of Popperian method of assessing the plausibility of the approach.) Another thing I’ll try to do is ask a number of precise questions that ought not to be impossible to solve and that can be thought about in isolation. Answers to any of these questions would, I think, be very helpful, either in demolishing the approach or in advancing it.

This section is copied from my previous post.

I define a *complexity structure* to be a subset of a set . I call the union of the the *alphabet* associated with the structure. Often I consider the case where . The maps between complexity structures that I consider (if you like, you can call them the morphisms in my category) are maps such that for each , the coordinate depends only on . To put that another way, if is another complexity structure, the maps I consider are ones of the form . I have found it inconvenient not having a name for these, but I can’t think of a good one. So I hereby declare that when I use the word “map” to talk about a function between complexity structures, I shall *always* mean a map with this property.

I call a subset of a complexity structure *basic* if it is of the form for some and some . The motivation for the restriction on the maps is that I want the inverse image of a basic set to be basic.

The non-trivial basic sets in the complexity structure are the coordinate hyperplanes and . The circuit complexity of a subset of measures how easily it can be built up from basic sets using intersections and unions. The definition carries over almost unchanged to an arbitrary complexity structure, and the property of maps ensures that the inverse image of a set of circuit complexity has circuit complexity at most .

Given a complexity structure , we can define a game that I call the *shrinking-neighbourhoods game*. For convenience let us take to be for some positive integer . Then the players take turns specifying coordinates: that is, they make declarations of the form . The only rules governing these specifications are the following.

- Player I must specify coordinates from to .
- Player II must specify coordinates from to .
- At every stage of the game, there must be at least one that satisfies all the specifications so far (so that the game can continue until all coordinates are specified).

Note that I do not insist that the coordinates are specified in any particular order: just that Player I’s specifications concern the first half and Player II’s the second.

To determine who wins the game, we need a *payoff set*, which is simply a subset . Player I wins if the sequence that the two players have specified belongs to , and otherwise Player II wins. I call a set *I-winning* if Player I has a winning strategy for getting into and *II-winning* if Player II has a winning strategy for getting into . (Just in case there is any confusion here, I really do mean that is II-winning if Player II has a winning strategy for getting into . I didn’t mean to write .)

Because the game is finite, it is determined. Therefore, we have the following Ramseyish statement: given any 2-colouring of a complexity structure , either the red set is I-winning or the blue set is II-winning. (Normally with a Ramsey statement one talks about *containing* a structure of a certain kind. If we wanted to, we could do that here by looking at minimal I-winning and minimal II-winning sets.)

Given a complexity structure , I define a *lift* of to be a complexity structure together with a map that satisfies the condition set out earlier. I define a lift to be *Ramsey* if is a winning subset of whenever is a winning subset of , and moreover it is winning for the same player. A more accurate name would be “winning-set preserving”, but I think of “Ramsey” as an abbreviation for that.

This gives us a potential method for showing that a subset is I-winning: we can find a Ramsey lift such that is simple enough for it to be easy to show that it is a I-winning subset of . Then the Ramsey property guarantees that , and hence , is I-winning in .

The definition of a Ramsey lift is closely modelled on Martin’s definition of a lift from one game to another.

Suppose that we have a suitable definition of “simple”. Then I would like to prove the following.

- If a set has polynomial circuit complexity, then there exists a Ramsey lift of with such that is simple and the cardinality of is much less than doubly exponential.
- If is a random subset of , then with high probability the smallest Ramsey lift that makes simple has an alphabet of doubly exponential size.
- There exists an NP set such that

the smallest Ramsey lift that makes simple has an alphabet of doubly exponential size.

Obviously, the first and third statements combined would show that PNP. For the time being, I would be delighted even with just the first of these three statements, since that would give an example of a property of functions that follows non-trivially from low circuit complexity. (That’s not guaranteed, since there might conceivably be a very simple way of constructing lifts from circuits. However, I think that is unlikely.)

Having the first and second statements would be a whole lot better than just having the first, since then we would have not just a property that follows non-trivially from low circuit complexity, but a property that distinguishes between functions of low circuit complexity and random functions. Even if we could not then go on to show that it distinguished between functions of low circuit complexity and some function in NP, we would at least have got round the natural-proofs barrier, which, given how hard that seems to be to do, would be worth doing for its own sake. (Again this is not quite guaranteed, since again one needs to be confident that the distinguishing property is interestingly different from the property of having low circuit complexity.)

As I said in my previous post, I think there are three reasons that, when combined, justify thinking about this potential distinguishing property, despite the small probability that it will work. The first is of course that the P versus NP problem is important and difficult enough that it is worth pursuing any approach that you don’t yet know to be hopeless. The second is that the property didn’t just come out of nowhere: it came from thinking about a possible analogy with an infinitary result (that in some rather strange sense it is harder to prove determinacy of analytic sets than it is to prove determinacy of Borel sets). And finally, the property appears not to be even close to a natural property in the Razborov-Rudich sense: for one thing it quantifies over all possible complexity structures that are not too much bigger than , and then it demands that the maps should preserve the I-winning and II-winning properties.

It is conceivable that the property might turn out to be natural after all. For instance, maybe the property of preserving I-winning and II-winning sets is so hard to achieve (I have certainly found it hard to come up with examples) that all possible Ramsey lifts are of some very special type, and perhaps that makes checking whether there is a Ramsey lift that simplifies a given set possible with a polynomial-time algorithm (as always, polynomial in ). But I think I can at least say that if the above property is natural, then that is an interesting and surprising theorem rather than just a simple observation.

Let be a straight-line computation of a set . That is, each is either a *coordinate hyperplane* (a set of the form for some and some ), or the intersection or union of two earlier sets in the sequence, and . We would like to find a complexity structure with not too large, together with a map that has the properties required of a Ramsey lift, such that is simple. Since a composition of Ramsey lifts is a Ramsey lift, and since taking inverse images (under the kinds of maps we are talking about) preserves simple sets, whatever definition of “simple” we are likely to take, as well as preserving all Boolean operations, a natural approach is an inductive one. The inductive hypothesis is that we have found a Ramsey lift such that the sets are simple for every . We now look at . By the inductive hypothesis, this is a union or intersection of two simple sets, so we now look for a Ramsey lift such that is simple. Setting , we then have a Ramsey lift such that is simple for every .

Thus, if we can find a very efficient Ramsey lift that turns a given intersection or union of two simple sets into a simple set, then we will be done. “Very efficient” means efficient enough that repeating the process times (where is polynomial in — though even superlinear in would be interesting) does not result in an alphabet of doubly exponential size. Note that if our definition of “simple” is such that the complement of a simple set is simple, then it is enough to prove this just for intersections or just for unions.

What might we take as our definition of “simple”? The idea I had that ran into trouble was the following. I defined “simple” to be “basic”. I then tried to find a very efficient lift — I was hoping to multiply the size of the alphabet by a constant — that would take the intersection of two basic sets to a basic set.

Let us very temporarily define a basic set to be -*basic* if it is defined by means of a restriction of the th coordinate. That is, it is of the form . (I want this definition to be temporary because most of the time I prefer to use “-basic” to refer to an intersection of at most basic sets.) If is -basic and is -basic, then it is natural to expect that if we can lift to a basic set, that basic set should be either -basic or -basic. Furthermore, by symmetry we ought to be able to choose whether we want it to be -basic or -basic. But then if we let be the 1-basic set and let be any other basic set, that tells us that we can lift so that it becomes a 1-basic set.

Now let us apply that to the coordinate hyperplanes in . If we can lift these very efficiently one by one until they all become 1-basic sets, then we have a complexity structure with a small alphabet and a map such that is 1-basic for every coordinate hyperplane . But applying Boolean operations to 1-basic sets yields 1-basic sets, and every subset of is a Boolean combination of coordinate hyperplanes. Therefore, *every* subset of has become a 1-basic set!

This is highly undesirable, because it means that we have shown that the property “Can be made simple by means of an efficient Ramsey lift” does not distinguish functions of low circuit complexity from arbitrary functions.

Because of that undesirability, I have not tried as hard as I might have to find such a lift. An initial attempt can be found in this tiddler. Note that the argument I have just given does not show that there cannot be a Ramsey lift that turns an -basic set into a 1-basic set at the cost of multiplying the size of the alphabet by a constant. What I have shown is that *if* this could be done, then there would be a Ramsey lift that converted all sets simultaneously into 1-basic sets, with an alphabet of size at most . If that were the case, then I think the approach would be completely dead. (Correction: the approach if the sets to be preserved are I-winning and II-winning sets would almost certainly be dead, and I don’t have any reason to think that if one tried to preserve other classes of sets, then the situation would be any different.) So that is one possible way to kill it off.

**Problem 1.** Let be a complexity structure and let be a basic subset of . Must there exist a complexity structure and a Ramsey lift such that is 1-basic and ?

In fact, if all one wants to do is disprove the statement that for a random set there is a doubly exponential lower bound, it is enough to obtain a bound here of the form .

The above observation tells us that we are in trouble if we have a definition of “simple” such that simple sets are closed under unions and intersections. More generally, we have a problem if we can modify our existing definition so that it becomes closed under unions and intersections. (What I have in mind when I write this is the example of basic sets. Those are not closed under intersections and unions, but if one could prove that every intersection of two basic sets can be lifted to a basic set, then, as I argued above, one could probably strengthen that result and show that every intersection of two basic sets can be lifted to a 1-basic set. And the 1-basic sets *are* closed under intersections and unions.)

Before I go on to discuss what other definitions of “simple” one might try, I want to discuss a second difficulty, because it gives rise to another statement that, if true, would deal a serious blow to this approach.

In the previous post, I gave an example of a lift that provides us with what I think of as the “trivial upper bound”: a Ramsey lift that turns every single subset of into an -basic set, with an alphabet of doubly exponential size. So if we want an inductive argument of the kind I have discussed above, we will need to show that an intersection or union of two simple sets can be lifted to a simple set with the size of the alphabet increasing in such a way that if one iterates that increase polynomially many times, the resulting size will be less than doubly exponential. (Actually, that isn’t quite necessary: maybe we could establish a lower bound of for a function in NP and an upper bound of for functions of circuit complexity , where .) This makes it highly problematic if we want to do anything that *squares* the size of the alphabet after only polynomially many steps. If we do that, then the size of the alphabet after times that polynomial number of steps, which is of course still a polynomial number of steps, will be at least and we will have proved nothing.

The reason this is troubling is that even if I forget all about simplifying any set , I find it very hard to come up with examples of Ramsey lifts. (All I mean by a Ramsey lift of is a complexity structure and a map that takes I-winning sets to I-winning sets and II-winning sets to II-winning sets.) The only ones I know about can be found on this tiddler here. And they all have the property that the players have to provide “extra information” of a kind that at the very least squares the size of the alphabet. In fact, it is usually quite a lot worse than that.

Maybe I can try to be slightly more precise about what I mean there. All the lifts I have considered (and I don’t think this is much of a restriction) take the form of sets where a typical sequence in is of the form and the map takes that sequence to . If , then What makes it interesting is that we do not take *all* sequences of the above form (that is, for arbitrary and arbitrary . Rather, we take only *some* of those sequences. (It is that that makes it possible to simplify sets. Otherwise, there would be nothing interesting about lifts.) So if Player I makes an opening move , we can think of this as a move in the original game together with a binding obligation on the two players that the eventual sequence will have at least one preimage such that . The set of all such sequences is a set that may well be a proper subset of .

Suppose now that this extra information is enough to determine some other coordinate . Then unless there are already very few options for how to choose , the number of possibilities for will be comparable in size to the size of the alphabet, and therefore the size of the alphabet is in serious danger of squaring, and certainly of raising itself to the power 3/2, say. And that is, as I have just pointed out, much too big an increase to iterate superlinearly many times.

So it looks as though any “extra information” we declare has to be rather crude, in the sense that it does not cut down too drastically the set in which the game is played. But I have no example of a Ramsey lift with this property. What’s more, the kind of difficulty I run into makes me worry that such a lift may not exist. If it doesn’t, then that too will be a serious blow to the approach.

Let me ask a concrete problem, the answer to which would I think be very useful. It is a considerable weakening of Problem 1.

**Problem 2.** Let be a complexity structure. Does there necessarily exist a non-trivial Ramsey lift with and bounded above by a function of ?

The main concern is that should *not* depend on .

I have not sorted out completely what “non-trivial” means here, but let me give a class of examples that I consider trivial. Let be a large enough set and let be a surjection. Define a map by . Finally, let . Then we can think of as a map from to . Note that is in some sense just like : it’s just that the coordinates of may have been repeated.

I claim that this is a Ramsey lift. Indeed, suppose that is a I-winning subset of . Then a winning strategy for Player I for is simply to project the game so far to , play a winning strategy in , and choose arbitrarily how to lift each specification of a coordinate of to a specification of the corresponding coordinate of .

To put that more formally, if the specifications so far are for and it is Player I’s turn, then she works out the specification she would make in in response to the specifications for . If this specification is , then she picks an arbitrary preimage of and makes the specification .

A similar argument works for winning sets for Player II.

It is the fact that this can always be done that makes the lift in some sense “trivial”. Another way of thinking about it is that there is an equivalence relation on such that replacing a point by an equivalent point makes no difference.

As far as I can tell at this stage, the problem is interesting if one takes “non-trivial” to mean not of the form I have just described. However, I reserve the right to respond to other examples by enlarging this definition of triviality. The real test of non-triviality is that an interesting Ramsey lift is one that has the potential to simplify sets.

A positive answer to the problem above will not help us if is an enormously large function of . However, for now my main concern is to decide whether it is possible to obtain a bound independent of . If it is, then a major source of worry is removed. If it is not, then the approach will be in serious trouble.

I stopped writing for a few hours after that last paragraph, and during those few hours I realized that my definition of non-triviality was not wide enough. Before I explain why not, I want to discuss a worry I have had for a while, and a very simple observation that explains why I don’t have it any more.

Because the worry was unfounded, it is rather hard to explain it, but let me try. Let’s suppose that we are trying to find an interesting Ramsey lift . Suppose also that we choose a random subset of with the critical probability . That is, we choose elements with that probability that makes the probability that is a I-winning set equal to . Then it seems highly likely that will be “only just” a I-winning set if it is one. And we’ll need to make sure that every time just happens to be I-winning, then is I-winning, and every time it just fails to be I-winning, is II-winning. This seems extraordinarily delicate, unless somehow the winning strategies in are derived rather directly from the winning strategies in (as seems to be the case for the examples we have so far).

The observation I have now made is almost embarrassingly simple: if is only just a I-winning set, we do not mind if is a II-winning set. That is because is not usually the complement of . In fact, if is a random set and every element of has many preimages in , then both and will be pretty well all of .

It is worth drawing attention to the way that it seems to be most convenient to prove that a lift is Ramsey. Instead of taking a winning subset of and trying to prove that its image is winning (for the same player) in , I have been taking a winning subset of and trying to prove that its inverse image is winning (for the same player) in . Let me prove a very easy lemma that shows that this is OK.

**Lemma.** Suppose that is a lift. Then the following two statements are equivalent.

(i) The image of every winning subset of is winning in for the same player.

(ii) The inverse image of every winning subset of is winning in for the same player.

**Proof.** Suppose that the second condition holds and let be a winning subset of . If is not a winning subset of for the same player, then is a winning subset of for the other player, which implies that is a winning subset of for the other player. But , so this contradicts being a winning set for the original player.

Conversely, suppose that the first condition holds and let be a winning subset of . Then if is not a winning subset of for the same player, then is a winning subset of for the other player, which implies that is a winning subset of for the other player. But , so this contradicts being a winning set for the original player. QED

Another way of saying all this is that if we want to prove that a map is a Ramsey lift, then the only winning sets for which we need to prove that is also a winning set are inverse images of sets . And the reason for that is that one can replace by the superset without affecting the image.

The quick description of these is as follows: take a trivial Ramsey lift of the kind I described earlier (one that duplicates each coordinate several times) and pass to a random subset of it.

Let me sketch an argument for why that, or something similar to it, works. The reason is basically the same as the reason that the trivial lift works. For the sake of clarity let me introduce a little notation. I’ll start with a complexity structure . I’ll then take to be a random subset of , where is some set and I write a typical element of as a sequence . The map takes this sequence to . I’m thinking of as a fairly large set, and the elements of are chosen independently from with some suitable probability .

Now let be a winning subset of . I want to show that is a winning subset of for the same player. So let be a winning strategy for for Player I (the case of Player II is very similar, so I won’t discuss it). Then in she can play as follows. If it is her turn and the specifications so far are of for , then she looks at what the strategy dictates in in response to the specifications of the , ignoring the . This will involve specifying some . Now she must find some such that there exists a sequence in that satisfies the specifications so far as well as the specification .

Typically, the proportion of that will serve as a suitable is approximately , so what we need, roughly speaking, is that should be bigger than . It’s not quite as simple as that, since if the alphabet is very very large, then there may be occasional pieces of extraordinary bad luck. However, I’m pretty sure it will be possible to modify the above idea to make it watertight.

Let and be complexity structures and a Ramsey lift. Let us say that is trivial if for any set of specifications () that can arise during the game in , for any set of specifications () with (this is a slight abuse of notation) and for any further specification , there exists a further specification , consistent with all the previous ones, such that .

This is an attempt to describe the property that makes it very easy to lift strategies in to strategies in : you just see what you would do at each stage in and lift that to — a policy that does not work in general but works in some simple cases.

One thing that is probably true but that it would be good to confirm is that a Ramsey lift of this simple kind cannot be used to simplify sets. I’ll state this as a problem, but I’m expecting it to be an easy exercise.

**Problem 3.** Let be a lift that is trivial in the above sense. Is it the case that for every the straight-line complexity of is equal to the straight-line complexity of ?

(A quick reminder: in a general complexity structure, I define the straight-line complexity of a set to be the length of the smallest sequence of sets that ends with , where all earlier sets in the sequence are either basic sets or unions or intersections of two earlier sets.)

Assuming that the answer to Problem 3 is yes, then the next obvious question is this. It’s the same as Problem 2 except that now we have a candidate definition of “non-trivial”.

**Problem 4.** Let be a complexity structure. Does there necessarily exist a non-trivial Ramsey lift where the size of the alphabet goes up by at most a factor that depends on only?

I very much hope that the answer is yes. I was beginning to worry that it was no, but after the simple observation above, my perception of how difficult it is to create Ramsey lifts has altered. In that direction, let me ask a slightly more specific problem.

**Problem 5.** Is there a “just-do-it” approach to creating Ramsey lifts?

What I mean there is a procedure for enumerating all the winning sets in and then building up and in stages, ensuring for each winning set in turn that its inverse image is a winning set for the same player. I would be surprised if this could be done efficiently, but I think that it would make it much clearer what a typical Ramsey lift looked like.

Let me also recall a problem from the previous post.

**Problem 6.** Let be the set of all sequences in of odd parity. Does there exist a Ramsey lift such that is a basic set and the alphabet of is not too large?

I would also be interested in a Ramsey lift that made simple in some other sense. Indeed, I suspect that the best hope for this approach is that the answer to Problem 6 is no, but that for some less restrictive definition of “simple” it is yes.

Maybe that’s enough mathematics for one post. I’d like to finish by trying to clarify what I mean by “micro-publication” on the TiddlySpace document. I can’t do that completely, because I’m expecting to learn on the job to some extent.

I’ll begin by saying that Jason Dyer answered a question I asked in the previous post, and thereby became the first person to be micro-published. I don’t know whether it was his intention, but anyway I was pleased to have a contribution suitable for this purpose. He provided an example that showed that a certain lift that turns the parity function into a basic function was (as expected) not a Ramsey lift. It can be found here. There are several related lifts for which examples have not yet been found. See this tiddler for details.

Jason’s micro-publication should not be thought of as typical, however, since it just takes a question and answers it. Obviously it’s great if that can be done, but what I think of as the norm is not answering questions but more like this: you take a question, decide that it cannot be answered straight away, and instead generate new questions that should ideally have the following two properties.

- They are probably easier than the original question.
- If they can be answered, then the original question will probably become easier.

One could call questions of that kind “splitting questions”, because in a sense they split up the original task into smaller and simpler tasks — or at least there is some chance that they do so.

What I have not quite decided is what constitutes a micro-publication. Suppose, for example, somebody has a useful comment about a question, but does not generate any new questions. Does that count? And what if somebody else, motivated by the useful comment, comes up with a good question? I think what I’ll probably want to do in a case like that is write a tiddler with the useful comment and the splitting question or questions, carefully attributing each part to the person who contributed it, with links to the relevant blog comments.

Also, I think that when someone asks a good question, I will automatically create an empty tiddler for it. So one way of working out quickly where there are loose ends that need tying up is to look for empty tiddlers. (TiddlySpace makes this easy — their titles are in italics.)

Some people may be tempted to think hard about a question and then present a fairly highly developed answer to it. If you feel this temptation, then I’d be very grateful if you could do one of the following two things.

- Resist it.
- Keep a careful record of all the questions you ask in the process of answering the original question, so that your thought processes can be properly represented on the proof-discovery tree.

By “resist it”, what I mean is not that you should avoid thinking hard about a question, but merely that each time you generate new questions, you should write up your thoughts so far in the form of blog comments, so that we get the thought process and not just the answer. The main point is that if we end up proving something interesting, then I would like it to be as clear as possible how we did it. With this project, I am at least as interested in trying to improve my understanding of the research process as I am in trying to make progress on the P versus NP problem.

]]>

As long-term readers of this blog will be aware, the P versus NP problem is one of my personal mathematical diseases (in Richard Lipton’s sense). I had been in remission for a few years, but last academic year I set a Cambridge Part III essay on barriers in complexity theory, and after marking the essays in June I thought I would just spend an hour or two thinking about the problem again, and that hour or two accidentally turned into about three months (and counting).

The trouble was that I had an idea that has refused to die, despite my best efforts to kill it. Like a particularly awkward virus, it has accomplished this by mutating rapidly, so that what it looks like now is very different from what it looked like at the beginning of the summer. (For example, at that stage I hadn’t thought of trying to model a proof on the proof of Borel determinacy.) So what am I to do?

An obvious answer is this: expose my ideas to public scrutiny. Then if there is a good reason to think that they can’t be made to work, it is likely that that reason will come to light more quickly than if I, as one individual with judgment possibly skewed by my emotional attachment to the approach, think about it on my own.

But what if they *can* be made to work? Do I want to make them public in their current only partially developed state? I’ve thought about this, and my view is that (i) it is very unlikely that the ideas will work, not just because it is *always* unlikely that any given attack on a notoriously hard problem will work, but also because there are certain worrying analogies that suggest, without actually conclusively demonstrating, that the approach has a good chance of running into a certain kind of well-known difficulty (roughly, that I’ll end up not managing to show that the parity function is simpler than an arbitrary function) and (ii) if, by some miracle, the approach *does* work, I’ll have put enough into it to be able to claim a reasonable share of the credit, and I’ll probably get to that stage far more quickly and enjoyably than if I work secretly. So in the first case, I gain something precious — time — and in the second case I also gain time and end up with an amount of credit that any sensible person ought to be satisfied with. [Confession: I wrote that some time ago and then had some further ideas that made me feel more optimistic, so I worked on them on my own for another two or three weeks. I'm going public only after starting to feel a bit bogged down again.]

And of course, if I go down the public route, it gives me another chance to try to promote the Polymathematical way of doing research, which on general grounds I think ought to be far more efficient. This is a strong additional motivation.

There is one question that I will not be able to suppress so I might as well get it out of the way: if the problem did get solved this way, then what would happen to the million dollars? The answer is that I don’t know, but I am not too bothered, since the situation is very unlikely to arise, and if it does, then it’s the Clay Mathematics Institute’s problem — they have a committee for making that kind of decision — and not mine. And I think it would be very wrong indeed if the existence of a prize like that had the effect of making research on a major mathematical question more secret, and therefore more inefficient, than it needed to be.

I have two remaining anxieties about going public. One is that it looks a bit attention grabbing to say that I’m working on the P versus NP problem. It’s probably a hopeless thing to ask, but I’d like it if this project could be thought of in a suitably low-key way. What I have at the moment has not yet been sufficiently tested to count as a serious approach to the problem: as I’ve already said, complexity experts may be able to see quickly why it can’t work. Of course I dream that it might turn into a serious approach, but I’m not claiming that status for it unless it survives other people’s attempts to kill it off. To begin with, one should probably think of Polymath9 as devoted to the question, “Why could nothing like this work?” which is rather less exciting than “Please help me finish off my proof that PNP.” (However, I think the approach is different enough from other approaches that a sufficiently general explanation of why nothing like it can work would be of some interest in itself.)

The other is that I may have missed some simple argument that immediately demolishes the approach in its entirety. If someone points out such an argument, it will sting a bit, and it makes me feel quite apprehensive about clicking on the “Publish” button I see in front of me. But let me feel the fear and do it anyway, since it’s probably better to feel embarrassed when that happens than it is to spend another two or three months working on an approach that is doomed to fail uninterestingly.

Although multiple online collaboration has not been widely adopted as a way of doing research, there have been enough different quite serious projects, each one with its own distinctive characteristics, to provide some evidence of what works and what doesn’t. Let me mention three examples that in different ways *have* worked. I think that Polymath1 (the density Hales-Jewett theorem) worked well partly because we started with not just a problem, but also the beginning of an approach to that problem. (The approach later changed and eventually had little in common with how it had been when it started, but having a clear starting point was still helpful.) Polymath3 (the Erdős discrepancy problem) started with just a statement of the problem to be tackled, and did not end up solving it, but I still count the project as a success of a kind, in that we rapidly reached a much better understanding of the problem, found plenty of revealing experimental data, and generated a number of interesting subquestions and variants of the initial question. More recently, Polymath8 (improving Zhang’s bound for prime gaps) worked well because the problem was not a yes/no question. Rather, it was a how-far-can-we-push-this-proof question, very well suited to a group of people looking together at a paper and reaching an understanding of the arguments that allowed them to improve the bound significantly. It was also good to undertake a project that was guaranteed to produce at least something — though I think the current bound is probably better than most people would have predicted at the beginning of the process.

Having said all that, there are some aspects of the Polymath projects so far that have left me not fully satisfied. I don’t mean that I have been positively dissatisfied, but I have been left with the feeling that more could be achieved. For one thing, it is slightly disappointing that there have not been more projects. (I bear some responsibility for this, since I have not been involved in any Polymath projects for quite a while, apart from a brief attempt to revive Polymath3.) I think there are various reasons for this, some of which it may be possible to do something about.

My original fantasy was that it would be possible for lots of people to make small contributions to a project and for those small contributions to add up almost magically to something greater than the sum of its parts. I think that to a small extent that happened, in the sense that reading other people’s comments was quite unexpectedly stimulating. However, I came to think that there was significant room for improvement in the way that a Polymathematical discussion takes place. At the moment it principally takes place in two ways: as a sequence of comments on blog posts, and as a wiki that is gradually built up by the participants. But neither of these conveys in a truly transparent way the logical structure of the discussion, or makes it easy for new people to join the discussion once it has got going. What I would like to see is the gradual building up of what I call a *proof discovery tree*. I don’t have a precise definition of this concept, but the rough idea is this. You start with the initial question you are trying to answer. You can’t just write down the answer, so you have to find new questions to ask. (At the very beginning they will be extremely hazy questions like, “What could a proof of this statement conceivably look like?”.) Those questions will probably generate further questions, and in that way one builds up a tree of questions. When one has gone deep enough into the tree, one starts to reach questions that one can answer. Sometimes the answer will have the effect of telling you that you have reached a dead end. Occasionally it will transform your approach to the entire problem.

I think something like that is a reasonable description of the research process, though of course it leaves a lot out. I also think that with modern technology it is possible to record one’s attempt to prove a result in a tree-like format rather than in the linear format that is encouraged by paper and pen, or even by TeX files, blog posts and the like.

What would be the advantage of writing notes on proof attempts in a tree-like form? I think there is one huge advantage: if at some point you feel stuck, or for some other reason want others to join in, then setting out your thoughts in a more structured way could make it much easier for others to take up where you left off rather than having to start from scratch. For example, when you stop, there will probably be a number of leaves of your proof-discovery tree that are not dead ends, but rather are questions that you just haven’t got round to answering. If you ask the questions in isolation (on Mathoverflow, say), then they will seem fairly unmotivated. But if they live on a proof-discovery tree, you can follow a path back to the root, seeing that the question asked was motivated by an earlier question, which itself was motivated by an earlier question, and so on. Or, if you just want to add a new leaf to the proof-discovery tree, you can ignore all that motivation (or perhaps skim it briefly) and simply try to answer the question.

Would it be practical for people to keep adding leaves to a tree like this? Wouldn’t it all get a bit out of hand, with people disagreeing about what constitutes an appropriate link from an existing vertex? I think it might. One way round that problem is the following. There is an informal discussion that takes place in the usual way — with comments on blog posts. But the participants also keep an eye on a somewhat more formal proof-discovery tree that develops as the discussion progresses, and if somebody makes a comment that looks as though it could be developed into a useful new node of the tree, it is proposed for a sort of “micro-publication”. If it is accepted, then whoever is moderating the discussion adds it to the proof-discovery tree, possibly rewriting it in the process. The node of the tree comes with links to the comment that inspired it, and the name of the author of that comment. So this process of micro-publication provides a similar kind of motivation to the one that traditional publication provides, but on a much smaller scale.

Is there any good software out there for creating a proof-discovery tree of this kind? I asked this question on Google Plus and got a variety of helpful answers. I opted to go with the third answer, given by Robert Schöftner, who suggested TiddlySpace with a MathJax plugin. If you’re the kind of person to whom that sounds complicated, you should take the fact that I managed to get it to work fairly easily as strong evidence that it is not. And I’ve fallen in love with TiddlySpace. It feels very unhealthy to have written that, but also, given the amount of time I’ve spent with it, not too wide of the mark.

Its main advantage, as I see it, over a traditional wiki is that a “tiddler”, which roughly corresponds to a wiki page or short blog post, is not on a separate web page. Rather, it forms part of a “tiddlyspace”, roughly speaking a collection of interlinked tiddlers, that all live on the same page and can be opened and closed as you like. Amazingly (to me at any rate), you can open, close, create and edit tiddlers even when you are offline, without losing anything. When you’re next online, everything gets saved. (I imagine if you close the page then you *will* lose everything, but it’s not exactly challenging not to do that.) One can also add nice gadgets such as what they call “sliders” — boxes you click on to make some text appear and click on again to make it disappear. I’ve used that in a few places to make it convenient for people to be reminded of definitions that they may have forgotten or not seen.

Now I’m not trying to say that everyone should use TiddlySpace. I’m sure people have very strong views about different kinds of wiki software being better than others. But I *would* like to encourage people to try writing their research attempts in a tree-like format, so that if they don’t succeed in solving a problem but do have interesting ideas about it, then they can present their incomplete proof attempt in a nice way. If you prefer some other software to TiddlySpace, then by all means use that instead.

As a matter of fact, TiddlySpace, while having a lot in common with what I have often thought would be great to have, also lacks a few features that I’d really like. I have included a tiddler with a site map that indicates the tree structure of all the pages by suitably indenting the titles. But what I’d prefer is a much more graphical representation, with actual nodes and links. The nodes could be circles (or perhaps different shapes for different kinds of pages) with text in them, and could increase in size as you hovered over them (like the icons on the dock of a Mac if you have too many of them) and open up if you clicked on them. Similarly, the edges would have text associated with them. So it might look more like the stacks project visualizations.

If writing up proof attempts became standard practice, and if somewhere there was an index of links to incomplete proof-discovery trees, then people who wanted something to think about could search through the leaves of the proof-discovery trees for problems that look interesting and well-motivated. (Maybe the Selected Papers Network could be used for this indexing purpose, though these would not be papers in any traditional sense.) In that way, collaborations could start up. Some of these might be very open and public. Others might be much quieter (e.g. someone emails the author of the proof-discovery tree with an answer to a question, and that leads to a private collaboration with the author). Also, even if every path of a proof-discovery tree led to a dead end, that would *still* be a useful document: it would give a detailed and thorough record of a proof attempt that doesn’t work. That’s something else that I’ve long thought would be a nice thing to have, partly because it may save time for other people, and partly because even a failed proof attempt may contain ideas that are useful for other problems. Also, as I found with Polymath1, there is the surprising phenomenon that other people’s ideas can be immensely stimulating *even if you don’t use them*. So even if my ideas about the P versus NP problem turn out to be fruitless, there is a chance, as long as they are not completely ridiculous (a possibility I cannot rule out at this stage), that they could provoke someone else into having better ideas that lead to interesting progress on the problem. If they do that for you, maybe you could buy me a drink some time.

For the above reasons, I see the publication (in the sense of making public on the internet) of partial proof-discovery trees as one possible way of getting Polymathematical research to become more accepted. Each such “publication” would be a kind of proposal for a project, as well as a record of progress so far. I also think that “micro-publishing” contributions to a proof-discovery tree have the potential to provide a motivation that is similar to the motivation that drives people to answer questions on Mathoverflow: it offers slightly more of a reward than you get from people responding to a comment you have made on a blog.

Yet another potential advantage of writing partial proof-discovery trees is that if you present your ideas so far in a structured format, it can result in a much more systematic approach to *your own* research. You may go from, “Oh dear, all this is getting complicated — I think I’ll try another problem,” to “Ah, now I see how all those various ideas link up (or don’t link up) — I think there is more to say about that leaf there.” I have found that when writing down my ideas about games and the P versus NP problem. So there is something to be said for doing it, even if you have no intention of making your thoughts public. (But what I would like to see in that case is people eventually deciding that they are unlikely to add to their private proof-discovery trees and making them public.)

Because I’m keen to see whether something like this could work, I have spent a couple of weeks taking my thoughts from over the summer (which I had written into a LaTeX file that stretched to 80 pages — in the interests of full disclosure I might make that file public too, though it is disorganized and I wouldn’t particularly recommend reading it), throwing out some of them that seemed to go nowhere, and putting the rest into a tree-structured Tiddlyspace wiki. I have tried to classify the tiddlers themselves and (more importantly) the links between the tiddlers. Most tiddlers are devoted to discussions of questions, so the link classifications are saying for what kind of reason I pass from trying to answer one question to trying to answer another. (Some simple examples might be that I want to try the first non-trivial case, or that I want to see whether a generalization of the original statement has a chance of being true.) I haven’t put as much thought into this link classification as I might have, so I am very open to suggestions for how to improve it, especially if these would make the connections clearer. (I can foresee two sorts of improvements: reclassifications of links within the scheme as it now is, and revisions to the scheme itself.) The result of doing that was to stimulate a lot more thought about the approach, so I’ve added that to the tree as well. A link to the whole thing can be found at the end of this post.

To summarize, this is what I suggest.

1. The aim of the project is **either** to dispose quickly of the approach I am putting forward, by finding a compelling reason to believe that it won’t work (I have tried to highlight the most vulnerable parts, to make this as easy as possible — if it is possible) **or** to build on the existing partial proof-discovery tree until it yields an interesting theorem. While the big prize in the second (and much less likely) case would of course be to prove that PNP, there are more realistic weaker targets such as finding *any* property that follows “interestingly” from a function’s having low circuit complexity. In the first case, there is the prospect of finding a new barrier to proving lower bounds. For that one would need the approach to fail for an interesting reason. I explain below why I don’t think the approach obviously naturalizes, so there seems at least some chance of this.

2. I will give anyone who might be interested a week or two to browse in my partial proof-discovery tree. There is quite a lot to read (though I hope that its tree structure makes it possible to understand the approach without reading anything like everything), so I won’t open a mathematical comment thread for a while. (I’ve got other things I need to do in the meantime, so this works quite well for me.) However, before that starts I would very much welcome comments about the use I have made of TiddlySpace. I wanted to create a document that set out a proof attempt in a more transparent way than is possible if you are forced into a linear structure by a TeX document, but what I’ve actually produced was not planned all that carefully and I think there is room for improvement.

3. Once the comment thread is open (on this blog), I’ll act as a moderator in the way described above: if someone (or more than one person) provides input that would make a good page to add as a new leaf to the tree, then I will “micro-publish” that page. I don’t want to be too dictatorial about this, so I will welcome proposals for inclusion — either of your own questions and observations or of somebody else’s. I will make clear who the author is of each of these “micro-publications”. If I do not give credit to somebody who deserves it (e.g. if I base a page on a blog comment that builds on another blog comment that I had forgotten about) then I will welcome having that pointed out.

4. A typical “page” will consist of a question that is motivated by an existing page, together with a discussion of that question. If other questions arise naturally in that discussion, can’t be answered instantly, and seem worth answering, then they will be designated as “open tasks”. An open task can become a page if somebody makes enough observations about it to reduce the task to subtasks that look easier. (This does not have to be a logical reduction — it can simply be replacing the initial task by something that deserves to be attempted first.)

5. As a rule of thumb, if a question arises during the writing of a page that is sufficiently different from the question that the page is about that it is most naturally regarded as a new question, then it gets a new page. But this is a matter of judgment. For example, if the question is very minor and easy to answer, then it probably counts as more of a remark and doesn’t deserve a page to itself.

6. The underlying principle behind a link is this. You have a page discussing a question. If you can argue convincingly that the right approach (or at least a good approach) to the question is to think about another question or questions, then the argument forms the main content of the page, and the subquestion or questions form the headings for potential new pages. Links are classified into various types: if your link cannot be classified easily but is of a clearly recognisable type that does not belong to the current classification system, then I will consider adding that link type.

7. The main criterion for micro-publication is *not* the mathematical quality of the proposed page, but the suitability of that page as a new leaf of the tree. This principle reflects my conviction/prejudice that a good piece of mathematics can always be broken up into smaller units that are fairly natural things to try. I want to use the proof-discovery tree as a way of encouraging the process of exploring reasonably obvious avenues to be as systematic as possible. So I will normally insist that any proposed leaf is joined to an existing node by means of a link of one of a small number of types I have listed. (The list can be found over at the TiddlySpace space.) I will consider proposals for new link types, but will accept them only if there are compelling reasons to do so — which there may well be to start with.

8. If maintaining the partial proof-discovery tree becomes too much work to do on my own, then I will consider giving editing rights to one or more “core” participants. But to start with I will be the sole moderator.

There is a lot to read on my TiddlySpace. If you’d rather have some idea of what’s there before investing any time in looking at it, then this section is for you. I’ll try to give the main idea, though not fully precisely and without much of the motivation. If that gets you interested, you can try to use the proof-discovery tree to understand the motivation and a lot more detail about the approach.

The main idea, as I’ve already said, is to try to find a proof that relates to sets of low circuit complexity in the same way that Martin’s proof of determinacy relates to Borel sets. There are two instant reactions one might have to this proposal, one pessimistic and one optimistic. The pessimistic reaction is that the analogy between Borel sets and sets of low circuit complexity has already been explored, and it seems that a better analogue for the Borel sets is sets that can be computed by polynomial-sized circuits *of constant depth*. This fits with the fact that there is no natural Borel analogue of the parity function, and the parity function cannot be computed by polynomial-sized circuits of constant depth.

The optimistic reaction is that Martin’s proof is different enough from the proof, say, that the set of graphs containing an infinite clique is not Borel, that there is a chance that the objection just given does not apply. In particular, to prove that Borel sets of level are determined, one needs to apply the power set operation to the natural numbers roughly times, and the statement that all analytic sets are determined (analytic sets corresponding to NP functions) needs large cardinal axioms. Could this be peculiar enough to enable us to find some non-natural analogue in the finite set-up?

An important thing to stress is that the property that I hope will distinguish between sets of low circuit complexity and random sets (or, even better, some set in NP, but for now I am not really thinking about that) is *not* an analogue of determinacy. That’s because it is a very easy exercise to show that the intersection of two determined sets does not have to be determined. (Roughly speaking, each set may have a nice part and a nasty part, with the nice parts disjoint and the nasty parts intersecting.) For this reason, Martin can’t prove determinacy by showing that the class of determined sets is closed under complements and countable unions and intersections. Instead what he does is prove inductively that Borel sets can be lifted to much simpler sets in such a way that (i) it is easy to show that the simpler sets are determined and (ii) it follows from that that the original sets are determined.

I won’t give all the definitions here, but the condition that is needed to get (ii) to work is basically this: given any set in the lifted game for which one of the players has a winning strategy, the same player has a winning strategy for the image of that set in the original game.

For various reasons, I’m convinced that certain features of the analysis of the infinite game have to be modified somewhat. The pages of my TiddlySpace set out my reasons in gory detail, but here let me simply jump to the set-up that I have been led to consider.

I define a *complexity structure* to be a subset of a set . I call the union of the the *alphabet* associated with the structure. Often I consider the case where . The maps between complexity structures that I consider (if you like, you can call them the morphisms in my category) are maps such that for each , the coordinate depends only on . To put that another way, if is another complexity structure, the maps I consider are ones of the form . I call a subset of a complexity structure *basic* if it is of the form for some and some . The motivation for the restriction on the maps is that I want the inverse image of a basic set to be basic.

The non-trivial basic sets in the complexity structure are the coordinate hyperplanes and . The circuit complexity of a subset of measures how easily it can be built up from basic sets using intersections and unions. The definition carries over almost unchanged to an arbitrary complexity structure, and the property of maps ensures that the inverse image of a set of circuit complexity has circuit complexity at most .

Given a complexity structure , we can define a game that I call the *shrinking-neighbourhoods game*. For convenience let us take to be for some positive integer . Then the players take turns specifying coordinates: that is, they make declarations of the form . The only rules governing these specifications are the following.

- Player I must specify coordinates from to .
- Player II must specify coordinates from to .
- At every stage of the game, there must be at least one that satisfies all the specifications so far (so that the game can continue until all coordinates are specified).

Note that I do not insist that the coordinates are specified in any particular order: just that Player I’s specifications concern the first half and Player II’s the second.

To determine who wins the game, we need a *payoff set*, which is simply a subset . Player I wins if the sequence that the two players have specified belongs to , and otherwise Player II wins. I call a set *I-winning* if Player I has a winning strategy for getting into and *II-winning* if Player II has a winning strategy for getting into . (Just in case there is any confusion here, I really do mean that is II-winning if Player II has a winning strategy for getting into . I didn’t mean to write .)

Because the game is finite, it is determined. Therefore, we have the following Ramseyish statement: given any 2-colouring of a complexity structure , either the red set is I-winning or the blue set is II-winning. (Normally with a Ramsey statement one talks about *containing* a structure of a certain kind. If we wanted to, we could do that here by looking at minimal I-winning and minimal II-winning sets.)

Given a complexity structure , I define a *lift* of to be a complexity structure together with a map that satisfies the condition set out earlier. I define a lift to be *Ramsey* if is a winning subset of whenever is a winning subset of , and moreover it is winning for the same player. A more accurate name would be “winning-set preserving”, but I think of “Ramsey” as an abbreviation for that.

This gives us a potential method for showing that a subset is I-winning: we can find a Ramsey lift such that is simple enough for it to be easy to show that it is a I-winning subset of . Then the Ramsey property guarantees that , and hence , is I-winning in .

The definition of a Ramsey lift is closely modelled on Martin’s definition of a lift from one game to another, though there are also some important differences that I will not discuss here.

Now let me say what the property is that I hope will distinguish sets of low circuit complexity from some set in NP. I stress once again that this is a rather weak kind of hope: I think it probably won’t work, and the main reason I have not yet established for certain that it doesn’t work is that the definition of a Ramsey lift is complicated enough to make it fairly hard to prove even rather simple facts about it. However, I think the difficulties are reasonable ones rather than unreasonable ones. That is, I think that there are a number of questions that are tricky to answer, but that should yield reasonably quickly. I do *not* think that the difficulties are a disguised form of the usual difficulties connected with circuit complexity. So the most likely outcome of opening up the approach to public scrutiny is that the answers to these smaller questions will be found and they will not be what I want them to be.

To explain the property, let me first give an example of a Ramsey lift that converts every subset of into a basic set. I will take to be the set of sequences with the following properties.

- There exists such that is an ordered pair of the form , where and is a I-winning subset of with a winning strategy that begins with the move .
- For every other , is an element of .
- For every , is an ordered pair of the form , where , , and .
- .

The map is the obvious one that takes the sequence above to .

Given a set , its inverse image is equal to the set of all such that for some . This is a basic subset of , as claimed earlier.

It remains to show that is a Ramsey lift of . Let be a I-winning subset of and let be a winning strategy for Player I for getting into .

Suppose that Player I’s first move is of the form for some and some I-winning subset for which is the first move of a winning strategy. Player II can now play an arbitrary move of the form , where , , and . Since is a winning strategy for getting into , the result will always be a win for Player I. Therefore, for every with there exists with . Let be the set of sequences such that . Then . So a winning strategy for Player I for is to begin with the move and then to play the rest of the strategy that gets into , which will get her into .

Now suppose that Player I’s first move is of the form . This time, Player II is free to choose an arbitrary such that and play the move . After that, Player I is guaranteed to produce a sequence in , which implies that contains all sequences with . Therefore, Player I has a winning strategy for , since she can simply start with the move .

Now let be a II-winning subset of and let be a winning strategy for Player II for getting into . Then for every opening move that Player I might choose to make, Player II can defeat that move. It follows that there exists with such that the sequence

belongs to . Therefore, for every Player I winning set there exists such that . It follows that Player I does not have a winning strategy for , so Player II does have a winning strategy for .

I called that an important example because it gives us a “trivial upper bound” on the size we need to have if we want to find a Ramsey lift from to that makes a set simple. The lift above makes every single subset of into a basic set. Note that this bound is quite large: there are doubly exponentially many winning sets . (Slightly less obviously, there are doubly exponentially many *minimal* winning sets. I haven’t written out a full proof of this, but here is why I believe it. If you take a random set with a certain critical probability, then it should be a I-winning set, but it should not be possible to remove lots of elements from it and still have a I-winning set. Therefore, we need to have a collection of sets of density almost as big as the critical probability such that almost every set with the critical probability has a subset in the collection. That should make the collection doubly exponential in size. It would be good to make this argument rigorous.)

What I would like to prove is something like this. There is one part of what I want that is unfortunately a little vague, which is the definition of “simple”. I’ll discuss that in a moment.

- If a set has polynomial circuit complexity, then there exists a Ramsey lift of with such that is simple and the cardinality of is much less than doubly exponential.
- If is a random subset of , then with high probability the smallest Ramsey lift that makes simple is doubly exponential.
- There exists an NP set such that

the smallest Ramsey lift that makes simple is doubly exponential.

If one could prove 1 and 3, then one would have shown that PNP. If one could prove 1 and 2, then one would have exhibited a non-trivial property that distinguishes between functions of polynomial circuit complexity and random functions. That in itself would not prove that PNP, but it might point the way towards other methods of defining “unnatural” properties, which is a necessary first step towards proving that PNP.

I’ll say once again that I don’t yet consider this to be a serious approach, even if I ignore the problem that I don’t yet know what a “simple” set is. Given a precise definition of “simple” (and I have some candidates for this), I have just exhibited a pair of statements the conjunction of which would imply that PNP. However, for an observation that to count as a serious approach to proving , there are two other properties one wants. The first is good evidence that is actually true, which I do not have — I do not count my failure to disprove it so far as good evidence, and the analogy with Martin’s theorem has certain drawbacks that make me think that it is likely that either 1 will be false, or else 1 will be true but only because *all* sets can be efficiently lifted, so that both 2 and 3 will be false. The second requirement is some reason to believe that might be easier to prove than . Here I think the implication above fares slightly better: while I have no idea how to prove lower bounds on the “Ramsey-lift complexity” of a set, the fact that proving upper bounds doesn’t seem to be easy for sets of low circuit complexity suggests that if one *did* manage to prove such upper bounds, one would have a reduction of the problem that didn’t feel trivially equivalent in difficulty to the original problem, though it might in practice turn out to be very hard as well. If further thought about 1-3 led people to believe that they were likely to be true after all, then and only then would I want to say that this was a serious approach. But as I’ve said several times now, I think that is fairly unlikely.

A key question I’d like to know the answer to is whether there is an efficient (that is, much smaller than doubly exponential) Ramsey lift for the parity function, or rather the set of points with an odd number of 1s. The reason is that it looks to me at the moment as though the most likely thing to go wrong will be that the most efficient lift blows up rapidly as the circuit complexity of a function increases — so rapidly that it becomes doubly exponential for circuits of linear size. (All we would need for this is for the size of to square at each increase by 1 in the length of a straight-line computation. Obviously, slightly weaker statements would also suffice.) If that is indeed the case, it may well be that what determines the size of the smallest lift is closely related to the noise sensitivity of the set , in which case the parity function is a good one to use as a test.

Another possibility for a cheap demolition of the whole approach is if you can spot a simple Ramsey lift that converts an arbitrary set into a basic set and needs an alphabet of only exponential size. I haven’t really tried to find such a lift, so I could easily have missed something obvious.

For Martin a simple set was one that is open and closed. The best analogue I can think of for the notion of an open set is the following. Let be a complexity structure. Call a subset of -*basic* if it is an intersection of basic sets, and call it -*open* if it is a union of -basic sets. Call it -*closed* if its complement is -open. Then we could look at sets that are -open and -closed for some suitably small .

But how small? The only natural candidates seem to be 1,2 or something around , but there appear to be difficulties with all these choices.

Why not define a set to be simple if it is basic? I have a problem with that, which is that if it is too easy to lift a set to a basic set, then we will probably be able to lift all the coordinate hyperplanes in (which are basic already) to sets of the form — that is, to sets that are not just basic but defined by restrictions of the *first* coordinate. But if we can do that, then *all* sets lift to basic sets in .

If you want an even easier question to think about than the one above about the parity function, I have not yet even managed to determine whether one particular lift works. Here’s how it is defined. I take the set of all sequences of the form , where and is the parity of . The map takes this sequence to . Thus, the game in is the same as the game in except that when Player II specifies the th coordinate, he must also commit himself to a particular parity and to playing his last move to ensure that has this parity.

This extra condition on Player II should disadvantage him, so there should be payoff sets that Player I can get into if Player II has the extra restriction but cannot get into otherwise. I’m pretty sure it will be easy to find such a payoff set, but my first few attempts have failed, so I have not managed to do it yet. I do have a heuristic argument that suggests that a suitably chosen random set ought to be an example, so one approach would be to make that argument rigorous.

One can also consider small variants of the above lift, for some of which it is not at all clear that a random set should work, so what I’d really like to see is either a proof that some simple variant is in fact, contrary to expectations, a Ramsey lift, or an argument that is sufficiently general to rule out a large class of similar constructions.

For anyone wondering whether to invest any time in helping me think about my approach, this is of course the key question. It’s hard to say for *sure* that the approach wouldn’t naturalize. Perhaps one could come up with a clever criterion that would say which sets can be efficiently lifted. But I’m fairly confident that the property “there exists a complexity structure with an alphabet of significantly less than doubly exponential size and a map that takes I-winning sets to I-winning sets and II-winning sets to II-winning sets such that is simple” is not easy to reformulate as a property with polynomial (or quasipolynomial, or anything at all small) circuit complexity (in the truth table of ). In fact, I think it is not even in NP, since we existentially quantify over and but then require to have a property that holds for a very large class of rather complicated subsets of . So even if we allow to be “only exponential” in size (which is not required by the approach), the natural formulation of the property appears to be .

Of course, it’s one thing to write down a strange property, and quite another to expect it to hold for functions of low circuit complexity. But the fact that I have been strongly motivated by the proof of Borel determinacy gives me some small reason to hope that a miracle might occur. It is in the nature of miracles that it probably won’t occur, but the subjective probability I associate with it is far enough from zero that I don’t want to give up on it until I am absolutely sure that it won’t.

I should add that even if the property I have given above does not work, it may be that some related property does, as there are various details that could be changed. For example, we could replace the classes of I-winning and II-winning sets by other classes of sets and ask for our maps to preserve those.

Finally, here is the proof-discovery tree as I have developed it so far. To get started, I recommend clicking on “PvsNP Sitemap” in the toolbar at the top of the page. Even if the approach collapses almost immediately, I hope you may enjoy looking at it and getting an idea of what can be done on TiddlySpace.

]]>

The system I have in mind works as follows. It’s a multilevel representative democracy. Suppose for convenience that for some positive integer . (It is easy, but slightly tedious, to modify what I am about to write to take care of more general .) Suppose that the country is divided into three “super-constituencies”, each of which gets a vote in the top-level decision-making body (known as the triumvirate). Suppose that decisions in that body are passed by a majority vote. A group of people that wants to control the country can do so as long as it can control at least two votes in the triumvirate.

How are the members of the triumvirate chosen? They are elected by another triumvirate one level down. The representative in the top-level triumvirate is representing the views of the three people in the triumvirate one level down, and is worried about stepping out of line, since then he/she risks being deselected by the three people in the level-2 body.

So if a merry band of fanatics wants to control a representative in the top-level triumvirate, it is enough to control at least two of the representatives in the second-level triumvirate that selects the top-level representative.

Of course, we can iterate this argument. So how many people do we need to control the country? We need two at the top level, and therefore four at the second level, and so on. Therefore, we need at the bottom level. (Note that the representatives do not have to be fanatics themselves — if they don’t vote in the way that the fanatics want, then they get deselected by the people one level down, losing all those lovely perks that go with a high-level job in politics.) If , then , so we’re done.

One might want to make small adjustments to the bound to allow all the different levels of influence to be disjoint. So then . But this is within a constant of . Similarly, if we start with some that is not of that precise form, that again affects the estimate by just a constant factor.

So the conclusion is that in principle people can mess up a country with population . If you have more people than that, then the main thing you want is a system with a few levels of groups within groups — not necessarily formal at every level — and a distribution that is not too concentrated and not too diffuse. (If it is too concentrated, then you’ll end up wasting a lot of votes on controlling representatives who are already controlled, but if it is too diffuse, then you won’t control anybody except at very low levels. In the extreme case, what you want is to be arranged in what can be viewed as a discrete approximation to the Cantor set: in less extreme cases you still want to be somewhat “fractal” and “Cantor-like”.)

]]>

The purpose of this post is to add some rigour to what I wrote in the previous post, and in particular to the subsection entitled “Why should we believe that the set of easily computable functions is a ‘random-like’ set?” There I proved that *if* the Rubik’s-cube-like problem is as hard as it looks, then there can be no polynomial-time-computable property that distinguishes between a random composition of 3-bit scramblers and a purely random Boolean function. This implies that there can be no polynomial-time-computable “simplicity” property that is satisfied by all Boolean functions of circuit complexity at most that is not satisfied by almost all Boolean functions.

I personally find the assumption that the Rubik’s-cube-like problem is hard very plausible. However, if you disagree with me, then I don’t have much more I can say (though see Boaz Barak’s first comment on the previous post). What Razborov and Rudich did was to use a different set of random polynomial-time-computable functions that has a better theoretical backing. They build them out of a pseudorandom function generator, which in turn is built out of a pseudorandom generator, which is known to exist if the discrete logarithm problem is hard. And the discrete logarithm problem is hard if factorizing large integers is hard. Since many people have tried hard to find an algorithm for factorizing large integers, there is some quite strong empirical evidence for this problem’s being hard. It’s true that there are also people who think that it is not hard, but the existence of a pseudorandom generator does not depend on the hardness of factorizing. Perhaps a more significant advantage of the Razborov-Rudich argument is that *any* pseudorandom generator will do. So the correctness of their conclusion is based on a weaker hypothesis than the one I used earlier.

It’s time I said in more detail what a pseudorandom generator is. Suppose you have a Boolean function , with . Then you have two obvious probability distributions on . The first is just the uniform distribution, which we can think of as choosing a random 01-string of length . The second is obtained by choosing an element uniformly at random from and applying the function . This we can think of as a *pseudo*random 01-string of length . The idea is that if mixes things up sufficiently, then there is no efficient algorithm that will give significantly different results when fed a purely random 01-string and a pseudorandom 01-string.

We can be slightly more formal about this as follows. Suppose is a Boolean function. Define to be the probability that when is chosen randomly from . Define to be the probability that when is chosen randomly from . We say that is an -*hard pseudorandom generator* if whenever can be computed in time .

It may look a little strange that appears twice there. Shouldn’t one talk about a -hard pseudorandom generator, where the number of steps is and the difference in the probabilities is at most ? The reason for setting equal to is that, up to a polynomial, it is the only interesting value, for the following reason. Suppose that the difference in the probabilities is . Then if we run the algorithm times, the difference in the expected number of times we get 1 is . If that is significantly bigger than , then the probability that the difference in the actual number of times we get a 1 is not at least will be small, so we can detect the difference between the two with high probability by counting how many 1s each one gives. This happens when is proportional to . Speaking a little roughly, if the probabilities differ by , then you need at least runs of the experiment and at most runs to tell the difference between random and pseudorandom, where and are fixed polynomial functions. Since running the experiment times doesn’t affect the complexity of the detection process by more than a polynomial amount when depends polynomially on , we might as well set : if you prefer a bigger you can get it by repeating the experiment, and there is nothing to be gained from a smaller since the difference between random and pseudorandom is already hard to detect.

Intuitively, a pseudorandom generator is a function from a small Boolean cube to a big one whose output “looks random”. The formal definition is making precise what “looks random” means. I took it to mean “looks random to a computer program that runs in polynomial time” but one can of course use a similar definition for *any* model of computation, or indeed any class whatever of potential distinguishing functions. If no function in that class can distinguish between random functions and images of with reasonable probability, then is pseudorandom for that class.

A pseudorandom generator produces a small subset of (technically it’s a multiset, but this isn’t too important) with the property that it is hard to distinguish between a random string in and a purely random string. However, sometimes we want more than this. For example, sometimes we would like to find a function from to that “looks random”. We could of course think of such a function as a 01 string of length (that is, as a list of the values taken by the function at each point of ) and use a pseudorandom generator to generate it, but that will typically be very inefficient.

Here is a much better method, which was devised by Goldreich, Goldwasser and Micali. (Their paper is here.) Let be a pseudorandom generator. (Later I’ll be more precise about how hard it needs to be.) We can and will think of this as a pair of functions , each one from to . If we are now given a string , we can use it and the functions and to define a function . We simply take the composition of s and s that correspond to . That is, . To put that another way, you use the digits of to decide which of and to apply. For instance, if , then you apply , then , then , then , then .

What we actually want is a function to , but if we take the first digit then we’ve got one. We can think of it as a function of two variables: given and , then equals the first digit of . But a function of two variables can also be thought of as a collection of functions of one variable in two different ways. We’ve thought so far of as indexing functions and as being the argument, but now let’s switch round: we’ll take as the index and as the argument. That is, let’s write for , which is itself just the first digit of .

We now have two probability distributions on the set of all functions from to . One is just the uniform distribution — this is what we mean by a random function. The other is obtained by choosing a random string of length , applying the composition to it and taking the first digit — this is what we mean by a pseudorandom function (associated with this particular construction).

Note that we are now in a very similar situation to the one we were in with 3-bit scramblers earlier. We have a small bunch of efficiently computable functions — the functions — and it is hard to distinguish between those and entirely random functions. But now we shall be able to prove that it is hard, subject to widely believed hardness hypotheses. Also, even if you don’t believe those hypotheses, the reduction to them is interesting.

How easily can be computed? Well, given , we have to calculate the result of applying to a composition of functions, each of which is a polynomial-time-computable function from to . So the number of steps we need is for some polynomial . This is polynomial in provided that is polynomial in . Accordingly, we take for some large constant .

The idea now is to show that if we can distinguish between a random function of the form and a genuinely random function, then the pseudorandom generator is not after all very hard: in fact, it will have hardness at most , which is substantially less than exponential in . Since the pseudorandom generator was arbitrary, this will show that no pseudorandom generator of that hardness exists.

By the way, let me draw attention to the parts of this proof that have always caused me difficulty (though I should say again that it’s the kind of difficulty that can be overcome if one is sufficiently motivated to do so — I’ve just been lazy about it up to now). The first is the point about the roles of the two variables and above and the way those roles switch round. Another is a wrong argument that has somehow made me feel that what is going on must be subtler than it actually is. That argument is that a pseudorandom generator is a function defined on , so its hardness is a reasonable function of , while the kind of pseudorandomness we’re interested in takes place at the level of Boolean functions defined on , which have domains of size , so breaking those in polynomial time in will surely have no bearing on the far smaller function that makes the pseudorandom generator.

I didn’t express that wrong argument very well — necessarily, since it’s wrong — but the thing I’ve been missing is that is quite large compared with , and we are making really quite a strong assumption about the hardness of the pseudorandom generator. Specifically, we’re not just assuming that the generator has superpolynomial (in ) hardness: we’re assuming that its hardness is at least for some small positive constant . That way the hardness can easily be comparable to . So there isn’t some clever way of “dropping down a level” from subsets of to subsets of or anything like that.

The third thing that got in the way of my understanding the proof *is* connected with levels. It’s surprising how often something easy can feel hard because it is talking about, say, sets of sets of sets. Here it is important to get clear about what we’re about to discuss, which is a sequence of probability distributions of Boolean functions from to . This shouldn’t be *too* frightening, since we’ve already discussed two such probability distributions: the uniform distribution and the distribution where you pick a random and take the function . What we’re going to do now is create, in a natural way, a sequence of probability distributions that get gradually more and more random. That is, we’ll start with the the distribution that’s uniform over all the , and step by step we’ll introduce more randomness into the picture until after steps we’ll have the uniform distribution over all functions from to .

I’m going to describe the sequence of distributions in a slightly different way from the way that Razborov and Rudich describe it. (In particular, their distributions start with the uniform distribution and get less and less random.) I don’t claim any particular advantage for my tiny reformulation — I just find it slightly easier. I also feel that if I’ve reworked something in even a minor way then it’s a sign that I understand it, so to a large extent I’m doing it for my own benefit.

First of all, let us take the set of binary sequences of length less than or equal to . These sequences form a binary tree if we join each sequence to the two sequences you get by appending a 0 and a 1. Let us take a sequence of trees , where consists just of the root of (that is, the empty sequence) and is the full tree , and let us do so in such a way that each is obtained from by finding a leaf and adding its children: that is, picking a sequence in that is not a sequence of length and is not contained in any other sequence in and adding the sequences and . It is not hard to see that this can be done, and since we add two sequences at a time and get from the tree that consists just of the empty sequence to the tree of all binary sequences of length at most , there are trees in this sequence of trees.

Given a tree , we create a probability distribution on the set of functions from as follows. For every we let be maximal such that the subsequence belongs to . We then take a random , apply the composition , and take the first digit of the result. If then we interpret this to mean that we simply pick a random . An important point to be clear about here is that the random points *do not depend on* . So what I should really have said is that for each vertex of we pick a random . Then for each we find the maximal subsequence that belongs to , apply to the composition , and pass to the first digit. If we interpret the composition as the identity function, so we simply take .

Note that if , then this is just applying the composition to and taking the first digit, which is exactly what it means to take a random function of the form . Note also that if , then for every all we do is take , which is another way of saying that whatever is, we choose a random and take its first digit, which of course gives us a function from to chosen uniformly at random. So it really is the case that the first distribution in our sequence is uniform over all and the last distribution is uniform over all functions from to .

Now let’s think about the difference between the distribution that comes from and the distribution that comes from . Let and be the binary sequences that belong to but not to . Let us also write for the random element of associated with any given 01-sequence . Let be the length of . Then the one thing we do differently when evaluating the random image of is this. If has as an initial segment, then instead of evaluating the first digit of we evaluate the first digit of . If does not have as an initial segment, then nothing changes.

Note that the first of these evaluations can be described as the first digit of . The basic idea now is that if we can distinguish between that and , then we can distinguish between and . But is a purely random sequence in whereas is a random output from the pseudorandom generator.

Let us now remind ourselves what we are trying to prove. Suppose that is a simplicity property that can be computed in time (which is another way of saying that it can be computed in time that is polynomial in ). By “simplicity property” I mean that holds whenever has circuit complexity at most , where is the polynomial function described earlier, and does not hold for almost all functions. Actually, we can be quite generous in our interpretation of the latter statement: we shall assume that if is a purely random function, then holds with probability at most .

If has those two properties, then holds for every , and therefore

I wrote there to mean the probability when is chosen uniformly at random, and to mean the probability when is chosen uniformly at random.

Let me now write for the probability distribution associated with . From the above inequality and the fact that there are of these distributions it follows that there exists such that

That is, the probability that holds if you choose a function randomly using the st distribution is greater by than the probability if you use the th distribution.

What we would like to show now is that this implies that the hardness of the pseudorandom generator is at most . To do that, we condition on the values of for all sequences other than , and . (Recall that was defined to be the unique sequence such that and belong to but not to .) By averaging, there must be some choice of all those sequences such that, conditioned on that choice, we still have

We now have a way of breaking the pseudorandom generator . Suppose we are given a sequence and want to guess whether it is a random sequence of the form (with chosen uniformly from ) or a purely random element of . We create a function from to as follows. For each , let be the maximal initial segment of that belongs to . If is not equal to , then take the first digit of , where is the length of and is the fixed sequence from the set of sequences on which we have conditioned. If and has length , then apply to the left half of if the next digit of is 0 and to the right half of if it is 1. Then take the first digit of the result.

If is a random sequence of the form , then what we are doing is choosing a random and taking the first digit of . Therefore, we are choosing a random function according to the distribution , conditioned on the choices of . If on the other hand is a purely random sequence in , then we are choosing a random function according to the distribution under the same conditioning. Since the probabilities that holds differ by at least and can (by hypothesis) be computed in time , it follows that the hardness of is at most .

Since for an arbitrarily small constant , it follows that if there is a polynomial-time-computable property that distinguishes between random and pseudorandom functions from to , then no pseudorandom generator from to can have hardness greater than .

A small remark to make at this point is that the hardness of the generator needs to be defined in terms of circuit complexity for this argument to work. Basically, this is because it is not itself that we are using to distinguish between random and pseudorandom sequences but a function that is created out of (using in particular lots of restrictions of the random ) in a not necessarily uniform way. So even if can be computed in polynomial time, it does not follow that there is an *algorithm* (as opposed to circuit) that will break the generator in time .

Recall that earlier I proposed a way of getting round the natural-proofs barrier and proceeded to argue that it almost certainly failed, for reasons very similar to the reasons for the natural-proofs barrier itself. The question I would like to consider here is whether that argument can be made to rely on the hardness of factorizing rather than on the hardness of a problem based on 3-bit scramblers that does not, as far as I know, connect to the main body of hardness assumptions that are traditionally made in theoretical computer science.

Here is an informal proposal for doing so. Let , let , identify the points of with graphs on (labelled) vertices in some obvious way and let take the value 1 if the corresponding graph contains a clique of size and 0 otherwise. Also, let be the function that is 1 if the th edge belongs to the graph and 0 otherwise. Those functions are chosen so that the function is (trivially) an injection.

Now there is a concept in theoretical computer science called a *pseudorandom permutation* from to . I’ll define it properly in a moment, but for now it’s enough to say that if you try to invent the definition for yourself, you’ll probably get it right. Roughly, however, it’s a permutation of that depends on a randomly chosen string in for some suitable and is hard to distinguish from a purely random permutation of . Importantly, pseudorandom permutations exist if pseudorandom functions exist (I’ll discuss this too), as was shown by Luby and Rackoff.

So let’s compose the function with a pseudorandom permutation , obtaining a function .

Actually, one thing you might not guess if you try to define a pseudorandom permutation is that the permutation *and its inverse* should both be efficiently computable functions. Because of that, if we are provided the values of for a graph , we can easily tell whether or not contains a clique of size : we just compute , which gives us , and look at the first digit.

Now let’s suppose that we have a progress-towards-cliques property that takes the value 1 for a sequence of functions if, given it is easy to determine whether contains a clique of size , and suppose that does not apply to almost all sequences of functions. That is, let us suppose that if is a purely random sequence of functions (subject to the condition that the resulting function to is an injection) then the probability that satisfies is at most .

Next, suppose that we have a permutation and we want to guess whether it has been chosen randomly or pseudorandomly. Composing it with the function and applying to the resulting function, we get 1 if has been chosen pseudorandomly. Note that to do this calculation we need at most steps to calculate and, if is polynomial-time computable, at most steps to determine whether holds. So if has hardness at least , it follows that the probability that a random injection yields functions that satisfy is at least . For sufficiently large , this is a contradiction.

I’m fairly confident that I can make the above argument precise and rigorous, but it may be a known result, or folklore, or uninteresting for some reason I haven’t noticed. If anyone who knows what they are talking about thinks it’s worth my turning it into a note and at least putting it on the arXiv, then I’m ready to do that, but perhaps it is better left in its current slightly informal state.

]]>

I have a secondary motivation for the posts, which is to discuss a way in which one might try to get round the natural-proofs barrier. Or rather, it’s to discuss a way in which one might initially think of trying to get round it, since what I shall actually do is explain why a rather similar barrier seems to apply to this proof attempt. It might be interesting to convert this part of the discussion into a rigorous argument similar to that of Razborov and Rudich, which is what prompts me to try to understand their paper properly.

But first let me take a little time to talk about what the result says. It concerns a very natural (hence the name of the paper) way that one might attempt to prove that P does not equal NP. Let be the set of all Boolean functions . Then the strategy they discuss is to show on the one hand that all functions in that can be computed in fewer than steps have some property of “simplicity”, and on the other hand that some particular function in NP does not have that simplicity property.

Now if one wants to design a proof along those lines, it is important that the simplicity property shouldn’t be *trivial*. By that I mean that it shouldn’t be a property such as “can be computed in fewer than steps”. The good news about that kind of property is that it is probably true that it distinguishes between your favourite NP-complete function and functions that can be computed in fewer than steps. But the obvious bad news is that proving that this is the case is trivially equivalent to the problem we are trying to solve.

The moral of that silly example is that we are looking for a property that is in some sense *genuinely different* from the property of being computable in at most steps. Without that, we’ve done nothing.

There are plenty of other silly examples, like “can be computed in fewer than steps”, which are slightly less trivial but unsatisfactory in exactly the same way. So what we really want is some kind of simplicity property that isn’t obviously to do with how easy a function is to compute — it is that air of tautology that makes certain properties useless for our purposes.

Now one way that we might hope to make the simplicity property non-trivial in this sense is if it is somehow simpler to deal with than the property of being computable in a certain number of steps.

Let me pause here to stress that there are two levels at which I am talking about simplicity here. One is the level of Boolean functions: we want to show that some Boolean functions are simple and some are not, according to some as yet unformulated definition of simplicity. The other is the level of *properties* of Boolean functions: we want our simplicity property to be in some sense simple itself, so that it doesn’t have the drawback of the tautologous examples. So one form of simplicity concerns subsets of (which are equivalent to Boolean functions in ) and the other concerns subsets of , which can be identified with subsets of , a -dimensional discrete cube.

Why do we want the simplicity property to be itself simple? There are two potential advantages. One is that it will be easier to prove that some Boolean functions are simple and other Boolean functions are not simple if the simplicity property is not too strange and complicated. The other is that if the simplicity property is simple, then it will give us some confidence that it is not one of the semi-tautologous properties that get us nowhere.

This second point isn’t obvious — how do we know that the property of being computable in at most steps corresponds to a very complicated subset of the -dimensional Boolean cube? The result of Razborov and Rudich gives a surprisingly precise answer to this question, but if you just want to convince yourself that it is probably true, then a short and easy nonrigorous argument is enough, and provides a good introduction to the slightly longer rigorous argument of Razborov and Rudich.

The basic philosophy behind the argument is this: a random efficiently computable function is almost impossible to distinguish from a random function. So if we let be the subset of that consists of all Boolean functions computable in at most steps, then looks very like a random subset of . (Recall that is the set of *all* Boolean functions on , so is a set of size .)

Let me briefly argue *very* nonrigorously (this is not the nonrigorous argument I was talking about two paragraphs ago, but an even vaguer one). A property of Boolean functions can be identified with a subset of . A *simple* property of Boolean functions can therefore be thought of as a simple subset of . A very general heuristic tells us that if is a set, is a simple subset of of density and is a “random-like” subset of of density , then has density roughly . That is, there is almost no correlation between a simple set and a random-like set. If we were to say “random” instead of “random-like”, then this kind of statement can often be proved using an easy counting argument: for each , the probability that has density significantly different from is very small. (I’m taking to be a random set of density .) Since there aren’t very many simple sets, most sets have the property that they do not correlate with *any* simple sets.

Suppose that that argument transferred from random sets to “random-like” sets and that the set of functions computable in at most steps is a “random-like” subset of . That will tell us that if is any simple simplicity property, then the probability that a random function in has property is almost the same as the probability that a random function in has property . It follows that if all functions in are simple (as we want for the proof strategy to get off the ground), then almost all functions in must be simple. But that’s saying that a *random* function should be simple (with high probability), which hardly sounds like the sort of simplicity property we know and love.

The statement of Razborov and Rudich’s main theorem is starting to take shape. What the above argument suggests is that if we want to use a simplicity property to show that then we have an unwelcome choice: either has to be a strange and complicated property or almost all Boolean functions must have property . Razborov and Rudich formulate a precise version of this statement and prove it subject to the assumption that pseudorandom generators exist — an assumption that is widely believed to be true.

The previous section leaves a number of unjustified statements. Before I attempt to justify them, let me make two remarks. The first is that I haven’t said what I mean by the set of all functions computable in at most steps. Am I putting some bound on the size of the Turing machine that does the computation? If so, how?

The second remark is that for the general idea to be valid (that simple simplicity properties won’t distinguish between efficiently computable functions and arbitrary functions), it is enough if we can find *some* set of efficiently computable functions and convince ourselves that it looks like a random set. It doesn’t matter whether is the set of “all” efficiently computable functions, so we don’t have to decide what “all efficiently computable functions” even means. So that deals with the problem just mentioned.

I now want to describe a set of functions of low circuit complexity. This is not the same as low computational complexity, so the remarks I am about to make concern the question of how one might distinguish between NP and the class of functions of polynomial circuit complexity. Since functions computable in polynomial time can be computed with polynomial-sized circuits, this would be enough to show that ; indeed, it is one of the main strategies for showing it.

Let us define a *3-bit scrambler* to be a function of the following form. Let be a subset of of size 3, and assume for convenience that . Let be a permutation of . (That is, it takes the eight points in and permutes them in some way — it doesn’t matter how.) Then takes an -bit Boolean sequence and “does to “. I hope that that informal definition will be enough for most people, but if you want a formal definition then here goes. Let’s define to be the projection that takes an -bit sequence to the sequence , and let’s define to be the “insertion” that takes a pair of sequences and and replaces the bits and by and , respectively. Finally, if is an -bit sequence, define to be . In other words, we isolate the bits in , apply the permutation , and then stick the resulting three bits back into the slots where the original three bits came from.

A simple example of a 3-bit scrambler is the map that takes an -bit sequence and performs the following operation. If the first three bits are , then it replaces them by ; if the first three bits are , then it replaces them by ; otherwise it does nothing.

It is easy to see that any 3-bit scrambler can be created using a circuit of bounded size. Therefore, a composition of 3-bit scramblers has circuit complexity at most for some absolute constant .

What’s nice about 3-bit scramblers is that they give us a big supply of pretty random looking functions of low circuit complexity: you just pick a random sequence of 3-bit scramblers and compose them. That gives you a function from to , but if you want a function from to you can simply take the first digit.

Now I would like to convince you, with a complete absence of anything so vulgar as an actual proof, that a random function created in this way is hard to distinguish from a genuinely random function. Let’s think about what a 3-bit scrambler looks like geometrically. If we have the function , then there is a sense in which what it does depends only on the bits in . But what is that sense, since the image depends on all the bits of ? A nice way to look at it is this. The Boolean cube can be partitioned into eight parts according to the values at the three bits in . Each of these parts is a subcube of codimension 3. The effect of is to apply a permutation to those eight parts, which it carries out in the simplest way possible. For example, if part X is to move to part Y, then it is simply translated there: the bits inside are changed but the bits outside are not changed. So you chop up the big cube into eight bits and swap those bits around without rotating them or altering their internal structure in any way.

I like to think of this as a sort of gigantic Rubik’s cube operation. The analogy is not perfect, since rotation does take place in a Rubik’s cube. However, what the two situations have in common is a set of fairly simple permutations that can combine to create much more complicated ones. In fact, the 3-bit scramblers generate every even permutation of the set . This isn’t obvious, but isn’t a massively hard result either. It is false for 2-bit scramblers, because those are all affine over .

Consider now the following problem: you are given a scrambled Rubik’s cube and asked to unscramble it in at most 15 moves. The worst positions are known to need 20 moves. Of course, I’m assuming that at most 15 moves have been used for the scrambling — in fact, let’s assume that those 15 moves were selected randomly. As far as I know, finding an economical unscrambling is a hard problem, one that in general you shouldn’t expect to be able to solve except by brute force. A good reason for expecting it to be hard is that it’s very much in the territory of problems that are known to be not just hard but impossible, such as solving the word problem in groups.

And now consider a closely related problem: you are given a Rubik’s cube to which a random 15 moves have been applied, and another Rubik’s cube that is scrambled uniformly at random (that is, it is in a random position chosen uniformly from all positions reachable from the starting configuration), and are asked to guess which is which. Is there some quick way of making a guess that is significantly better than chance?

If you agree that the answer is probably no, then you should be even readier to agree that the answer is no for the corresponding problem concerning 3-bit scramblers, since those are all the more complicated. But I suppose I shouldn’t say that without providing a little bit of evidence that they really are complicated. For that I’ll refer to a paper of mine that was published in 1996, where I showed that if you compose a random sequence of 3-bit scramblers, then the resulting permutation of the Boolean cube is *almost -wise independent* for some that depends in a power-type way on and , meaning that if you choose any distinct sequences, then their images are approximately uniformly and independently distributed. This gives a reasonably strong sense in which a random composition of 3-bit scramblers looks like a random permutation of . Of course, it’s a long way from a proof that a random composition of 3-bit scramblers cannot be efficiently distinguished from a random permutation, but that’s not something we’re going to be able to prove any time soon, since it would imply that . However, it is a reassuring piece of evidence: although the idea that these random scramblings are hard to distinguish from genuinely random functions is quite plausible, it is good to have some reason to believe that this plausibility is not a mirage.

It is important be clear here what “hard to distinguish” means, so let’s pause for a moment and think how we could distinguish between random compositions of 3-bit scramblers and genuinely random even permutations of . (Again, if you want to talk about functions to instead, then take first digits. It doesn’t affect the discussion much.) To be precise about what the problem is asking, you are given two even permutations of , one a random composition of 3-bit scramblers and the other an even permutation chosen uniformly at random. Your task is to guess which is which with a probability significantly better than 1/2 of being correct. The question is how much computer power you need to do that.

The only obvious strategy is brute force: you look at every composition of 3-bit scramblers and see whether any of the resulting permutations is equal to one of the two permutations you’ve been given. If it is, then with very high probability that’s the one that was not chosen purely randomly. (It’s possible, but extraordinarily unlikely, that a purely random even permutation just happens to be a composition of 3-bit scramblers.)

The number of compositions of 3-bit scramblers is , which is bigger than exponential, so this strategy is very expensive indeed. In fact, it’s superpolynomial not just in but also in , which is a more appropriate measure, since to specify the problem we need to specify in the region of bits of information: the values taken by the two permutations. (It’s actually more like , though that’s a slight overestimate since we know that both functions are even permutations.)

What is in terms of ? Well, let’s write . Then (here is a constant that can vary from expression to expression), so . A polynomial function of takes the form , so this is distinctly bigger.

I said that this part of the post would not be rigorous, but that is slightly misleading, since I *have* just proved something rigorous: that *if* being able to detect the output of a 3-bit scrambler with probability better than chance is a hard problem, in the sense that the best algorithm is not much better than brute force, then the ugly choice described earlier really is necessary: if you want a property that distinguishes between functions computable by polynomial-sized circuits and arbitrary functions, then either that property will have to be one that cannot be computed in polynomial time (as a function of ) or it will have to apply to almost all functions.

The drawback with this argument is that its interest depends on the unsupported assertion that the 3-bit-scrambler problem is hard. What Razborov and Rudich did was similar, but they used a different assertion — also unproved, but more convincingly supported — namely that factorizing is hard.

Before I get on to how Razborov and Rudich did that, I want to discuss an approach to showing that that initially appears to get round the difficulty I’ve just described. Recall that the difficulty is this. If is a property of Boolean functions that applies to all functions of circuit complexity at most , then if certain problems that look very hard really are very hard, it follows that either is not computable in polynomial time (as a function of ) or applies to almost all functions.

In the latter case, it seems unreasonable to think of as a “simplicity” property. But so what? Do we need a simplicity property? Another idea is to have what one might think of as a “making-progress” property. Suppose, for example, that we are trying to prove that the problem of detecting whether a graph has a clique of size is of superpolynomial circuit complexity. Perhaps we could define some kind of measure that we could apply to Boolean functions, such that the higher that measure was, the more information the Boolean functions would, in some sense, contain about which graphs contained cliques of size and which did not.

There is a well-known argument that instantly kills this idea. Let’s suppose that our measure of progress towards detecting cliques is not completely stupid. In that case, a random function will, with very high probability, have made absolutely no progress towards detecting cliques. But now let be the function that’s 1 if your graph contains a clique of size and 0 otherwise, and let be a Boolean function chosen uniformly at random. Then the function is also a Boolean function chosen uniformly at random. So and have, individually, made no progress whatever towards detecting cliques. However, , so in one very simple operation — the exclusive OR, we get from no progress at all to the clique function itself.

But does that really kill the idea? A natural response to this example is to think not about individual functions but about *ensembles* of functions. Is there a useful sense in which, while neither nor on its own carries any information about whether graphs contain cliques, the pair does?

There is obviously *some* sense in which the pair contains this information, since if you are given the functions and you can easily determine whether a graph contains a clique of size . However, we would like to generalize this very simple example. Here is a strategy one might try to adopt to prove that .

1. Choose your favourite NP-complete function, such as the clique function.

2. Define a “clique usefulness” property on ensembles of functions: roughly speaking this would tell you, given a set of Boolean functions, whether it had any chance of helping you determine in a short time whether a graph contains a clique of size .

3. Prove that the set of coordinate functions (that is, the functions defined by ) does not have the clique usefulness property.

Note that if we do things backwards like this, focusing very much on the target (to detect cliques) rather than the initial information (whether or not each edge belongs to the graph), then the property of “getting close to the target” is naturally small. So could we use this kind of idea to get round the difficulty that any reasonably simple simplicity property has to apply to almost all functions?

I think the answer is no, for reasons that are fairly similar to the reasons discussed in the previous section. Again I’ll use 3-bit scramblers to make my point. Let’s suppose that we have a property that applies to ensembles of functions, and that measures, in some sense, “how much information they contain about cliques”. Now let me define a collection of ensembles of functions using 3-bit scramblers. I’ll start with the clique function itself, which I’ll call , and I’ll also take some random Boolean functions . (It isn’t actually important that there are functions, but there should be around .) Putting those functions together gives me a function . Now I’ll simply compose with a random composition of 3-bit scramblers. That is, I’ll let be random 3-bit scramblers (with ) and I’ll define to be the Boolean function .

Suppose I know and the functions . Then it is easy to reconstruct , since I can just take the composition . Thus, if I am given the Boolean functions , then with the help of a polynomial-sized circuit (to calculate the composition of the inverses of the 3-bit scramblers) I can reconstruct . Taking the first digit, I find out whether or not my graph contains a clique of size .

Therefore, any “clique usefulness” property is going to have to do something that looks rather hard: it must distinguish between ensembles produced in the manner just described, and genuinely random ensembles of Boolean functions. Note that what is not random about the functions is not the functions themselves but the very subtle dependencies between them.

There is a slightly unsatisfactory feature of this problem, which is that it depends on a very specific function, namely the clique function. Also, when we create the function , we don’t create a bijection, since it is not the case that exactly half of all graphs contain a clique of size . To deal with the latter criticism, let’s increase the number of random functions, so now we start with for some that’s large enough that is an injection. (It won’t have to be very large for this — linear in will be fine.) Now compose with random 3-bit scramblers, where the sets are subsets of . The result of this is some functions .

The problem we would now like to solve is this. Given the functions , find a sequence of 3-bit scramblers defined on the Boolean cube such that, writing for the composition and for the function (so and ), we have if and only if contains a clique of size .

This is a special case of the following problem. Suppose you are given sequences of points and of . Does there exist a composition of 3-bit scramblers such that for every and for every ?

Actually, that isn’t quite the problem that’s of interest, but it is very closely related. The real problem is more like this. Suppose you are given points and in and told that one of the following two situations holds. Either they have been chosen randomly or we have chosen randomly with first coordinate 1 and randomly with first coordinate 0, and taken a random composition of 3-bit scramblers, setting and for each . Can you efficiently guess which is the case with a chance of being correct that is significantly different from 1/2 without using vast amounts of computer power?

This doesn’t look at all easy, so it looks very much as though something rather similar to the natural-proofs statement holds in this reverse direction as well. It would say something like this. Suppose that you have some polynomially computable property (for “informativeness”) of sets of functions , such that has property whenever the clique function (or any other NP function of your choice) can be efficiently computed given the values of . Then almost every sequence of functions has property . The “proof” is similar to the earlier argument: a polynomially computable property can’t distinguish, even statistically, between genuinely random sets of functions and random sets of functions that have been cooked up to have just enough dependence to be informative about cliques. Since all the latter must have property , almost all the former must have property as well.

In the next post I’ll turn to the actual argument of Razborov and Rudich.

]]>

If you are interested enough to look at the preprint and find that you spot a typo or more serious error, we would of course be very grateful to be told.

]]>

One way to get some further insight into the proof of Borel determinacy is to look at some auxiliary games that don’t work as lifts. A particularly natural candidate is this. Let be an infinite (pruned) tree, let and let be the game . Now define a game as follows. Player I plays a move of the form , where is a strategy for with first move . Player II plays a move of the form , where is a strategy for and . Thereafter, the two players must play the strategies and .

Clearly the outcome of this game is decided after the first two moves: if beats then Player I wins and otherwise Player II wins.

Now let’s try to map strategies to strategies. Given a strategy for Player I for , let the first move be . Then it makes sense to let be the image of . Does this give us the lifting property? Well, if is a run of with Player I playing the strategy , then there must be some Player II strategy for (in fact there are several) such that (that is, if Player I plays and Player II plays then is the sequence produced). So if Player II plays (where is the second term of ) as his first move in the result of the run of when projected to will be the sequence . So far so good.

The problem comes when we try to map Player II strategies. Let be a strategy for Player II in . This can be thought of as a map that takes Player I strategies for to Player II strategies for . (Player I’s first move is of the form , but actually is determined by . Likewise, is determined by and .) Given such a map, how are we supposed to create a strategy out of it?

Let’s think about the set of all possible runs of the game that can result if Player II uses the strategy . They are the sequences of the form . In order to be able to find a suitable image for , we need to be able to find a strategy such that every possible run of if Player II uses gives us a sequence of the form . In other words, we need Player II to have a winning strategy for this set of sequences. Note that we need to be able to do this for *every* strategy — that is, every function from Player I strategies to Player II strategies.

Now the one thing that matters about is the sequence and the one thing we know about that is that it is a sequence that can result if Player I plays the strategy . Since is arbitrary, we need to show that if is any set of sequences with the property that for every Player I strategy there is at least one that can result if Player I plays , then Player II has a winning strategy for . But the assumption we are making about is precisely that Player I does *not* have a winning strategy for . (Proof: if is any strategy for Player I, there is, by assumption, a sequence not in that can result if Player II plays appropriately.)

So we find that what we are asking for is precisely this: that if is any set for which Player I does not have a winning strategy, then Player II has a winning strategy for . (Here is playing the role of above.) But this is the statement that all sets are determined, otherwise known as the axiom of determinacy. And of course, if we have that, then we don’t need to go to any trouble to prove that all Borel sets are determined.

A small additional remark here is that, as I’ve sort of said but not fully spelt out, we could change the rules of the so that Player II plays not a strategy but just some sequence that is consistent with . In other words, we don’t actually care about : we just care about .

Yet another remark is that in the argument above, the payoff set played no role. That’s because the proof that the game is decided after the first move of each player is particularly simple: the entire sequence is decided after these moves, and therefore whatever set we take, we know after those moves which player is destined to win. In other words, the map above from strategies to strategies simultaneously lifts all games to clopen games. So it gives us a neat way of proving the axiom of determinacy, using … er … the axiom of determinacy.

In the previous example, we showed that if all games are determined, then all games can be lifted to clopen games, and indeed to games that are decided after the first move of each player.

What if we try to prove the much stronger result that every determined game can be lifted? This is stronger because we are not allowed to use the determinacy of *all* sets in the proof, as we did above in an absolutely critical way, but instead just the determinacy of the payoff set in the game we are trying to lift.

Let the game be as above. Then the initial data we have consists of , , and either a winning strategy for Player I or a winning strategy for Player II. We now have to devise an auxiliary game. I’m just going to write down the obvious thing, with no thought of whether it works — and then show that it doesn’t work.

Let’s define as follows. If Player I has a winning strategy for , then we pick one and let be an arbitrary run of . That is, is played on the tree of all paths in that are consistent with and all paths are considered wins for Player I. So this game is in fact decided before it even starts.

Does the lifting property hold? Well, there is only one possible strategy for Player I, since every point at even distance from the root has just one successor. The obvious strategy to map that to is , and then trivially every run of with Player I playing comes from a run of .

How about the other way round? A strategy for Player II in the game can be thought of as a sequence that is consistent with . What should we map that to? We need to map it to a strategy that guarantees that the resulting sequence will be . But that’s obviously out of the question: Player I can easily play in a way that doesn’t produce . (Note that in there is no reason for Player I to apply the strategy . Indeed, she doesn’t even need to apply a winning strategy.)

What went wrong here, when it was all looking so good for Player I? The rough answer is that the game restricts far too much what Player I can do and places no restrictions on Player II. That’s all very well when we are trying to show that Player I has a winning strategy for the set of sequences that can result from a run of , but when we try to show the same for Player II, the fact that Player I’s moves are so circumscribed in is a disaster: it means that it is easy for Player I to make moves in that couldn’t have come from a run of .

Suppose we try to imitate Martin’s proof that closed sets can be lifted, but show instead that *all* sets can be lifted. We define the auxiliary game as follows. Player I declares a quasistrategy . Player II then declares either an infinite sequence that’s compatible with and belongs to or a quasistrategy for the tree defined by such that all possible outcomes lie in . The difference between this and what we did before is that the sequence is infinite, rather than being a finite sequence with all its continuations in .

Actually, this is rather similar to the first attempt. The differences are that Player I declares a quasistrategy rather than a strategy, and instead of sequences that end up in , Player II plays quasistrategies that end up entirely in . But I think these will turn out to be minor variations, so I won’t discuss this after all — I’m almost certain that this is another example that works, but only if you assume the axiom of determinacy.

There is a way of thinking of determinacy that I find quite helpful. Let us think of the payoff set as a 2-colouring of , the set of infinite paths in . To make this more vivid, I’ll reword it in colouring terms, so let’s say that we have coloured the paths in with two colours, red and blue, the red paths being precisely the paths in .

A typical Ramsey theorem tells us that we can find a monochromatic substructure of some kind — that is, a substructure that is nice in some way and has all its elements of the same colour. Sometimes we consider *off-diagonal* Ramsey theorems, where we want either a red substructure of one kind or a blue substructure of another kind.

Let’s define a I-tree to be a subtree of that contains the root and has the property that each vertex at even distance from the root has precisely one successor in , whereas for each vertex of at odd distance from the root, the successors in are the same as the successors in . Every strategy for Player I defines a I-tree — the tree of all finite paths that can be obtained if Player I plays with the strategy . The strategy is a winning strategy if and only if all infinite paths in the corresponding I-tree are red.

Similarly, we can define II-trees, and once we have done so, the statement “ is determined” is equivalent to the statement, “Either there is a I-tree such that is red or there is a II-tree such that is blue.

The monochromatic structures we are looking at for this off-diagonal Ramsey theorem are particularly nice sets of paths.

There are quite strong parallels here with the infinite Ramsey theorem of Galvin and Prikry. Let us write for the set of all infinite subsets of . Then the Galvin-Prikry theorem says (or at least implies) that for every Borel subset of (in the product topology) there is an infinite subset such that either or . As with determinacy, to find any set that fails this theorem one needs to use the axiom of choice (or, to be more accurate, something like a non-principal ultrafilter, which is a strictly weaker assumption than the axiom of choice, but most naturally proved using the axiom of choice).

Going back to Martin’s proof itself, let’s try to understand what would go wrong if the players declared strategies in the places that they actually declare quasistrategies. We have sort of seen this — there are places in the proof where we need to deem a player to have played a certain quasistrategy. But was that just a convenience, or was it essential?

I’m considering the following auxiliary game. Player I’s first move is , where now is a *strategy*, and not just a quasistrategy. Player II either plays , where is a finite sequence consistent with , all of whose continuations lie in , or , where is a strategy inside the tree corresponding to .

The first thing to note here is that is effectively an infinite sequence . So let’s reword this to say that Player II plays , where is either a finite sequence with all its continuations in or an infinite sequence that lives in . In both cases, the sequence must be consistent with the strategy declared by Player I.

In the first case, the players must play along and after that are released from all obligations. In the second case, they must play along .

Now let be a strategy for Player I in this auxiliary game and let be its first move. What infinite sequences can arise? They are all sequences with one of the following two properties:

1. has an initial segment consistent with such that all continuations of lie in , and the rest of is consistent with the strategy , given the initial moves and ;

2. is consistent with and lies in .

If Player I has a winning strategy for the first class of sequence, then she plays it, and after reaching such that all continuations lie in , she does whatever dictates.

If Player I does not have a winning strategy for the first class of sequence, then in the Martin proof, she would deem Player II to be playing the canonical quasistrategy for the complement of that class. However, that option is not open to us if we aren’t allowed quasistrategies. So she has a problem.

But maybe it isn’t a huge problem, since in she will be required to play consistently with , whatever Player II does. Since is a strategy rather than a quasistrategy, we don’t have to make any choices, and therefore don’t have to “deem” Player II to have declared any particular sequence .

What I’m saying here is that in the second case Player I should simply play the strategy , but if Player II ever departs from the canonical quasistrategy and gets to a position where Player I can force a finite sequence of the first type, then Player I switches to doing that and continuing with . Otherwise, play continues for ever, consistently with and producing an element of . So Player I can deem Player II to have declared that element.

So it seems as though quasistrategies are not yet essential. But now let’s think about Player II. If is a strategy for Player II, then it defines a map from Player I strategies to finite sequences with continuations in and infinite sequences that live in , always consistent with . Let be the set of finite sequences that can arise. In Martin’s proof, we then consider the canonical quasistrategy for avoiding , if Player I has such a thing. Call this . Then Player II’s strategy for is roughly this. Assume that Player I is playing and do whatever dictates. If Player I ever departs from , then force a sequence in .

If we are forced to consider *strategies*, then the above doesn’t make sense: doesn’t dictate a response to a quasistrategy. So let’s just think about what sequences could possibly arise. For each strategy declared by Player I that doesn’t produce some , will give rise to an infinite sequence that’s consistent with .

I think that yet again we’re back in the situation where Player II does have a winning strategy for the appropriate set of sequences if we assume the axiom of determinacy, but otherwise doesn’t. To see that, just consider the case where . In this case, we never get the finite sequences — we just get for each strategy an infinite sequence consistent with , and we want to show that Player II has a winning strategy for the resulting set of sequences.

So I *think* the situation is that we need Player I to be able to play, at the very least, canonical quasistrategies for closed sets. And once that is permitted, I think it forces us to allow Player II to use quasistrategies as well. (I haven’t checked this, but at any rate the analysis for Player I above used in an important way the fact that was a strategy and not just a quasistrategy.)

This one’s a bit more of a stretch, but I find it quite useful. It takes a while to explain.

To begin with I want to consider a simple way of reformulating Ramsey statements that is sometimes useful. Let be a collection of subsets of a set . We call an *up-set* if whenever and we have . Sometimes up-sets are called *monotone* set systems. Given an up-set we can form an up-set by taking the set of all sets such that for every . We call such a set a *transversal* of . Note that , so this is a form of duality. It is trivial that . In the other direction, if , then no subset of can belong to (since is an up-set) and therefore is a transversal of that does not intersect .

Now let us take a typical Ramsey statement, such as van der Waerden’s theorem. The statement that for every 2-colouring of there is a monochromatic arithmetic progression of length can be reformulated as follows. Let be the set of all subsets of that contain an arithmetic progression of length . If no set in consists entirely of blue elements, then the set of red elements is a transversal of , and conversely. Therefore, we need to prove that every transversal of contains a progression of length — that is, belongs to .

This reformulation can be applied to all (sufficiently conventional) Ramsey statements. They tell us that every transversal of a certain up-set must contain a set of a certain kind.

Now let’s consider what it means to lift a game not just to a clopen game but to a game whose outcome is decided after the first move of each player. (Recall that we showed that every closed game can be lifted to a clopen game of this kind.) I want to concentrate particularly on the condition that there should be a map from strategies to strategies such that every run of consistent with should be the image of some run of consistent with .

I want to go further still and consider the case where the tree of consists of certain paths of the form , where is a path in (and and can be thought of as “extra information” of some kind). Here belongs to some set and belongs to some set .

I haven’t thought whether this is a genuine further restriction or whether it’s more or less a WLOG. But it’s the situation I want to think about.

After the first two moves have been played, the possible further moves can be thought of as a subtree of (which itself is the set of all descendants of ). Let us call this subtree and let . Then we can think of the extra information as the tree and each as a subtree of . So the way that works is this. Player I plays a subtree of , Player II plays a subtree of , and after that the game is played in the subtree .

I stress that I certainly don’t mean that Player I plays an *arbitrary* subtree of or that Player II plays an *arbitrary* subtree of . Rather, there will be some class of subtrees that Player I is allowed to choose from, and for each there will be some class of subtrees that Player II is allowed to choose from. Let us call a subtree that belongs to the class it needs to belong to *valid*.

A small remark here is that we don’t actually need to think about the tree . We could regard Player I as playing the set and Player II choosing a tree from that set. Indeed, sometimes it is clearer to think of things this way.

It will be convenient to adopt the following definition. If is an infinite rooted tree, then I’ll define a *I-subtree* of to be a subtree that contains the root such that every vertex at even distance from the root has exactly one successor in and every vertex at odd distance has the same successors in as it has in . (Equivalently, it is the tree of finite sequences that could result from Player I playing a certain strategy.) We’ll also define a II-tree in a similar way.

Now let us look at the lifting property. Let be a strategy for Player I for and let be its first move. Then is the subtree chosen by Player I (from a set of trees ) in which the rest of the game will take place.

What possible sequences can arise? Well, at the next move, Player II plays some , and will give Player I a strategy for playing in the tree . The set of sequences consistent with this strategy is an arbitrary I-subtree of . Or rather, it is a set of the form , where is a I-subtree of . And we need that however Player I chooses these I-subtrees , their union will contain a I-subtree of .

That is the same as saying the following. Let be a pruned subtree of such that for every , contains for some I-subtree of . Then contains a I-subtree of .

We can think of this as a weak Ramsey statement about the set of trees . Instead of saying that every transversal of this set contains a I-subtree of , we are assuming something stronger than the transversality property: we require that is not just non-empty, but contains a I-subtree of .

I won’t do it here, but if one translates this into a statement about red-blue colourings, it says something of the following flavour: either the blue set contains a structure of one kind or the red set contains *a large subset of* a structure of another kind. It is in that sense that it is a “weak” Ramsey theorem.

This may seem a bit artificial, but actually weak Ramsey statements can be useful, as I once found out myself, when to solve a problem in Banach spaces I formulated and proved a weak Ramsey statement that said something along the following lines: if you 2-colour the sequences (of a certain kind) in a Banach space, then there is a subspace such that all the sequences in that subspace are of one colour, or Player I has a winning strategy in a certain game for producing sequences of the other colour.

So far I’ve analysed the lifting property for Player I strategies. The analysis for Player II strategies is somewhat similar, but different enough that I should give the details.

A strategy for Player II in the game consists in selecting a way of choosing, for every valid subtree , a valid subtree and after that a strategy for playing the game defined by . (Again, I don’t really care what the payoff set is — strategies, quasistrategies, subtrees etc. are all defined without reference to the payoff set.) We can think of that strategy as a II-subtree of .

The property we need is this. However Player II chooses the subtrees , their union will contain a II-subtree of .

In a moment I’ll state as concisely as I can the two weak Ramsey properties that we require of our sets of valid subtrees in order to have a lift of anything at all. But before I do that, let me now comment about where payoff sets fit in. They add a huge extra constraint: if is the payoff set, then for every valid subtree we must have either or . So the question is whether we can combine this constraint with the weak Ramsey constraints. In the next section, I’ll give a couple of simple examples of classes of valid subtrees that satisfy the Ramsey constraint.

Let be a rooted pruned tree. Suppose we define an auxiliary tree as follows. Its first layer of vertices consists of labels that standa for subtrees of (where by subtree I mean pruned subtree that contains the root). Then each successor of is a label that stands for a subtree of . The descendants of each are given by a copy of .

We are interested in the following question. Under what conditions is it possible to map every strategy for to a strategy for such that every run of is the image (under the obvious map) of a run of ? The conclusion I reached was this.

1. For every Player I strategy to be mappable in this sense we need that for every the set system has the following weak Ramsey property: every tree such that contains a I-subtree of for every contains a I-subtree of .

2. For every Player II strategy to be mappable in this sense we need that if for each we pick some and a II-subtree of , then the union of the contains a II-subtree of .

Here are a couple of extreme examples to illustrate this. First, let’s take the case where the trees are all possible I-subtrees of and the trees are all possible infinite paths in . We have looked at this case already, but in a different language.

Trivially if is a subtree of and all the infinite paths in belong to , then contains a I-subtree of , since it actually equals .

As for the second property, if for each I-subtree we can find an infinite path that’s a subset of a tree , then Player I does not have a winning strategy for , which, if we allow ourselves the axiom of determinacy, tells us that Player II has a winning strategy for , which implies that contains a II-subtree of and gives us the required property.

So as we saw earlier, to show that this system of trees works requires the axiom of determinacy.

As a second example, let’s take the case where there is only one subtree and it’s the whole of , and where the subtrees are all II-subtrees of .

Now the premise of the first weak Ramsey property is that every II-subtree of contains a I-subtree that is a subset of a certain tree . But a I-subtree of a II-subtree is just an infinite path, so this says that for every . Determinacy gives us that contains a I-subtree, as required. Without determinacy we can’t conclude that it does.

The second property says in this case that if we choose a II-subtree of a II-subtree of , then it must contain a II-subtree of , which is trivial, since the only II-subtree of a given II-subtree is that II-subtree itself.

I have to admit that I had rather forgotten about that. I now want to think about how much difference it makes if we add it in. In particular, I’d like to think about the example where Player I chooses an arbitrary strategy and Player II chooses an arbitrary sequence consistent with that strategy — the game where we used the axiom of determinacy to prove the lifting property.

Given a Player II strategy for , here’s how we “defined” . We noted that gave us a map from Player I strategies (for ) to sequences compatible with those strategies, and that the axiom of determinacy implies that the resulting sequences must give us a set for which Player II has a winning strategy (since Player I trivially hasn’t got a winning strategy for the complement). The strategy depends only on the map , which depends only on what does for Player II’s first move. So it seems as though we are safe here.

That’s a relief. The situation for Player I’s strategies is even simpler. Recall that Player I just plays in the strategy that she declares in her first move. So once again, what does for the first moves depends only on what does for the first move.

In this section I want to do a rather simple exercise: I want to consider various potential monotonicity statements. Here once again are the properties I am interested in of the sets of subtrees .

1. For every the set system has the following property: every tree such that contains a I-subtree of for every contains a I-subtree of .

2. If for each we pick some and a II-subtree of , then the union of the contains a II-subtree of .

I want to answer questions like, “Does it help to have more sets ?” and “Does it make things easier if the trees are bigger?”

For property 1, since we require it to hold for every set , it is easier if there are fewer such sets. Also, if we add more trees to one of those sets, it makes the premise harder to satisfy, and therefore makes the conclusion more likely to hold. This again is helpful for property I.

The monotonicity with respect to the trees themselves is a subtler matter. (In fact, I wrote an earlier draft of this post where the subtlety escaped me. I’ve decided in this instance that it is too much to expect someone to read through a whole lot of mistaken text before I come clean and admit that it is mistaken. So what you are getting here is not my actual thought processes but a later tidying up. But had I been more careful in the first place, then these might have been my thought processes I suppose.) The question we are asking is this. For a fixed set , what makes it easier for to contain a I-subtree of and what makes it harder? Basically, it is easier if Player I has more moves available but *harder* if Player II has more moves available.

That observation motivates the definition of two orderings on trees. Let’s write if is a subtree of with the following property: given any path in , if it ever leaves , then the first point at which it does so is at an odd distance from the root — that is, it offers Player I a move in that she does not have in . Similarly, if the same is true but with “odd” replaced by “even” (or Player I replaced by Player II). I’ll call *I/II-larger than* if .

For property 1 to be more likely to hold, we want the premise to be less likely to hold, so we want to be less likely to contain a I-subtree of . Therefore, we prefer to be I-smaller and II-bigger.

Now let us turn attention to property 2. This time it is helpful if there are more sets . It is also helpful if there the sets are smaller, since that makes it harder to create a counterexample to property 2. And it is also helpful if II-subtrees are harder to come by, for which we prefer the trees to be II-smaller and I-bigger.

In every case, the monotonicity goes in opposite directions for properties 1 and 2. This is reassuring, since otherwise there would probably be some “collapse” where we would say something like, “Without loss of generality all the are equal to ,” and end up with a trivial definition.

Since putting up this post earlier today I have had the following thought. Every open set is a countable union of basic open sets, and every basic open set is also closed. Since basic open sets are clopen, they can trivially be lifted — to themselves. So why not start the “induction” at the basic open sets, and thereby avoid the need for the fairly complicated lemma that tells us that closed sets can be lifted to clopen sets?

It took me a while to realize the answer to this. It comes in the innocuous-looking final step of the argument for dealing with countable intersections. Recall that the technique was to find a *simultaneous* lifting (or rather -lifting) of all the sets to be intersected, so that they all become clopen. Then we argue that their intersection is at least closed, so it too can be lifted, by the complicated lemma. And that gives us our lift for the original intersection.

So the complicated lemma about lifting closed sets is essential at the point where we argue that an intersection of the clopen sets coming from the simultaneous lifting can be lifted.

I find this a useful point to highlight, since it shows that the lemma (lifting closed sets) is not just there to deal with the base case — it is driving the inductive step as well. So one can view the proof as a multiple (indeed, transfinitely multiple) application of that lemma.

]]>

We are in the middle of the proof that closed games can be lifted to clopen games. We have defined the game to which we will lift, and shown how to map Player I strategies for the auxiliary game to strategies for the original game . So the first thing to do in this post is to show how to map Player II strategies for to Player II strategies for .

Recall that I defined the game as follows. Let be the game (that is, the game is played on the infinite tree and is the set of infinite paths for which Player I wins). Then Player I begins by playing a pair , where is a quasistrategy for . Then Player II either plays , where is a finite sequence that is consistent with , all of whose continuations belong to , or he plays , where is a quasistrategy for the game restricted to the tree that corresponds to (which is the tree of all possible finite sequences that can result if Player I plays consistently with ), such that all runs consistent with and result in wins for Player I. In the first case, both players play along the sequence and then Player I continues playing consistently with . In the second case, Player I continues playing consistently with and Player II continues playing consistently with . Player I wins if and only if the eventual sequence belongs to .

Let be a Player II strategy for . We need to find a strategy , defined in such a way that what it does to sequences of length at most depends only on what does to sequences of length at most , such that every possible run of the game, if Player II plays using the strategy , is a sequence that can arise from .

In the previous post we ended up with a useful way of thinking about this. Let be the set of all sequences that could possibly arise if Player II uses the strategy . Then we need to prove that Player I does not have a winning strategy for obtaining a sequence outside .

At first this looks odd, since it doesn’t mention , but the point is that isn’t necessarily a winning strategy. The definition of lifting is set up in such a way that if we have a winning strategy in , then it must map to a winning strategy in . Therefore, we don’t have to think about this when verifying the lifting property.

Why is this necessary for what we need to prove? Well, if Player I plays a strategy that guarantees to produce a sequence outside , then, by definition, whatever strategy Player II plays, the resulting run of the game will not be one that results from a play of the game with Player II using the strategy .

An obvious way to go about proving that Player I does not have such a strategy is to prove that Player II has a winning strategy for producing a sequence *in* , and that is what we shall do.

What is the set ? Well, a Player II strategy for works as follows. If Player I plays a quasistrategy , then Player II must decide, based on that (and the initial point ) whether to play a pair of the form or a pair of the form , and having done that he must decide how to play consistently with the choice he has made. So is the set of all sequences such that there exists a quasistrategy (for ) such that is a valid move for Player I and one of the following two conditions holds. Either (i) dictates that Player II’s next move is , where is some initial segment of , and the rest of the sequence is played consistently with and what dictates, or (ii) dictates that Player II’s next move is , and that the rest of the sequence should be consistent with , and what dictates.

We run into a serious difficulty here. A natural approach is to define to be the set of all finite sequences that can result from case (i). That is, the set of all such that there exists an opening move in that would cause to play the move . (Here, is the second term of .) Then we can say what happens if Player II has a winning strategy for producing a sequence in , and what happens if he doesn’t have one.

But suppose he does have one. The idea would be to play the game in a way that guarantees that eventually such a sequence will be reached, after which he is guaranteed to win the game , and then do pretty much anything. He can then deem Player I to have played the move , where is a strategy that causes to respond with . But the “doing pretty much anything” part is not an option: Player II needs the rest of the sequence to be consistent with , and there is no need for Player I to continue to play consistently with this quasistrategy.

On spotting this difficulty, I realized that I had not properly understood the definition of the game . I have deliberately not gone back and changed the definition to the correct one, since these mistakes are useful for my purposes: they help to explain why the definition is as it is. The right definition of the game is in fact this.

As I’ve already said, Player I’s first move is of the form , where is a quasistrategy, and Player II’s first move is either of the form or of the form , where all continuations of belong to and all sequences consistent with and are in . However, if Player II plays a move of the first kind, then after helping to create the sequence , *Player I is not obliged to play consistently with* . A move of type releases Player I from that obligation. However, if Player II plays a move of type , then both players must play consistently with the quasistrategies that they have declared.

This adjustment to the definition of deals with the difficulty above: if Player II has a winning strategy for obtaining a sequence , then he can deem Player I to have played a sequence that would have provoked the response (given that the strategy is being used for ) and once the sequence has finished, as long as Player II does what tells him to do in the game (given the moves so far, and assuming that the first two moves were and ), the resulting sequence is indeed one that comes from a run of with the strategy being employed. The run is precisely the one that continues after with Player I playing in whatever she actually chooses to play in .

What if Player II does not have a winning strategy for obtaining a sequence in ? In that case, he must again imagine a game of going on on the side. In this game, he will want to deem Player I to have played a move of the form that will be followed (if he plays the strategy in ) by a move of the form . Then he will want Player I’s subsequent moves to be consistent with . Of course, they may not be, so if they aren’t, then he will want to get maximum advantage from this somehow.

There is a natural candidate for . If Player II has not got a winning strategy for sequences in , then (by the Gale-Stewart theorem — the result that open games are determined) Player I has a winning strategy for sequences not in . Better still, she has a *canonical quasistrategy* for this game, namely the quasistrategy “don’t move to a winning position for Player II”. This quasistrategy has a very useful maximality property that we shall exploit, which is that if Player I ever departs from it, then Player II immediately has a winning strategy. Roughly speaking, this means that Player II can deem Player I to be playing according to the canonical quasistrategy, and if she ever does a move that is inconsistent with that quasistrategy, then Player II can change his mind and deem Player I to have played a quasistrategy that provokes a sequence in .

Let me say that more precisely. If Player II does not have a winning strategy for the set , then let be the canonical quasistrategy for Player I in the game defined by (by which I mean the closed game in which Player I aims to create a sequence with no initial segment in ). For as long as Player I plays consistently with the quasistrategy , Player II will play as follows: play the game as if Player I has begun with the move (where is the move that Player I begins with in ) and play the corresponding points in . Note that the move will not permit Player II to play a pair of the form in , since has to be consistent with , which is a quasistrategy that explicitly *avoids* all possible sequences .

If at any stage, Player I plays a move that is *not* consistent with (and this could even be the very first move ), then Player II is in a winning position for creating a sequence in . But that’s great! Player II can revert to the original plan, get to a sequence , deem Player I to have played a move that would have provoked the move , and so on.

The point here is this. At some point, Player II has to settle on an initial move for Player I in the game . There is nothing to stop him making a decision about this and later changing his mind. However, he mustn’t do this infinitely many times, since then it is no longer clear that there is a quasistrategy that can be deemed to be part of Player I’s first move. But in the argument above, Player II changed his mind at most once, and was therefore OK.

I’ve more or less done so already, but let me spell out what happens if Player I *does* play consistently with . In that case, Player II plays a game of on the side, responding to the move and all subsequent moves that Player I makes in (which can be interpreted as moves in ). Then he plays the corresponding moves in (where “corresponding” means “the same” except in the case of the first move , which involves forgetting the quasistrategy and just playing the point ).

At the end of the previous post, I mentioned that we would need a generalization of the result that we have just finished proving, one that is amenable to diagonalizations. We will find that we want to build a single lifting out of a sequence of an infinite liftings, and to do that we need those liftings to “settle down”, just as if we want a sequence of 01-sequences to converge in the product topology, we need to be able to say, for every , that all the sequences from some point on agree in their first terms.

A -*lifting* of a game to a game is a lifting with the following additional properties, which basically say that is the same as for the first moves.

1. The first levels of the tree on which is played are identical to the first levels of the tree on which is played.

2. For every at distance at most from the root, .

3. If is a strategy for , then must do the same in as does in .

We have shown that every closed game has a -lifting to a clopen game. We actually need it to have a -lifting for every . This is a straightforward adaptation of what we have already proved, so I won’t say very much about it.

But I will say a little. The game is defined as follows. As before, the tree is almost the same of , but contains a little more information. However, this time the extra information is provided by the two players not on their first moves, but at moves and . (The case discussed earlier was the case .) Thus, after the two players have played , Player I plays a move of the form , where is a quasistrategy, and Player II plays a move either of the form , where is a finite sequence that begins with and has all its continuations in , or of the form , where is a quasistrategy that will guarantee that Player II loses, assuming that Player I plays consistently with . In the first case, the two players must play along and thereafter can play how they like, and in the second case they must play consistently with their quasistrategies.

The construction of the map taking strategies to strategies is basically the same as before, except that we play the first moves of the game before getting started.

The rough idea of how to prove that a countable union of “liftable” games is a liftable game is this. Let be a tree and let be the sets corresponding to a whole lot of games on that belong to Borel classes for which we have already proved determinacy. We then lift to a tree that makes clopen. The inverse images of the remaining sets under the continuous map that takes to belong to the same Borel classes as before (since these are preserved by inverse images under continuous maps). We now repeat the process for , , and so on.

The idea is to create a lifting for which all the games are simultaneously clopen. If we just had a finite union, that would be straightforward: we would just repeat the process times. The resulting lifting would give us a union of clopen sets, which would be clopen. (There are little details to check, such as that a composition of liftings is a lifting.) However, we have a countable union, so a sequence , where I’m using to mean “can be lifted to”. Therefore, we need to construct some kind of inverse limit of all these lifted games. That’s where -liftings come in.

In this section I’ll show how this inverse limit works. After that, the proof will be almost finished.

Suppose then that we have a sequence , where for each is an -lifting of . (Later we’ll need to add to all these s, but again I prefer to prove the initial case to keep the notation simpler and wave my hands when it comes to the general case.) We now want to build a game that is simultaneously a lift for all the .

Incidentally, the small fact I mentioned earlier — that compositions of lifts are lifts — is indeed easy to prove. Suppose, for example, that and is a strategy for . Then maps first to a strategy and then to a strategy . Given a run of where is played, there exists a run of where is played that maps to it, and then a run of where is played that maps to that.

By the definition of -liftings, we have a sequence of trees , where for each the first levels of are the same as the first levels of . We therefore have an obvious definition of a limiting tree : its first levels are given by the first levels of . Then the first levels of every tree from onwards will agree with the first levels of .

How do we map to one of the trees ? Well, let us write for the restriction of a tree to the first levels. Then we need a way of mapping to . This we do by mapping to using the “identity” mapping (i.e., using the fact that the first levels of are the same as the first levels of ) and then composing that with the maps that take to to etc. all the way down to . (I could summarize what I’ve just said by saying “take the inverse limit in the obvious way”. I probably don’t even need the words “in the obvious way” but I’m trying to get away with a slightly imprecise understanding of the notion of inverse limits.)

Now suppose we are given a strategy for . (Note that I don’t need to say anything about payoff sets — a strategy is defined just in terms of the tree that the game is played on, regardless of which outcomes are judged to be wins for which player.) We need to map it to a strategy for .

Let us write for the part of the strategy that says what does for the first moves of the game (and thus the first moves of the player whose strategy it is). Then it is straightforward to map to a strategy for the first moves of the game , whenever : it just maps to the “same” strategy. And what about when ? In that case we have a map that takes strategies for to strategies for . Moreover, what does for the first moves depends only on what does for the first moves (at last we see this property coming in in an absolutely critical way, though it is a pretty natural property anyway), so it makes sense to talk about .

So the natural definition for a strategy for built out of the strategy for is as follows. If you want to know what does for the first moves of , you take the strategy , consider it as a strategy for the first moves of , and then map it to a strategy for the first moves of using the map that lifts to (which is a composition of maps that lift to ).

In a similar way we can define a strategy for each game (just by regarding as the beginning of the sequence and doing exactly what we did above — a -lifting is in particular a -lifting).

Of course, there is something to check if we define it this way: we need to know that what we claim the strategy does for the first moves is consistent with what we claim it does for the first moves when . In other words, we need to check that is the strategy for the first moves of . But this is indeed the case: to work out we take the strategy , consider it as a strategy for the first moves of , and map that to a strategy for the first moves of . Instead of the first stage, we could take the strategy , consider it as a strategy for the first moves of , map that to a strategy for the first moves of , restrict it to a strategy for the first moves, and then map it to a strategy for the first moves of . The result would be the same as both the partial strategies that we want to show are equal.

The same argument shows that all the strategies are well-defined.

How about the lifting property? Let’s suppose we run the game according to this strategy , producing a path in . We would now like to find a path in consistent with that maps to .

Recall that the first levels of are the same as the first levels of , and that for each the lifting from to is a -lifting. Therefore, to find the path we seek, we simply lift to a path in that is consistent with the strategy , lift that to a path in that is consistent with the strategy , and so on. The first points in the path do not change after we have reached , so we end up with a well-defined limiting sequence . For every , the first terms of this sequence are consistent with the strategy , which tells us precisely that the sequence as a whole is consistent with .

The above argument may look slightly complicated, but it’s really the definitions that are slightly complicated: there isn’t any point where the proof takes a surprising turn.

An additional remark we need to make is that because we need -liftings in our inductive hypothesis, we need them in the conclusion as well. This can be achieved as follows. What we have shown is that if each is -lifted to , then every is lifted to . A simple modification to the argument shows that if each is -lifted to , then every is -lifted to .

The rest of the argument is now quite easy. Suppose we have a tree and sets all of which give rise to games that can be lifted to clopen games. Suppose also that we know that any inverse image under a continuous map of one of these sets has the same property. (This is true in particular if belongs to some Borel class for which we have proved that all sets in the class can be lifted appropriately.) We then construct a sequence of games as follows. First we 1-lift to in such a way that becomes a clopen set. All the other are mapped to continuous inverse images of themselves in the new tree . We then 2-lift to in a way that makes (or rather, its inverse image) clopen. Then we 3-lift to to make clopen, and so on. Finally, we let be the inverse limit discussed in the previous section.

What we have now is a lifting of the original game to a game such that all the inverse images of the sets are clopen. It follows that the union of these inverse images is at least open. We can therefore do one final lift to a game that makes the union clopen. We have therefore lifted the original union in to a clopen set, which is what we were trying to do.

I didn’t say that last sentence very well, and in particular was mixing up my types: do we lift games, or trees, or sets, or what? But I hope the meaning is clear: as I remarked earlier, liftings don’t depend on payoff sets, so we found a lifting of the original tree to a new tree — where lifting includes the fact that strategies for map to strategies for — such that the inverse image of in is a clopen set.

Again, we need something a bit stronger: for any we need to be able to -lift to some . But that is straightforward, given the remarks above.

I haven’t actually given the final tiny step of the argument, which is to show that the lifting property is preserved when you take complements. But that follows trivially from the fact that inverse images preserve complements and the fact that complements of clopen sets are clopen.

I was planning to end this series of posts here, but have subsequently had some further thoughts about liftings that shed a little light — for me at least — on the proof. I have collected those into a fifth post, which I will put up in a few days.

]]>

Recall the definition of a *pruned tree*. This is an infinite rooted tree such that from every vertex there is at least one directed infinite path. (Less formally, if you are walking away from the root, you never get stuck.) Given such a tree , we write for the set of infinite directed paths in . If we are working in , then the tree we will work with has finite sequences as its vertices, with each sequence joined to its extensions . Then .

If and are finite paths in (I’ll stop using the word “directed” — all paths are supposed to be directed away from the root) — then we write to mean that is an initial segment of and if is a proper initial segment. Also, if is a path that starts where another path ends, I’ll write for the concatenation of and . In paths are represented by integer sequences. In that case, if and , then denotes the sequence . That is, we view the path as starting at the sequence . This isn’t exactly consistent with the previous definition, but it is the only definition that makes sense, so no confusion should arise.

If is a tree and is a path that starts at the root, then denotes the tree of all such that .

We topologize by taking sets of the form as the basic open sets. Thus, a subset is open if for every there exists a finite path such that and for every infinite path such that .

Given a pruned tree and a “payoff set” , we can define a two-player game as follows. The players build up a path that starts at the root, taking turns to decide which the next vertex will be. Thus, if is the root, then the players take turns choosing subject to the condition that should be a path in for every . At the end of the game, the players have defined an infinite path . Player I wins if and otherwise Player II wins.

A game is called open/closed/Borel/etc. if and only if the payoff set is open/closed/Borel/etc. in .

A *strategy* for Player I is a function from even-length paths in that takes each such path to a path of length one greater such that . A strategy for Player II is similar but for odd-length paths.

It can be helpful to associate strategies with subtrees of . A strategy for Player I can be used to create a subtree of that consists of all vertices in that can be reached by a path consistent with , together with the edges induced by those vertices. The strategy is a winning strategy if and only if — that is, every infinite path consistent with the strategy is in the payoff set . (Everything I say about Player I can be easily adapted to corresponding statements about Player II — I won’t keep pointing this out.)

Which subtrees can be obtained in this way? A subtree is derived from some strategy for Player I if and only if it has the following property. Let us say that a vertex is a *successor* of a vertex if it is a neighbour of and is further away from the root. Then for each vertex of at an even distance from the root, there should be exactly one successor in , and for each vertex at an odd distance from the root, every successor should be in . To define a strategy from which is derived is easy. Given a vertex in at even distance from the root, let be the unique path that ends at and let be the unique path in of length one greater than . If is a path of even length not in , then can be defined arbitrarily, but this is not very important, since cannot be reached if Player I uses the strategy . Thus, there isn’t exactly a one-to-one correspondence between strategies and certain subtrees, but there is a correspondence between “the parts of strategies that actually matter” and those subtrees.

Note that if and are subtrees corresponding to strategies and for Player I and II, respectively, then is an infinite path. Indeeed, it is precisely the path that results if Player I plays and Player II plays . This path we denote by .

A *quasistrategy* on is a bit like a strategy except that it doesn’t determine moves uniquely. For example, a quasistrategy for Player I in a closed game is, “If you can move to a position that is not a winning position for Player II, then do so, and otherwise move arbitrarily.” (This is a winning quasistrategy if Player II does not have a winning strategy.) Subtrees are a convenient way of formalizing the notion of a quasistrategy: we can define a quasistrategy for Player I on to be any subtree such that for each vertex of at even distance from the root at least one successor belongs to , and for each vertex at odd distance from the root, every successor belongs to . (Again, it is not too important to know what happens at vertices that cannot be reached if the quasistrategy is applied, but it is easy to invent a definition that does specify this.)

Given a tree and a payoff set , let us write for the associated game. A *lifting* of is another game together with a map and a map that takes strategies for to strategies for with the following properties.

(i) For every , takes paths of length to paths of length . (In other words, if is a path in , then the images form a path in , which we will denote by . We will also use for the resulting map on *infinite* paths: that is, we will sometimes regard as a map from to .)

(ii) . (Here we are already regarding as a map from to .)

(iii) If is a strategy for , then is a strategy for the same player, and what does to sequences of length at most depends only on what does to sequences of length at most .

(iv) For every strategy for and every infinite path in that is consistent with there exists an infinite path in that is consistent with such that . (Informally, every run of the game in with being applied is the image of some run of the game in with being applied.)

We make a couple of remarks about this definition. Note first of all that the map is continuous, since if , then if is any path that agrees with for the first steps, will agree with for the first steps.

Note too that if is a winning strategy in the game , then is a winning strategy in the game . Indeed, if is a run of that is consistent with , then there must be a run of consistent with such that . Since is a winning strategy, will belong to or as appropriate (i.e., according to whether is a strategy for Player I or Player II), and therefore will belong to or as appropriate.

Our ultimate aim is going to be to lift an arbitrary Borel game to a clopen game . How easy is this likely to be? Are liftings difficult things to construct?

At first one might think that even if we completely ignore the map that takes strategies to strategies, if is a rather complicated Borel set it would be hard to find a continuous function such that is clopen. However, this is not as hard as it sounds: it is a reasonably straightforward exercise to show that every Borel set is a continuous image of a closed set. (The converse is not true. For example, the non-Borel set described in an earlier post, of all graphs on that contain an infinite clique, is, more or less by definition, a continuous image of a closed set.)

Once we have chosen the function , it is tempting to think that there are too few constraints on the function . However, the constraint that what does to sequences of length at most depends only on what does to sequences of length at most restricts the possibilities for quite considerably.

Let’s write for the set of all infinite subtrees that satisfy the conditions described earlier: that each vertex at an even distance from the root is joined to exactly one successor, while each vertex at an odd distance is joined to all its successors. Let us call such trees *strategic*. Let us also call a finite tree strategic if all its leaves are at the same distance from the root and all vertices apart from the leaves satisfy the conditions of an infinite strategic tree. Thus, finite strategic trees are restrictions of infinite strategic trees to some depth . Let us write for the set of all such trees. If is a function that satisfies condition (iii) above, then for each it induces a function . The definition of is obvious: given a strategic tree in of depth , extend it to an arbitrary tree and define to be the restriction to the first levels of . This is well-defined, by condition (iii).

Suppose now that we have a collection of maps with . If they are all derived from some map that satisfies condition (iii), then they have to satisfy a simple compatibility condition. Let us write for the restriction of a tree to its first levels. Then we require that, for any and any strategic tree , .

Conversely, if we are given such a sequence of maps we can define a map that gives rise to all of them in a natural way. (It is clearly some kind of inverse limit, but I haven’t thought carefully about exactly what the category is.) Given a strategic tree , we define to be the union of the trees . The compatibility conditions ensure that the trees are nested, and since they are strategic it follows that their union is strategic.

The fact that is an inverse limit shows that it is very far from an arbitrary function from strategies to strategies (or strategic trees to strategic trees). Let us think about how we might build up a sequence satisfying the compatibility conditions. That is, if we have defined already, what more do we need to specify in order to define ?

To begin at the beginning, the first move of a strategy is just a choice of a successor of the root. Let us write and for the “level-” vertices in the trees and , respectively. Then can be thought of as a map from to .

Once we have decided on , the map is also decided, since Player I has no decisions to make in the second move. So the next choice to make is of . Informally, is a map from “how Player I makes her first two moves in ” to “how Player I makes her first two moves in “. More formally, it is a map from strategic trees of depth 3 to strategic trees of depth 3. To specify a strategic tree of depth 3 in we have to specify a successor of the root, and for each of its successors we have to specify a further successor. So we can split up the domain of according to the choice of . For each fixed , the specification of a tree is a function defined on the set of successors of , which takes each successor to one of *its* successors. Meanwhile, the codomain of this restriction of can be thought of as a function defined on the set of successors of , which again takes each successor to one of its successors. So this restriction of is an arbitrary function from a huge set of functions to a huge set of functions. And we then need to choose such functions for every .

Already we see that we are dealing with a vast and unstructured set. Should we worry about this? (The kind of worry I am talking about is that we have so much flexibility that it is hard to see why the theorem we are trying to prove isn’t trivial.) I think not, for the following reason. If we visualize the tree laid out “horizontally” with the root at the left and the levels living in vertical lines, then all the lack of structure is “vertical”, which is somehow appropriate because the set of successors of any given vertex is just a set and has no structure. On the other hand, trees have a lot of “horizontal structure” (for example, the topology is “defined horizontally”), and this is where the restrictions on make themselves felt.

Of course, condition (iii) is not the only condition that a lifting has to satisfy: there is also the all-important condition (iv). However, I will save a similar discussion of that condition for later.

Now at long last comes the definition of the lifting we shall use. Let be a pruned tree, let be a closed set and let be the game . We define a tree as follows. Its vertices are certain sequences of the form

where is a path in (I should put the root before everything, but I take it as implicit that every path starts at the root and only the subsequent vertices need to be specified) and and are extra pieces of information that I will describe in a moment. The reason I said “certain sequences” above is that the information and will restrict what is allowed for the .

The map is the map that takes to (or to if you prefer to think of it like that — there is a one-to-one correspondence between paths in and their end vertices). In other words, simply forgets the extra information.

What is this “extra information”? In the case of , it is easy: Player I has to provide not just a successor of the root in but also a quasistrategy for the game that results in playing . We can think of it as follows: Player I plays but in addition says, “This is my quasistrategy.” If Player I plays the move as her first move, then all subsequent choices of must result in a sequence compatible with . In terms of trees, the subtree of with as its root projects to the subtree of that corresponds to the quasistrategy and starting move .

The definition of is less obvious, since Player II has two options. He can either provide a sequence of length that is consistent with , such that all extensions of live in , or a quasistrategy for the game played in the tree that is derived from (that is, the tree given by all possible runs of the game that are consistent with ). The quasistrategy must have the property that every infinite sequence that can result from it belongs to . In other words, it must be a *losing* quasistrategy (in the strong sense that it guarantees a loss) for Player II.

If Player II plays a move of the form , then both players must play along the sequence until they reach the end of it, and then they must continue consistently with . If Player II plays a move of the form , then both players must play consistently with the quasistrategies and . (Again, the force of this “must” is that the game is taking place inside a certain subtree. In other words, it’s not that they lose if they don’t play consistently with their quasistrategies: rather, they *cannot but* play consistently with their quasistrategies.)

Since is such a simple map, it is easy to say who wins the game : Player I wins if the resulting infinite sequence, ignoring the extra information, belongs to , and otherwise Player II wins.

Another easy observation is that this game is clopen. To see that it is closed, observe that is obviously continuous and is closed. To see that it is open, note that if Player II plays a move of the form , then since the two players continue along and all continuations of belong to , Player II wins the game in this case, whereas if Player II plays a move of the form , where is a guaranteed-losing quasistrategy, then by definition the sequence that results at the end of the game belongs to and Player I wins. So the result of the game is entirely decided after Player II’s first move.

The fact that the game is decided after Player II’s first move suggests that we have not actually used the assumption that is closed, since we could ignore the first proof that is a closed game and just use the fact that it is decided after Player II’s first move. However, the assumption that is closed has sneaked in under the radar via the unspoken assertion that Player II *can* play a move of one of the two types above. Suppose Player I were to play a guaranteed-losing quasistrategy . Then Player II would not be able to counter with a move of the second type, so would be forced to find a finite sequence all of whose extensions live in . This may not be possible if isn’t closed.

This subsection isn’t strictly necessary, but it seems a pretty sensible thing to think about.

If Player I has a winning strategy for , then an obvious first move in is the move , where is the first move in determined by .

This move automatically results in a win for Player I in , since there is no sequence consistent with , all of whose extensions lie in . (If there were, then would by definition not be a winning strategy.) So Player II is forced to play a move of the second type and therefore loses.

If Player I plays a move , where is not a winning quasistrategy (which in particular happens automatically if Player I does not *have* a winning quasistrategy), then there must be some infinite sequence consistent with that lives in . Since is open, there is some initial segment of , all of whose continuations lie in . If Player II plays , then the two players continue to the end of , at which point the resulting sequence is guaranteed to belong to , and Player II wins.

What we have observed is that if Player I has a winning strategy for , then she has a winning strategy for , and if she does *not* have a winning strategy for , then Player II has a winning strategy for . The interesting thing to note here is that I did not deduce the latter statement from the determinacy of . That is, I didn’t say, “If Player I doesn’t have a winning strategy for , then, since is closed, Player II must have a winning strategy for . We use that to create a winning strategy for as follows.” Rather, I *directly* defined a winning strategy for Player II in using the fact that is closed and the quasistrategy is not winning.

What we are trying to do, however, is the reverse. We want to show that if Player I has a winning strategy for , then at least somebody has a winning strategy for , and likewise for Player II. (As I write this, I haven’t yet digested the proof enough to know whether the same player has a winning strategy for , but my impression is that that isn’t necessarily the case. We shall soon see. [Added later: I now see that that was a silly remark, since I've already observed that the lifting property ensures that if is a winning strategy for then is a winning strategy for for the same player.])

What is a strategy for Player I like in the game ? It consists of an initial move , where is a quasistrategy for , and then a strategy for playing the remainder of the game consistently with not just but also with either the sequence or the quasistrategy given by Player II in his first move.

How can we use this for creating a strategy for ? An obvious first thought is that the strategy should be some strategy that is consistent with the quasistrategy . (The meaning of “consistent with” is I hope obvious.) Does that work? Certainly, what does to sequences of length at most depends only on what does to sequences of length at most , since it depends only on what does in the very first move.

What about the lifting property? Suppose that is a sequence consistent with . Is there some play of consistent with that yields the same sequence?

There are two cases to consider here: the case and the case .

If , then pick a strategy for Player II such that . If Player II plays the move in , then the result will be that for the remainder of the game the players will produce the sequence , just as required.

If , then choose an initial segment of of even length such that all continuations of belong to and let Player II play the move in . Then the sequence is consistent with and , so again it results from a play of .

So it seems that creating a strategy from a strategy and obtaining the lifting property is almost trivial. The problem is that in the actual proof something more complicated is done, so the above argument has to be wrong somehow. But how? It took me about a day (not of continuous thought, I hasten to add, and not writing anything down, but a day nevertheless) to see my stupid mistake. I’m leaving the above wrong argument in, since I think it helps to understand why the right argument is as it is. You may prefer to test your understanding by working out for yourself what is wrong with the argument rather than just reading my explanation below.

The problem is that to define the strategy I ignored most of the strategy, so it wasn’t actually true that I obtained the lifting property. Recall that a strategy for Player I consists of two things: a first move , where is some quasistrategy, and then a strategy for playing consistently not just with but also with Player II’s extra information — either a sequence that must be followed or a quasistrategy . So if in the game Player I just plays an arbitrary strategy that is consistent with , there is no reason to think that this will result from some play of with the strategy “Start with and continue with the strategy or .” We need to depend on somehow.

Let’s have another try. (I admit that for this I am just copying from an existing account of the proof.) Suppose that Player I’s first move in the game (according to a given strategy ) is , where is a quasistrategy for . Now consider the game restricted to . That is, we associate with a tree in the manner described earlier, and the payoff set is . This game can be thought of as with the restriction that Player I is required to play consistently with the quasistrategy .

Since is a closed set, so is in this restricted game. Therefore the game is determined. However, there is a twist. We shall ask for Player I to play the *open* game with payoff set , which is also determined. Or rather, we shall insist on the following: if Player I has a winning strategy for getting into the open set , then she must play according to such a strategy.

This may seem a rather bizarre thing for Player I to do: the reason for it is to force the game to reach a sequence , all of whose continuations belong to , which is good for us because we can then deem Player II to have played the sequence in his first move in the game .

We could perhaps summarize this by saying that in the case where Player I has a winning strategy for the “reverse game”, she is not in a very good position in the game , so instead she plays in a way that will ensure that Player II wins, but will also ensure that the run of the game lifts properly.

The next thing I don’t understand is this. To prove that is determined, we use the fact that is determined. The way we use that is to show that a winning strategy for maps to a winning strategy for for the same player. So it would seem that we don’t need a map from *arbitrary* strategies to strategies, but just a map from *winning* strategies to winning strategies.

But a winning strategy for Player I for must start with a move such that is a winning quasistrategy for , or else Player II can play a move of the type , where is a finite sequence, consistent with , all of whose continuations lie in . And if Player I starts with a winning quasistrategy, and if is the tree defined above, we have that , so Player I cannot possibly have a winning strategy for producing a finite sequence with all its continuations in . So that complicated case would appear not to arise, or at least not to arise in any relevant way.

I can’t see anything wrong with this simplification, but I have a guess about why we would be ill advised to make it. The reason is that all we are considering at the moment is the base case of the induction. Later on, we are going to have to look at countable intersections and countable unions. It seems at least possible (and in fact very likely) that we will have to consider images of non-winning strategies at that stage, for example if some of the sets are wins for Player I and some are wins for Player II. So let us persist with the not yet obviously necessary task of constructing the map that is defined on *arbitrary* strategies.

So far we have defined what does in the case that Player I has a winning strategy for arriving in , where is the tree that corresponds to the quasistrategy in Player I’s first move . Or rather, we have said what the important part is. Let me briefly describe this case in full, repeating a few things I’ve already said.

If is Player I’s strategy for , then the strategy is defined as follows. Let be the first move required by and let be the tree corresponding to the quasistrategy . If Player I has a winning strategy for the game , then she must choose such a strategy and play it. After a point has been reached where all continuations of the current sequence lie in , Player I must continue with whatever strategy dictates in the case that Player II plays the sequence as his first move.

If Player I does *not* have a winning strategy for the game , then Player II must have a winning strategy for this game, since it is open (in ) and therefore determined. In fact, we know that the quasistrategy “always move to a position that is not a winning position for Player I” is a winning quasistrategy for Player II. Let be this quasistrategy. Player I’s strategy for is as follows. For as long as Player II plays consistently with the quasistrategy , she plays whatever dictates if Player II’s first move is . If Player II ever departs from , then he moves to a losing position for the game , that is, a winning position for Player I in this game. In that case, Player I goes back to the previous approach: that is, choosing a winning strategy for that game (given the current position), playing it until a finite sequence is reached, all of whose continuations are in , and continuing as dictates.

This post has passed the 5000-word limit, but I think I should finish it with a quick verification that the all-important lifting property holds for this strategy. But before I do that, let me try to describe in a less formal way what does to Player I strategies for .

The best way to think about it (or rather, the way I find helpful at the time of writing) is to forget all about winning and losing and just concentrate on the lifting property. That is, if we have a strategy for Player I, we don’t try to map it to a strategy that will maximize Player I’s chances of winning. *We don’t care* about that, since all we need is that if is a *winning* strategy for Player I, then the lifting property will *guarantee* that is a winning strategy for Player I in the game . So instead, we focus entirely on the lifting property.

Once we think about things this way, the definition writes itself to some extent. All we have to do is work out what sequences in can possibly arise during runs of the strategy and devise a strategy for Player I for creating one of those sequences. In other words, we end up playing a different game — the one where Player I’s aim is to create what one might call a -compatible sequence.

What are these sequences? There are two kinds, according to the kind of move Player II makes. Again let be Player I’s first move as dictated by . One kind is a sequence consistent with that has an initial segment , all of whose continuations belong to and that continues according to the strategy . The other kind is a sequence that could result if Player II plays some quasistrategy against that always yields a sequence in and Player I does what dictates in response to that quasistrategy.

Can Player I force the game to yield a sequence of such a kind? Clearly yes if she has a winning strategy for producing the requisite initial segments . If she doesn’t, then we would like to find a suitable quasistrategy for Player II, and the most natural one (sometimes called the canonical quasistrategy) is the “avoid losing if you can” quasistrategy that we chose.

As a matter of fact, I think I’ll end the post here, since with the above explanation it becomes clear that (not too surprisingly) the strategy has been defined in a way that makes absolutely sure that the lifting property holds. So the lifting property holds. I don’t feel the need for a more formal proof.

It remains to say what does to strategies for Player II and then to discuss countable unions and intersections. It turns out that the countable unions and intersections are easier to deal with than the base step of the induction (that is, the statement that closed games can be lifted to clopen games). However, there is one aspect of the proof that I have not yet mentioned, since I didn’t want it to be a distraction. It’s that we actually need a very slight further strengthening of the inductive hypothesis. We need not just that the game lifts to a clopen game but that for every the game lifts to a clopen game in such a way that the first levels of the tree are the same as the first levels of the tree . The modifications needed are completely straightforward (roughly speaking you don’t introduce the extra information until the first moves are complete) but complicate the notation. So I decided to present the case , which contains the main idea.

The reason for needing the result for general is that that enables us to do a diagonal construction when we come to look at countable unions and intersections.

]]>

I also commented that an intersection of two determined sets does not have to be determined, which suggests that in order to prove that all Borel sets are determined, we will need to find a clever inductive hypothesis. This hypothesis should be of the form, “All Borel sets of index less than have property P,” where having property P implies that you are determined, and it is also preserved when you take countable intersections and unions.

Since the property of being determined is quite a strange property, it seems rather unlikely that we will be able to find a much simpler property that does the job. So it is natural to look for a property that is itself related to games and determinacy. But what might such a property be?

I have not come anywhere near the level of understanding where I can make the next remark not seem to be a slight jump — that is, in some sense the “obvious” thing to think. However, I think that once one has seen it, it is a fairly natural jump. The idea is to prove that Borel games are determined by showing that they can be “lifted” to simpler games where we know how to prove determinacy. With that vague plan in the back of our minds, let us think about how determinacy of one game could imply determinacy of another.

I should warn anyone who is reading this, that most of the post is going to be a bit of a mess, and I will end up covering very little ground. If this bothers you, then you might like to skim it and wait for the third post, where the presentation is more conventional. What I could do at this point is look at an account of the proof on the internet, copy out the correct, rather complicated definition of “lifting”, and go through the somewhat complicated verification that it works. However, I also want to give some idea of why the definition is as it is. More precisely, I want to show how certain aspects of it can be thought of as arising from the application of a very general technique for constructing mathematical objects with certain properties. (In this case, the object I’m referring to is the method of “lifting” one game to another, and the properties are basically that the method should allow us to deduce the determinacy of the original game from the determinacy of the lifted game.)

One technique for finding mathematical objects with certain properties is pure guesswork: you just keep trying things until you chance on one that does the job.

Sometimes — and particularly if you know that the object must belong to some class of highly structured objects that you know to be small — this approach is a good one. But often it isn’t at all good: there are just too many things to try and the conditions are too complicated for it to be realistic to hope to chance on a construction that works.

A more systematic approach is to think of the task as something a bit like solving an equation. If you need to find a number such that , you don’t guess, but instead you systematically simplify the conditions. You say to yourself, “Well, will have to be 25, but in that case we know that