When the announcement was made a few hours earlier, my knowledge of Subhash Khot could be summarized as follows.

- He’s the person who formulated the unique games conjecture.
- I’ve been to a few talks on that in the past, including at least one by him, and there have been times in my life when I have briefly understood what it says.
- It’s a hardness conjecture that is a lot stronger than the assertion that PNP, and therefore a lot less obviously true.

What I hoped to get out of the laudatio was a return to the position of understanding what it says, and also some appreciation of what was so good about Khot’s work. Anybody can make a conjecture, but one doesn’t usually win a major prize for it. But sometimes a conjecture is so far from obvious, or requires such insight to formulate, or has such an importance on a field, that it is at least as big an achievement as proving a major theorem: the Birch–Swinnerton-Dyer conjecture and the various conjectures of Langlands are two obvious examples.

The unique games conjecture starts with a problem at the intersection of combinatorics and linear algebra.

Suppose you are given a collection of linear equations over the field . Then you can use Gaussian elimination to determine whether or not they have a solution. Now suppose that you find out that they do *not* have a solution. Then something you might consider doing is looking for an assignment to the variables that solves as many of the equations as possible. If , then a random assignment will solve on average half the equations, so it must be possible to solve at least half the equations. So the interesting thing is to do better than 50%. A famous result of Johan Håstad states that this cannot be done, even when each equation involves just three variables. (Actually, that restriction to three variables is not the surprising aspect — there are many situations where doing something for 2 is easy and the difficulty kicks in at 3. For example, it is easy to determine whether a graph is 2-colourable — you just start at a vertex, colour all its neighbours differently, etc. etc., and since all moves are forced apart from when you start again at a new connected component, if the process doesn’t yield a colouring then you know there isn’t one — but NP-hard to determine whether it is 3-colourable.)

More precisely, Håstad’s result says that for any fixed , if there were a polynomial-time algorithm that could tell you whether it was possible to satisfy at least a proportion of a collection of linear equations over (each equation involving three variables), then P would equal NP. His proof relies on one of the big highlights of theoretical computer science: the PCP theorem.

The unique games conjecture also concerns maximizing the number of linear equations you can solve, but this time we work mod and the equations are very special: they take the form .

To get a little intuition about this, I suppose one should do something I haven’t done until this very moment, and think about how one might go about finding a good algorithm for solving as many equations of this type from some collection as possible. An obvious observation is that once we’ve chosen , the value of is determined if we want to solve the equation . And that may well determine another variable, and so on. It feels natural to think of these equations as a labelled directed graph with the variables as vertices and with an edge from to labelled if the above equation is present in the system. Then following the implications of a choice of variables is closely related to exploring the component of that vertex in the graph. However, since our aim is to solve as many equations as possible, rather than all of them, we have the option of removing edges to make our task easier, though we want to remove as few edges as possible.

Maybe those few remarks will make it seem reasonably natural that the unique games conjecture can be connected with something called the *max cut problem*. This is the problem of finding a partition of the vertices of a graph into two sets such that the number of edges from one side to the other is as big as possible.

Actually, while browsing some slides of Håstad, I’ve just seen the following connection, which seems worth mentioning. If and all the equal 1, then if and only if the variables and get different assignments. So in this case, solving as many equations as possible is precisely the same as the max cut problem.

However, before we get too carried away with this, let me say what the unique games conjecture actually says. Apparently it has been reformulated a few times, and this version comes from 2004, whereas the original version was 2002. It says that even if 99% of the equations (of the form over ) can be simultaneously satisfied, then it is still NP hard to determine whether 1% of them can be simultaneously satisfied. Note that it is important to allow to be large here, since the random approach gives you a proportion straight away. Also, I think 99% and 1% are a friendly way of saying and for an arbitrary fixed .

In case the statement isn’t clear, let me put it slightly more formally. The unique games conjecture says the following. Suppose that for some there exists a polynomial-time algorithm that outputs YES if a proportion of the equations can be solved simultaneously and NO if it is impossible to solve more than a proportion of them, with no requirements on what the algorithm should output if the maximum proportion lies between and . Then P=NP.

At this point I should explain why the conjecture is called the unique games conjecture. But I’m not going to because I don’t know. I’ve been told a couple of times, but it never stays in my head, and when I do get told, I am also told that the name is something of a historical accident, since the later reformulations have nothing to do with games. So I think the name is best thought of as a strange type of label whose role is simply to identify the conjecture and not to describe it.

To give an idea of why the UGC is important, Arora took us back to an important paper of Goemans and Williamson from 1993 concerning the max cut problem. The simple random approach tells us that we can find a partition such that the size of the resulting cut is at least half the number of edges in the graph, since each edge has a 50% chance of joining a vertex in one half to a vertex in the other half. (Incidentally, there are standard “derandomization” techniques for converting observations like this into algorithms for finding the cuts. This is another beautiful idea from theoretical computer science, but it’s been around for long enough that people have got used to it.)

Goemans and Williamson were the first people to go beyond 50%. They used semidefinite programming to devise an algorithm that could find a cut for which the number of edges was at least 0.878 times the size of the max cut. I don’t know what that 0.878 really is — presumably some irrational number that came out of the proof — but it was sufficiently unnatural looking that there was a widespread belief that the bound would in due course be improved further. However, a check on that belief was given in 2004 by Khot, Kindler, Mossel and O’Donnell and in 2005 by Mossel, O’Donnell and Oleskiewicz (how they all contributed to the result I don’t know), who showed the very surprising result that if UGC is true, then the Goemans-Williamson bound is optimal. From what I understand, the proof is a lot more than just a clever observation that max cut can be reduced to unique games. If you don’t believe me, then try to explain to yourself how the constant 0.878 can arise in a simple way from a conjecture that involves only the constants “nearly 0″ and “nearly 1″.

In general, it turns out that UGC implies sharp thresholds for approximability for many problems. What this means is that there is some threshold, below which you can do what you want with a polynomial-time algorithm and above which doing what you want is NP hard. (So in the max cut example the threshold is 0.878: getting smaller than that proportion can be done in polynomial time, and getting above that proportion is NP hard — at least if you believe UGC.)

Almost as interesting is that the thresholds predicted by UGC all come from rather standard techniques such as semidefinite programming and linear programming. So in some sense it is telling us not just that a certain *bound* is best possible but that a certain *technique* is best possible. To put it a bit crudely and inaccurately, it’s saying that for one of these problems, the best you can do with semidefinite programming is the best you can do full stop.

Arora said something even stronger that I haven’t properly understood, but I reproduce it for completeness. Apparently UGC even tells us that the failure of a standard algorithm to beat the threshold *on a single instance* implies that no algorithm can do better. I suppose that must mean that one can choose a clever instance in such a way that if the standard algorithm succeeds with that instance, then that fact can be converted into a machine for solving arbitrary instances of UGC. How you get from one instance of one problem to lots of instances of another is mysterious to me, but Arora did say that this result came as a big surprise.

There were a couple of other things that Arora said at the end of his talk to explain why Khot’s work was important. Apparently while the UGC is just a conjecture, and not even a conjecture that is confidently believed to be true (indeed, if you want to become famous, then it may be worth trying your hand at finding an efficient algorithm for it, since there seems to be a non-negligible chance that such an algorithm exists), it has led to a number of non-obvious predictions that have then been proved unconditionally.

Soon after Arora’s laudatio, Khot himself gave a talk. This was an odd piece of scheduling, since there was necessarily a considerable overlap between the two talks (in their content, that is). I’ll end by mentioning a reformulation of UGC that Khot talked about and Arora didn’t.

A very important concept in graph theory is that of *expansion*. Loosely speaking, a graph is called an expander if for any (not too large) set of vertices, there are many edges from that set to its complement. More precisely, if is a -regular graph and is a set of vertices, then we define the expansion of to be the number of edges leaving divided by (the latter being the most such edges there could possibly be). Another way of looking at this is that you pick a random point and a random neighbour of , and define the expansion of to be the probability that is not in .

The expansion of the graph as a whole is the minimum expansion over all subsets of size at most (where is the number of vertices of ). If this quantity is high, it is saying that is “highly interconnected”.

Khot is interested in *small-set* expansion. That is, he picks a small and takes the minimum over sets of size at most rather than at most .

The precise reformulation I’m about to give is not in fact the one that Khot gave but rather a small modification that Boaz Barak, another well-known theoretical computer scientist, gave in his invited lecture a day later. The unique games conjecture is equivalent to the assertion that it is NP hard to distinguish between the following two classes of graphs.

- Graphs where there exists a set of size at most with small expansion.
- Graphs where every set of size at most has very big expansion.

I think for the latter one can take the expansion to be at least 1/2 for each such set, whereas for the former it is at most for some small that you can probably choose.

What is interesting here is that for ordinary expansion there is a simple characterization in terms of the size of the second largest eigenvalue of the adjacency matrix. Since eigenvalues can be approximated efficiently, there is an efficient method for determining whether a graph is an expander. UGC is equivalent to saying that when the sets get small, their expansion properties can “hide” in the graph in a rather strong way: you can’t tell the difference between a graph that has very good small-set expansion and a graph where there’s a set that fails very badly.

I had lunch with Boaz Barak on one of the days of the congress, so I asked him whether he believed UGC. He gave me a very interesting answer (a special case of the more general proposition that Boaz Barak has a lot of very interesting things to say about complexity), which I have unfortunately mostly forgotten. However, my rapidly fading memory is that he would like it to be true, because it would be a beautiful description of the boundary of what algorithms can do, but thinks it may very well be false. He thought that one possibility was that solving the problems that UGC says are NP hard is not in fact NP hard, but not possible in polynomial time either. It is perfectly possible for a problem to be of intermediate difficulty.

Although it wouldn’t directly contradict NP hardness, it would be very interesting to find an algorithm that solved the small-set expansion problem in a time that was only modestly superpolynomial: something like , say. That would probably get you an invitation to speak at an ICM.

]]>

The most concrete thing I remember (without being 100% sure I’ve got it right) is that one of Mirzakhani’s major results concerns counting closed geodesics in Riemann surfaces. A geodesic is roughly speaking a curve that feels like a straight line to an inhabitant of the surface. Another way of putting it is that if you take two points that are close together on a geodesic, then the part of the geodesic between those points is the shortest curve that joins those two points. (Hmm, on writing that I feel that I’ve made an elementary mistake of exposition, in that I have assumed that you know what a Riemann surface is, and then gone to a little trouble to say what a geodesic is, when not many people will know the former without also knowing the latter. To atone for that, let me add a link to the Wikipedia article on Riemann surfaces, though I’m afraid that article is not much good for the beginner. A beginner’s definition, not precise at all but perhaps adequate for the purposes of reading this post, is that a Riemann surface is a surface like a sphere or a torus, but with some very important extra structure that comes from the fact that each little patch of surface looks like a little patch of the complex plane.)

If you follow your nose inside a Riemann surface, then sometimes you get back to where you started and are pointing in the same direction. In that case, you follow your original path all over again and the geodesic is called *closed*. But sometimes that doesn’t happen.

We can further classify closed geodesics into two types: those that cross themselves and those that don’t. The ones that don’t are called *simple*. An example of a simple closed geodesic is a great circle on the surface of a sphere. Apparently, the problem of counting closed geodesics was pretty much solved, but the problem of counting *simple* closed geodesics was significantly harder. It is this problem that Mirzakhani solved. (I’m not quite sure what “solved” meant here — perhaps her work means that if someone gives you a Riemann surface, you can tell them how many simple closed geodesics it contains.)

The more I write, the more I realize that the counting must be up to some kind of equivalence, since otherwise it seems to me that there will almost certainly either be no simple closed geodesics or uncountably many. But I’ll have to wait to look at my notes to get more precise about that.

The other main thing I remember from the talk is that moduli spaces were a very important part of Mirzakhani’s work, which provided another nice thematic connection between the work of different medallists. Just as Avila studied whole families of dynamical systems, a moduli space is a whole family of Riemann surfaces. And in both cases the family is far more than merely a *set* of objects: it is a set *with geometrical structure*. For example, if you take all interval exchange maps that chop into five parts and permute them in a certain specified way, then each one is uniquely determined by the end points of the intervals other than and . So we can naturally associate with each one an element of the set

(Those include some degenerate examples.) This is a polyhedral subset of , so it has nice geometrical, topological and measure-theoretic structure, which allows one to talk about almost all interval exchange maps, or nowhere dense sets of interval exchange maps, and so on.

An example that people often give to demonstrate what a moduli space is (and I should say that my entire knowledge of this concept comes from my memory of editing a very nice article by David Ben-Zvi on the subject for the Princeton Companion to Mathematics — though obviously anything I say about them that is false is not his fault) is the space of all tori. If you are not used to Riemann surfaces, then you may think that there is just one torus up to isomorphism, but there you would be wrong. Topologically it is true, but we want an isomorphism *of Riemann surfaces*, and the maps that you are allowed to use are much more rigid. So for example if you take the complex plane and quotient out by , you get a torus that is not isomorphic to the torus you get if instead you quotient out by the triangular lattice. (Roughly speaking, the obvious attempt to define an isomorphism would involve shearing the plane, but shears are not holomorphic.)

If we quotient by two lattices, when will the results give isomorphic tori? If one is an expansion of the other, then they will, and if one is a rotation of the other, then they will again. From that we get that if two complex numbers generate a lattice, then the isomorphism type of the torus depends only on their ratio. So we have already reduced the family of tori to a single complex parameter. However, that isn’t the whole story as different complex parameters do not necessarily give rise to different tori. But it gives some idea that the tori form a “space” that itself has an interesting geometrical structure. For reasons I don’t fully understand, moduli spaces are very helpful in the study of Riemann surfaces, and are also extremely interesting objects in their own right.

OK that’s about it for what I remember. But before I look at my notes, I’d like to mention briefly one other connection with Avila, which is that Mirzakhani is also very interested in billiards in polygons, though this wasn’t mentioned in the laudatio.

Actually, that reminds me of one other thing, which is that one of Mirzakhani’s results is strongly reminiscent of famous results of Marina Ratner. Maybe I’ll be able to say more about that after looking at my notes.

OK, now I’ve looked at my notes I find that, as I thought, I had forgotten quite a bit.

One important detail is that Mirzakhani looked at surfaces of genus at least two (that is, surfaces with at least two “holes”, so not tori). This is important because it means that the metrics on them are hyperbolic. It turns out that the moduli space of Riemann surfaces of genus is a complex variety of complex dimension , and is also a symplectic orbifold. (An orbifold is a bit like a manifold but is allowed to have a few singularities. In the torus example, one of these singularities arises as a result of the fact that the triangular lattice has a symmetry — rotation by 60 degrees — that most lattices do not have.)

The moduli spaces are totally inhomogeneous. That is very important, but I don’t know what it means. (I can’t remember whether McMullen told us — probably he did.)

McMullen concentrated on three aspects of Mirzakhani’s work. The first was what I’ve already mentioned, namely counting simple closed geodesics. My feeling that there would be uncountably many of these unless one looked at equivalence classes somehow was based on the sphere and the torus, so maybe when the geometry becomes hyperbolic.

He told us that if is a Riemann surface of genus , then the the number of simple loops grows like . I can’t remember what the parameter means. I’ve written to indicate what is being counted.

It seems a bit silly not to try to find out what is going on here, so let me have a quick look at the citation.

Ah, that makes much more sense! stands for length. So the formula is an estimate for the number of simple loops of length at most . If you look at all closed geodesics (i.e., allowing self-crossing ones too) then the growth rate is .

This apparently led to a new proof of a famous conjecture of Witten — a formula for intersection numbers on the moduli space — which was originally proved by Kontsevich in 1992.

Another consequence is the result that the probability that a random simple loop in genus 2 cuts the surface into two pieces is 1/7.

The second major topic was complex geodesics in . I don’t know the precise definition, but I presume that the idea is that if you take a point in that is surrounded by a copy of an infinitesimally small part of the complex plane, then there is a unique way of continuing that “in the same direction” and getting what I presume is a Riemann surface that lives inside . So it would be a little bit like a 2D generalization of a geodesic but would also involve the complex structure. Ah, I see that I have written that a complex geodesic is a holomorphic isometry from the hyperbolic plane to , though I wonder whether that should be a local isometry — that is, that for each point in the hyperbolic plane there is a neighbourhood such that the restriction of the map to that neighbourhood is an isometry.

I’ve written that there are complex geodesics through every point in in every direction, and that they are called Teichmuller discs.

Apparently real geodesics are usually dense in . Sometimes they can be exotic shapes such as fractal cobwebs (whatever those are), defying classification. What about in two dimensions? Can we get some 2D analogue of fractal cobwebs? No we can’t. Mirzakhani and her coworkers showed that you always get an algebraic subvariety. This is strongly reminiscent of work of Margulis and Ratner.

What is remarkable about this result is that it is an analogue of the Margulis/Ratner results in a totally inhomogeneous situation, which was completely unexpected.

I’ve just cheated and looked at the citation again, because it seemed to be particularly important to get some idea of what “totally inhomogeneous” means. The answer is fairly simple. A homogeneous space is one where the geometry at every point is the same. To say that is totally inhomogeneous is to say that at *no* two points is the geometry the same. While looking for that, I also saw that Mirzakhani solved the simple-loop-counting problem by connecting it to a certain volume computation in the moduli space . So it was a definite case where looking at the entire family helps you to prove things about the individual members of the family.

The third aspect of Mirzakhani’s work that McMullen talked about concerned something called earthquake flow that was defined by Thurston. I thought I had some understanding of what this was when I was watching the talk, but can’t really remember now. On watching the explanation again, I find that I can understand part of what McMullen says (about deforming Riemann surfaces by cutting along closed geodesics and giving them a twist, and then doing something similar but with an entire “lamination” of closed geodesics), but I still don’t quite get how that leads to a flow. (If you want to try, then the video is here and the explanation starts at 25:24.)

The result is that the earthquake flow is ergodic and mixing, and this means something like that if you randomly apply earthquakes then you get all shapes of genus . Apparently, Mirzakhani established a measurable isomorphism between earthquake flow and horocycle flow, and this was a big surprise. Those are just words to me, but when I hear someone like Curt McMullen say that a result is very surprising, then I am impressed.

]]>

I was rescued by an extraordinary piece of luck. When I got to the gate with my boarding card, the woman who took it from me tore it up and gave me another one, curtly informing me that I had been upgraded. I have no idea why. I wonder whether it had anything to do with the fact that in order to avoid standing any longer than necessary I waited until almost the end before boarding. But perhaps the decision had been made well before that: I have no idea how these things work. Anyhow, it meant that I could make my seat pretty well horizontal and I slept for quite a lot of the journey. Unfortunately, I wasn’t feeling well enough to make full use of all the perks, one of which was a bar where one could ask for single malt whisky. I didn’t have any alcohol or coffee and only picked at my food. I also didn’t watch a single film or do any work. If I’d been feeling OK, the day would have been very different. However, perhaps the fact that I wasn’t feeling OK meant that the difference it made to me to be in business class was actually greater than it would have been otherwise. I rather like that way of looking at it.

An amusing thing happened when we landed in Paris. We landed out on the tarmac and were met by buses. They let the classy people off first (even we business-class people had to wait for the first-class people, just in case we got above ourselves), so that they wouldn’t have to share a bus with the riff raff. One reason I had been pleased to be travelling business class was that it meant that I had after all got to experience the top floor of an Airbus 380. But when I turned round to look, there was only one row of windows, and then I saw that it had been a Boeing 777. Oh well. It was operated by Air France. I’ve forgotten the right phrase: something like “shared code”. A number of little anomalies resolved themselves, such as that that take-off didn’t feel like the one in Paris, that the slope of the walls didn’t seem quite correct if we were on the top floor, etc.

I thought that as an experiment I would see what I could remember about the laudatio for Martin Hairer without the notes I took, and then after that I would see how much more there was to say *with* the notes. So here goes. The laudatio was given by Ofer Zeitouni, one of the people on the Fields Medal committee. Early on, he made a link with what Ghys had said about Avila, by saying that Hairer too studied situations where physicists don’t know what the equation is. However, these situations were somewhat different: instead of studying typical dynamical systems, Hairer studied stochastic PDEs. As I understand it, an important class of stochastic PDEs is conventional PDEs with a noise term added, which is often some kind of Brownian motion term.

Unfortunately, Brownian motion can’t be differentiated, but that isn’t by itself a huge problem because it can be differentiated if you allow yourself to work with distributions. However, while distributions are great for many purposes, there are certain things you can’t do with them — notably multiply them together.

Hairer looked at a stochastic PDE that modelled a physical situation that gives rise to a complicated fractal boundary between two regions. I think the phrase “interface dynamics” may have been one of the buzz phrases here. The naive approach to this stochastic PDE led quickly to the need to multiply two distributions together, so it didn’t work. So Hairer added a “mollifier” — that is, he smoothed the noise slightly. Associated with this mollifier was a parameter : the smaller was, the less smoothing took place. So he then solved the smoothed system, let tend to zero, showed that the smoothed solutions tended to a limit, and defined that limit to be the solution of the original equation.

The way I’ve described it, that sounds like a fairly obvious thing to do, so what was so good about it?

A first answer is that in this particular case it was far from obvious that the smoothed solutions really did tend to a limit. In order to show this, it was necessary to do a renormalization (another thematic link with Avila), which involved subtracting a constant . The only other thing I remember was that the proof also involved something a bit like a Taylor expansion, but that a key insight of Hairer was that instead of expanding with respect to a fix basis of functions, one should instead let the basis of functions depend on the function was expanding — or something like that anyway.

I was left with the feeling that a lot of people are very excited about what Hairer has done, because with his new theoretical framework he has managed to go a long way beyond what people thought was possible.

OK, now let me look at the notes and see whether I want to add anything.

My memory seems to have served me quite well. Here are a couple of extra details. An important one is that Zeitouni opened with a brief summary of Hairer’s major contributions, which makes them sound like much more than a clever trick to deal with one particular troublesome stochastic PDE. These were

1. a theory of regularity structures, and

2. a theory of ergodicity for infinite-dimensional systems.

I don’t know how those two relate to the solution of the differential equation, which, by the way, is called the KPZ equation, and is the following.

It models the evolution of interfaces. (So maybe “interface dynamics” was not after all the buzz phrase.)

When I said that the noise was Brownian, I should have said that the noise was completely uncorrelated in time, and therefore makes no sense pointwise, but it integrates to Brownian motion.

The mollifiers are functions that replace the noise term . The constants I mentioned earlier depend on your choice of mollifier, but the limit doesn’t (which is obviously very important).

What Zeitouni actually said about Taylor expansion was that one should measure smoothness by expansions that are tailored (his word not mine) to the equation, rather than with respect to a universal basis. This was a key insight of Hairer.

One of the major tools introduced by Hairer is a generalization of something called rough-path theory, due to Terry Lyons. Another is his renormalization procedure.

Zeitouni summarized by saying that Hairer had invented new methods for defining solutions to PDEs driven by rough noise, and that these methods were robust with respect to mollification. He also said something about quantitative behaviour of solutions.

If you find that account a little vague and unsatisfactory, bear in mind that my aim here is not to give the clearest possible presentation of Hairer’s work, but rather to discuss what it was like to be at the ICM, and in particular to attend this laudatio. One doesn’t usually expect to come out of a maths talk understanding it so well that one could give the same talk oneself. As I’ve mentioned in another post, there are some very good accounts of the work of all the prizewinners here. (To see them, follow the link and then follow further links to press releases.)

]]>

Dick Gross also gave an excellent talk. He began with some of the basic theory of binary quadratic forms over the integers, that is, expressions of the form . One assumes that they are *primitive* (meaning that , and don’t have some common factor). The *discriminant* of a binary quadratic form is the quantity . The group SL then acts on these by a change of basis. For example, if we take the matrix , we’ll replace by and end up with the form , which can be rearranged to

(modulo any mistakes I may have made). Because the matrix is invertible over the integers, the new form can be transformed back to the old one by another change of basis, and hence takes the same set of values. Two such forms are called *equivalent*.

For some purposes it is more transparent to write a binary quadratic form as

If we do that, then it is easy to see that replacing a form by an equivalent form does not change its discriminant since it is just -4 times the determinant of the matrix of coefficients, which gets multiplied by a couple of matrices of determinant 1 (the base-change matrix and its transpose).

Given any equivalence relation it is good if one can find nice representatives of each equivalence class. In the case of binary quadratic forms, there is a unique representative such that or . From this it follows that up to equivalence there are finitely many forms with any given discriminant. The question of how many there are with discriminant is a very interesting one.

Even more interesting is that the equivalence classes form an Abelian group under a certain composition law that was defined by Gauss. Apparently it occupied about 30 pages of the *Disquisitiones*, which are possibly the most difficult part of the book.

Going back to the number of forms of discriminant , Gauss did some calculations and stated (without proof) the formula

There was, however, a heuristic justification for the formula. (I can’t remember whether Dick Gross said that Gauss had explicitly stated this justification or whether it was simply a reconstruction of what he must have been thinking.) It turns out that the sum on the left-hand side works out as the number of integer points in a certain region of (or at least I assume it is since the binary form has three coefficients), and this region has volume . Unfortunately, however, the region is not convex, or even bounded, so this does not by itself prove anything. What one has to do is show that certain cusps don’t accidentally contain lots of integer points, and that is quite delicate.

One rather amazing thing that Bhargava did, though it isn’t his main result, was show that if a binary quadratic form represents all the positive integers up to 290 then it represents all positive integers, and that this bound is best possible. (I may have misremembered the numbers. Also, one doesn’t have to know that it represents every single number up to 290 in order to prove the result: there is some proper subset of that does the job.)

But the first of his Fields-medal-earning results was quite extraordinary. As a PhD student, he decided to do what few people do, and actually read the *Disquisitiones*. He then did what even fewer people do: he decided that he could improve on Gauss. More precisely, he felt that Gauss’s definition of the composition law was hard to understand and that it should be possible to replace it by something better and more transparent.

I should say that there are more modern ways of understanding the composition law, but they are also more abstract. Bhargava was interested in a definition that would be computational but better than Gauss’s. I suppose it isn’t completely surprising that Gauss might have produced something suboptimal, but what is surprising is that it was suboptimal *and* nobody had improved it in 200 years.

The key insight came to Bhargava, if we are to believe the story he tells us, when he was playing with a Rubik’s cube. He realized that if he put the letters to at the vertices of the cube, then there were three ways of slicing the cube to produce two matrices. One could then do something with their determinants, the details of which I have forgotten, and end up producing three binary quadratic forms that are related, and this relationship leads to a natural way of defining Gauss’s composition law. Unfortunately, I couldn’t keep the precise definitions in my head.

Here’s a fancier way that Dick Gross put it. Bhargava reinvented the composition law by studying the action of SL on . The orbits are in bijection with triples of ideal classes for the ring that satisfy . That’s basically the abstract way of thinking about what Bhargava did computationally.

In this way, Bhargava found a symmetric reformulation of Gauss composition. And having found the right way of thinking about it, he was able to do what Gauss couldn’t, namely generalize it. He found 14 more integral representations on objects like above, which gave composition laws for higher degree forms.

He was also able to enumerate number fields of small degree, showing that the number of fields of degree and discriminant less than grows like . This Gross described as a fantastic generalization of Gauss’s work.

I spent the academic years 2000-2002 at Princeton and as a result had the privilege of attending Bhargava’s thesis defence, at which he presented these results. It must have been one of the best PhD theses ever written. Are there any reasonable candidates for better ones? Perhaps Simon Donaldson’s would offer decent competition.

It’s not clear whether those results would have warranted a Fields medal on their own, but the matter was put beyond the slightest doubt when Bhargava and Shankar proved a spectacular result about elliptic curves. Famously, an elliptic curve comes with a group law: given two points, you take the line through them, see where it cuts the elliptic curve again, and define that to be the inverse of the product. This gives an Abelian group. (Associativity is not obvious: it can be proved by direct computation, but I don’t know what the most conceptual argument is.) The group law takes rational points to rational points, and a famous theorem of Mordell states that the rational points form a finitely generated subgroup. The structure theorem for Abelian groups tells us that for some it must be a product of with a finite group. The integer is called the *rank* of the curve.

It is conjectured that the rank can be arbitrarily large, but not everyone agrees with that conjecture. The record so far is held by the curve

discovered by Noam Elkies (who else?) and shown to have rank 19. According to Wikipedia, from which I stole that formula, there are curves of unknown rank that are known to have rank at least 28, so in another sense the record is 28, in that that is the highest known integer for which there is proved to be an elliptic curve of rank at least that integer.

Bhargava and Shankar proved that the *average* rank is less than 1. Previously this was not even known to be finite. They also showed that at least 80% of elliptic curves have rank 0 or 1.

The Birch–Swinnerton-Dyer conjecture concerns ranks of elliptic curves, and one consequence of their results (or perhaps it is a further result — I’m not quite sure) is that the conjecture is true for at least 66% of elliptic curves. Gross said that there was some hope of improving 66% to 100%, but cautioned that that would not prove the conjecture, since 0% of all elliptic curves doesn’t mean no elliptic curves. But it is still a stunning advance. As far as I know, nobody had even thought of trying to prove average statements like these.

I think I also picked up that there were connections between the delicate methods that Bhargava used to enumerate number fields (which again involved counting lattice points in unbounded sets) and his more recent work with Shankar.

Finally, Gross reminded us that Faltings showed that for hyperelliptic curves (a curve of the form for a polynomial — when is a cubic you get an elliptic curve) the number of rational points is finite. Another result of Bhargava is that for almost all hyperelliptic curves there are in fact no rational points.

While it is clear from what people have said about the work of the four medallists that they have all proved amazing results and changed their fields, I think that in Bhargava’s case it is easiest for the non-expert to understand just *why* his work is so amazing. I can’t wait to see what he does next.

]]>

The first one was an excellent talk by Etienne Ghys on the work of Artur Avila. (The only other talk I’ve heard by Ghys was his plenary lecture at the ICM in Madrid in 2006, which was also excellent.) It began particularly well, with a brief sketch of the important stages in the history of dynamics. These were as follows.

1. Associated with Newton is the idea that you are given a differential equation, and you try to find solutions. This has of course had a number of amazing successes.

2. However, after a while it became clear that the differential equations for which one could hope to find a solution were not typical. The next stage, initiated by Poincaré, was to aim for something less. One could summarize it by saying that now, given a differential equation, one tries merely to say something interesting about its solutions.

3. In the 1960s, Smale and Thom went a stage further, trying to take on board the realization that often physicists don’t actually know the equation that models the phenomenon they are looking at. As Ghys put it, the endeavour now can be summed up as follows: you are not given a differential equation and you want to say something interesting about its solutions.

Of course, once the well-deserved laugh had died down, he explained a bit further what he meant. One way he put it was to ask what a typical dynamical system looks like.

He then talked about four important results of Avila that fit into this broad framework. One concerns iterates of unimodal maps, which are maps that look like upside-down parabolas (they are zero at 0 and 1 and have a single local maximum in between, which lies above the line ). Avila showed that given an analytic family of such maps, almost every function in the family gives rise either to a very structured dynamical system or a rather random-like one. More precisely, for almost every in the family, either almost every orbit converges to an attracting cycle (such systems are called *regular*) or there is an absolutely continuous measure such that almost every orbit in is distributed according to .

The main tool in the proof is something called the renormalization operator. I didn’t fully understand what this was, but I got a partial understanding. A discrete dynamical system is a set together with a map (usually assumed to have extra properties such as continuity or preservation of measure, which of course requires to have some structure so that those properties make sense) that one iterates. We are interested in orbits, which are simply sequences of the form .

Now suppose you have a subset of . Often you can define a dynamical system on by simply setting to be for the smallest positive integer for which . And often this dynamical system is closely related to the big dynamical system on . In a way I didn’t pick up from the lecture, the renormalization operator exploits this close relationship to turn maps from to into maps from to . We can use this basic idea to define a renormalization operator on the space of all unimodal maps.

It is not obvious to me why this is a good thing to do, except that it fits into the general philosophy, that applies in many many contexts, that considering a lot of objects of a certain type at once is often a great way to learn about individual objects of that type. (This theme was to reappear in a big way in the talk about Mirzakhani’s work.) Avila did not invent the renormalization map, but according to Ghys he is an absolute master at using it, and has in that way made it his own.

The second result was about interval exchange maps. These are maps that take a unit interval, chop it up into finitely many pieces (of varying lengths if you want the map to be interesting) and reassemble them in a different order. In 2007, Avila and Giovanni Forni proved that almost all interval exchange maps are weak mixing. This means that if you take any two sets and , then for almost every the measure of is approximately what you would expect if was a “random set” — that is, the product of the measures of and .

Renormalization was the tool here too. Apparently the key to proving this result was to show that the renormalization map on the space of interval exchange maps is chaotic. I don’t know exactly what this means.

I have always had a soft spot for interval exchange maps, because I once heard a fascinating open problem and thought about it very hard with no success. Suppose you are given a polygonal but not necessarily convex room lined with mirrors and you switch a light on. Must it illuminate the whole room? (Assume that the light comes from a point source.) There is a very nice construction called Kafka’s study, which shows that the answer can be no in a room with a smooth boundary. To draw it, you begin by drawing an ellipse, cutting it in half along the line joining its two foci, which I’ll take to be horizontal, keeping only one half, and then creating a sort of mushroom shape with the half ellipse at the top and a curve that goes horizontally through the two foci but also dips down between the foci (to make the “stalk” of the mushroom). If a beam of light comes out of one focus and hits the boundary of the ellipse, then it bounces back to the other focus. From this it is easy to see that if you switch on a light in the stalk part of the room, then the two other bits that do not lie in the top half of the ellipse will remain dark. I think the idea behind the name was that Kafka could work in the side parts without being disturbed by noise from the stalk part.

Another way of thinking about this is as a billiards problem. If you fire off a billiards ball (infinitesimally small of course) from the stalk part of the room, then however much it bounces, it will never reach the side parts.

What about the polygonal case? If a room is polygonal and all the sides make an angle with the horizontal that’s a rational multiple of , then a billiard ball will only ever travel in one of a finite number of directions, so we can define a map from the set of pairs of the form (boundary point, possible direction from that boundary point) to itself, which, if you think about it for a bit, can be seen to be an interval exchange map.

Years ago I managed to prove to my own satisfaction the known (I’m pretty sure, though I don’t know enough about the area to know where to find it) result that for almost every direction you send a billiard ball out in the resulting orbit will be dense. However, once the angles stop being nice rational multiples of , the dynamical system becomes a rather unpleasant map that moves bits of the plane about while also applying affine transformations to them.

As a means of simplifying the problem, I decided to consider a natural 2D analogue of interval exchange maps. This time you take a square, chop it up into finitely many rectangles, and reassemble the rectangles in some other way into the square. That led to a question I spent a long time on and couldn’t answer. (This was probably in about 1989 or so.) Take a rectangle exchange map of the kind I’ve just described, and take a point in the square. Is it recurrent? That is, will its iterates necessarily come back arbitrarily close to the original point? In the 1D case the answer is yes, and I seem to remember that was a key lemma in the proof about dense orbits.

Note that I’m not asking whether *almost* all points are recurrent: that is an easy excercise (and a result of Poincaré). I really want them all.

Incidentally, a few years after I was obsessed with the billiards-in-polygons problem, a paper came out that purported to solve it. Imagine my surprise when the polygon in question had rational angles. It turned out that the paper did something like assuming that corners absorbed light, or something like that. Anyhow, as far as I know the following two questions are still open, but if not, then I’d be interested to be pointed to the appropriate literature.

1. If you have a light source that’s more like a real light in that light comes in all directions from everywhere in a non-empty open set, then must an arbitrary polygonal room be illuminated?

2. If you take a point in a polygonal room and send off a billiard ball, is it true that for almost every direction you might choose the trajectory of the ball will be dense? (As far as I know “almost every” could mean for every direction not belonging to some countable set.)

Moving on to the other two of Avila’s results, I’m going to say much less. The first one was a solution of the ten-martini problem, so called because Mark Kac offered ten martinis to whoever solved it. Unfortunately, he had died by the time Avila was in a position to claim them. I didn’t really understand the problem, but it was to do with the Schrödinger equation and boiled down to a problem in spectral theory, which Avila, remarkably, solved using dynamical systems.

The last problem was one that Etienne Ghys told us most people assume must be easy when they hear it for the first time, and often offer incorrect proofs. Maybe because he had said that I didn’t have any particular feeling that it should be easy, but perhaps you, dear reader, will.

It is known that a diffeomorphism on a manifold can be approximated (in ) by a diffeomorphism. Avila showed that if the diffeomorphism is volume preserving, then the one can be taken to be volume preserving as well. The proof was apparently very hard.

The main other thing I remember from the talk was that Ghys prepared a sequence of photos that flashed up in front of us in a seemingly endless sequence, of all Avila’s collaborators. The fact that he has so many is one of the remarkable things about him: he is apparently very generous with his ideas, a great illustration of how that kind of generosity can be hugely beneficial not just to the people who are on the receiving end but also to those who exhibit it.

]]>

I didn’t manage to maintain my ignorance of the fourth Fields medallist, because I was sitting only a few rows behind the medallists, and when Martin Hairer turned up wearing a suit, there was no longer any room for doubt. However, there was a small element of surprise in the way that the medals were announced. Ingrid Daubechies (president of the IMU) told us that they had made short videos about each medallist, and also about the Nevanlinna Prize winner, who was Subhash Khot. So for each winner in turn, she told us that a video was about to start. An animation of a Fields medal then rotated on the large screens at the front of the hall, and when it settled down one could see the name of the next winner. The beginning of each video was drowned out by the resulting applause (and also a cheer for Bhargava and an even louder one for Mirzakhani), but they were pretty good. At the end of each video, the winner went up on stage, to more applause, and sat down. Then when the five videos were over, the medals were presented, to each winner in turn, by the president of Korea.

Here they are, getting their medals/prize. It wasn’t easy to get good photos with a cheap camera on maximum zoom, but they give some idea.

After those prizes were announced, we had the announcements of the Gauss prize and the Chern medal. The former is for mathematical work that has had a strong impact outside mathematics, and the latter is for lifetime achievement. The Gauss medal went to Stanley Osher and the Chern medal to Phillip Griffiths.

If you haven’t already seen it, the IMU page about the winners has links to very good short (but not too short) summaries of their work. I’m quite glad about that because I think it means I can get away with writing less about them myself. I also recommend this Google Plus post by John Baez about the work of Mirzakhani.

I have one remark to make about the Fields medals, which is that I think that this time round there were an unusually large number of people who could easily have got medals, including other women. (This last point is important — one should think of Mirzakhani’s medal as the new normal rather than as some freak event.) I have two words to say about them: Mikhail Gromov. To spell it out, he is an extreme, but by no means unique, example of a mathematician who did not get a Fields medal but whose reputation would be pretty much unaltered if he had. In the end it’s the theorems that count, and there have been some wonderful theorems proved by people who just missed out this year.

Other aspects of the ceremony were much as one would expect, but there was rather less time devoted to long and repetitive speeches about the host country than I have been used to at other ICMs, which was welcome.

That is not to say that interesting facts about the host country were entirely ignored. The final speech of the ceremony was given by Martin Groetschel, who told us several interesting things, one of which was the number of mathematics papers published in international journals by Koreans in 1981. He asked us to guess, so I’m giving you the opportunity to guess before reading on.

Now Korea is 11th in the world for the number of mathematical publications. Of course, one can question what this really means, but it certainly means something when you hear that the answer to the question above is 3. So in just one generation a serious mathematical tradition has been created from almost nothing.

He also told us the names of the people on various committees. Here they are, except that I couldn’t quite copy all of them down fast enough.

The Fields Medal committee consisted of Daubechies, Ambrosio, Eisenbud, Fukaya, Ghys, Dick Gross, Kirwan, Kollar, Kontsevich, Struwe, Zeitouni and GÃ¼nter Ziegler.

The program committee consisted of Carlos Kenig (chair), Bolthausen, Alice Chang, de Melo, Esnault, me, Kannan, Jong Hae Keum, Le Bris, Lubotsky, Nesetril and Okounkov.

The ICM executive committee (if that’s the right phrase) for the next four years will be Shigefumi Mori (president), Helge Holden (secretary), Alicia Dickenstein (VP), Vaughan Jones (VP), Dick Gross, Hyungju Park, Christiane Rousseau, Vasudevan Srinivas, John Toland and Wendelin Werner.

He also told us about various initiatives of the IMU, one of which sounded interesting (by which I don’t mean that the others didn’t). It’s called the adopt-a-graduate-student initiative. The idea is that the IMU will support researchers in developed countries who want to provide some kind of mentorship for graduate students in less developed countries working in a similar area who might otherwise not find it easy to receive appropriate guidance. Or something like that.

Ingrid Daubechies also told us about two other initiatives connected with the developing world. One was that the winner of the Chern Medal gets to nominate a good cause to receive a large amount of money. Stupidly I seem not to have written it down, but it may have been $250,000. Anyhow, that order of magnitude. Phillip Griffiths chose the African Mathematics Millennium Science Initiative, or AMMSI. The other was that the five winners of the Breakthrough Prizes in mathematics, Donaldson, Kontsevich, Lurie, Tao and Taylor, have each given $100,000 towards a $500,000 fund for helping graduate students from the developing world. I don’t know exactly what form the help will take, but the phrase “breakout graduate fellowships” was involved.

When I get time, I’ll try to write something about the Laudationes, but right now I need to sleep. I have to confess that during Jim Simons’s talk, my jet lag caught up with me in a major way and I simply couldn’t keep awake. So I don’t really have much to say about it, except that there was an amusing Q&A session where several people asked long rambling “questions” that left Jim Simons himself amusingly nonplussed. His repeated requests for short pithy questions were ignored.Â

Just before I finish, I’ve remembered an amusing thing that happened during the early part of the ceremony, when some traditional dancing was taking place (or at least I assume it was traditional). At one point some men in masks appeared, who looked like this.

Just while we’re at it, here are some more dancers.

Anyhow, when the men in masks came on stage, there were screams of terror from Mirzakhani’s daughter, who looked about two and a half, and delightful, and she (the daughter) took a long time to be calmed down. I think my six-year-old son might have felt the same way — he had to leave a pantomime version of Hansel and Gretel, to which he had been taken as a birthday treat when he was five, almost the instant it started, and still has those tendencies.

]]>

The flight over was not exactly fun — a night flight never is — but I watched two passable films, got a little bit of work done, missed out on the hot towels (which was good news because it meant I must have been properly asleep), and had possibly the best inflight meal of my life. The last was probably a well-known dish but it happened not to be known to me. I had a choice between beef, chicken, and bibimbap, with the first two being western and the third Korean. That was a no-brainer, but when I asked for the bibimbap I was given not just the bibimbap itself but a leaflet explaining how to assemble it. The steps were as follows.

1. Please put the steamed rice into the “Bibimbap” bowl.

2. Add gochujang (Korean hot pepper paste).

Spicy level 1. (Mild): 1/2 of tube.

Spicy level 2. (Hot): Full tube.

3. Add sesame oil.

4. Mix the “Bibimbap” together.

5. Enjoy the “Bibimbap” with side dish and soup.

I squeezed out almost all the tube of hot pepper sauce and the result was pleasantly hot without threatening to be painful. It was also delicious and substantial. The soup, which I think may have been seaweed soup, was also very good.

I now regret choosing omelette for breakfast when I could have had something called rice porridge, which also looked interesting. (The omelette wasn’t.)

The one other notable thing about the flight was that the plane was so vast that it took off before it felt as though it had picked up enough speed to do so. It also satisfied the “law of turbulence”: that no matter how big a plane is, it gets buffeted about just as much as any other plane. I wonder if there is some scaling law there: for instance, the faster you go, the more dramatic the changes in pressure and wind direction, or something like that.

Seoul was fairly similar to what I expected, though a bit more spread out perhaps. My impression of the place is gleaned from just one bus journey (over an hour) from airport to hotel. Maybe I’ll have more to say about it later.

When I arrived, I immediately went to register. That was quick and efficient, and I picked up my unusually tasteful conference bag, which resembles a large handbag. I had a choice between black and brown, and went daringly for the latter. It had the usual kinds of things in it, with one exception: no notepad. (For the younger generation out there, that means a number of sheets of paper conveniently joined together, rather than some kind of tablet computer.) That will make my note-taking work slightly harder, but I’ll think of something.

The first event of the ICM was an opening reception, which took place in a huge room in the conference centre. There was an extraordinary amount of food there, and also beer, which was very welcome. The food was good, and some of it interestingly Korean, but it didn’t quite reach the heights of the bibimbap (or should that be “Bibimbap”?).

Although I’m not strictly forced to leave the hotel, I’m not sure I’m ready to pay $40 for breakfast, so I’m going to nip out quickly and try to find some coffee and a bun or something like that. I noticed from the bus that there were lots of quite promising looking coffee places: it will certainly be a bonus if, as looks as though will be the case, Korea is a country where one can get a good cup of coffee. And then it’s off to the opening ceremony. More later.

Actually, more sooner, because I’ve just remembered that I was going to mention an amusing story that I was told at the reception yesterday. Apparently the Pope is visiting Korea, and asked for an audience with the president today. And the president told him that he would have to wait till tomorrow, because today she was otherwise occupied. It’s heartening to know that mathematics takes precedence over the Catholic church.

And slightly more again: I have a bit of battery left on my laptop, which I was allowed to bring into the opening ceremony. As was advised, I got here very much earlier than the start time, which makes an already long ceremony a significant chunk longer. We’ve been treated to Beatles songs arranged for some Korean instrument that I don’t know the name of — it looks a bit like a lyre but sits horizontally on the lap. Meanwhile, it seems that the names of the Fields Medallists have, disappointingly, been leaked. Despite that, I’ve managed to maintain my ignorance. (To be more accurate, I am now certain about three of the names but still don’t know who the fourth person is. We’ll see whether I can avoid learning that before it is announced.)

]]>

Just as the last ICM was the first (and still only) time I had been to India, this one will be my first visit to Korea. I’m looking forward to that aspect too, though my hotel is right next to where the congress is taking place and the programme looks pretty packed, so I’m not sure I’ll see much of the country. Talking of the packedness of the programme, I can already see that there are going to be some agonising decisions. For example, Tom Sanders is giving an invited lecture at the same time as Ryan Williams, two speakers I very much want to listen to. I suppose I’ll just have to read the proceedings article of the one I don’t go to. Equally unfortunate is that Ben Green’s plenary lecture is not until next week, when I’ll have gone. But I hope that I’ll still be able to get some kind of feel for where mathematics is now, what people outside my area consider important, and so on, and that I’ll be able to convey some of that in the next few posts.

I’d better stop this now, since I’ll soon be getting on to an Airbus 380 — a monstrously large double-decker plane. One of my children is something of a transport enthusiast and told me in advance that this would be the case (he had looked it up on the internet). I had hoped to end up on the top floor, but that turns out to be for business class only. The flight is about 11 hours: it leaves at 9pm French time and arrives at around 2:30pm Korean time. The challenge will be not to be utterly exhausted by the time of the opening ceremony on Wednesday morning. My memory of Hyderabad is that by the end of the four days I was so tired that I was almost getting anxious about my health. I plan to look after myself a bit better this time, but it may be difficult.

]]>

What I wrote gives some kind of illustration of the twists and turns, many of them fruitless, that people typically take when solving a problem. If I were to draw a moral from it, it would be this: when trying to solve a problem, it is a mistake to expect to take a direct route to the solution. Instead, one formulates subquestions and gradually builds up a useful bank of observations until the direct route becomes clear. Given that we’ve just had the football world cup, I’ll draw an analogy that I find not too bad (though not perfect either): a team plays better if it patiently builds up to an attack on goal than if it hoofs the ball up the pitch or takes shots from a distance. Germany gave an extraordinary illustration of this in their 7-1 defeat of Brazil.

I imagine that the rest of this post will be much more interesting if you yourself solve the problem before reading what I did. I in turn would be interested in hearing about other people’s experiences with the problem: were they similar to mine, or quite different? I would very much like to get a feel for how varied people’s experiences are. If you’re a competitor who solved the problem, feel free to join the discussion!

If I find myself with some spare time, I might have a go at doing the same with some of the other questions.

What follows is exactly what I wrote (or rather typed), with no editing at all, apart from changing the LaTeX so that it compiles in WordPress and adding two comments that are clearly marked in red.

**Problem** *Let be an infinite sequence of positive integers. Prove that there exists a unique integer such that*

Slight bafflement.

The expression in the middle is not an average. If we were to replace it by an average we would have the second inequality automatically.

Try looking at simple cases. Here we could consider what happens when , for example. Then the inequality says

Here we automatically have the first inequality, but there is no reason for the second inequality to be true.

Putting those observations together, we see that the first inequality is true when , and the second inequality is “close to being true” as gets large, since it is true if we replace by in the denominator.

If the inequality holds for a unique , then a plausible guess is that the first inequality fails at some and if is minimal such that it fails, then both inequalities are true for . I shall investigate that in due course, but I have another idea.

It is clear that WLOG . Can we now choose in such a way that we always get equality for the second inequality? We can certainly solve the equations, so the question is whether the resulting sequence will be increasing.

We get , so I’d better set and then continue constructing a sequence.

So , , , and so on. Thus all the with are equal, which they are not supposed to be. This feels significant.

Out of interest, what happens to the inequalities when we (illegally) take the above sequence? We get , so we get equality on both sides except when when we get .

Try to disprove the result.

Try to find the simplest counterexample you can.

An obvious thing to do is to try to make the inequality true when and when . So let’s go. Without loss of generality , . We now need .

For we need . That can be rearranged to , exactly contradicting what we had before.

That doesn’t solve the problem but it looks interesting. In particular, it suggests rearranging the first inequality in the general case, to

That’s quite nice because the right hand side is a genuine average this time.

Actually, if getting an average is what we care about, we could also rearrange the first inequality by simply multiplying through by , which gives

I think it is time to revisit that guess, in order to try to prove at least that there *exists* a solution. So we know that the first inequality holds when , since all it says then is that . Can it always hold? If so, then again WLOG and , and after that we get , , etc.

Let’s write and for . Then we have , , , etc. We also require .

Let’s set . Now the first condition becomes but . Is that possible?

Is it possible with equality? WLOG . Then we have , , , etc.

I’m starting to wonder whether the integer must be something like 1 or 2. Let’s think about it. We know that . If then we have our . Suppose instead that . Then , so . Now if then we are again done, so suppose that .

But since , we can simply insert in between the two. Why can’t we continue doing that kind of thing? Let me try.

If , then , so we can insert in between the two.

I seem to have disproved the result, so I’d better see where I’m going wrong. I’ll try to construct a sequence explicitly. I’ll take , . I need , so I’ll take . Now I need , so I’ll take . Now I need , so I’ll take .

I don’t seem to be getting stuck, so let me try to prove that I can always continue. Suppose I’ve already chosen . Then the condition I need is that

By induction we already have that , from which it follows that and therefore that . We may therefore find between these two numbers, as desired.

You idiot Gowers, read the question: the have to be positive integers.

Fortunately, the work I’ve done so far is not a complete waste of time. [The half-conscious thought in the back of my mind here, which is clearer in retrospect, was that the successive differences in the example I had just constructed were getting smaller and smaller. So it seemed highly likely that using the same general circle of thoughts I would be able to prove that I couldn't take the to be integers.]

Here’s a trivial observation: if the second inequality fails, then . So if , then . How long can we keep that going with positive integers? Answer: for ever, since we can take .

Never mind about that. I want to go back to an earlier idea. [It isn't obvious what I mean by "earlier idea" here. Actually, I had earlier had the idea of defining the as below, but got distracted by something else and ended up not writing it down. So a small part of the record of my journey to the proof is missing.] It is simply to define and for . Then for if the first inequality holds we have

So each new is less than the average of the up to that , and hence less than the average of the before that . But that means that the average of the forms a decreasing sequence. That also means that the are bounded above by , something I could have observed ages ago. So they can’t be an increasing sequence of integers.

I’ve now shown that the first inequality must fail at some point. Suppose is the first point at which it fails. Then we have

and

The second inequality tells us that exceeds the average of , which implies that it exceeds the average of . That gives us the inequality

So now I’ve proved that there exists an integer such that the inequalities both hold. It remains to prove uniqueness. This formulation with the ought to help. We’ve picked the first point at which is at least as big as the average of . Does that imply that is at least as big as the average of ? Yes, because is at least as big as that average, and is bigger than . In other words, we can prove easily that if the first inequality fails for then it fails for , and hence by induction for all .

]]>

Just before I start this post, let me say that I do still intend to write a couple of follow-up posts to my previous one about journal prices. But I’ve been busy with a number of other things, so it may still take a little while.

This post is about the next European Congress of Mathematics, which takes place in Berlin in just over two years’ time. I have agreed to chair the scientific committee, which is responsible for choosing approximately 10 plenary speakers and approximately 30 invited lecturers, the latter to speak in four or five parallel sessions.

The ECM is less secretive than the ICM when it comes to drawing up its scientific programme. In particular, the names of the committee members were made public some time ago, and you can read them here.

I am all in favour of as much openness as possible, so I am very pleased that this is the way that the European Mathematical Society operates. But what is the maximum reasonable level of openness in this case? Clearly, public discussion of the merits of different candidates is completely out of order, but I think anything else goes. In particular, and this is the main point of the post, I would very much welcome suggestions for potential speakers. If you know of a mathematician who is European (and for these purposes Europe includes certain not obviously European countries such as Russia and Israel), has done exciting work (ideally recently), and will not already be speaking about that work at the International Congress of Mathematicians in Seoul, then we would like to hear about it. Our main aim is that the congress should be rewarding for its participants, so we will take some account of people’s ability to give a good talk. This applies in particular to plenary speakers.

~~I shall moderate all comments on this post. If you suggest a possible speaker, I will not publish your comment, but will note the suggestion.~~ More general comments are also welcome and will be published, assuming that they are the kinds of comments I would normally allow.

[In parentheses, let me say what my comment policy now is. The volume of spam I get on this blog has reached a level where I have decided to implement a feature that WordPress allows, where if you have never had a comment accepted, then your comment will automatically be moderated. I try to check the moderation queue quite frequently. If you have had a comment accepted in the past, then your comments will appear as normal.

I am very reluctant to delete comments, but I do delete obvious spam, and I also delete any comment that tries to use this blog as a form of self-promotion (such as using a comment to draw attention to the author's proof of the Riemann hypothesis, or to the author's fascinating blog, etc. etc.). I sometimes delete pingbacks as well -- it depends whether I think readers of my blog might conceivably be interested in the post from which the pingback originates.]

Going back to the European Congress, if you would prefer to make your suggestion by getting in contact directly with a committee member, then that is obviously fine too. The list of committee members includes email addresses.

However you make your suggestions, it would be very helpful if you could give not just a name but a brief reason for the suggestion: what the work is that you think should be recognised, and why it is important.

The main other thing I am happy to be open about is the stage that the committee has reached in its deliberations, and the plans for how it will carry out its work. Right now, we are at the stage of trying to put together a longlist of possible speakers. I have asked the other committee members to suggest to me at least six potential speakers each, of whom at least six should be broadly in their area. I hope that will give us enough candidates to make it possible to achieve a reasonable subject balance. We will of course also strive for other forms of balance, such as gender and geographical balance, to the extent that we can. Once we have a decent-sized longlist, we will cut it down to the right sort of size.

We are aiming to produce a near-complete list of speakers by around November. This is rather a long time in advance of the Congress itself, which worried me a bit, but I have permission from the EMS to leave open a few slots so that if somebody does something spectacular after November, then we will have the option of inviting them to speak.

]]>

**Further update: figures in from Nottingham too.**

**Further update: figures now in from Oxford.**

**Final update: figures in from LSE.**

A little over two years ago, the Cost of Knowledge boycott of Elsevier journals began. Initially, it seemed to be highly successful, with the number of signatories rapidly reaching 10,000 and including some very high-profile researchers, and Elsevier making a number of concessions, such as dropping support for the Research Works Act and making papers over four years old from several mathematics journals freely available online. It has also contributed to an increased awareness of the issues related to high journal prices and the locking up of articles behind paywalls.

However, it is possible to take a more pessimistic view. There were rumblings from the editorial boards of some Elsevier journals, but in the end, while a few individual members of those boards resigned, no board took the more radical step of resigning en masse and setting up with a different publisher under a new name (as some journals have done in the past), which would have forced Elsevier to sit up and take more serious notice. Instead, they waited for things to settle down, and now, two years later, the main problems, bundling and exorbitant prices, continue unabated: in 2013, Elsevier’s profit margin was up to 39%. (The profit is a little over Â£800 million on a little over Â£2 billion.) As for the boycott, the number of signatories appears to have reached a plateau of about 14,500.

Is there anything more that can be done? One answer that is often given is that the open access movement is now unstoppable, and that it is only a matter of time before the current system will have changed significantly. However, the pace of change is slow, and the alternative system that is most strongly promoted — open access articles paid for by article processing charges — is one that mathematicians tend to find unpalatable. (And not only mathematicians: they are extremely unpopular in the humanities.) I don’t want to rehearse the arguments for and against APCs in this post, except to say that there is no sign that they will help to bring down costs any time soon and no convincing market mechanism by which one might expect them to.

I have come to the conclusion that if it is not possible to bring about a rapid change to the current system, then the next best thing to do, which has the advantage of being a lot easier, is to obtain as much information as possible about it. Part of the problem with trying to explain what is wrong with the system is that there are many highly relevant factual questions to which we do not yet have reliable answers. Amongst them are the following.

1. How willing would researchers be to do without the services provided by Elsevier?

2. How easy is it on average to find on the web copies of Elsevier articles that can be read legally and free of charge?

3. To what extent are libraries actually suffering as a result of high journal prices?

4. What effect are Elsevier’s Gold Open Access articles having on their subscription prices?

5. How much are our universities paying for Elsevier journals?

The main purpose of this post is to report on efforts that I and others have made to start obtaining answers to these questions. I shall pay particular attention to the last one, since it is about that that I have most to say. I will try to keep the post as factual as possible and give my opinions about some of the facts in a separate post.

I have two small pieces of evidence. The first is an interesting comment that was made on a Google Plus post of mine by Benoît Kloeckner, who wrote the following.

In France, when the national consortium “Couperin” was dealing with Springer for the 2012-2014 contract, we issued a petition asserting that some terms (notably interdiction to unsubscribe from a number of journals) were unacceptable and that we, mathematicians, would agree not to get access to Springer journals. This was done to give negotiators more strength, but had little effect despite a significant number of signatures.

This points to a problem that I will discuss in more detail in my next post: that different subjects have different needs. Part of the reason mathematicians find the current system so objectionable is that we have already got to the stage where we don’t really need journals for anything other than the very crude measure of quality that it gives us, since a fairly high, and ever increasing, proportion of the articles that interest us are freely available in preprint form. But in some subjects, such as biology or medicine, this is much less true, and as a result people rely far more on journal articles.

I tried to take the temperature in the mathematics faculty in Cambridge by asking my colleagues to complete a very brief questionnaire: there were two questions, with multiple-choice answers. The questions were as follows.

1. How easily could you do without access to Elsevier journals via ScienceDirect and print copies?

2. For those who negotiate on our behalf to be in a strong bargaining position, they have to be able to risk our losing access to Elsevier products (other than those that are freely available) for a significant length of time. How willing would you be for them to take that risk?

In case the results were interestingly different, I got people in DAMTP (the department of applied mathematics and theoretical physics) to answer one copy of the questionnaire and people in DPMMS (the department of pure mathematics and mathematical statistics) to answer another. The results were as follows. There were 96 responses from DAMTP and 80 from DPMMS. I give the DAMTP figure first and then the DPMMS figure, both as percentages.

1. How easily could you do without access to Elsevier journals via ScienceDirect and print copies?

(i) It would be no problem at all. [27.1, 23.8]

(ii) It would be OK, but a minor inconvenience. [26.0, 38.8]

(iii) It would be OK most of the time, but occasionally very inconvenient. [24.0, 32.5]

(iv) It would be a significant inconvenience. [14.6, 5.0]

(v) It would have a strongly negative impact on my research. [8.3, 0.0]

2. For those who negotiate on our behalf to be in a strong bargaining position, they have to be able to risk our losing access to Elsevier products (other than those that are freely available) for a significant length of time. How willing would you be for them to take that risk?

(i) Very willing [46.9, 55.7]

(ii) Willing [31.3, 39.2]

(iii) Unwilling [14.6, 3.8]

(iv) Very unwilling [7.3, 1.3]

Thus, if the responses were representative, then in both departments, most people would not suffer too much inconvenience if they had to do without Elsevier’s products and services, and a large majority were willing to risk doing without them if that would strengthen the bargaining position of those who negotiate with Elsevier.

Another question I might have asked is how much the answers would have changed if the departments were to subscribe to just a few important journals. That is an important question, since it might be that the University of Cambridge should follow the examples of Harvard, MIT, Cornell and others (that link is from 2004 so the situation may have changed), stop paying for a Big Deal contract and switch to paying for individual journals at list prices instead.

It is very easy to find websites where surveys like the one I conducted can be set up for no charge. (But be a little careful: I accidentally chose one called Surveymonkey that allowed only 100 responses, as a result of which I had to ask people to do it again.) I would be extremely interested if other people could do similar surveys in their own departments, both in mathematics and in other subjects.

My impression has for some time been that in mathematics a significant proportion of articles are available on the arXiv or on authors’ home pages, to the point where I almost never need to look at the journal version. There also appears to be a distinct positive correlation between the quality of a journal and the proportion of its articles freely available. And there seem to be national differences in the extent to which people make their papers available. But until recently it was a rather long and tedious process to obtain any hard figures about this.

Recently, however, Scott Morrison has set up a website called The Mathematics Literature Project, to which you can contribute if you have the time. Although one still has to input the information manually, Scott has written software that automates the process to some extent and makes it much quicker. The project is still in its infancy, but it already demonstrates that a large proportion of articles in various different journals, not all of them Elsevier journals, are indeed freely available in preprint form. And there is some evidence for the correlation with quality: for example, Discrete Mathematics is a less good journal than the Journal of Combinatorial Theory A and B, and a lot fewer of its articles can be found. (For JCTA the proportion is over 80%, whereas for Discrete Mathematics it is more like 30%.)

Thus, there is plenty of evidence that mathematicians at least do not really need their universities to pay large sums of money to Elsevier. Unfortunately, because of bundling, that fact on its own has had almost no effect on prices.

I’m tempted just to suggest that you go and talk to a librarian. You won’t be left in much doubt about the answer, at least qualitatively speaking. In brief, libraries suffer because bundling means that they have very little control over their budgets. If Elsevier raises its prices, then libraries simply have to pay them or else lose the entire bundle, so effectively they are forced to make cuts elsewhere. And this happens. For example, Phil Sykes, former chair of Research Libraries UK, shared a document with me that includes many interesting figures, one of which is that between 2001 and 2009, mean expenditure on books went up by 0.17%, which is a substantial real-terms cut, while mean expenditure on journals went up by 82%. Apparently, the expenditure on books as a proportion of total expenditure went down from 11% to just over 7% between 1999 and 2009.

But this distortion is not confined to books. Journals that belong to a large bundle are artificially protected, at the expense of other, potentially more useful, journals that do not belong to the bundle. If you think that this is just a theoretical possibility, then take a look at the example of the Université de Paris Descartes. This is the top university in Paris for medicine, the university you try to get into if you are French and want to be a doctor.

It would seem a safe bet that a top medical university would subscribe to at least some journals from the Nature publishing group, such as Nature Medicine, which describes itself as the premier journal for medical research, or Nature, which likes to think of itself as the premier journal full stop. But no: subscriptions to all Nature journals as well as many others were cancelled this year. In the long list of cancelled subscriptions, you won’t find any mention of Elsevier journals, because they are bundled together.

From time to time, a library decides that enough is enough. A couple of years ago, the mathematics department of the Technisches Universität München decided to cancel all its subscriptions to Elsevier journals. And very recently the entire Universität Konstanz, also in Germany, decided to cancel its license negotiations and replace its license by “alternative procurement channels”. Given the evidence that we are becoming less reliant on journal subscriptions, it would seem rational for other libraries to consider whether to take similar measures.

Recall that Gold Open Access refers to the practice where a publisher makes an article freely available online in return for an article processing charge (APC), which is typically paid by an author’s institution or by a grant-awarding body. Elsevier now has various journals that are funded that way, as well as “hybrid” journals — that is, journals to which libraries still subscribe but which allow authors to make their articles open access in return for an APC. The proportion of Elsevier articles for which APCs have been paid is currently very small, but it is likely to increase, since various funding bodies are starting to insist that the academics they fund should make their articles open access, and often (but not always) the assumption is that this should be done via an APC.

A few months ago, it occurred to me to wonder what would happen if the proportion of Gold Open Access articles did indeed increase. Would Elsevier continue to rake in its subscription revenue and receive the APCs on top? This would seem particularly unjust in the case of hybrid journals, since libraries with Big Deal contracts cannot cancel their subscriptions to them, and in any case if several of the articles are not open access they may well not want to. So there would seem to be a danger that Elsevier is receiving substantial article processing charges that are not needed to cover the cost of processing (the additional cost of making an article open access is at least an order of magnitude less than the APCs), or to compensate Elsevier for loss of subscription revenue.

I then discovered that, not surprisingly, many other people had been concerned about this point. There is even a technical term for the practice of effectively charging twice for the same article: it is called *double dipping*. I found a page on Elsevier’s website where they stated that they had a no-double-dipping policy. However, that mentioned only the list prices of journals, so it did not address my concern at all, given that most libraries have Big Deal contracts. I decided to write to Elsevier to ask about this, and the result was that they updated the relevant page.

I think one can summarize what they say on the page now as follows: they set their prices based on the number of non-open-access articles included in the Freedom Collection; this has gone up, so they feel no compunction about charging more for the Freedom Collection. So they are at least *implying* that if enough open-access articles were published that the total volume of non-open-access articles went down, they would lower their prices.

That leaves me with two concerns. The first is that if their Big Deal contracts are confidential, then we have no way of knowing whether they are sticking to their official policy. The second is that what matters should not be the number of open access articles as a proportion of the whole, but the proportion of open access articles *amongst the articles that people actually want to read*. If, for example, half the articles in journals such as Cell and The Lancet became open access but Elsevier launched a handful of joke journals that published a comparable volume of articles, then the value of the non-open-access component to libraries would have gone down substantially, but according to Elsevier’s stated policy their charges would not be decreased.

On top of all that is a remarkable scandal that has attracted a great deal of attention recently, which is that Elsevier has been double dipping in the most direct way possible: charging people to download articles for which APCs have been paid. Mike Taylor spotted this about two years ago. Elsevier’s response, coordinated by Alicia Wise, was less than swift, not surprisingly given their strong incentive to drag their feet about it. Peter Murray-Rust has been vigorously campaigning about this issue. If you’re interested, you can check out the March 2014 archive of his blog and work backwards.

Now we come to the big question. One of the most annoying aspects of the current situation in academic publishing is that the big publishers don’t want us to know what our universities are paying for their journals, so they insist on confidentiality clauses. As a result, we can’t tell whether we are getting good value for money, though there is plenty of indirect evidence, and even some direct evidence, that we are not.

There have been a few attempts in the past to use freedom-of-information legislation to get round these confidentiality clauses, some successful and others not. Also, some information has been made available by other means. Here are the cases I know about, but this list is very likely to be incomplete. (If I am notified of further useful information, I will be happy to add it to the list with appropriate acknowledgement.)

1. In 2009 public-record requests were made by Paul Courant, Ted Bergstrom and Preston McAfee to a large number of US universities asking for details of their Big Deal contracts with publishers. They had considerable success with this, obtaining information from 36 institutions. Elsevier made strenuous efforts to prevent the disclosures, contesting the request to Washington State University, but a judge ruled against them. See this page for further details. Together with Michael Williams they wrote an analysis of what they discovered, which ~~will soon become available in preprint form.~~ has now been published. It includes the following figures for what a number of universities spent on Elsevier contracts. The first figure in each row is the cost in dollars of the Elsevier Freedom Package and the figure in brackets is the enrolment. (The latter is not by any means a perfect measure of the size of a university, but it gives at least some idea.)

University |
Cost in dollars |
Enrolment |

Arizona Universities* | 2,724,888 | 123,473 |

Auburn | 1,252,544 | 22,654 |

Clemson | 1,296,044 | 16,582 |

Colorado State | 1,319,633 | 24,409 |

Cornell | 1,969,908 | 20,340 |

Georgia State | 934,764 | 25,135 |

Louisiana State | 1,198,237 | 28,467 |

New York U. | 1,878,962 | 40,291 |

U of Alabama | 1,018,614 | 22,971 |

U of California** | 8,760,968 | 218,320 |

U of Colorado | 1,725,023 | 28,333 |

U of Denver | 467,406 | 10,036 |

U of Georgia | 1,854,419 | 33,079 |

U of Idaho | 750,808 | 10,008 |

Illinois Universities*** | 2,319,383 | 72,751 |

U of Iowa | 1,420,484 | 27,361 |

U of Maryland | 1,760,173 | 31,573 |

U of Michigan | 2,164,830 | 39,447 |

U of Tennessee | 579,815 | 27,635 |

U of Texas, Arlington | 620,042 | 20,136 |

U of Texas, Austin | 1,539,380 | 46,537 |

U of Wisconsin | 1,215,516 | 35,295 |

U of Wyoming | 497,014 | 10,478 |

*A consortium of three universities in Arizona

**A joint license for ten University of California campuses

***A joint license for three University of Illinois campuses

If you like this kind of thing, then take a look at the appendix to their paper, from which the above table comes, and which is not behind a paywall. In case you have access to PNAS, the article is here.

One related thing I have found, which interests me a lot because of its relevance to this post, is a judgment from Greg Abbott, the Attorney General of Texas, that the University of Texas should release details of its contracts with publishers. The part that interests me starts near the bottom of page 3, where there is a detailed discussion of what constitutes a trade secret. Roughly speaking, information is a trade secret of one company if disclosing it to other companies would cause substantial competitive harm to the first company. The Attorney General concludes in robust terms that the Big Deal contracts do not meet the definition of a trade secret, which I agree with because the different publishing companies are not competing to sell the same product.

2. There is a fascinating blog post by David Colquhoun written in December 2011, which I would certainly have referred to before if I had been aware of it, in which he discusses in detail the situation at his institution, which is University College London. In it, he says, “I’ve found some interesting numbers, with help from librarians, and through access to The Journal Usage Statistics Portal (JUSP).” The word “interesting” is an understatement. The first number is that UCL then paid Elsevier €1.25 million for electronic only access to Elsevier journals. But as interesting as that headline figure is his analysis of the usage of Elsevier (and other) journals. As one might expect, but it is very good to see this confirmed, there are a few journals that are used a lot, but the usage tails off extremely rapidly.

3. In this country, there have been Freedom of Information requests to De Montfort University in 2010 (successful), Swansea University in 2014 (unsuccessful), and the University of Edinburgh in 2014 (successful). I recommend at this point that you read the refusal letter by Swansea. For reasons that I’ll come to, it is fairly clear that the letter was basically written by Elsevier, so it gives us some insight into their official reasons for wanting to keep their contracts secret. As I’ll discuss later, their arguments are very weak.

There was also a successful request to Swansea in 2013, but this one asked for the amount spent on all journal subscriptions, rather than just Elsevier subscriptions. It reveals that the amount went up from Â£1,514,890.88 in 2007/8 to Â£1,861,823.92 in 2011/12. (From the wording, it seems that these figures include VAT, but I’m not quite sure.) That’s a whopping 23% increase in four years. Of course, that may be because Swansea University decided to increase significantly the number of journals it subscribed to, but that explanation seems a trifle unlikely in the current economic climate. Whatever the explanation, the amount of money is very high.

The successful request to Edinburgh was made on January 16th by Sean Williams. The response was delayed, but on April 8th they finally responded, giving full details for two years and the totals for three. This reveals that Edinburgh spends around Â£845,000 plus VAT per year.

4. Recently there was a long negotiation between Elsevier and Couperin, a large consortium representing French academic insitutions. (Actually, I say long, but Elsevier apparently has an annoying habit of not beginning the process of negotiation in earnest until close to the end of the existing contract, so that the other side must either make decisions very quickly or risk large numbers of academics temporarily losing access to Elsevier journals.) The result was what one might call a Huge Deal, one that gave complete access to ScienceDirect to all academic institutions, from the very largest to the very smallest. Couperin professed to be pleased with the deal. I do not yet know whether that satisfaction is shared by the universities that are actually paying for it. If you want to know how much France is paying for access to ScienceDirect, then I recommend typing “Elsevier Couperin” into Google. After at most a couple of minutes of digging, you will find a document that tells you. Three important aspects of this deal are (i) that it lasts for five years, (ii) that the total amount paid to Elsevier is initially lower than before but goes up each year and ends up higher and (iii) that the access is now spread to many more institutions. What I do not know is what the effect of this is on the large universities that were paying for Elsevier journals before. Does the fact that many more institutions are involved mean that prices have gone down substantially? Or are most of the institutions that have newly been granted access paying very little for it and therefore not saving much money for the others? It would be good to have some insight into these questions. The bottom line though, is that Elsevier’s profits in France are protected by the deal.

5. Brazil too has a national agreement with Elsevier, and refuses to sign a confidentiality clause. Somewhere I did once find, or get referred to, a page with details about the deal, but have not managed to find it again. My memory of it was that it was rather hard to understand.

**Update 25/4/2014**: many thanks to Rafael Pezzi, whose comment below I reproduce here, for more information about the situation in Brazil.

From the Brazilian open science mailing list:

Brazil has an nation wide agreement providing journal access to 423 academic and research institutions. It is called Portal de PeriÃ³dicos, provided by CAPES. According to its 2013 financial report [1], last year CAPES spent US$ 93,872,151.11 (with US$ 31,644,204.12 paid to Elsevier).

Some institutions that are not covered by the agreement, as they do not meet the eligibility criteria, had to pay in separate in order to get access to this portal, spending additional US$ 11,560,438.93.

Rafael

[1] http://www.capes.gov.br/images/stories/download/Contas_Publicas/Relatorio-de-Gestao-2013.pdf

6. A comment by Anonymous below points me to a blog post that says that at the end of 2011 Purdue agreed a $2.9 million deal with Elsevier and describes the general situation facing libraries when they negotiate these deals. It also links to a post about Pittsburgh (with less precise figures).

In early January, I decided to try to find out more about what UK universities are paying by making a request under the Freedom of Information Act. As in France, the negotiations are carried out by a consortium: the British one is called JISC collections. (It’s surprisingly hard to find out what JISC stands for: the answer is Joint Information Systems Committee.) Initially (to be precise, on the 8th of January), I wrote to Lorraine Estelle, who is the head of JISC collections. I made a FOI request, and the information I asked to be told was how much JISC had agreed to pay Elsevier in the most recent round of negotiations, and how that payment was shared between the institutions represented by JISC.

She suggested that we should speak on the phone, which we did. I learned some important things from the phone call, which I will come to later, but I did not get the information I had actually asked for. She explained why on the phone, and some time later, when I found that I couldn’t quite remember her explanation, I asked for a clarification in writing. She provided me with the following.

Your question: As I understood it, you didn’t actually have the data that I was asking for. Is that correct? And do you mean that you negotiated a total — which, presumably, you would know — but do not know how it was split between the various universities?

Answer: We do have the data and we do know the split – but because we do not actually aggregate the subscriptions ourselves for the Elsevier deal, I have to get the total sum and the split from Elsevier.

I interpret that as meaning that for legal purposes she did not have the information in a form that might have obliged her to disclose it under the Freedom of Information Act.

And thus, I was passed on to Alicia Wise. As many people who have had dealings with Alicia Wise have found, including Peter Murray-Rust in his attempts to stop Elsevier charging for access to open access articles, this is not a good situation to be in.

Obviously she didn’t say, “Of course, I’d be happy to provide you with that information.” But I’d have been satisfied with a clear statement from her that she was not prepared to provide it, and I couldn’t get that either. Here is a sample of our correspondence. (Incidentally, owing first to some misunderstanding and then, apparently, to Alicia Wise wanting to check that Lorraine Estelle had not given me any confidential information, which she hadn’t, the correspondence didn’t even begin until about a fortnight after Lorraine Estelle had passed on my request.)

Her first email message, sent on February 5th, explained that Elsevier makes “an array of pricing information publicly available” and provided some links. These were to list prices of journals, which, because of bundling, give no indication of what universities actually pay. She also proposed that we should meet, or perhaps talk on the phone. I wrote back on the 7th suggesting that a phone conversation would be more convenient. I got no response for four days, so on the 11th I sent my reply again, which prompted a suggestion of several possible dates for a meeting. She said,

Sorry, should have sent you a receipt acknowledgment. We’ve worked out internally that Chris Greenwell and I should, together, be able to answer questions that arise (although I am also contemplating inviting someone from our pricing team along in case you have very very detailed questions!)

At this point I had a little worry, so I put it to her.

But before we actually arrange anything, and in particular before we decide whether it is better to meet physically or by phone, perhaps it is worth clarifying what could come out of such a meeting. The main question I asked in my FOI request was the following: “there is one particular thing I would like to know, and that is details of the most recent round of negotiations between JISC and Elsevier. I would like to know what annual payment was agreed, and how that payment was shared between the higher education institutions represented.”

If you are prepared to answer that question in full (I’m talking actual amounts of money rather than the general principles underlying the negotiations), and without binding me to any confidentiality agreement, then we have something serious to talk about. If not, then I’m not sure there is any point in having a discussion. However, in the second case, it would still be useful to know your reasons for not being prepared to divulge the information.

She responded as follows.

Thanks for this. I continue to think a call or meeting would be helpful as my immediate question is what hypothesis do you have, or are you testing, that require data at this level of granularity? The data you request are commercially sensitive. I am wondering if publicly available data â€“ for example the attached which is from publications by the Society of College, University, and National Libraries (http://www.sconul.ac.uk/) â€“ might serve your purpose? If we could understand better what you are after and why, we might be better able to come up with data that helps you. (And, yes, we would have even greater flexibility if you were prepared to consider treating some information in confidence but I appreciate you might be unwilling to do so.)

To which I said this.

Thanks for sending those slides, though of course you must have known perfectly well that they would not be of any help to me.

I can’t see what is unclear about what I am after. As I said, I would like to know what the UK universities represented by JISC are paying annually for Elsevier journals (a combination of Core Collections and access to Science Direct). My main reason for wanting to know that is that I think it is in the public interest for people to know how much universities are spending.

However, there are more specific reasons that I am interested in the data. One is that because the cost to universities of their Core Collections is based on historic spend on print journals, there is the potential for very similar universities to pay very different amounts for a similar service from Elsevier. I have been told that this is the case — for example, Cambridge suffers because historically college libraries have subscribed to journals — but would like to have the data so that I can confirm this.

If you won’t give me this information on the grounds of commercial sensitivity, then just let me know, and it will save us all time.

That was on February 12th. Her next reply came on March 7th, and said this.

Thanks for this. I did intend for the slides to be useful to you, but now that you have explained more clearly what you are after can see this was not the case. They have, however, helped to move our conversation on. We are focused on delivering value for money to all our customers, including Cambridge. The most direct way to find out the information you are looking for with respect to Cambridge might be a conversation with the library there?

So after all that, I still didn’t have a straight answer. However, by then I had long since lost patience: on February 19th, I submitted Freedom of Information requests to all 24 Russell Group universities, with the exceptions of Cardiff, where my email kept bouncing back, and Exeter, which I missed out accidentally. (Later I sent requests to them too.) My request was as follows.

Dear [Head of university library],

I would like to make a request under the Freedom of Information Act. I am interested to know what [name of university] currently spends annually for access to Elsevier journals. I understand that this is typically split into three parts, a subscription price for core content, which is based on historic spend, a content fee for accessing those journals via ScienceDirect, and a further fee for accessing unsubscribed titles from the Freedom Collection, also via ScienceDirect. I would like to know the total fee, and how it is split up into those three components.

Many thanks in advance for any help you can give me on this.

Yours sincerely,

Timothy Gowers

When I sent these requests, I had very little idea what my chances were of finding anything out at all. Lorraine Estelle had told me that JISC Collections are firmly against confidentiality clauses, but that Elsevier had insisted. But also, and crucially, there was a clause about FOI requests that made it not completely certain that they would fail. Unfortunately, this clause cannot be made public. (Yes, you read that correctly: the confidentiality clause is itself confidential.) However, as we shall see, the responses by some of the universities give some indication of what is probably in it.

In the end, the result was that, to my surprise and delight, a substantial majority of universities decided to give me the information I wanted, though many of them gave me just the total and not the breakdown into its three components. Here are the figures from the 18 universities that were brave and public spirited enough to give me them, together with Edinburgh, which, for reasons I don’t understand, refused to give any figures to me but provided them to Sean Williams. The figures *exclude* VAT, which adds a not exactly negligible 20% to the cost, but at least that goes back to the taxpayer rather than swelling even further the coffers of Elsevier. The price is rounded to the nearest pound. I obtained the enrolment figures from this page.

**Update 25/4/2014:** Richard van Noorden has kindly pointed me to a document from which I can obtain staff numbers. So I’ve now added a third column to the table, which gives the number of full-time academic staff followed by the number of part-time academic staff. (These figures are for the academic year 2012/3. Again, they may not be a perfect measure of how much people are using Elsevier journals, but they are probably better than student numbers.)

**Update 28/4/2014** Imperial College London has responded to my request for a review of their initial decision by providing me with their total figure (but not the breakdown).

**Update 30/4/2014** The University of Nottingham has done the same. The breakdown is not provided because they “consider the likelihood and scale of prejudice here [to both Elsevier's and the University's commercial interests] to be very high and therefore the test favours application of the exemption.” It is clear that there is some kind of game going on here, since everybody knows that the breakdown is basically that almost the entire amount is the subscription fee, with the content fee and Freedom Collection fee being a tiny proportion of the whole. (See below for an explanation of what I am talking about here.) So there is no imaginable effect that publishing the exact numbers could possibly have. However, equally, it is not all that important to know them.

**Update 16/5/2014** Queen Mary University of London has supplied their total figure to Edward Hughes, who is there.

**Update 23/5/2014** I now have the figures from Oxford.

**Update 31/5/2014** Figures from LSE added.

University |
Cost |
Enrolment |
Academic Staff |

Birmingham | Â£764,553 | 31,070 | 2355 + 440 |

Bristol | Â£808,840 | 19,220 | 2090 + 525 |

Cambridge | Â£1,161,571 | 19,945 | 4205 + 710 |

Cardiff | Â£720,533 | 30,000 | 2130 + 825 |

*Durham | Â£461,020 | 16,570 | 1250 + 305 |

**Edinburgh | Â£845,000 | 31,323 | 2945 + 540 |

*Exeter | Â£234,126 | 18,720 | 1270 + 290 |

Glasgow | Â£686,104 | 26,395 | 2000 + 650 |

Imperial College London | Â£1,340,213 | 16,000 | 3295 + 535 |

King’s College London | Â£655,054 | 26,460 | 2920 + 1190 |

Leeds | Â£847,429 | 32,510 | 2470 + 655 |

Liverpool | Â£659,796 | 21,875 | 1835 + 530 |

Â§London School of Economics | Â£146,117 | 9,805 | 755 + 825 |

Manchester | Â£1,257,407 | 40,860 | 3810 + 745 |

Newcastle | Â£974,930 | 21,055 | 2010 + 495 |

Nottingham | Â£903,076 | 35,630 | 2805 + 585 |

Oxford | Â£990,775 | 25,595 | 5190 + 775 |

* ***Queen Mary U of London | Â£454,422 | 14,860 | 1495 + 565 |

Queen’s U Belfast | Â£584,020 | 22,990 | 1375 + 170 |

Sheffield | Â£562,277 | 25,965 | 2300 + 460 |

Southampton | Â£766,616 | 24,135 | 2065 + 655 |

University College London | Â£1,381,380 | 25,525 | 4315 + 1185 |

Warwick | Â£631,851 | 27,440 | 1535 + 305 |

*York | Â£400,445 | 17,405 | 1205 + 285 |

*Joined the Russell Group two years ago.

**Information obtained by Sean Williams.

***Information obtained by Edward Hughes.

Â§LSE subscribes to a package of subject collections rather than to the full Freedom Collection.

~~The universities for which I still do not have the information are~~ ~~Imperial College London~~, ~~London School of Economics and Political Science,~~ ~~Nottingham,~~ ~~and Oxford.~~ ~~, and Queen Mary University of London.~~ ~~I still have hopes of finding out the figures for~~ ~~Imperial~~, ~~Nottingham and~~ ~~Oxford, and will provide them if I do.~~

A striking aspect of these amounts is just how much they vary. How does it come about, for example, that University College London pays over twice as much as King’s College London, and almost six times as much as Exeter? In order to explain this, I need to say something about the system as it is at the moment. It is here that I am indebted to Lorraine Estelle.

The present system (as it is in the UK, but my guess is that these remarks apply more generally) would be inexplicable were it not for the fact that it grew out of an older system that existed before the internet. Given that fact, though, it makes a lot more sense. (I don’t mean that it is fair — just that its existence is comprehensible.) If you were an Elsevier executive managing the transition from a world of print journals to a world where most people want to read articles online, what service would you offer and what would you do about prices? Since it costs almost nothing to make articles that are already online available to more people, and since it is convenient for a university to have access to everything, the obvious service to offer is complete access to all Elsevier journals. But what should you charge for this service?

Up to now, different universities have spent significantly different amounts on Elsevier journals, so if you start all over again and work out a price for the complete package, either some universities will have to pay much more than they did before, which they would probably be unwilling to do, or some universities will end up paying much less than they did before and profits will suffer quite badly. So you try to devise a system that will give universities the new service at prices that are based on the old service. That way, no university ends up paying significantly more or less than it did before. But because this is unfair — after all, now different universities will be paying very different amounts for the same service — you feel that you can’t let the universities know what other universities are paying.

The current system in the UK is very much as the above thought experiment would lead one to expect. So it is easy to see why Elsevier wants confidentiality clauses. It also explains the rather strange structure of the deals that universities have with Elsevier. Typically they have a certain “core content” (roughly, the journals they subscribed to before the transition), for which they pay something close to list prices and receive print copies. They then pay a small extra fee for permanent electronic access to that core content, and another small extra fee for electronic access to all other Elsevier journals, but this time only while the university continues to have a contract with Elsevier. Of course, in such a situation a university would like to cut down its core content to zero, but that is not allowed: there are strict controls on what they are allowed to cancel. The buzz phrase here is “historic spend”, which roughly means what universities spent on print subscriptions before the transition to electronic access. The system ensures that what universities pay now closely matches their historic spend.

Here is how Lorraine Estelle explains it.

Prior to the move to online journal, each institution subscribed to titles on a title by title basis.

When NESLI was set up, our negotiations were confined to the “e-fee” or “top-up fee”.

This was the fee that institutions needed to pay in order to have access to all a publisher’s content in electronic format. Their “subscribed titles” plus all other titles from that publisher. (This is the deal that has become known as “The Big Deal’ and adopted by all major publishers).

The “e-fee” or “top-up fee” was (and usually is still) contingent of the institutions maintaining the level of spend for the “subscribed titles”.This article provides the background to NESLI http://www.uksg.org/serials/nesli back in 1998

As institutions have moved to e-only – we negotiate with most publishers on the total cost across the consortium. However, in most (but not all) deals the division of spend across the UK library consortium is uneven – and still depends on the level of historic spend on subscribed titles. So an institution that used to subscribe to many titles, will still pay more than one that used to subscribe to fewer.

We negotiate the total increase – known as the price cap, the cancellation allowance (which means institutions can cancel a percentage of historically subscribed titles and still retain e-access), and the licence terms and conditions. This is not unique and it is the model employed by most academic library consortia across the world.

The deal is negotiated by Jisc Collections – but we do have support and input from the institutions. Oversight of our negotiations is provided by our Electronic Information Resources working group http://www.jisc-collections.ac.uk/About-JISC-Collections/Advisory-Groups/Electronic-Resources-Information-Group/ It is very rare for an institution to negotiate its own deal, because it would be difficult for them to get the same terms on an individual basis. The few exceptions are where an institution has a special relationship with a publisher – University of Oxford for OUP titles, for example.

All this is important, because it shows that a certain picture of how Elsevier operates, one that I used to believe in, is an oversimplification. In that picture, Elsevier insists on confidentiality clauses in order to be able to screw each university for whatever it can get. However, such a description is misleading on two counts. First, Elsevier negotiates with JISC rather than directly with universities, and secondly, the amount that universities pay is based on historic spend rather than on what Elsevier manages to wring out of them.

I say “an oversimplification” rather than “wrong” because if Elsevier *did* operate in the way I had previously imagined, the results would probably be rather similar. What is the maximum that Elsevier would be likely to persuade a university to pay? It would be very hard to persuade a university to agree to a huge leap in prices, so in each year one would expect the maximum to be whatever the university paid in the previous year plus a small real-terms increase. And all the evidence suggests that that is more or less exactly what Elsevier has managed to achieve.

Another factor that is perhaps worth briefly discussing is the fact that Durham, Exeter, Queen Mary University of London and York joined the Russell Group only two years ago. This probably helps to explain why (apart from QMUL, which refused to provide me with its figures) these universities are paying significantly less than most of the others. Whether Elsevier had an explicit policy of charging less to supposedly less prestigious universities (though the list of universities not in the Russell Group contains several that appear to me to be at least as prestigious as several that are in the Russell Group), or whether there is merely a strong correlation between membership of the Russell Group and historic spend on Elsevier journals, I don’t know. I think the former may be the case, since I have heard librarians talking about a “banding system” (I don’t know any details about how it works), and also because Bergstrom et al mention in their paper that in the US there is a classification of universities into different types according to how research intensive they are, with prices depending to a considerable extent on this classification.

A further factor that may possibly explain some of the data is that some institutions have recently merged with others. For example, The University of Manchester, one of the universities that pays most, merged in 2004 with UMIST (University of Manchester Institute of Science and Technology), and UCL merged in 2012 with The School of Pharmacy, University of London. The latter fact may help to explain why they are paying so much more now than what David Colquhoun said they were paying in 2011.

Although the differences between the amounts that different universities pay are eye-catching, it is important to be clear that they are a *symptom* of what is wrong with the system, and not the problem itself. The problem is quite simply that Elsevier has a monopoly over a product for which the demand is still very inelastic (the lack of elasticity being largely the fault of the academic community), with the result that the prices are unreasonably high for the service that Elsevier provides. (It bears repeating that the refereeing process and editorial selection are not paid for by Elsevier — those services are provided free of charge by academics.) If Elsevier were to equalize the prices (or equalize some suitable quantity such as price divided by size of university, or price per use) while keeping the aggregate the same, this would *not* solve the underlying problem.

As I have explained above, the price that a typical university pays to Elsevier in its Big Deal is divided into three components. One is a “subscription fee”, which is to pay for a certain collection of journals at something comparable to their list prices. Another is a “content fee”, which is to pay for electronic access in perpetuity to those titles (via ScienceDirect). The third is a “Freedom Collection fee”, which is to pay for electronic access to the rest of Elsevier’s journals, but this access, unlike the access covered by the content fee, is lost if you cancel the Big Deal.

I have got breakdowns from seven universities, but rather than give them here, I would rather simply make a few general points about them.

1. The content fee (that is, the fee for electronic access to the subscribed titles) is, in all the cases I know about, very close to 5.8824% of the subscription fee. Since 1/17=0.05882352941, I think that is saying that the content fee is exactly one seventeenth of the subscription fee, with the tiny differences coming from rounding errors. Of course, the precise details here are unimportant: what matters is that it is a very small amount compared with the subscription fee itself.

2. The Freedom Collection fees do not have an obvious relationship with the subscription fee, but, amusingly, with the seven examples I have, the more you pay for the latter, the less you pay for the former. That actually makes some kind of sense, since the more you are paying the content fee, the bigger the chunk of the Freedom Collection you are already subscribing to. I haven’t managed to reverse-engineer any kind of simple quantitative relationship between the two prices, however.

3. The inverse relationship in point 2 might seem to make things fairer, and to a very small extent it does, but we are talking about fees of between Â£10,000 and Â£25,000 here, so even for a university with a small subscription fee the price of the Freedom Collection fee is well under a tenth of its subscription fee. In fact, it doesn’t even make up for the discrepancy in the content fees, because the price is not high enough to do so. Of course, it is grotesquely misleading to say that the Freedom Collection costs so little, because the price you pay for it is conditional on not cancelling the subscriptions that keep the subscription fee extremely high. Indeed, the entire “breakdown” is misleading for that reason: the effective cost of the Freedom Collection is far higher than its nominal cost.

The moral of all this is that the figures giving the total cost are what matter. What universities actually need is electronic access to Elsevier’s journals. In order to get that access, Elsevier insists that they nominally pay for something else, namely subscriptions that they are not allowed to cancel (even when they are duplicates, as has happened in Cambridge because of college libraries, and probably in Manchester and UCL as a result of mergers). But that is of no practical importance. It’s a bit like those advertisements that say “FREE OFFER!” and then in very small print they add “when you spend over Â£X,” which of course means that the so-called free offer is not free at all.

While I was still not at all sure that I would get any information about prices, I comforted myself with the thought that an institution that refuses a FOI request has to give reasons, and those reasons might well be informative. For example, they might reveal that the main reason for confidentiality is to protect Elsevier’s profits, which would conflict with Elsevier’s official reasons.

Or would it? If you’ve read this far, then your reward is the following rather wonderful video (which has done the rounds for a while, so you may have seen it) of David Tempest, from Elsevier, explaining why confidentiality clauses are necessary. Many thanks to Mike Taylor for obtaining it. A transcript can be found on his blog.

The person who asked the question is Stephen Curry, from Imperial College London. ~~I’m sorry to say that, as mentioned above, Imperial is one of the universities I have not managed to get figures from.~~ I’m glad to say that at last he can know what his university library is spending on his behalf.

David Tempest’s lapse aside, Elsevier usually does not admit that the confidentiality clauses are there to protect its profits. But the refusal letters I received tell a different story. A good example is the first response I had from any university (other than an acknowledgement), which was a refusal from Queen’s University Belfast. I will quote it in full.

Dear Mr Gowers

Freedom of Information Request â€“ Elsevier JournalsMy letter, dated 21 February 2014, in relation to the above refers. [sic]

Having reviewed your request and consulted with appropriate colleagues, I would respond as set out below:

I would like to make a request under the Freedom of Information Act. I am interested to know what Queen’s University Belfast currently spends annually for access to Elsevier journals. I understand that this is typically split into three parts, a subscription price for core content, which is based on historic spend, a content fee for accessing those journals via ScienceDirect, and a further fee for accessing unsubscribed titles from the Freedom Collection, also via ScienceDirect. I would like to know the total fee, and how it is split up into those three components.I can confirm that whilst the University does hold this information, it is not being provided to you as it is considered exempt under Section 43(2) of the Act.

Section 43(2) of the Act provides that information is exempt if its disclosure under the Act would be likely to prejudice the commercial interests of any person, including the public authority itself.

Commercial interests relate to the ability to successfully participate in a commercial activity. This could be the ability to buy or sell goods or services or the disclosure of financial and planning information to market competitors. It is, therefore, necessary to decide whether release of this information will have an impact on the commercial activity of Elsevier or the University.

In making this determination, the University has consulted with Elsevier regarding the disclosure of the requested information and whether such disclosure would be likely to prejudice Elsevierâ€™s commercial interests.

In written representations to the University, Elsevier has indicated that the disclosure of the amount of money spent annually on access to Elsevier journals would reveal pricing information, specifically the licensing fees that have been negotiated with the University in circumstances that may include a level of discount.

The disclosure of this information would be likely to have a detrimental effect on Elsevierâ€™s future negotiating position with that of the University and, indeed, the wider HEÂ sectorÂ â€“ which representsÂ a large percentage of their market.

The University accepts this argument and also considers that disclosureÂ of information that would reveal pricing would also be likely to prejudice the commercial interests of the University itself, insofar as it could have a detrimental impact on the future negotiation of tailoredÂ solutions for licensing of Elsevierâ€™sÂ products and discounts from list prices. Â â€¨

Section 43(2) is a qualified exemption and the University must, therefore, consider where the balance of the public interest lies.The University accepts the need for transparency and accountability for decision making. The requirement, however, for transparency and accountability needs to be weighed against the harm to the commercial interests of third parties or the University itself through disclosure. The University has, therefore, weighed the prejudice caused by disclosure of the requested information against the likely benefit to the wider public.

In considering arguments in favour of disclosing the information, the University has taken into account the wider interest of the general public in having access to information on how public funds are spent. In this instance, there is a public interest in demonstrating that the University has negotiated a competitive rate in relation to the procurement of Elsevierâ€™s products and services.

The University considers, however, that this public interest is already met by the significant amount of pricing information that Elsevier currently makes publicly available â€“ such information is available at:

http:\www.elsevier.com/librarians/journal-pricing and

http:\www.elsevier.com/librarians/physical-sciences/mathematics/journal-pricing.In relation to those factors favouring non-disclosure, the University has a duty to protect commercially sensitive information that is held about any third party. In this instance, disclosure of the amount of money spent by the University on Elsevier products would reveal pricing information that was acknowledged by both the University and Elsevier at the time the contract was entered into as being commercially confidential. Disclosure of this information would be likely to prejudice not only the commercial interests of Elsevier but also the interests of the University itself, along with the relationship that the University has with its supplier.

It is reasonable, therefore, in all the circumstances of this case that the exemption should be maintained and the requested information not disclosed.

If you are dissatisfied with the response provided, please put your complaint in writing to me at the above address. If this fails to resolve the matter, you have the right to apply to the Information Commissioner.

Yours sincerely

Amanda Aicken

Information Compliance Unit

I responded as follows.

Dear Amanda Aicken,

Thank you for your response to my Freedom of Information Request (reference FOI/14/42). You invited me to write to you if I was dissatisfied with it. I have a number of reasons for dissatisfaction, so I am taking you up on your invitation.

My main objection is that I disagree with several of your reasons for declining my request. I will present them as a numbered list.

1. You say that the disclosure of the information I ask for would be likely to have a detrimental effect on Elsevier’s future negotiating position with that of the university. You also say that it would be likely to prejudice the commercial interests of the university itself. I do not find these two statements easy to reconcile. Could you please explain how it is possible for

bothparties to lose out?2. You agree with me that there is a public interest in demonstrating that the university has negotiated a competitive rate in relation to the procurement of Elsevier’s products and services. You go on to say that this public interest is already met by the information that Elsevier has made publicly available online. However, this is manifestly untrue. The only figures provided by Elsevier are for the list prices of their journals. But since universities pay for Elsevier’s Freedom Collection with a Big Deal, the list prices do not give me any way of verifying that the university has negotiated a competitive rate. Indeed, they do not even allow me to work out the order of magnitude of how much Queen’s University is paying to Elsevier. Please would you either retract your statement that this public interest has already been met by Elsevier, or else explain to me how to use the list prices to estimate the total amount paid by Queen’s University?

3. Your letter implies that there are direct negotiations between Elsevier and Queen’s University of Belfast. However, this is also not true. The negotiations are mediated through JISC. Therefore, there is no obvious mechanism whereby disclosing the prices would cause any commercial harm to the university.

4. It has not escaped my notice that the letter you sent is remarkably similar to a letter sent by the University of Swansea to somebody else who made a similar request. It is clear that you used that letter as a template, or else that you and the University of Swansea used the same template, perhaps provided by Elsevier. This suggests to me that you have not considered the balance of arguments for and against disclosure with sufficient independence.

In summary, the main two points that I cannot accept are that the financial interests of Queen’s University are likely to be prejudiced by the disclosure of this information, and that there is sufficient information in the public domain to enable me to determine whether the university has negotiated a competitive rate. If you are going to refuse to disclose the information, then I would like it to be for reasons that are not obviously false.

Yours sincerely,

Timothy Gowers

The Swansea letter I referred to is this one, which I have already mentioned. It was the formulaic nature of the response, with ghastly Orwellian phrases such as “tailored solutions” and misleading references to “a level of discount” that appeared not just in these two letters but in many other refusal letters that I was to receive, that got me annoyed enough to express my dissatisfaction, which in the case of Queen’s University Belfast and a handful of other universities eventually resulted in success. The response I received to my letter above was as follows. It did not really address my arguments, but since it gave me the information that was not a big concern.

Dear Mr Gowers,

Freedom of Information Request — Elsevier Journals — Internal ReviewYour email to Mrs Amanda Aicken, dated 5 March 2014, requesting an internal review of the University’s response to your Freedom of Information request on the above, refers.

On 21 February 2014, you submitted a request for information in relation to the University’s annual expenditure on access to Elsevier Journals. You requested details of the total fee and how this is split up into three components: a subscription price for core content; a contnet fee for accessing those journals via ScienceDirect; and a further fee for accessing unsubscribed titles from the Freedom Collection.

On 4 March 2014, the University responded to your request, confirming that whilst this information was held, it was not being provided to you as it was considered commercially sensitive information and, therefore, was exempt under Section 43(2) of the Act. The University had made this determination following consultation with Elsevier, which had indicated that the disclosure of the requested information would prejudice its commercial interests by revealing pricing information. In particular, Elsevier argued that disclosure of the information would reveal the licensing fees that had been negotiated with the University in circumstances that may have included a level of discount.

I understand that you, subsequently, lodged a complaint in respect of the University’s response to your request and this complaint has been handled as an internal review of the decision not to provide the requested information.

You have expressed dissatisfaction with the response on the grounds that you ‘cannot accept (are) that the financial interests of Queen’s University are likely to be prejudiced by the disclosure of this information, and that there is sufficient information in the public domain to enable me to determine whether the University has negotiated a competitive rate’.

I have now completed my review and my findings are detailed below.

I have reconsidered the nature of the requested information and the application of the exemption to withhold this information. In doing so, I have taken into account written advice from relevant senior staff in the University’s McClay Library and advice received from JISC regarding the detail of the contract with Elsevier. I have also noted your comments regarding the need for transparency and the public interest in demonstrating that the University has negotiated a competitive rate in relation to the procurement of Elsevier’s products and services.

At the time of your request, the University was clearly of the view that disclosure of the requested information would be likely to have a detrimental effect on Elsevier’s future negotiating position with that of the University and, indeed, the wider HE sector. An additional, albeit secondary argument, was the possibility that disclosure would prejudice the interests of the University itslef with respect to the relationship that the University has with Elsevier as a supplier. I am persuaded that that [sic] this was not, in the circumstances, an unreasonable view.

I do, however, believe that on balance, the public interest in disclosure was greater than that in maintaining the commercial interests exemption. I also understand that subsequent to your original request, several institutions have disclosed information, either in relation to the total annual expenditure on access to Elsevier Journals, or on the detailed breakdown of expenditure as requested.

In light of the above, it is my view that the information should now be disclosed. I am, therefore, providing the requested information in relation to 2014 — this is provided in the table below.

I have had several correspondences like this. I would like to pick out a couple of excerpts from other refusal letters that are not essentially contained in the Belfast letter. I had this rather chilling paragraph from Queen Mary University of London.

However, in addition to the reasons outlined above already, revealing this information to the world at large may damage the relationship that QML has with Elsevier including the prospect of legal action that may be taken against QML. This could result in QML being unable to offer Elsevier products which would have the knock-on effect of impacting our resources, our research and even student recruitment. Since these would imperil QMLâ€™s finances, in financially tough times and while receiving less and less from the public purse, this cannot be said to be in the public interest.

It would be interesting to know what Elsevier said to them to provoke that. Because of this paragraph, I felt sorry for QMUL and decided not to request a review of their decision (16/5/2014 — they have now provided the total figure to Edward Hughes, perhaps reasoning that there was safety in numbers).

However, the following paragraph from Oxford had the opposite effect on me.

Maintaining confidentiality with regard to the information requested enables the University and Elsevier to arrive at a fair and competitive negotiated and customised price. Full pricing transparency would mean that the best pricing model publishers could offer would be list price, which would be likely to result in increased costs to the University. Disclosure of pricing terms would inhibit publishersâ€™ ability to develop flexible, tailored solutions suitable for a particular customerâ€™s needs.

Part of my response to that was that the statement beginning “Full pricing transparency” was manifestly false: publishers could offer any model they like. Also, that “tailored solutions” phrase is a red rag to a bull: knowing about how the system works, and how little it is “tailored for a particular customer’s needs”, I cannot read it without getting annoyed. I have requested a review from Oxford ~~but not yet heard back (though they should, legally, have responded by now).~~ and they have now given me their total figure.

Incidentally, although I wrote initially to librarians, they were legally obliged to pass my requests on to their Freedom of Information offices, so the letters I got back were (mostly) from bureaucrats. So when I got refusals, this did not necessarily reflect the wishes of the librarians, who stand to gain from the prices being known.

When it comes to high prices and confidentiality contracts, Elsevier are not the only offenders, though there is some anecdotal evidence that they are the leaders, in the sense that other publishers use Elsevier as a benchmark to see what they can get away with. So why submit Freedom of Information requests for Elsevier contracts without doing the same for Springer, Wiley, Taylor-Francis, etc.?

There is no good reason. My answer to this inevitable question is that I do not regard the work of finding out about journal prices as finished. I will report on this blog if and when I or other people find out about other publishers and other universities.

There is a great deal more that could be said about journal prices and what should be done about them. However, this post has passed the 10,000-word mark, so I shall leave further discussion for a second post. Among the questions I intend to address are the following, many of which concern other big publishers just as much as they concern Elsevier.

1. Is it fair to say that Elsevier is a monopoly?

2. Does Elsevier’s pricing policy violate competition law?

3. What would be a fair system for charging for electronic access to a large collection of journals?

4. Are the current prices really all that unreasonable, given the importance to science of journal articles?

5. Is it better for university libraries to form consortia or should they negotiate individually?

6. What would be the implications for Cambridge (and perhaps other universities too) of a switch to paying list prices for individual journals?

7. Different subjects have very different publishing cultures and very different needs. Are they better off campaigning together in a single open access movement or would it be better to have a fragmented movement, with different subjects campaigning separately for their different interests?

8. What more can be done to accelerate a move towards a cheaper journal system?

]]>

A good way to test your basic knowledge of (some of) the course would be to do a short multiple-choice quiz devised by Vicky Neale. If you don’t get the right answer first time for every question, then it will give you an idea of the areas of the course that need attention.

Terence Tao has also created a number of multiple-choice quizzes, some of which are relevant to the course. They can be found on this page. The quiz on continuity expects you to know the definitions of adherent points and limit points, which I did not discuss in lectures.

The first five posts on this blog in the IA Analysis category are devoted to the questions on this course in the 2003 Tripos. The course has not changed much since then, so these questions are similar to the kind of thing that could be set now. I try to say not just what the answers are but how I thought of them, how I decided what to write out in detail and what just to assume, and so on. They may be of some use when you prepare for the exams.

A long time ago I wrote a number of informal discussions of undergraduate mathematical topics. My ideas about some of these are not always identical to what they were then, but again you may find some of them helpful, particularly the ones on analysis.

If I think of further resources, I’ll add them to the post.

Finally, I’ve very much enjoyed giving this course — thanks for being a great audience (if that’s the right word).

]]>

and

relate to things like the opposite, adjacent and hypotenuse. Using the power-series definitions, we proved several facts about trigonometric functions, such as the addition formulae, their derivatives, and the fact that they are periodic. But we didn’t quite get to the stage of proving that if and is the angle that the line from to makes with the line from to , then and . So how does one establish that? How does one even *define* the angle? In this post, I will give one possible answer to these questions.

A cheating and not wholly satisfactory method would be to define the angle to be . Then it would be trivial that and we could use facts we know to prove that . (Or could we? Wouldn’t we just get that it was ? The fact that many angles have the same and creates annoying difficulties for this approach, though ones that could in principle be circumvented.) But if we did this, how could we be confident that the notion of angle we had just defined coincided with what we think angle should be? The problem has not been fully solved.

Another approach might be to define trigonometric functions geometrically, prove that they have the basic properties that we established using the power series definitions, and prove that these properties characterize the trigonometric functions (meaning that any two functions and that have the properties must be and ). However, this still requires us to make sense of the notion of angle somehow, and we might also feel slightly worried about whether the geometric arguments we used to justify the addition formulae and the like were truly rigorous. (I’m not saying it can’t be done satisfactorily — just that I don’t immediately see a good way of doing it, and I have a different approach to present.)

How are radians defined? You take a line L starting at the origin, and it hits the unit circle at some point P. Then the angle that line makes with the horizontal (or rather, the horizontal heading out to the right) is defined to be the length of the circular arc that goes anticlockwise round the unit circle from to P. (This defines a number between 0 and , but we can worry about numbers outside this range later.)

There is nothing wrong with this definition, except that it requires us to make rigorous sense of the length of a circular arc. How are we to do this?

For simplicity, let’s assume that our point P is and that both and are positive. So P is in the top right quadrant of the unit circle. How can we define and then calculate the length of the arc from to , or equivalently from to ?

One non-rigorous but informative way of thinking about this is that for each between and , we should take an interval , work out the length of the bit of the circle vertically above this interval, and sum up all those lengths. The bit of the circle in question is a straight line (since is infinitesimally small) and by similar triangles its length is .

How did I write that down? Well, the big triangle I was thinking of was one with vertices , and the point on the circle directly above , which is , by Pythagoras’s theorem. The little triangle has one side of length , which corresponds to the side in the big triangle of length . So the hypotenuse of the little triangle is , as I claimed.

Adding all these little lengths up, we get , so it remains to evaluate this integral.

This is of course a very standard integral, usually solved by substituting or for . If you do that, you find that the length works out as , which is just what we hoped. However, we haven’t discussed integration by substitution in this course, so let us see it in a more elementary way (not that proving an appropriate form of the integration-by-substitution rule is especially hard).

Using the rules for differentiating inverses, we find that

and since , this gives us . So the integrand has as an antiderivative, and therefore, by the fundamental theorem of calculus,

So the angle between the horizontal and the line joining the origin to is (by definition) the length of the arc from to , which we have calculated to be . Therefore, .

The process I just went through, of saying “Let’s add up a whole lot of infinitesimal lengths; that says we should write down the following integral; calculating the integral gives us L, so the length is L,” is a process that one often goes through when calculating similar quantities. Why are we so confident that it is OK?

I sometimes realize with mathematical questions like this that I have been a mathematician for many years and never bothered to worry about them. It’s just sort of obvious that if a function is reasonably nice, then writing something down that’s approximately true with and turning into and writing a nice sign in front gives you a correct expression for the quantity in question. But let’s try to think a bit about how we might define length rigorously.

First, we should say what a curve is. There are various definitions, according to how much niceness one wants to assume, but let me take a basic definition: a curve is a continuous function from an interval to . (I haven’t defined continuous functions to , but it simply means that if , then and are both continuous functions from to .)

This is an example of a curious habit of mathematicians of defining objects as things that they clearly aren’t. Surely a curve is not a function — it’s a special sort of subset of the plane. In fact, shouldn’t a curve be defined as the *image* of a continuous function from to ? It’s true that that corresponds more closely to what we are thinking of when we use the word “curve”, but the definition I’ve just given turns out to be more convenient, though it’s important to add that two curves (as I’ve defined them) and are *equivalent* if there is a strictly increasing continuous bijection such that for every . In this situation, we think of and as different ways of representing the same curve.

Incidentally, if you want a reason not to identify curves with their images, then one quite good reason is the existence of objects called *space-filling curves*. These are continuous functions from intervals of reals to that fill up entire two-dimensional sets. Here’s a picture of one, lifted from Wikipedia.

It shows the first few iterations of a process that gives you a sequence of functions that converge to a continuous limit that fills up an entire square.

Going back to lengths, let’s think about how one might define them. The one thing we know how to define is the length of a line segment. (Strictly speaking, I’m not allowed to say that, since a line segment isn’t a function, but let’s understand it as a particularly simple function from an interval to a line segment in the plane.) Given that, a reasonable definition of length would seem to be to approximate a given curve by a whole lot of little line segments. That leads to the following idea for at least approximating the length of a curve . We take a dissection and add up all the little distances . Here I am defining the distance between two points in in the normal way by Pythagoras’s theorem. This gives us the expression

for the approximate length given by the dissection. We then hope that as the differences get smaller and smaller, these estimates will tend to a limit. It isn’t hard to see that if you refine a dissection, then the estimate increases (you are replacing the length of a line segment that joins two points by the length of a path that consists of line segments and joins the same two points).

Actually, that hope is not always fulfilled: sometimes the estimates tend to infinity. Indeed, for space-filling curves, or fractal-like curves such as the Koch snowflake, the estimates *do* tend to infinity. In this case, we say that they have infinite length. But if the estimates tend to a limit as the maximum of the differences tends to zero, we call that limit the length of the curve. A curve that has a finite length defined this way is called *rectifiable*.

Suppose now that we have a curve given by and that the two functions and are continuously differentiable. Then both and are bounded on , so let’s suppose that is an upper bound for and . Then by the mean value theorem,

Therefore, for every dissection, which implies that the curve is rectifiable. (Remark: I didn’t really use the continuity of the derivatives there — just their boundedness.)

We can say slightly more than this, however. The differentiability of tells us that for some . And similarly for with some . Therefore, the estimate for the length can be written

This looks very similar to the kind of thing we write down when doing Riemann integration, so let’s see whether we can find a precise connection. We are concerned with the function . If we now *do* use the continuity of and , then is continuous too, so it can be integrated. Now since and belong to the interval , and both lie between the lower and upper sums given by the dissection. That implies the same for

Since is integrable, the limit of as the largest (which is often called the *mesh* of the dissection) tends to zero is .

We have shown that the length of the curve is given by the formula

Now, finally, let’s see whether we can justify our calculation of the length of the arc of the unit circle between and . It would be nice to parametrize the circle as , but we can’t do that, since we are defining using length, so we would end up with a circular definition (in more than one sense). [Actually, we *can* do something very close to this. See the final section of the post for details.] So let’s parametrize it as follows. We’ll define on the interval and we’ll send to . Then and , so

So the length is , which is exactly the expression we wrote down earlier.

Let me make two quick remarks about that. First, you might argue that although I have shown that the final *expression* is indeed correct, I haven’t shown that the informal *argument* is (essentially) correct. But I more or less have, since what I have effectively done is calculate the lengths of the hypotenuses of the little triangles in a slightly different way. Before, I used the fact that one side was and used similar triangles. Here I’ve used the fact that one side is and another side is and used Pythagoras.

A slightly more serious objection is that for this calculation I used a general result that depended on the assumption that both and are continuously differentiable, but didn’t check that the appropriate conditions held, which they don’t. The problem is that , so , which tends to infinity as and is undefined at .

However, it is easy to get round this problem. What we do is integrate from to , in which case the argument is valid, and then let tend to zero. The integral between and is , and that tends to .

One final remark is that this length calculation explains why the usual substitution of for in an integral of the form is not a piece of unmotivated magic. It is just a way of switching from one parametrization of a circular arc (using the x-coordinate) to another (using the angle, or equivalently the distance along the circular arc) that one expects to be simpler.

Thanks to a comment of Jason Fordham below, I now realize that we can after all parametrize the circle as . However, this is not the I’m trying to calculate, so let’s call it . I’m just taking to be an ordinary real number, and I’m defining and using the power-series definition. Then the arc of the unit circle that goes from to can be defined as the curve defined on the interval by the formula . The general formula for the length of a curve then gives us

So the length of the arc satisfies .

]]>

A preliminary question about this is why it is not more or less obvious. After all, writing , we have the following facts.

- Writing , we have that .
- For each , .

If we knew that , then we would be done.

Ah, you might be thinking, how do we know that the sequence converges? But it turns out that that is not the problem: it is reasonably straightforward to show that it converges. (Roughly speaking, inside the circle of convergence the series converges at least as fast as a GP, and multiplying the th term by doesn’t stop a GP converging (as can easily be seen with the help of the ratio test). So, writing for , we have the following facts at our disposal.

Doesn’t it follow from that that ?

We are appealing here to a general principle, which is that if some functions converge to and their derivatives converge to , then is differentiable with . Is this general principle correct?

Unfortunately, it isn’t. Suppose we take some continuous functions that converge to a step function. (Roughly speaking, you make be 0 up to 0, then linear with gradient until it hits 1, then 1 from that point onwards.) And suppose we then let be the function that differentiates to and is 0 up to 0. Then the converge to the function that is 0 up to 0 and for positive . This function *almost* differentiates to the step function, but it isn’t differentiable at 0.

So we’ve somehow got to use particular facts about power series in order to prove our result — we can’t appeal to general considerations, because then we are appealing to a principle that isn’t true. (Actually, in principle some compromise might be possible, where we show that functions defined by power series have a certain property and then use nothing apart from that property from that point on. But as it happens, we shall not do this.)

We have a formula for . Why don’t we write out a formula for and see if we can tell what happens when ?

That is certainly a sensible first thing to try, so let’s see what happens.

What can we do with that? Perhaps we’d better apply the binomial theorem. Then we find that the right-hand side is equal to

Part of the above expression gives us what we want, namely . So we’re left wanting to prove that

tends to 0 as .

Unfortunately, as gets big, some of those binomial coefficients get pretty big too. Indeed, when is bigger than , the growth in the binomial coefficients seems to outstrip the shrinking of the powers of . What can we do?

Fortunately, there is a better (for our purposes at least) way of writing . We just expanded out using the binomial theorem. But we could instead have used the expansion

Applying that with and , we get

Just before we continue, note that this gives us an alternative, and in my view nicer, way to see that the derivative of is , since if you divide the right-hand side by and let then each of the terms tends to .

Anyhow, if we use this trick, then works out to be

Now let’s subtract the thing we want this to tend to, which is . (This is not valid unless we know that this series converges. So at some stage we will need to prove that.) If we think of as a sum of copies of , then we can write the difference as

which equals

Now is another example of the expansion we had above. That is, we can write it as

We haven’t yet mentioned the radius of convergence of the original power series, but let’s do so now. Suppose it is , that is such that , and that we have chosen small enough that . Then the modulus of the expression above is at most .

It follows that

Since , this is equal to .

So this will tend to zero as as long as we can prove that the sum converges.

Let’s prove a lemma to deal with that last point. It says that if is smaller than the radius of convergence of the power series , then the power series converges.

The proof is very similar to an argument we have seen already. Let be the radius of convergence, and pick with . Then the power series converges, so the terms are bounded above, by , say. Then .

But the series converges, by the ratio test. Therefore, by the comparison test, the series converges.

This shows also that if then the power series converges (since we have just proved that it converges absolutely). So if we differentiate a power series term by term, we get a new power series that has the same radius of convergence, something we needed earlier.

If we apply this lemma a second time, we get that the power series converges, and dividing by 2 that gives us what we wanted above, namely that converges.

An obvious way of applying the result is to take some of your favourite power series and differentiate them term by term. This illustrates the very important general point that if you can obtain something in two different ways, then you usually end up proving something interesting.

So let’s take the function , which we have shown converges everywhere. Then we can obtain the derivative either by differentiating the function itself or by differentiating the power series term by term. That tells us that

, which simplifies to , which in turn simplifies to , which equals .

Earlier we proved this result by writing as and proving that . I still prefer that proof, but you are at liberty to disagree.

As another example, let us consider the power series . When this equals , by the formula for summing a GP. We can now differentiate the power series term by term, and we can also differentiate the function . Doing so tells us the interesting fact that

We can see that in another way as well. By our result on multiplying power series, the product of with itself is the power series , where is the convolution of the constant sequence with itself. That is, with every and equal to 1, which gives us . (This agrees with the previous answer, since is the same as .)

In the proof above, we used the identity

with and , and then we used it again to calculate what happened when we subtracted . Can we get those calculations out of the way in advance? That is, can we begin by finding a nice formula for ?

We obviously can, by subtracting from the right-hand side and simplifying, much as we did in the proof above (with and ). However, we can do things a bit more slickly as follows. Start with the identity

Differentiating both sides with respect to , we get

If we now take for and for , we deduce that is equal to

In particular, if and are both at most , then , which is the main fact we needed in the proof.

Armed with this fact, we could argue as follows. We want to show that

is . By the inequality we have just proved, if and are at most , then the modulus of this expression is at most

and an earlier lemma told us that this converges within the circle of convergence. So the quantity we want to be is in fact bounded above by a multiple of . (Sometimes people use the notation for this. The means “bounded above in modulus by a constant multiple of the modulus of”.)

The proof in this post has relied heavily on the idea, which appeared to come from nowhere, of writing not in the obvious way, which is

but in a “clever” way, namely

Is this something one just has to remember, or can it be regarded as the natural thing to do?

I chose the words “can it be regarded as” quite carefully, since I want to argue that it is the natural thing to do, but when I was preparing this lecture, I didn’t find it the natural thing to do, as I shall now explain. I came to this result with the following background. Many years ago, I lectured a IB course called Further Analysis, which was a sort of combination of the current courses Metric and Topological Spaces and Complex Analysis, all packed into 16 lectures. (Amazingly, it worked quite well, though it was a challenge to get through all the material.) As a result of lecturing that, I learnt a proof that power series can be differentiated term by term inside their circle of convergence, but the proof uses a number of results from complex analysis. I then believed what some people say, which is that the complex analysis proof of this result is a very good advertisement for complex analysis, since a direct proof is horrible. And then at some point I was chatting to Imre Leader about the reorganization of various courses, and he told me that it was a myth that proving the result directly was hard. It wasn’t trivial, he said, but it was basically fine. In fact, it may even be thanks to him that the result is in the course.

Until a few days ago, I didn’t bother to check for myself that the proof wasn’t too bad — I just believed what he said. And then with the lecture coming up, I decided that the time had finally come to check it: something that I assumed would be a reasonably simple exercise. I duly did the obvious thing, including expanding using the binomial theorem, and got stuck.

I would like to be able to say that I then thought hard about why I was stuck, and after a while thought of the idea of expanding using the expansion of . But actually that is not what happened. What happened was that I thought, “Damn, I’m going to have to look up the proof.” I found a few proofs online that looked dauntingly complicated and I couldn’t face reading them properly, apart from one that was quite nice and that for a while I thought I would use. But one thing all the proofs had in common was the use of that expansion, so that was how the idea occurred to me.

So what follows is a rational reconstruction of what I *wish* had been my thought processes, rather than of what actually went on in my mind.

Let’s go back to the question of how to differentiate . I commented above that one could do it using the expansion, and said that I even preferred that approach. But how might one think of doing it that way? There is a very simple answer to that, which is to use one of the alternative definitions of differentiability, namely that is differentiable at with derivative if as . This is simply replacing by , but that is nice because it has the effect of making the expression more symmetrical. (One might argue that since we are talking about differentiability *at* , the variables and are playing different roles, so there is not much motivation for symmetry. And indeed, that is why calling one point and the other is often a good idea. But symmetry is … well … sort of good to have even when not terribly strongly motivated.)

If we use this definition, then the derivative of is the limit as of , and now there is no temptation to use the binomial expansion (we would first have to write as and the whole thing would be disgusting) and the absolutely obvious thing to do is to observe that we have a nice formula for the ratio in question, namely

which obviously tends to as .

In fact, the whole proof is arguably nicer if one uses and rather than and .

Thus, the “clever” expansion is the natural one to do with the symmetric definition of differentiation, whereas the binomial expansion is the natural one to do with the definition. So in the presentation above, I have slightly obscured the origins of the argument by applying the clever expansion to the definition.

Another way of seeing that it is natural is to think about how we prove the statement that a product of limits is the limit of the products. The essence of this is to show that if is close to and is close to , then is close to . This we do by arguing that is close to , and that is close to .

Suppose we apply a similar technique to try to show that is close to . How might we represent their difference? A natural way of doing it would be to convert all the s into s in a sequence of steps. That is, we would argue that is close to , which is close to , and so on.

But the difference between and is , so if we adopt this approach, the we will end up showing precisely that

]]>

The problem is to show that if is an infinite sequence of s, then for every there exist and such that has modulus at least . This result is straightforward to prove by an exhaustive search when . One thing that the Polymath project did was to discover several sequences of length 1124 such that no sum has modulus greater than 2, and despite some effort nobody managed to find a longer one. That was enough to convince me that 1124 was the correct bound.

However, the new result shows the danger of this kind of empirical evidence. The authors used state of the art SAT solvers to find a sequence of length 1160 with no sum having modulus greater than 2, and also showed that this bound is best possible. Of this second statement, they write the following: “The negative witness, that is, the DRUP unsatisfiability certificate, is probably one of longest proofs of a non-trivial mathematical result ever produced. Its gigantic size is comparable, for example, with the size of the whole Wikipedia, so one may have doubts about to which degree this can be accepted as a proof of a mathematical statement.”

I personally am relaxed about huge computer proofs like this. It is conceivable that the authors made a mistake somewhere, but that is true of conventional proofs as well. The paper is by Boris Konev and Alexei Lisitsa and appears here.

]]>

I have always found this situation annoying, because a part of me said that the result ought to be a straightforward generalization of the mean value theorem, in the following sense. The mean value theorem applied to the interval tells us that there exists such that , and therefore that . Writing for some we obtain the statement . This is the case of Taylor’s theorem. So can’t we find some kind of “polynomial mean value theorem” that will do the same job for approximating by polynomials of higher degree?

Now that I’ve been forced to lecture this result again (for the second time actually — the first was in Princeton about twelve years ago, when I just suffered and memorized the Cauchy mean value theorem approach), I have made a proper effort to explore this question, and have realized that the answer is yes. I’m sure there must be textbooks that do it this way, but the ones I’ve looked at all use the Cauchy mean value theorem. I don’t understand why, since it seems to me that the way of proving the result that I’m about to present makes the whole argument completely transparent. I’m actually looking forward to lecturing it (as I add this sentence to the post, the lecture is about half an hour in the future), since the demands on my memory are going to be close to zero.

We know that we want a statement that will involve the first derivatives of at , the th derivative at some point in the interval , and the value of at . The idea with Rolle’s theorem is to make a whole lot of stuff zero, and then with the mean value theorem we take a more general function and subtract a linear part to obtain a function to which Rolle’s theorem applies. So let’s try a similar trick here: we’ll make as much as we can equal to zero. In fact, I’ll go even further and make the values of and zero.

So here’s what I’ll assume: that and also that . That’s as much as I can reasonably set to be zero. And what should be my conclusion? That there is some such that . Note that if we set then we are assuming that and trying to find such that , so this result really does generalize Rolle’s theorem. (I’m also assuming that is times differentiable on an open interval that contains . This is a slightly stronger condition than necessary, but it will hold in the situations where we want to use Taylor’s theorem.)

The proof of this generalization is almost trivial, given Rolle’s theorem itself. Since , there exists such that . But as well, so by Rolle’s theorem, this time applied to , we find such that . Continuing like this, we eventually find such that . So we can set and we are done.

For what it’s worth, I didn’t use the fact that , but just that .

Now let’s take an arbitrary function that is -times differentiable on an open interval containing . To prove the mean value theorem, we subtracted a linear function so as to obtain a function that satisfied the hypotheses of Rolle’s theorem. Here, the obvious thing to do is to subtract a polynomial of degree to obtain a function that satisfies the hypotheses of our higher-order Rolle theorem.

The properties we need to have are that , , and so on all the way up to , and finally . It turns out that we can more or less write down such a polynomial, once we have observed that the polynomial has the convenient property that except when when it is 1. This allows us to build a polynomial that has whatever derivatives we want at . So let’s do that. Define a polynomial by

Then for . A more explicit formula for is

Now doesn’t necessarily equal , so we need to add a multiple of to correct for this. (Doing that won’t affect the derivatives we’ve got at .) So we want our polynomial to be of the form

and we want . So we want to equal , which gives us . That is,

A quick check: if we substitute in for we get , which does indeed equal .

For the moment, we can forget the *formula* for . All that matters is its *properties*, which, just to remind you, are these.

- is a polynomial of degree .
- for .
- .

The second and third properties tell us that if we set , then for and . Those are the conditions needed for our higher-order Rolle theorem. Therefore, there exists such that , which implies that .

Let us just highlight what we have proved here.

**Theorem.** *Let be continuous on the interval and -times differentiable on an open interval that contains . Let be the unique polynomial of degree such that for and . Then there exists such that .*

Note that since is a polynomial of degree , the function is constant. In the case , the constant is , the gradient of the line joining to , and the theorem is just the mean value theorem.

Actually, the result we have just proved *is* Taylor’s theorem! To see that, all we have to do is use the explicit formula for and a tiny bit of rearrangement. To begin with, let us use the formula

Note that for every , so the theorem tells us that there exists such that

Rearranging, that gives us that

Finally, using the formula for , which was

and setting , we can rewrite our conclusion as

which is Taylor’s theorem with the Lagrange form of the remainder.

I think it is quite rare for a proof of Taylor’s theorem to be asked for in the exams. However, pretty well every year there is a question that requires you to understand the *statement* of Taylor’s theorem. (I am writing this post without any knowledge of what will be in this year’s exam, and the examiners will be entirely within their rights to ask for anything that’s on the syllabus. So I certainly don’t recommend not learning the proof of Taylor’s theorem.)

You may at school have seen the following style of reasoning. Suppose we want to calculate the power series of . Then we write

Taking we deduce that . Differentiating we get that

and taking we deduce that . In general, differentiating times and setting we deduce that if is even, if mod 4, and if mod 4. Therefore,

There are at least two reasons that this argument is not rigorous. (I’ll assume that we have defined trigonometric functions and proved rigorously that their derivatives are what we think they are. Actually, I plan to define them using power series later in the course, in which case they have their power series by definition, but it is possible to define them in other ways — e.g. using the differential equation — so this discussion is not a complete waste of time.) One is that we assumed that could be expanded as a power series. That is, at best what we have just shown is that *if* can be expanded as a power series, then the power series must be that one.

A second reason is that we just assumed that the power series could be differentiated term by term. That holds under certain circumstances, as we shall see later in the course, and those circumstances hold for this particular power series, but until we’ve proved that is given by this particular power series we don’t know that the conditions hold.

Taylor’s theorem helps us to clear up these difficulties. Applying it with replaced by 0 and replaced by , we find that

for some . All the terms apart from the last one are just the expected terms in the power series for , so we get that is equal to the partial sum of the power series up to the term in plus a remainder term.

The remainder term is , so its magnitude is at most . It is not hard to prove that tends to zero as . (One way to do this is to observe that the ratio of successive terms has magnitude at most 1/2 once is bigger than .) Therefore, the power series converges for every , and converges to .

The basic technique here is as follows.

(i) Write down what Taylor’s theorem gives you for your function.

(ii) Prove that for each (in the range where you want to prove that the power series converges) the remainder term tends to zero as tends to infinity.

The material in this section is not on the course, but is still worth thinking about. It begins with the definition of a derivative, which, as I said in lectures, can be expressed as follows. A function is differentiable at with derivative if

We can think of as the best linear approximation to for small .

Once we’ve said that, it becomes natural to ask for the best quadratic approximation, and in general for the best approximation by a polynomial of degree .

Let’s think about the quadratic case. In the light of Taylor’s theorem it is natural to expect that

in which case would indeed be the best quadratic approximation to for small .

What Taylor’s theorem as stated above gives us is

for some . If we know that is continuous at , then as , so we can write , where . But then , as we wanted, since .

However, this result does not need the continuity assumption, so let me briefly prove it. To keep the expressions simple I will prove only the quadratic case, but the general case is pretty well exactly the same.

I’ll do the same trick as usual, by which I mean I’ll first prove it when various things are zero and then I’ll deduce the general case. So let’s suppose that . We want to prove now that .

Since , we have that

Therefore, for every we can find such that for every with .

This gives us several inequalities, one of which is that for every such that . If we now set to be , then we have that for every . So by the mean value theorem, for every such , which implies that .

If we run a similar argument using the fact that we get that . And we can do similar arguments with as well, and the grand conclusion is that whenever we have .

What we have shown is that for every there exists such that whenever , which is exactly the statement that as , which in turn is exactly the statement that .

That does the proof when . Now let’s take a general and define a function by

Then , so , from which it follows that

which after rearranging gives us the statement we wanted:

As I said above, this argument generalizes straightforwardly and gives us Taylor’s theorem with what is known as *Peano’s form of the remainder*, which is the following statement.

For that we need to exist but we do not need to exist anywhere else, so we certainly don’t need any continuity assumptions on .

This version of Taylor’s theorem is not as useful as versions with an explicit formula for the remainder term, as you will see if you try to use it to prove that can be expanded as a power series: the information that the remainder term is is, for fixed , of no use whatever. But the information that it is gives us an expression that we can prove tends to zero.

However, one amusing (but not, as far as I know, useful) thing it gives us is a direct formula for the second derivative. By direct I mean that we do not go via the first derivative. Let us take the quadratic result and apply it to both and . We get

and

From this it follows that

Dividing through by we get that

as .

I’m not claiming the converse, which would say that if this limit exists, then is twice differentiable at . In fact, doesn’t even have to be once differentiable at . Consider, for example, the following function. For every integer (either positive or negative) and every in the interval we set equal to . We also set , and we take when . (That is, for negative we define so as to make it an odd function.)

Then for every , so for every , and in particular it tends to 0 as . However, is not differentiable at 0. To see this, note that when we have , whereas when is close to we have close to . Therefore, the ratio does not converge as , which tells us that is not differentiable at 0.

If you want an example that is continuous everywhere, then take . This again has the property that for every , and it is not differentiable at 0.

Even if we assume that is differentiable, we can’t get a proper converse. For example, the condition

does not imply that exists and equals 0. For a counterexample, take a function such as (and 0 at 0). Then must lie between and therefore certainly be . But the oscillations near zero are so fast that is unbounded near zero, so doesn’t exist at 0.

]]>

Suppose I were to ask you to memorize the sequence 5432187654321. Would you have to learn a string of 13 symbols? No, because after studying the sequence you would see that it is just counting down from 5 and then counting down from 8. What you want is for your memory of a proof to be like that too: you just keep doing the obvious thing except that from time to time the next step isn’t obvious, so you need to remember it. Even then, the better you can understand why the non-obvious step was in fact sensible, the easier it will be to memorize it, and as you get more experienced you may find that steps that previously seemed clever and nonobvious start to seem like the natural thing to do.

For some reason, Analysis I contains a number of proofs that experienced mathematicians find easy but many beginners find very hard. I want to try in this post to explain why the experienced mathematicians are right: in a rather precise sense many of these proofs *really are easy*, in the sense that if you just repeatedly do the obvious thing you will solve them. Others are mostly like that, with perhaps one smallish idea needed when the obvious steps run out. And even the hardest ones have easy parts to them.

I feel so strongly about this that a few years ago I teamed up with a colleague of mine, Mohan Ganesalingam, to write a computer program to solve easy problems. And after a lot of effort, we produced one that can solve several (but not yet all — there are still difficulties to sort out) problems of the kind I am talking about: easy for the experienced mathematician, but hard for the novice. Now you have some huge advantages over a computer. For example, you understand the English language. Also, you can be presented with a vague instruction such as “Do any obvious simplifications to the expression and then see whether it reminds you of anything,” and you will be able to follow it. (In principle, so could the program, but only if we spent a long time agonizing about what exactly constitutes an “obvious” simplification, what kind of similarity should be sufficient for one mathematical expression to trigger the program to call up another, and so on.) So if a mere computer can solve these problems, you should definitely be able to solve them.

What I plan to do in this post is basically explain how the program would go about proving some of the theorems we’ve proved in the course. To explain *exactly* how it works would be complicated. However, because you are humans, there are lots of technical details that I don’t need to worry about, and what remains of the algorithm when you ignore those details is really pretty simple.

The rough idea is that you should equip yourself with a small set of “moves” and simply apply these moves when the opportunity arises. That is an oversimplification, since sometimes one can do the moves in “silly” ways, but merely being consciously aware of the moves is very useful. (Incidentally, the notion of “silliness” is hard to define formally but is something that humans find easy to recognise when they see examples of it. So that’s another example of the kind of advantage you have over the computer.)

I’m going to describe a way of keeping track of where you have got to in your discovery of a proof. It’s not something I suggest you do for the rest of your mathematical lives. Rather, it is something that you might like to consider doing if you find it hard to come up with typical Analysis I proofs. If you use this technique a few times, then it should get easier, and after a while you will find that you don’t need to use the technique any more.

The technique is simply to record what statements you are likely to want to use, and what statement you are trying to prove. Both of these can change during the course of your proof discovery, as we shall see.

I think the easiest way to explain this and the moves is to begin by giving an example of the whole process in action. Then I’ll talk about the moves in a more abstract way. Let’s take as an example the proof that if a Cauchy sequence has a convergent subsequence then the sequence itself is convergent.

To begin with, we have nothing we obviously need to use, and a statement that we want to prove. That statement is the following.

—————————————————-

Every Cauchy sequence with a convergent subsequence converges

Let us begin by writing that very slightly more formally, to bring out the fact that it starts with .

—————————————————-

is Cauchy and has a convergent subsequence

converges

The next step is to apply the “let” move, which I’ve talked about several times in lectures. If you ever have a statement to prove of the form “For every such that holds, also holds,” then you can just automatically write “Let be such that holds,” and change your target to that of establishing that holds.

In our case, we write, “Let be a Cauchy sequence that has a convergent subsequence,” and modify our target to that of proving that converges. So now we represent where we’ve got to as follows.

is a Cauchy sequence

has a convergent subsequence

——————————————-

converges

Maybe the purpose of those strange horizontal lines is becoming clearer at this point. I am listing statements that we can *assume* above the line and ones that we are trying to *prove* below the line.

At this point it seems natural to give a name to the convergent subsequence that we are given. Let us call it . This again is just one instance of a very general move: if you are told you’ve got something, then give it a name. This sequence has two properties: it is a subsequence of and it converges. I’ll list those two properties separately.

is a Cauchy sequence

is a subsequence of

converges

——————————————-

converges

Having done that, I think I’ll remove the second hypothesis, since the fact that is a subsequence of is implicit in the notation.

is a Cauchy sequence

converges

——————————————-

converges

The second hypothesis here is again telling us we’ve got something: a limit of the subsequence. So let’s apply the naming move again, calling this limit .

is a Cauchy sequence

——————————————-

converges

That’s enough reformulation of our assumptions. It’s time to think about what we are trying to prove. To do that, we use a process called *expansion*. That means taking a definition and writing it out in more detail. It tends to be good to *avoid* expanding definitions unless you are genuinely stuck: that way you won’t miss opportunities to *use results from the course* rather than proving everything from first principles. However, here a proof from first principles is what is required. I’m going to do a partial expansion to start with: a sequence converges if there exists a real number that it converges to.

is a Cauchy sequence

——————————————-

converges to

Now our target has changed to an existential statement. How are we going to find an that the sequence converges to?

Sometimes proving existential statements is very hard, but here it is easy, since we have a candidate for the limit staring us in the face, and better still it is the only candidate around. So let us make a very reasonable guess that the sequence is going to converge to , and make proving that our new target.

is a Cauchy sequence

——————————————-

That’s nice because we’ve got rid of that existential quantifier. But what do we do next? We must continue to expand: this time the definition of . Note that if you want to be able to do this, it is absolutely vital that you *know your definitions*. Otherwise, you obviously can’t do this expansion move. And if you can’t do that, then you can kiss goodbye to any hopes you might have had of proving this kind of result.

is a Cauchy sequence

——————————————-

Now we have a target that begins with a universal quantifier, so it’s time for the “let” move again.

is a Cauchy sequence

——————————————-

Now things become slightly harder, because this time we do *not* have a candidate staring us in the face for the thing we are trying to find. (The thing we are trying to find is .) It’s not a bad idea in this situation to try to write out in vague terms what the key statements mean. One can do something like this.

Eventually all terms of are close to each other

Eventually all terms of are close to

————————————————

Eventually all terms of are close to

The rough idea of the proof should now be clear: if all terms in the subsequence are close to and all terms are close to each other, then eventually for each term we can say that it is close to a term in the subsequence, which is itself close to .

Since we are going to need to take two steps from a term in , one to the subsequence and one from the subsequence to , it seems a good idea to apply the two main hypotheses with . So let’s just go ahead and do that and see what we get.

——————————————-

Now we are once again in a position where we have been “given” something — in this case and . So let’s quietly drop the existential quantifiers and use the names and . (Purists might object to using the same names for the particular choices of and that we used when merely asserting that they exist. But this is very common practice amongst mathematicians and does not lead to confusion.)

——————————————-

How do we propose to “force” to be less than ? We are going to try to ensure, for suitable , that and . The first hypothesis tells us that we will be able to get the first condition if and are both at least , and the third hypothesis tells us that we we will be able to get the second condition if .

So our plan is going to be to choose and . For the plan to work, we shall need , , and .

We are now in a position to choose . We want our conclusion to hold when , and the tool we use works when , so it makes sense to take . If we substitute that in, we lose the existential quantifier in the target and arrive at the following.

——————————————-

Now we can apply the “let” move again, to get rid of the universal quantifier in the target statement.

——————————————-

We know we’re going to take , and that we can, since , so let’s go ahead and choose that value for in the first hypothesis. That leaves us with the following.

——————————————-

Just to make clear what I did there, it was a move called *substitution*. If you have a hypothesis of the form and a hypothesis , then you can substitute in for and get out . (One can also call this *modus ponens*: I prefer to call it substitution in this case because the condition is somehow not a very serious hypothesis, but more like a “restriction” applied on .)

Since I’ve used the hypothesis and am unlikely to need it again. I have deleted it.

Now we have to decide how to choose and how to choose . Recall that we needed and . In a human proof one just writes, “Let be such that and .” It’s a bit trickier for a computer to find it obvious that such a exists, but again that doesn’t matter to us here. I’ll use to denote the I’m choosing, and write down the conditions I’ve made sure satisfies.

——————————————-

Now we can substitute into the first hypothesis.

——————————————-

We can also substitute into the second hypothesis.

——————————————-

And now we are done by the triangle inequality.

Now that we have gone through a proof, let me list the main proof-generating moves we used.

If you are trying to prove a statement of the form “For every such that holds, also holds,” then write, “Let be such that holds,” (or words to that effect) and adjust your target to proving that holds.

If you are told that something exists, then give it a name. For example, if you are given the hypothesis is convergent, then you are told that a limit exists. So give it a name such as and change the hypothesis to .

If you are trying to prove something and you can’t find a high-level argument (by which I mean one that uses results from the course that are relevant to the statement you are trying to prove), and if what you are trying to prove involves concepts such as convergence or continuity that can be written out in low-level language (often, but not always, involving quantifiers), then rephrase what you are trying to prove in this lower-level way. That is, expand out the definition.

If you are given a hypothesis of the form , then given any object of the same type as , you are free to substitute it in for and obtain the hypothesis .

For example, in the proof above, we had the hypothesis “ is Cauchy”. In expanded form, this reads

We decided to substitute in , which is of the same type of thing as (both are positive real numbers), and yielded for us the statement

(We then applied the “naming” move to get rid of the .)

Often a hypothesis takes a slightly more general form, where *conditions* are assumed. That is, it takes the form

or still more generally

There the symbol means “and”, so this is saying that whenever you can find a that satisfies the conditions , then you can give yourself the hypothesis .

Suppose that you are trying to prove a statement of the form , and suppose you have identified an object of the same type as that you believe is going to do the job. Then you can change your target statement from to . (In words, instead of trying to show that there exists something that satisfies , you are going to try to show that satisfies .)

We did this when we moved from trying to prove that converges to *something* to trying to prove that it converges to .

This is not a complete set of useful moves. However, it is a start, and I hope it will help to back up my assertion that a large fraction of the proof steps that I take when writing out proofs in lectures are fairly automatic, and steps that you too will find straightforward if you put in the practice. I’ll try to discuss more moves in future posts.

]]>

I cannot promise to follow the amazing example of Vicky Neale, my predecessor on this course, who posted after every single lecture. However, her posts are still available online, so in some ways you are better off than the people who took Analysis I last year, since you will have her posts as well as mine. (I am making the assumption here that my posts will not contribute negatively to your understanding — I hope that proves to be correct.) Having said that, I probably won’t cover exactly the same material in each lecture as she did, so the correspondence between my lectures and her posts won’t be as good as the correspondence between her lectures and her posts. Nevertheless, I strongly recommend you look at her posts and see whether you find them helpful.

You will find this course *much* easier to understand if you are comfortable with basic logic. In particular, you should be clear about what “implies” means and should not be afraid of the quantifiers and . You may find a series of posts I wrote a couple of years ago helpful, and in particular the ones where I wrote about logic (NB, as with Vicky Neale’s posts above, they appear in reverse order). I also have a few old posts that are directly relevant to the Analysis I course (since they are old posts you may have to click on “older entries” a couple of times to reach them), but they are detailed discussions of Tripos questions rather than accompaniments to lectures. You may find them useful in the summer, and you may even be curious to have a quick look at them straight away, but for now your job is to learn mathematics rather than trying to get good at one particular style of exam, so I would not recommend devoting much time to them yet.

For the rest of this post, I want to describe briefly the prerequisites for this course. One of the messages I want to get across is that in a sense the entire course is built on one axiom, namely the least upper bound axiom for the real numbers. I don’t really mean that, but it would be correct to say that it is built on one *new* axiom, together with other properties of the real numbers that you are so familiar with that you hardly give them a second’s thought.

If I want to say that more precisely, then I will say that the course is built on the following assumption: there is, up to isomorphism, exactly one complete ordered field. If the phrase “complete ordered field” is unfamiliar to you, it doesn’t matter, though I will try to explain what it means in a moment. Roughly speaking, this assumption is saying that there is exactly one mathematical structure that has all the arithmetical and order properties that you would expect of the real numbers, and also satisfies the least upper bound axiom. And that structure is the one we call the real numbers.

And now let me make *that* more precise.

A field is a set with two binary operations and that behave in the same nice ways that addition and multiplication behave in the real numbers. That is, they have the following properties.

(i) is commutative and associative and has an identity element. Every element of has an inverse under .

(ii) is commutative and associative and has an identity element. Every element of other than the identity of has an inverse under .

(iii) is distributive over . That is, for any three elements of we have .

If we define an algebraic structure with some notions of addition and multiplication, then to say that it is a field is to say that all the usual rules we use to do algebraic manipulations are valid. It can be amusing and instructive to prove facts such as that assuming nothing more than the field axioms, but in this course I shall take these slightly less elementary facts as read as well. But I assure you that they *do* follow from the field axioms.

Some examples of fields that you have already met are , , and . (That last one is the field that consists of integers mod for a prime , with addition and multiplication mod . The only axiom that is not easy to verify is the existence of multiplicative inverses for non-zero elements of the field, which follows from the fact that if and are coprime then there are integers and such that .)

This question splits into two. First we need to know what an ordering is, and then we need to know how the ordering relates to the algebraic operations. Let me take these two in turn.

A *totally ordered set* is a set together with a relation that has the following properties.

- is
*transitive*: that is, if and , then . - satisfies the
*law of trichotomy*: that is, for any exactly one of the statements , , holds.

Note that the trichotomy law implies that is *antisymmetric*: that is, if then it cannot also be the case that .

In the above situation, we say that is a *total ordering* on . Given a total ordering we can make some obvious further definitions. For instance, we can define by saying that if and only if . (Note that is also a total ordering on .) Also, we can define by saying that if and only if either or , and similarly we can define .

Here’s an example of a totally ordered set that is not just a subset of the real numbers. We take to be the set of all polynomials with real coefficients, and if and are two polynomials, we say that if there exists a real number such that for every . (That is, if is “eventually bigger than “.) It is easy to check that this relation is transitive, and an instructive exercise to prove that the trichotomy law holds. (It is also not too hard, so I think it is better not to give the proof here.)

How should we define an ordered field? A first guess might be to say that it is a field with a total ordering on it. But a moment’s thought shows that that is a ridiculous definition, since we could define a “stupid” total ordering that had nothing to do with any natural ordering we might want to put on the field. For example, we could define an ordering on the rationals as follows: given two rational numbers and , written in their lowest terms with and positive, say that if either or and . That is certainly a total ordering on the rationals, but it is a rather strange one. For example, with this ordering we have and also .

What has gone wrong? The answer is that it is not interesting to have two structures on a set (in this case, the algebraic structure and the order structure) unless those structures *interact*. In fact, we have already seen this in the field axioms themselves: we have addition and multiplication, and it is absolutely crucial to have some kind of relationship between them. The relation we have is the distributivity law. Without that, we would allow “stupid” examples of pairs of binary operations that had nothing to do with each other.

An *ordered field* is a field together with a total ordering that satisfies the following properties.

- For every , if , then .
- For every , if and , then .

Basically what these properties are saying is that the usual rules we use when manipulating inequalities, such as adding the same thing to both sides, apply.

In practice, we tend to use a rather larger set of rules. For example, if we know that , we will feel free to deduce that . And nobody will bat an eyelid if you have a real number and state without proof that . Both these facts can be deduced fairly easily from the properties of ordered fields, and again it is quite a good exercise to do this if you haven’t already. However, in this course we shall take the following attitude. There are the axioms for an ordered field. There are also some simple deductions from these axioms that provide us with some further rules for manipulating equations and inequalities. All of these we will treat in the same way: we just use them without comment.

Before I get on to the most important axiom, and the one that very definitely will *not* be used without comment, I want to discuss a distinction that it is important to understand: the distinction between the abstract and the concrete approaches to mathematics. The abstract approach is to concentrate on the *properties* that mathematical structures have. We are given a bunch of properties and we see what we can deduce from them, and we do that quite independently of whether any object with those properties exists. Of course, we do like to check that the properties are consistent, which we do by finding an object that satisfies them, but once we have carried out that check we go back to concentrating on the properties themselves.

The concrete approach to mathematics is much more focused on the objects themselves. We take an object, such as the set of all prime numbers, and try to describe it, prove results about it, and so on.

The boundary between the two approaches is extremely fuzzy, because we often like to convert the concrete approach into a more abstract one. For example, consider the function . This can be defined concretely as the function given by the formula . (That’s just a concise way of writing .) And a similar definition can be given for . But somewhere along the line we will want to prove basic facts such as that , or that , or that . And once we’ve proved a few of those facts, we find that we no longer want to use the formula, because everything we need to know follows from those basic facts. And that is because with just a couple more facts of the above kind, we find that we have *characterized* the trigonometric functions: that is, we have written down properties that are satisfied by and and *by no other pair of functions*. When this kind of thing happens, our approach has shifted from the concrete (we are given the formulae and want to prove things about the resulting functions) to the abstract (we are given some properties and want to use them to deduce other properties).

Something very similar happens with the real numbers. Up to now (at least until taking Numbers and Sets), you will have been used to thinking of the real numbers as infinite decimals. In other words, the real number system is just out there, an object that you look at and prove things about. But at university level one takes the abstract approach. We start with a set of properties (the properties of ordered fields, together with the least upper bound axiom) and use those to deduce everything else. It’s important to understand that this is what is going on, or else you will be confused when your lecturers spend time proving things that appear to be completely obvious, such as that the sequence converges to 0. Isn’t that obvious? Well, yes it is if you think of a real number as one of those things with a decimal expansion. But it takes quite a lot of work to prove, using just the properties of a complete ordered field, that every real number has a decimal expansion, and rather than rely on all that work it is much easier to prove directly that converges to 0.

Let be a set of real numbers. A real number is an *upper bound* for if for every . For example, if is the open interval , then is an upper bound for .

A real number is *the least upper bound* of if it has the following two properties.

- is an upper bound for .
- If , then is not an upper bound for .

Another way of writing these two properties is as follows. I’ll use quantifiers.

- .
- .

In words, everything in is less than or equal to , and for any there is some that is bigger than .

As an example, is the least upper bound of the open interval . Why? Because if then , and if then we can find such that . (How do we do this? Well, if then take and if then take .)

The least upper bound property is the following statement: every non-empty subset of the reals that has an upper bound has a least upper bound.

But since we are thinking abstractly, we will not think of this as a *property* (of the previously given real numbers) but more as an *axiom*. To do so we can state it as follows.

Let be an ordered field. We say that has the *least upper bound property* if every non-empty subset of that has an upper bound has a least upper bound.

For reasons that will become clear only after the course has started, we say that an ordered field with the least upper bound property is *complete*. There are then two very important theorems that we shall assume.

**Theorem 1.** *There exists a complete ordered field.*

**Theorem 2.** *There is only one complete ordered field, in the sense that any two complete ordered fields are isomorphic.*

I don’t propose to give proofs of either of these results, but let me at least give some indication, for those who are interested, of how they can be proved. The proofs are not required knowledge for the course, but it’s not a bad idea to have some inkling of how they go.

One answer to this is that *the reals are a complete ordered field*! That is, if you take the good old infinite decimals that you are used to, and you say very carefully what it means to add or multiply two of them together, and you order them in the obvious way, then you can actually prove rigorously that you have a complete ordered field. It’s not very pretty (partly because of the fact that point nine recurring equals 1) but it can be done.

Here’s how one can prove the least upper bound property. For convenience let us take a non-empty set that consists of positive numbers only. Assuming that is bounded above, we would like to find a least upper bound. We can do this as follows. First, find the smallest integer that is an upper bound for . (We know that there must be an integer — just take any integer that is bigger than the upper bound we are given for . If we are defining the reals as infinite decimals, then it is genuinely obvious that such an integer exists — you just chop off everything beyond the decimal point and add 1.) Call this integer . Next, we find the smallest multiple of that is an upper bound for . This will be one of the numbers . Then you take the smallest multiple of that is an upper bound for , and so on. This gives you a sequence that might be something like . If you look at an individual digit of the numbers in this sequence, such as the fifth after the decimal point, it will eventually stabilize, and if you take these stabilized digits as the digits of a certain number, then that number will be an upper bound for and no smaller number will be. (Both these statements need to be checked, but both are reasonably straightforward.)

A more elegant way to prove the existence of a complete ordered field is to use objects called *Dedekind cuts*. A Dedekind cut is a partition of the rational numbers into two non-empty subsets and such that every element of is less than every element of , and such that does not have a minimal element.

To see why this might be a reasonably sensible definition, consider the sets and , where consists of all rationals such that either or , and consists of all positive rationals such that . This is the Dedekind cut that corresponds to our ordinary conception of the number .

The condition that should not have a minimal element is to make sure that we don’t have two different Dedekind cuts representing each rational number. (If the rational number is , the partition we are ruling out is and . We just allow the partition and .)

If and are two Dedekind cuts, we can define their sum to be , where is defined to be the set of all numbers such that and , and similarly for . It’s a bit harder to define products — you may like to try it. It’s not so hard to define a sensible total ordering on the set of all Dedekind cuts. And then there’s a lot of checking needed to prove that what results is a complete ordered field. (I may as well admit at this point that I’ve never bothered to check this for myself, or to read a proof in a book. I’m happy to know that it can be done, just as I’m happy to fly in an aeroplane without checking that the lift will be enough to keep me in the sky.)

Here’s one answer. You just go back to your notes in Numbers and Sets and look at the proof that every real number has a decimal expansion. Obviously if you define real numbers to be things with decimal expansions, then this is saying nothing at all, but that’s not what Professor Leader did. He deduced the existence of decimal expansions from the properties of complete ordered fields. So effectively he proved the following result: *every element of a complete ordered field has a decimal expansion*. We can say slightly more: it has a decimal expansion that does not end with an infinite sequence of 9s. Oh, and two different elements have different decimal expansions. So now if you want an isomorphism between two complete ordered fields, you just match up an element of one with the element of the other that has the same decimal expansion.

Let me very briefly sketch a neater approach. You first match up 1 with 1. (That is, you match up the multiplicative identity with the multiplicative identity.) Then you match up 1+1 with 1+1, and so on, until you have “the positive integers” inside your two complete ordered fields matched together. Then you match up 0 with 0 and the additive inverses of the positive integers with the additive inverses of the positive integers. Then you match up the reciprocals of the positive integers (or rather, their multiplicative inverses) with the reciprocals of the positive integers, and finally all the rationals with all the rationals. What I’m saying here is that in any complete ordered field you can make sense in only one reasonable way of the fraction when and are integers with , and you send each in one complete ordered field to its counterpart in the other.

Now let’s take *any* element of a complete ordered field. We can associate with the set of all “rationals” less than and map that set over to the other complete ordered field, using our correspondence between rationals. That gives us a set in the other complete ordered field. The least upper bound of is then the element that corresponds to .

As ever, there is work needed if you want to turn the above idea into a complete proof: if the map you’ve defined is , then you need to check things like that or that if a set has least upper bound , then has least upper bound . But all that can be done.

If you found what I’ve just written a bit intimidating, let me remind you that all you need to take away from it is that everything in this course will be deduced from the familiar algebraic and order properties of the reals, together with the least upper bound property. Since the algebraic and order properties should be very familiar to you, that means that the main things you need to learn are the definition of a least upper bound and the statement of the least upper bound property. The details matter, so a vague idea is not enough, but even so it’s not very much to learn.

]]>

When I got to my office, those other things I’ve been thinking about (the project with Mohan Ganesalingam on theorem proving) commanded my attention and the post didn’t get written. And then in the evening, with impeccable timing, Pavel Pudlak sent me an email with an observation that shows that one of the statements that I was hoping was false is in fact true: every subset of can be Ramsey lifted to a very simple subset of a not much larger set. (If you have forgotten these definitions, or never read them in the first place, I’ll recap them in a moment.)

How much of a disaster is this? Well, it’s *never* a disaster to learn that a statement you wanted to go one way in fact goes the other way. It may be disappointing, but it’s much better to know the truth than to waste time chasing a fantasy. Also, there can be far more to it than that. The effect of discovering that your hopes are dashed is often that you readjust your hopes. If you had a subgoal that you now realize is unachievable, but you still believe that the main goal might be achievable, then your options have been narrowed down in a potentially useful way.

Is that the case here? I’ll offer a few preliminary thoughts on that question and see whether they lead to an interesting discussion. If they don’t, that’s fine — my general attitude is that I’m happy to think about all this on my own, but that I’d be even happier to discuss it with other people. The subtitle of this post is supposed to reflect the fact that I have gained something from making my ideas public, in that Pavel’s observation, though simple enough to understand, is one that I might have taken a long, or even infinite, time to make if I had worked entirely privately. So he has potentially saved me a lot of time, and that is one of the main points of mathematics done in the open.

The basic idea I was pursuing was that perhaps we can find a property that distinguishes between subsets of (or Boolean functions) of low Boolean complexity and general subsets/functions of the following kind: a low-complexity set/function can be “lifted” from to a larger, but not too much larger, structure inside which it sits more simply. This basic idea was inspired by Martin’s proof that Borel sets are determined. After considering various possible ways of making the above ideas precise, and rejecting some of them when I realized that they couldn’t work, I arrived at the following set of definitions.

An *-dimensional complexity structure with alphabet* is a subset . (Sometimes it is convenient to define it as a subset of , in which case the definitions have to be modified slightly.) If and are two -dimensional complexity structures, then I call a function a *map* if for every , depends only on . Equivalently, is of the form .

A *basic -set* in a complexity structure is a subset of the form for some . A *basic set* is any set that is a basic -set for some . Note that if is a map and is a basic set in , then is a basic set in .

The *circuit complexity* or *straight-line complexity* of a subset of a complexity structure is the minimal for which there exists a sequence of subsets of such that every is a basic set or a union or intersection of two sets earlier in the sequence, and . If is a map and , then the circuit complexity of is at most the circuit complexity of , since preserves basic sets and Boolean operations.

I often, and perhaps slightly confusingly, describe a map as a *lift* of . That’s because it’s really and its effect on subsets of that I am interested in.

Let be a complexity structure. A *coordinate specification* is a statement of the form for some and .

Let us assume that is even and let . Then the *shrinking-neighbourhoods game* is a two-player game played according to the following rules.

- Player I starts, and the players alternately make coordinate specifications.
- Player I’s specifications must be of coordinates with , and Player II’s must be of coordinates with .
- No coordinate may be specified more than once.
- At every stage of the game, there must exist a sequence that obeys all the specifications made so far.

A subset of is *I-winning* if Player I has a winning strategy for ensuring that after all coordinates have been specified, the sequence that satisfies those specifications (which is obviously unique) belongs to . It is *II-winning* if Player II has a winning strategy for ensuring that the final sequence belongs to .

Since finite games are determined, if is any subset of , then either is I-winning or is II-winning.

This can be thought of as a kind of Ramsey property, something that I mention only to explain what would otherwise be a rather strange piece of terminology. I say that a map between complexity structures is *Ramsey* if for every I-winning subset of , is a I-winning subset of , and for every II-winning subset of , is a II-winning subset of . In other words, Ramsey maps preserve winning sets and the player that wins.

It is an easy exercise to show that is Ramsey if and only if for every subset , if is I-winning then is I-winning and if is II-winning then is II-winning. (This isn’t quite a triviality, however: it uses finite determinacy.) This formulation is often more convenient.

I don’t want to be too precise about this, because part of what I hoped was that the correct statement would to some extent emerge from the proof. But roughly what I wanted was the following.

- If is a set with low circuit complexity, then there is a complexity structure that is not too large, and a Ramsey map , such that is simple.
- If is a random set then no such pair exists.
- There is an NP set for which no such pair exists.

Achieving 1 and 2 together would give a non-trivial example of a property that distinguishes between sets of low circuit complexity and random sets, which is a highly desirable thing to do, given the difficulties associated with the natural-proofs barrier, even if it doesn’t immediately solve the P versus NP problem. And achieving 1 and 3 together would show that P doesn’t equal NP.

However, it was far from clear whether these statements were true under any reasonable interpretation. Perhaps even sets of low circuit complexity require enormous sets , or perhaps there is some simple way of lifting arbitrary sets with only a small . Either of these possibilities would show that the existence of efficient Ramsey lifts does not distinguish between sets of low circuit complexity and arbitrary sets. What Pavel sent me yesterday was an observation that basically shows that the second difficulty occurs. That is, he showed that one can lift an arbitrary set quite simply.

Before I present his example, I’ll just briefly mention that I had a philosophical reason for thinking that such an example was unlikely to exist, which was that any truly simple example ought to have an infinite counterpart, but in the infinite case it is not true that arbitrary sets can be efficiently lifted. I’ll try to give some sort of indication later of why this argument does not apply to Pavel’s example.

I’ll begin by describing the example in an informal way and then I’ll make it more formal. (Pavel provided both descriptions in his message to me, so I’m not adding anything here.)

Let be any set and define an auxiliary game played on as follows. It’s just like the shrinking-neighbourhoods game, except that at some point each player must declare a bit, and the parity of the two bits they declare must be odd if the final sequence belongs to and even if it doesn’t. (So they must both play consistently with this restriction.)

Suppose that Player I has a winning strategy for the original game for some set . Then she can win the auxiliary game with payoff set as follows. Let as usual. For her first moves, she simply plays her winning strategy for the original game (ignoring the extra bit that Player II declares if he declares it). Then for her last move, she continues to play the winning strategy, but she also declares her extra bit. If Player II has declared his bit, then she looks at the two possible sequences that can result after Player II’s final move. If they are both in or both in , then she makes sure that the parity of the two bits is odd in the first case and even in the second. If one sequence is in and the other in then it does not matter what she chooses for her extra bit. If Player II has not declared his bit, then she can play her extra bit arbitrarily, which will oblige Player II to ensure that the parity of the two bits is equal to 1 if the final sequence is in and 0 otherwise.

Now suppose that Player II has a winning strategy for in the original game. In this case the proof is even simpler. He just plays this strategy, ignoring Player I’s extra bit when she plays it, and declares his extra bit right at the end, making sure that the final parity of the two bits correctly reflects whether the sequence is in .

Finally, note that to tell whether the eventual sequence in the auxiliary game belongs to , it is only necessary to look at the two extra bits. So whether or not a point belongs to can be determined by just two coordinates of that point (though which coordinates they are can vary from point to point). That makes a very simple set (I call it 2-open, since it is a union of “2-basic open” sets), even though the “board” on which the auxiliary game is played is not very large.

Now let me give a precise definition of the complexity structure . It consists of all sequences with the following properties.

- For exactly one , . In this case we will write .
- For exactly one with , . In this case we will write .
- For all other we have and write .
- if and 0 otherwise.

So there are six possibilities for each coordinate (since it can be an arbitrary element of ). Thus, we can regard as a subset of , which is not that much bigger than .

The map does the obvious thing and takes to . It is then easy to see that the shrinking-neighbourhoods game in with payoff set is basically the same as the auxiliary game I described earlier.

It may be, but I think it would be a mistake to abandon the project immediately without thinking fairly hard about what has gone wrong so far. Is it a sign that nothing even remotely like this idea could work, or is it a sign that the problems are more “local” and that certain definitions should be adjusted? In the latter case, what might a new set of definitions look like?

I’ll try to explain in a future post why I think that it is worth exploring the general strategy of attempting to show that sets of low circuit complexity can be lifted (in some sense yet to be determined) to simple sets (also in some sense yet to be determined). For now, I’d just like to make the general point that there are many aspects of the definitions above that could be changed. For the moment, I still like the definition of a complexity structure, because when I came up with it I felt myself “forced” to it. (It would take a bit of time to remember why this was, however.) I also quite like the idea that the maps we want to consider are ones that preserve some class of sets, since that gives quite a bit of flexibility. We need the class of sets to be fairly complicated, since otherwise there is a danger that verifying that the sets are preserved becomes too easy, which could then mean that the property “can be efficiently lifted” becomes too simple and is ruled out by known complexity barriers. (I’m thinking here not just of the natural-proofs barrier but also of an interesting extension of it due to Rudich.)

Looking for a class of sets might seem a hopelessly complicated task, but there are several constraints on what the class of sets can be like for the proof to work. One important one is that it should be definable in any complexity structure. So it needs to be defined in a way that isn’t too specific to . It might be worth making precise what this restriction actually means.

The rough idea here is that in Pavel’s example it is possible to provide the extra information (that is, the extra bits in the auxiliary game) right at the end of the game. In an infinite game there is no such thing as “right at the end of the game”: whenever you play, you’re still very near the beginning. This difference has caused me difficulties in the past, and I think it is worth focusing on again. Is there some natural way of ruling out this postponing of the extra information?

One crude idea is to rule it out by … ruling it out. For example, we could define a set to be -winning for Player I/II if there is a winning strategy for Player I/II such that after her/his first moves the outcome of the game is already decided. There is probably some serious drawback with such a simple-minded approach, but it is worth finding that drawback. I have given very little thought to it, so there may be something very obviously bad about it. One small point is that if, as I think is likely to be necessary, a proof that low-complexity sets can be lifted is inductive in nature, then we will want a composition of simplifying lifts to be a simplifying lift. So we would want our lifts to be such that -winning sets lift to -winning sets for the same player (and not just that winning sets lift to winning sets). So we would preserve the -winning sets we’ve already created, and attempt to create some new ones.

I think that one of the reasons Polymath9 hasn’t taken off is that I presented too much material all at once. (I did try to make it clear that it wasn’t necessary to wade through it all, but even so I can see that it might have been off-putting.) In an effort to avoid that mistake this time, I’m going to resist the temptation to think further about how to respond to Pavel’s lift and go ahead and put up this post. If I do have further ideas, I’ll post them as comments.

]]>

If you are reasonably comfortable with the kind of basic logic needed in an undergraduate course, then you may enjoy trying to find the flaw in the following argument, which must have a flaw, since I’m going to prove a general statement and then give a counterexample to it. If you find the exercise extremely easy, then you may prefer to hold back so that others who find it harder will have a chance to think about it. Or perhaps I should just say that if you don’t find it easy, then I think it would be a good exercise to think about it for a while before looking at other people’s suggested solutions.

First up is the general statement. In fact, it’s a very general statement. Suppose you are trying to prove a statement and you have a hypothesis to work with. In other words, you are trying to prove the statement

Now if and are two statements, then is true if and only if either is false or is true. Hence what we are trying to prove can be rewritten as follows.

Now we can bring the inside the as long as we convert the into , so let’s do that. What we want to prove becomes this.

I’ll assume here that we haven’t done something foolish and given the name to one of the variables involved in the statement . So now I’m going to use the general rule that is equivalent to to rewrite what we want to prove as the following.

Finally, let’s rewrite what’s inside the brackets using the sign.

Every single step I took there was a logical equivalence, so the conclusion is that if you want to show that implies , your task is the same as that of finding a single such that .

Now let me give a counterexample to that useful logical principle. Let be a set of real numbers. Define the *diameter* of to be . I’ll write it .

Consider the following implication.

That is clearly correct: if every element of has modulus at most 1, then is contained in the interval , so clearly can’t have diameter greater than 2.

But then, by the logical principle just derived, there must be a single element of such that if *that* element has modulus at most 1, then the diameter of is at most 2. In other words,

But that is clearly nonsense. If all we know is that one particular element of has modulus at most 1, it can’t possibly imply that has diameter at most 2.

What has gone wrong here? If you can give a satisfactory answer, then you will have a good grasp of what mathematicians mean by “implies”.

]]>

I’ve thought a little about what phrase to attach to the project (the equivalent of “density Hales-Jewett” or “Erdős discrepancy problem”). I don’t want to call it “P versus NP” because that is misleading: the project I have in mind is much more specific than that. It is to assess whether there is any possibility of proving complexity lower bounds by drawing inspiration from Martin’s proof of Borel determinacy. Only if the answer turned out to be yes, which for various reasons seems unlikely at the moment, would it be reasonable to think of this as a genuine attack on the P versus NP problem. So the phrase I’ve gone for is “discretized Borel determinacy”. That’s what DBD stands for above. It’s not a perfect description, but it will do.

For the rest of this post, I want to set out once again what the approach is, and then I want to explain where I am running into difficulties. I’m doing that to try to expose the soft underbelly of my proof attempt, in order to make it as easy as possible for somebody else to stick the knife in. (One could think of this as a kind of Popperian method of assessing the plausibility of the approach.) Another thing I’ll try to do is ask a number of precise questions that ought not to be impossible to solve and that can be thought about in isolation. Answers to any of these questions would, I think, be very helpful, either in demolishing the approach or in advancing it.

This section is copied from my previous post.

I define a *complexity structure* to be a subset of a set . I call the union of the the *alphabet* associated with the structure. Often I consider the case where . The maps between complexity structures that I consider (if you like, you can call them the morphisms in my category) are maps such that for each , the coordinate depends only on . To put that another way, if is another complexity structure, the maps I consider are ones of the form . I have found it inconvenient not having a name for these, but I can’t think of a good one. So I hereby declare that when I use the word “map” to talk about a function between complexity structures, I shall *always* mean a map with this property.

I call a subset of a complexity structure *basic* if it is of the form for some and some . The motivation for the restriction on the maps is that I want the inverse image of a basic set to be basic.

The non-trivial basic sets in the complexity structure are the coordinate hyperplanes and . The circuit complexity of a subset of measures how easily it can be built up from basic sets using intersections and unions. The definition carries over almost unchanged to an arbitrary complexity structure, and the property of maps ensures that the inverse image of a set of circuit complexity has circuit complexity at most .

Given a complexity structure , we can define a game that I call the *shrinking-neighbourhoods game*. For convenience let us take to be for some positive integer . Then the players take turns specifying coordinates: that is, they make declarations of the form . The only rules governing these specifications are the following.

- Player I must specify coordinates from to .
- Player II must specify coordinates from to .
- At every stage of the game, there must be at least one that satisfies all the specifications so far (so that the game can continue until all coordinates are specified).

Note that I do not insist that the coordinates are specified in any particular order: just that Player I’s specifications concern the first half and Player II’s the second.

To determine who wins the game, we need a *payoff set*, which is simply a subset . Player I wins if the sequence that the two players have specified belongs to , and otherwise Player II wins. I call a set *I-winning* if Player I has a winning strategy for getting into and *II-winning* if Player II has a winning strategy for getting into . (Just in case there is any confusion here, I really do mean that is II-winning if Player II has a winning strategy for getting into . I didn’t mean to write .)

Because the game is finite, it is determined. Therefore, we have the following Ramseyish statement: given any 2-colouring of a complexity structure , either the red set is I-winning or the blue set is II-winning. (Normally with a Ramsey statement one talks about *containing* a structure of a certain kind. If we wanted to, we could do that here by looking at minimal I-winning and minimal II-winning sets.)

Given a complexity structure , I define a *lift* of to be a complexity structure together with a map that satisfies the condition set out earlier. I define a lift to be *Ramsey* if is a winning subset of whenever is a winning subset of , and moreover it is winning for the same player. A more accurate name would be “winning-set preserving”, but I think of “Ramsey” as an abbreviation for that.

This gives us a potential method for showing that a subset is I-winning: we can find a Ramsey lift such that is simple enough for it to be easy to show that it is a I-winning subset of . Then the Ramsey property guarantees that , and hence , is I-winning in .

The definition of a Ramsey lift is closely modelled on Martin’s definition of a lift from one game to another.

Suppose that we have a suitable definition of “simple”. Then I would like to prove the following.

- If a set has polynomial circuit complexity, then there exists a Ramsey lift of with such that is simple and the cardinality of is much less than doubly exponential.
- If is a random subset of , then with high probability the smallest Ramsey lift that makes simple has an alphabet of doubly exponential size.
- There exists an NP set such that

the smallest Ramsey lift that makes simple has an alphabet of doubly exponential size.

Obviously, the first and third statements combined would show that PNP. For the time being, I would be delighted even with just the first of these three statements, since that would give an example of a property of functions that follows non-trivially from low circuit complexity. (That’s not guaranteed, since there might conceivably be a very simple way of constructing lifts from circuits. However, I think that is unlikely.)

Having the first and second statements would be a whole lot better than just having the first, since then we would have not just a property that follows non-trivially from low circuit complexity, but a property that distinguishes between functions of low circuit complexity and random functions. Even if we could not then go on to show that it distinguished between functions of low circuit complexity and some function in NP, we would at least have got round the natural-proofs barrier, which, given how hard that seems to be to do, would be worth doing for its own sake. (Again this is not quite guaranteed, since again one needs to be confident that the distinguishing property is interestingly different from the property of having low circuit complexity.)

As I said in my previous post, I think there are three reasons that, when combined, justify thinking about this potential distinguishing property, despite the small probability that it will work. The first is of course that the P versus NP problem is important and difficult enough that it is worth pursuing any approach that you don’t yet know to be hopeless. The second is that the property didn’t just come out of nowhere: it came from thinking about a possible analogy with an infinitary result (that in some rather strange sense it is harder to prove determinacy of analytic sets than it is to prove determinacy of Borel sets). And finally, the property appears not to be even close to a natural property in the Razborov-Rudich sense: for one thing it quantifies over all possible complexity structures that are not too much bigger than , and then it demands that the maps should preserve the I-winning and II-winning properties.

It is conceivable that the property might turn out to be natural after all. For instance, maybe the property of preserving I-winning and II-winning sets is so hard to achieve (I have certainly found it hard to come up with examples) that all possible Ramsey lifts are of some very special type, and perhaps that makes checking whether there is a Ramsey lift that simplifies a given set possible with a polynomial-time algorithm (as always, polynomial in ). But I think I can at least say that if the above property is natural, then that is an interesting and surprising theorem rather than just a simple observation.

Let be a straight-line computation of a set . That is, each is either a *coordinate hyperplane* (a set of the form for some and some ), or the intersection or union of two earlier sets in the sequence, and . We would like to find a complexity structure with not too large, together with a map that has the properties required of a Ramsey lift, such that is simple. Since a composition of Ramsey lifts is a Ramsey lift, and since taking inverse images (under the kinds of maps we are talking about) preserves simple sets, whatever definition of “simple” we are likely to take, as well as preserving all Boolean operations, a natural approach is an inductive one. The inductive hypothesis is that we have found a Ramsey lift such that the sets are simple for every . We now look at . By the inductive hypothesis, this is a union or intersection of two simple sets, so we now look for a Ramsey lift such that is simple. Setting , we then have a Ramsey lift such that is simple for every .

Thus, if we can find a very efficient Ramsey lift that turns a given intersection or union of two simple sets into a simple set, then we will be done. “Very efficient” means efficient enough that repeating the process times (where is polynomial in — though even superlinear in would be interesting) does not result in an alphabet of doubly exponential size. Note that if our definition of “simple” is such that the complement of a simple set is simple, then it is enough to prove this just for intersections or just for unions.

What might we take as our definition of “simple”? The idea I had that ran into trouble was the following. I defined “simple” to be “basic”. I then tried to find a very efficient lift — I was hoping to multiply the size of the alphabet by a constant — that would take the intersection of two basic sets to a basic set.

Let us very temporarily define a basic set to be -*basic* if it is defined by means of a restriction of the th coordinate. That is, it is of the form . (I want this definition to be temporary because most of the time I prefer to use “-basic” to refer to an intersection of at most basic sets.) If is -basic and is -basic, then it is natural to expect that if we can lift to a basic set, that basic set should be either -basic or -basic. Furthermore, by symmetry we ought to be able to choose whether we want it to be -basic or -basic. But then if we let be the 1-basic set and let be any other basic set, that tells us that we can lift so that it becomes a 1-basic set.

Now let us apply that to the coordinate hyperplanes in . If we can lift these very efficiently one by one until they all become 1-basic sets, then we have a complexity structure with a small alphabet and a map such that is 1-basic for every coordinate hyperplane . But applying Boolean operations to 1-basic sets yields 1-basic sets, and every subset of is a Boolean combination of coordinate hyperplanes. Therefore, *every* subset of has become a 1-basic set!

This is highly undesirable, because it means that we have shown that the property “Can be made simple by means of an efficient Ramsey lift” does not distinguish functions of low circuit complexity from arbitrary functions.

Because of that undesirability, I have not tried as hard as I might have to find such a lift. An initial attempt can be found in this tiddler. Note that the argument I have just given does not show that there cannot be a Ramsey lift that turns an -basic set into a 1-basic set at the cost of multiplying the size of the alphabet by a constant. What I have shown is that *if* this could be done, then there would be a Ramsey lift that converted all sets simultaneously into 1-basic sets, with an alphabet of size at most . If that were the case, then I think the approach would be completely dead. (Correction: the approach if the sets to be preserved are I-winning and II-winning sets would almost certainly be dead, and I don’t have any reason to think that if one tried to preserve other classes of sets, then the situation would be any different.) So that is one possible way to kill it off.

**Problem 1.** Let be a complexity structure and let be a basic subset of . Must there exist a complexity structure and a Ramsey lift such that is 1-basic and ?

In fact, if all one wants to do is disprove the statement that for a random set there is a doubly exponential lower bound, it is enough to obtain a bound here of the form .

The above observation tells us that we are in trouble if we have a definition of “simple” such that simple sets are closed under unions and intersections. More generally, we have a problem if we can modify our existing definition so that it becomes closed under unions and intersections. (What I have in mind when I write this is the example of basic sets. Those are not closed under intersections and unions, but if one could prove that every intersection of two basic sets can be lifted to a basic set, then, as I argued above, one could probably strengthen that result and show that every intersection of two basic sets can be lifted to a 1-basic set. And the 1-basic sets *are* closed under intersections and unions.)

Before I go on to discuss what other definitions of “simple” one might try, I want to discuss a second difficulty, because it gives rise to another statement that, if true, would deal a serious blow to this approach.

In the previous post, I gave an example of a lift that provides us with what I think of as the “trivial upper bound”: a Ramsey lift that turns every single subset of into an -basic set, with an alphabet of doubly exponential size. So if we want an inductive argument of the kind I have discussed above, we will need to show that an intersection or union of two simple sets can be lifted to a simple set with the size of the alphabet increasing in such a way that if one iterates that increase polynomially many times, the resulting size will be less than doubly exponential. (Actually, that isn’t quite necessary: maybe we could establish a lower bound of for a function in NP and an upper bound of for functions of circuit complexity , where .) This makes it highly problematic if we want to do anything that *squares* the size of the alphabet after only polynomially many steps. If we do that, then the size of the alphabet after times that polynomial number of steps, which is of course still a polynomial number of steps, will be at least and we will have proved nothing.

The reason this is troubling is that even if I forget all about simplifying any set , I find it very hard to come up with examples of Ramsey lifts. (All I mean by a Ramsey lift of is a complexity structure and a map that takes I-winning sets to I-winning sets and II-winning sets to II-winning sets.) The only ones I know about can be found on this tiddler here. And they all have the property that the players have to provide “extra information” of a kind that at the very least squares the size of the alphabet. In fact, it is usually quite a lot worse than that.

Maybe I can try to be slightly more precise about what I mean there. All the lifts I have considered (and I don’t think this is much of a restriction) take the form of sets where a typical sequence in is of the form and the map takes that sequence to . If , then What makes it interesting is that we do not take *all* sequences of the above form (that is, for arbitrary and arbitrary . Rather, we take only *some* of those sequences. (It is that that makes it possible to simplify sets. Otherwise, there would be nothing interesting about lifts.) So if Player I makes an opening move , we can think of this as a move in the original game together with a binding obligation on the two players that the eventual sequence will have at least one preimage such that . The set of all such sequences is a set that may well be a proper subset of .

Suppose now that this extra information is enough to determine some other coordinate . Then unless there are already very few options for how to choose , the number of possibilities for will be comparable in size to the size of the alphabet, and therefore the size of the alphabet is in serious danger of squaring, and certainly of raising itself to the power 3/2, say. And that is, as I have just pointed out, much too big an increase to iterate superlinearly many times.

So it looks as though any “extra information” we declare has to be rather crude, in the sense that it does not cut down too drastically the set in which the game is played. But I have no example of a Ramsey lift with this property. What’s more, the kind of difficulty I run into makes me worry that such a lift may not exist. If it doesn’t, then that too will be a serious blow to the approach.

Let me ask a concrete problem, the answer to which would I think be very useful. It is a considerable weakening of Problem 1.

**Problem 2.** Let be a complexity structure. Does there necessarily exist a non-trivial Ramsey lift with and bounded above by a function of ?

The main concern is that should *not* depend on .

I have not sorted out completely what “non-trivial” means here, but let me give a class of examples that I consider trivial. Let be a large enough set and let be a surjection. Define a map by . Finally, let . Then we can think of as a map from to . Note that is in some sense just like : it’s just that the coordinates of may have been repeated.

I claim that this is a Ramsey lift. Indeed, suppose that is a I-winning subset of . Then a winning strategy for Player I for is simply to project the game so far to , play a winning strategy in , and choose arbitrarily how to lift each specification of a coordinate of to a specification of the corresponding coordinate of .

To put that more formally, if the specifications so far are for and it is Player I’s turn, then she works out the specification she would make in in response to the specifications for . If this specification is , then she picks an arbitrary preimage of and makes the specification .

A similar argument works for winning sets for Player II.

It is the fact that this can always be done that makes the lift in some sense “trivial”. Another way of thinking about it is that there is an equivalence relation on such that replacing a point by an equivalent point makes no difference.

As far as I can tell at this stage, the problem is interesting if one takes “non-trivial” to mean not of the form I have just described. However, I reserve the right to respond to other examples by enlarging this definition of triviality. The real test of non-triviality is that an interesting Ramsey lift is one that has the potential to simplify sets.

A positive answer to the problem above will not help us if is an enormously large function of . However, for now my main concern is to decide whether it is possible to obtain a bound independent of . If it is, then a major source of worry is removed. If it is not, then the approach will be in serious trouble.

I stopped writing for a few hours after that last paragraph, and during those few hours I realized that my definition of non-triviality was not wide enough. Before I explain why not, I want to discuss a worry I have had for a while, and a very simple observation that explains why I don’t have it any more.

Because the worry was unfounded, it is rather hard to explain it, but let me try. Let’s suppose that we are trying to find an interesting Ramsey lift . Suppose also that we choose a random subset of with the critical probability . That is, we choose elements with that probability that makes the probability that is a I-winning set equal to . Then it seems highly likely that will be “only just” a I-winning set if it is one. And we’ll need to make sure that every time just happens to be I-winning, then is I-winning, and every time it just fails to be I-winning, is II-winning. This seems extraordinarily delicate, unless somehow the winning strategies in are derived rather directly from the winning strategies in (as seems to be the case for the examples we have so far).

The observation I have now made is almost embarrassingly simple: if is only just a I-winning set, we do not mind if is a II-winning set. That is because is not usually the complement of . In fact, if is a random set and every element of has many preimages in , then both and will be pretty well all of .

It is worth drawing attention to the way that it seems to be most convenient to prove that a lift is Ramsey. Instead of taking a winning subset of and trying to prove that its image is winning (for the same player) in , I have been taking a winning subset of and trying to prove that its inverse image is winning (for the same player) in . Let me prove a very easy lemma that shows that this is OK.

**Lemma.** Suppose that is a lift. Then the following two statements are equivalent.

(i) The image of every winning subset of is winning in for the same player.

(ii) The inverse image of every winning subset of is winning in for the same player.

**Proof.** Suppose that the second condition holds and let be a winning subset of . If is not a winning subset of for the same player, then is a winning subset of for the other player, which implies that is a winning subset of for the other player. But , so this contradicts being a winning set for the original player.

Conversely, suppose that the first condition holds and let be a winning subset of . Then if is not a winning subset of for the same player, then is a winning subset of for the other player, which implies that is a winning subset of for the other player. But , so this contradicts being a winning set for the original player. QED

Another way of saying all this is that if we want to prove that a map is a Ramsey lift, then the only winning sets for which we need to prove that is also a winning set are inverse images of sets . And the reason for that is that one can replace by the superset without affecting the image.

The quick description of these is as follows: take a trivial Ramsey lift of the kind I described earlier (one that duplicates each coordinate several times) and pass to a random subset of it.

Let me sketch an argument for why that, or something similar to it, works. The reason is basically the same as the reason that the trivial lift works. For the sake of clarity let me introduce a little notation. I’ll start with a complexity structure . I’ll then take to be a random subset of , where is some set and I write a typical element of as a sequence . The map takes this sequence to . I’m thinking of as a fairly large set, and the elements of are chosen independently from with some suitable probability .

Now let be a winning subset of . I want to show that is a winning subset of for the same player. So let be a winning strategy for for Player I (the case of Player II is very similar, so I won’t discuss it). Then in she can play as follows. If it is her turn and the specifications so far are of for , then she looks at what the strategy dictates in in response to the specifications of the , ignoring the . This will involve specifying some . Now she must find some such that there exists a sequence in that satisfies the specifications so far as well as the specification .

Typically, the proportion of that will serve as a suitable is approximately , so what we need, roughly speaking, is that should be bigger than . It’s not quite as simple as that, since if the alphabet is very very large, then there may be occasional pieces of extraordinary bad luck. However, I’m pretty sure it will be possible to modify the above idea to make it watertight.

Let and be complexity structures and a Ramsey lift. Let us say that is trivial if for any set of specifications () that can arise during the game in , for any set of specifications () with (this is a slight abuse of notation) and for any further specification , there exists a further specification , consistent with all the previous ones, such that .

This is an attempt to describe the property that makes it very easy to lift strategies in to strategies in : you just see what you would do at each stage in and lift that to — a policy that does not work in general but works in some simple cases.

One thing that is probably true but that it would be good to confirm is that a Ramsey lift of this simple kind cannot be used to simplify sets. I’ll state this as a problem, but I’m expecting it to be an easy exercise.

**Problem 3.** Let be a lift that is trivial in the above sense. Is it the case that for every the straight-line complexity of is equal to the straight-line complexity of ?

(A quick reminder: in a general complexity structure, I define the straight-line complexity of a set to be the length of the smallest sequence of sets that ends with , where all earlier sets in the sequence are either basic sets or unions or intersections of two earlier sets.)

Assuming that the answer to Problem 3 is yes, then the next obvious question is this. It’s the same as Problem 2 except that now we have a candidate definition of “non-trivial”.

**Problem 4.** Let be a complexity structure. Does there necessarily exist a non-trivial Ramsey lift where the size of the alphabet goes up by at most a factor that depends on only?

I very much hope that the answer is yes. I was beginning to worry that it was no, but after the simple observation above, my perception of how difficult it is to create Ramsey lifts has altered. In that direction, let me ask a slightly more specific problem.

**Problem 5.** Is there a “just-do-it” approach to creating Ramsey lifts?

What I mean there is a procedure for enumerating all the winning sets in and then building up and in stages, ensuring for each winning set in turn that its inverse image is a winning set for the same player. I would be surprised if this could be done efficiently, but I think that it would make it much clearer what a typical Ramsey lift looked like.

Let me also recall a problem from the previous post.

**Problem 6.** Let be the set of all sequences in of odd parity. Does there exist a Ramsey lift such that is a basic set and the alphabet of is not too large?

I would also be interested in a Ramsey lift that made simple in some other sense. Indeed, I suspect that the best hope for this approach is that the answer to Problem 6 is no, but that for some less restrictive definition of “simple” it is yes.

Maybe that’s enough mathematics for one post. I’d like to finish by trying to clarify what I mean by “micro-publication” on the TiddlySpace document. I can’t do that completely, because I’m expecting to learn on the job to some extent.

I’ll begin by saying that Jason Dyer answered a question I asked in the previous post, and thereby became the first person to be micro-published. I don’t know whether it was his intention, but anyway I was pleased to have a contribution suitable for this purpose. He provided an example that showed that a certain lift that turns the parity function into a basic function was (as expected) not a Ramsey lift. It can be found here. There are several related lifts for which examples have not yet been found. See this tiddler for details.

Jason’s micro-publication should not be thought of as typical, however, since it just takes a question and answers it. Obviously it’s great if that can be done, but what I think of as the norm is not answering questions but more like this: you take a question, decide that it cannot be answered straight away, and instead generate new questions that should ideally have the following two properties.

- They are probably easier than the original question.
- If they can be answered, then the original question will probably become easier.

One could call questions of that kind “splitting questions”, because in a sense they split up the original task into smaller and simpler tasks — or at least there is some chance that they do so.

What I have not quite decided is what constitutes a micro-publication. Suppose, for example, somebody has a useful comment about a question, but does not generate any new questions. Does that count? And what if somebody else, motivated by the useful comment, comes up with a good question? I think what I’ll probably want to do in a case like that is write a tiddler with the useful comment and the splitting question or questions, carefully attributing each part to the person who contributed it, with links to the relevant blog comments.

Also, I think that when someone asks a good question, I will automatically create an empty tiddler for it. So one way of working out quickly where there are loose ends that need tying up is to look for empty tiddlers. (TiddlySpace makes this easy — their titles are in italics.)

Some people may be tempted to think hard about a question and then present a fairly highly developed answer to it. If you feel this temptation, then I’d be very grateful if you could do one of the following two things.

- Resist it.
- Keep a careful record of all the questions you ask in the process of answering the original question, so that your thought processes can be properly represented on the proof-discovery tree.

By “resist it”, what I mean is not that you should avoid thinking hard about a question, but merely that each time you generate new questions, you should write up your thoughts so far in the form of blog comments, so that we get the thought process and not just the answer. The main point is that if we end up proving something interesting, then I would like it to be as clear as possible how we did it. With this project, I am at least as interested in trying to improve my understanding of the research process as I am in trying to make progress on the P versus NP problem.

]]>

As long-term readers of this blog will be aware, the P versus NP problem is one of my personal mathematical diseases (in Richard Lipton’s sense). I had been in remission for a few years, but last academic year I set a Cambridge Part III essay on barriers in complexity theory, and after marking the essays in June I thought I would just spend an hour or two thinking about the problem again, and that hour or two accidentally turned into about three months (and counting).

The trouble was that I had an idea that has refused to die, despite my best efforts to kill it. Like a particularly awkward virus, it has accomplished this by mutating rapidly, so that what it looks like now is very different from what it looked like at the beginning of the summer. (For example, at that stage I hadn’t thought of trying to model a proof on the proof of Borel determinacy.) So what am I to do?

An obvious answer is this: expose my ideas to public scrutiny. Then if there is a good reason to think that they can’t be made to work, it is likely that that reason will come to light more quickly than if I, as one individual with judgment possibly skewed by my emotional attachment to the approach, think about it on my own.

But what if they *can* be made to work? Do I want to make them public in their current only partially developed state? I’ve thought about this, and my view is that (i) it is very unlikely that the ideas will work, not just because it is *always* unlikely that any given attack on a notoriously hard problem will work, but also because there are certain worrying analogies that suggest, without actually conclusively demonstrating, that the approach has a good chance of running into a certain kind of well-known difficulty (roughly, that I’ll end up not managing to show that the parity function is simpler than an arbitrary function) and (ii) if, by some miracle, the approach *does* work, I’ll have put enough into it to be able to claim a reasonable share of the credit, and I’ll probably get to that stage far more quickly and enjoyably than if I work secretly. So in the first case, I gain something precious — time — and in the second case I also gain time and end up with an amount of credit that any sensible person ought to be satisfied with. [Confession: I wrote that some time ago and then had some further ideas that made me feel more optimistic, so I worked on them on my own for another two or three weeks. I'm going public only after starting to feel a bit bogged down again.]

And of course, if I go down the public route, it gives me another chance to try to promote the Polymathematical way of doing research, which on general grounds I think ought to be far more efficient. This is a strong additional motivation.

There is one question that I will not be able to suppress so I might as well get it out of the way: if the problem did get solved this way, then what would happen to the million dollars? The answer is that I don’t know, but I am not too bothered, since the situation is very unlikely to arise, and if it does, then it’s the Clay Mathematics Institute’s problem — they have a committee for making that kind of decision — and not mine. And I think it would be very wrong indeed if the existence of a prize like that had the effect of making research on a major mathematical question more secret, and therefore more inefficient, than it needed to be.

I have two remaining anxieties about going public. One is that it looks a bit attention grabbing to say that I’m working on the P versus NP problem. It’s probably a hopeless thing to ask, but I’d like it if this project could be thought of in a suitably low-key way. What I have at the moment has not yet been sufficiently tested to count as a serious approach to the problem: as I’ve already said, complexity experts may be able to see quickly why it can’t work. Of course I dream that it might turn into a serious approach, but I’m not claiming that status for it unless it survives other people’s attempts to kill it off. To begin with, one should probably think of Polymath9 as devoted to the question, “Why could nothing like this work?” which is rather less exciting than “Please help me finish off my proof that PNP.” (However, I think the approach is different enough from other approaches that a sufficiently general explanation of why nothing like it can work would be of some interest in itself.)

The other is that I may have missed some simple argument that immediately demolishes the approach in its entirety. If someone points out such an argument, it will sting a bit, and it makes me feel quite apprehensive about clicking on the “Publish” button I see in front of me. But let me feel the fear and do it anyway, since it’s probably better to feel embarrassed when that happens than it is to spend another two or three months working on an approach that is doomed to fail uninterestingly.

Although multiple online collaboration has not been widely adopted as a way of doing research, there have been enough different quite serious projects, each one with its own distinctive characteristics, to provide some evidence of what works and what doesn’t. Let me mention three examples that in different ways *have* worked. I think that Polymath1 (the density Hales-Jewett theorem) worked well partly because we started with not just a problem, but also the beginning of an approach to that problem. (The approach later changed and eventually had little in common with how it had been when it started, but having a clear starting point was still helpful.) Polymath3 (the Erdős discrepancy problem) started with just a statement of the problem to be tackled, and did not end up solving it, but I still count the project as a success of a kind, in that we rapidly reached a much better understanding of the problem, found plenty of revealing experimental data, and generated a number of interesting subquestions and variants of the initial question. More recently, Polymath8 (improving Zhang’s bound for prime gaps) worked well because the problem was not a yes/no question. Rather, it was a how-far-can-we-push-this-proof question, very well suited to a group of people looking together at a paper and reaching an understanding of the arguments that allowed them to improve the bound significantly. It was also good to undertake a project that was guaranteed to produce at least something — though I think the current bound is probably better than most people would have predicted at the beginning of the process.

Having said all that, there are some aspects of the Polymath projects so far that have left me not fully satisfied. I don’t mean that I have been positively dissatisfied, but I have been left with the feeling that more could be achieved. For one thing, it is slightly disappointing that there have not been more projects. (I bear some responsibility for this, since I have not been involved in any Polymath projects for quite a while, apart from a brief attempt to revive Polymath3.) I think there are various reasons for this, some of which it may be possible to do something about.

My original fantasy was that it would be possible for lots of people to make small contributions to a project and for those small contributions to add up almost magically to something greater than the sum of its parts. I think that to a small extent that happened, in the sense that reading other people’s comments was quite unexpectedly stimulating. However, I came to think that there was significant room for improvement in the way that a Polymathematical discussion takes place. At the moment it principally takes place in two ways: as a sequence of comments on blog posts, and as a wiki that is gradually built up by the participants. But neither of these conveys in a truly transparent way the logical structure of the discussion, or makes it easy for new people to join the discussion once it has got going. What I would like to see is the gradual building up of what I call a *proof discovery tree*. I don’t have a precise definition of this concept, but the rough idea is this. You start with the initial question you are trying to answer. You can’t just write down the answer, so you have to find new questions to ask. (At the very beginning they will be extremely hazy questions like, “What could a proof of this statement conceivably look like?”.) Those questions will probably generate further questions, and in that way one builds up a tree of questions. When one has gone deep enough into the tree, one starts to reach questions that one can answer. Sometimes the answer will have the effect of telling you that you have reached a dead end. Occasionally it will transform your approach to the entire problem.

I think something like that is a reasonable description of the research process, though of course it leaves a lot out. I also think that with modern technology it is possible to record one’s attempt to prove a result in a tree-like format rather than in the linear format that is encouraged by paper and pen, or even by TeX files, blog posts and the like.

What would be the advantage of writing notes on proof attempts in a tree-like form? I think there is one huge advantage: if at some point you feel stuck, or for some other reason want others to join in, then setting out your thoughts in a more structured way could make it much easier for others to take up where you left off rather than having to start from scratch. For example, when you stop, there will probably be a number of leaves of your proof-discovery tree that are not dead ends, but rather are questions that you just haven’t got round to answering. If you ask the questions in isolation (on Mathoverflow, say), then they will seem fairly unmotivated. But if they live on a proof-discovery tree, you can follow a path back to the root, seeing that the question asked was motivated by an earlier question, which itself was motivated by an earlier question, and so on. Or, if you just want to add a new leaf to the proof-discovery tree, you can ignore all that motivation (or perhaps skim it briefly) and simply try to answer the question.

Would it be practical for people to keep adding leaves to a tree like this? Wouldn’t it all get a bit out of hand, with people disagreeing about what constitutes an appropriate link from an existing vertex? I think it might. One way round that problem is the following. There is an informal discussion that takes place in the usual way — with comments on blog posts. But the participants also keep an eye on a somewhat more formal proof-discovery tree that develops as the discussion progresses, and if somebody makes a comment that looks as though it could be developed into a useful new node of the tree, it is proposed for a sort of “micro-publication”. If it is accepted, then whoever is moderating the discussion adds it to the proof-discovery tree, possibly rewriting it in the process. The node of the tree comes with links to the comment that inspired it, and the name of the author of that comment. So this process of micro-publication provides a similar kind of motivation to the one that traditional publication provides, but on a much smaller scale.

Is there any good software out there for creating a proof-discovery tree of this kind? I asked this question on Google Plus and got a variety of helpful answers. I opted to go with the third answer, given by Robert Schöftner, who suggested TiddlySpace with a MathJax plugin. If you’re the kind of person to whom that sounds complicated, you should take the fact that I managed to get it to work fairly easily as strong evidence that it is not. And I’ve fallen in love with TiddlySpace. It feels very unhealthy to have written that, but also, given the amount of time I’ve spent with it, not too wide of the mark.

Its main advantage, as I see it, over a traditional wiki is that a “tiddler”, which roughly corresponds to a wiki page or short blog post, is not on a separate web page. Rather, it forms part of a “tiddlyspace”, roughly speaking a collection of interlinked tiddlers, that all live on the same page and can be opened and closed as you like. Amazingly (to me at any rate), you can open, close, create and edit tiddlers even when you are offline, without losing anything. When you’re next online, everything gets saved. (I imagine if you close the page then you *will* lose everything, but it’s not exactly challenging not to do that.) One can also add nice gadgets such as what they call “sliders” — boxes you click on to make some text appear and click on again to make it disappear. I’ve used that in a few places to make it convenient for people to be reminded of definitions that they may have forgotten or not seen.

Now I’m not trying to say that everyone should use TiddlySpace. I’m sure people have very strong views about different kinds of wiki software being better than others. But I *would* like to encourage people to try writing their research attempts in a tree-like format, so that if they don’t succeed in solving a problem but do have interesting ideas about it, then they can present their incomplete proof attempt in a nice way. If you prefer some other software to TiddlySpace, then by all means use that instead.

As a matter of fact, TiddlySpace, while having a lot in common with what I have often thought would be great to have, also lacks a few features that I’d really like. I have included a tiddler with a site map that indicates the tree structure of all the pages by suitably indenting the titles. But what I’d prefer is a much more graphical representation, with actual nodes and links. The nodes could be circles (or perhaps different shapes for different kinds of pages) with text in them, and could increase in size as you hovered over them (like the icons on the dock of a Mac if you have too many of them) and open up if you clicked on them. Similarly, the edges would have text associated with them. So it might look more like the stacks project visualizations.

If writing up proof attempts became standard practice, and if somewhere there was an index of links to incomplete proof-discovery trees, then people who wanted something to think about could search through the leaves of the proof-discovery trees for problems that look interesting and well-motivated. (Maybe the Selected Papers Network could be used for this indexing purpose, though these would not be papers in any traditional sense.) In that way, collaborations could start up. Some of these might be very open and public. Others might be much quieter (e.g. someone emails the author of the proof-discovery tree with an answer to a question, and that leads to a private collaboration with the author). Also, even if every path of a proof-discovery tree led to a dead end, that would *still* be a useful document: it would give a detailed and thorough record of a proof attempt that doesn’t work. That’s something else that I’ve long thought would be a nice thing to have, partly because it may save time for other people, and partly because even a failed proof attempt may contain ideas that are useful for other problems. Also, as I found with Polymath1, there is the surprising phenomenon that other people’s ideas can be immensely stimulating *even if you don’t use them*. So even if my ideas about the P versus NP problem turn out to be fruitless, there is a chance, as long as they are not completely ridiculous (a possibility I cannot rule out at this stage), that they could provoke someone else into having better ideas that lead to interesting progress on the problem. If they do that for you, maybe you could buy me a drink some time.

For the above reasons, I see the publication (in the sense of making public on the internet) of partial proof-discovery trees as one possible way of getting Polymathematical research to become more accepted. Each such “publication” would be a kind of proposal for a project, as well as a record of progress so far. I also think that “micro-publishing” contributions to a proof-discovery tree have the potential to provide a motivation that is similar to the motivation that drives people to answer questions on Mathoverflow: it offers slightly more of a reward than you get from people responding to a comment you have made on a blog.

Yet another potential advantage of writing partial proof-discovery trees is that if you present your ideas so far in a structured format, it can result in a much more systematic approach to *your own* research. You may go from, “Oh dear, all this is getting complicated — I think I’ll try another problem,” to “Ah, now I see how all those various ideas link up (or don’t link up) — I think there is more to say about that leaf there.” I have found that when writing down my ideas about games and the P versus NP problem. So there is something to be said for doing it, even if you have no intention of making your thoughts public. (But what I would like to see in that case is people eventually deciding that they are unlikely to add to their private proof-discovery trees and making them public.)

Because I’m keen to see whether something like this could work, I have spent a couple of weeks taking my thoughts from over the summer (which I had written into a LaTeX file that stretched to 80 pages — in the interests of full disclosure I might make that file public too, though it is disorganized and I wouldn’t particularly recommend reading it), throwing out some of them that seemed to go nowhere, and putting the rest into a tree-structured Tiddlyspace wiki. I have tried to classify the tiddlers themselves and (more importantly) the links between the tiddlers. Most tiddlers are devoted to discussions of questions, so the link classifications are saying for what kind of reason I pass from trying to answer one question to trying to answer another. (Some simple examples might be that I want to try the first non-trivial case, or that I want to see whether a generalization of the original statement has a chance of being true.) I haven’t put as much thought into this link classification as I might have, so I am very open to suggestions for how to improve it, especially if these would make the connections clearer. (I can foresee two sorts of improvements: reclassifications of links within the scheme as it now is, and revisions to the scheme itself.) The result of doing that was to stimulate a lot more thought about the approach, so I’ve added that to the tree as well. A link to the whole thing can be found at the end of this post.

To summarize, this is what I suggest.

1. The aim of the project is **either** to dispose quickly of the approach I am putting forward, by finding a compelling reason to believe that it won’t work (I have tried to highlight the most vulnerable parts, to make this as easy as possible — if it is possible) **or** to build on the existing partial proof-discovery tree until it yields an interesting theorem. While the big prize in the second (and much less likely) case would of course be to prove that PNP, there are more realistic weaker targets such as finding *any* property that follows “interestingly” from a function’s having low circuit complexity. In the first case, there is the prospect of finding a new barrier to proving lower bounds. For that one would need the approach to fail for an interesting reason. I explain below why I don’t think the approach obviously naturalizes, so there seems at least some chance of this.

2. I will give anyone who might be interested a week or two to browse in my partial proof-discovery tree. There is quite a lot to read (though I hope that its tree structure makes it possible to understand the approach without reading anything like everything), so I won’t open a mathematical comment thread for a while. (I’ve got other things I need to do in the meantime, so this works quite well for me.) However, before that starts I would very much welcome comments about the use I have made of TiddlySpace. I wanted to create a document that set out a proof attempt in a more transparent way than is possible if you are forced into a linear structure by a TeX document, but what I’ve actually produced was not planned all that carefully and I think there is room for improvement.

3. Once the comment thread is open (on this blog), I’ll act as a moderator in the way described above: if someone (or more than one person) provides input that would make a good page to add as a new leaf to the tree, then I will “micro-publish” that page. I don’t want to be too dictatorial about this, so I will welcome proposals for inclusion — either of your own questions and observations or of somebody else’s. I will make clear who the author is of each of these “micro-publications”. If I do not give credit to somebody who deserves it (e.g. if I base a page on a blog comment that builds on another blog comment that I had forgotten about) then I will welcome having that pointed out.

4. A typical “page” will consist of a question that is motivated by an existing page, together with a discussion of that question. If other questions arise naturally in that discussion, can’t be answered instantly, and seem worth answering, then they will be designated as “open tasks”. An open task can become a page if somebody makes enough observations about it to reduce the task to subtasks that look easier. (This does not have to be a logical reduction — it can simply be replacing the initial task by something that deserves to be attempted first.)

5. As a rule of thumb, if a question arises during the writing of a page that is sufficiently different from the question that the page is about that it is most naturally regarded as a new question, then it gets a new page. But this is a matter of judgment. For example, if the question is very minor and easy to answer, then it probably counts as more of a remark and doesn’t deserve a page to itself.

6. The underlying principle behind a link is this. You have a page discussing a question. If you can argue convincingly that the right approach (or at least a good approach) to the question is to think about another question or questions, then the argument forms the main content of the page, and the subquestion or questions form the headings for potential new pages. Links are classified into various types: if your link cannot be classified easily but is of a clearly recognisable type that does not belong to the current classification system, then I will consider adding that link type.

7. The main criterion for micro-publication is *not* the mathematical quality of the proposed page, but the suitability of that page as a new leaf of the tree. This principle reflects my conviction/prejudice that a good piece of mathematics can always be broken up into smaller units that are fairly natural things to try. I want to use the proof-discovery tree as a way of encouraging the process of exploring reasonably obvious avenues to be as systematic as possible. So I will normally insist that any proposed leaf is joined to an existing node by means of a link of one of a small number of types I have listed. (The list can be found over at the TiddlySpace space.) I will consider proposals for new link types, but will accept them only if there are compelling reasons to do so — which there may well be to start with.

8. If maintaining the partial proof-discovery tree becomes too much work to do on my own, then I will consider giving editing rights to one or more “core” participants. But to start with I will be the sole moderator.

There is a lot to read on my TiddlySpace. If you’d rather have some idea of what’s there before investing any time in looking at it, then this section is for you. I’ll try to give the main idea, though not fully precisely and without much of the motivation. If that gets you interested, you can try to use the proof-discovery tree to understand the motivation and a lot more detail about the approach.

The main idea, as I’ve already said, is to try to find a proof that relates to sets of low circuit complexity in the same way that Martin’s proof of determinacy relates to Borel sets. There are two instant reactions one might have to this proposal, one pessimistic and one optimistic. The pessimistic reaction is that the analogy between Borel sets and sets of low circuit complexity has already been explored, and it seems that a better analogue for the Borel sets is sets that can be computed by polynomial-sized circuits *of constant depth*. This fits with the fact that there is no natural Borel analogue of the parity function, and the parity function cannot be computed by polynomial-sized circuits of constant depth.

The optimistic reaction is that Martin’s proof is different enough from the proof, say, that the set of graphs containing an infinite clique is not Borel, that there is a chance that the objection just given does not apply. In particular, to prove that Borel sets of level are determined, one needs to apply the power set operation to the natural numbers roughly times, and the statement that all analytic sets are determined (analytic sets corresponding to NP functions) needs large cardinal axioms. Could this be peculiar enough to enable us to find some non-natural analogue in the finite set-up?

An important thing to stress is that the property that I hope will distinguish between sets of low circuit complexity and random sets (or, even better, some set in NP, but for now I am not really thinking about that) is *not* an analogue of determinacy. That’s because it is a very easy exercise to show that the intersection of two determined sets does not have to be determined. (Roughly speaking, each set may have a nice part and a nasty part, with the nice parts disjoint and the nasty parts intersecting.) For this reason, Martin can’t prove determinacy by showing that the class of determined sets is closed under complements and countable unions and intersections. Instead what he does is prove inductively that Borel sets can be lifted to much simpler sets in such a way that (i) it is easy to show that the simpler sets are determined and (ii) it follows from that that the original sets are determined.

I won’t give all the definitions here, but the condition that is needed to get (ii) to work is basically this: given any set in the lifted game for which one of the players has a winning strategy, the same player has a winning strategy for the image of that set in the original game.

For various reasons, I’m convinced that certain features of the analysis of the infinite game have to be modified somewhat. The pages of my TiddlySpace set out my reasons in gory detail, but here let me simply jump to the set-up that I have been led to consider.

I define a *complexity structure* to be a subset of a set . I call the union of the the *alphabet* associated with the structure. Often I consider the case where . The maps between complexity structures that I consider (if you like, you can call them the morphisms in my category) are maps such that for each , the coordinate depends only on . To put that another way, if is another complexity structure, the maps I consider are ones of the form . I call a subset of a complexity structure *basic* if it is of the form for some and some . The motivation for the restriction on the maps is that I want the inverse image of a basic set to be basic.

The non-trivial basic sets in the complexity structure are the coordinate hyperplanes and . The circuit complexity of a subset of measures how easily it can be built up from basic sets using intersections and unions. The definition carries over almost unchanged to an arbitrary complexity structure, and the property of maps ensures that the inverse image of a set of circuit complexity has circuit complexity at most .

Given a complexity structure , we can define a game that I call the *shrinking-neighbourhoods game*. For convenience let us take to be for some positive integer . Then the players take turns specifying coordinates: that is, they make declarations of the form . The only rules governing these specifications are the following.

- Player I must specify coordinates from to .
- Player II must specify coordinates from to .
- At every stage of the game, there must be at least one that satisfies all the specifications so far (so that the game can continue until all coordinates are specified).

Note that I do not insist that the coordinates are specified in any particular order: just that Player I’s specifications concern the first half and Player II’s the second.

To determine who wins the game, we need a *payoff set*, which is simply a subset . Player I wins if the sequence that the two players have specified belongs to , and otherwise Player II wins. I call a set *I-winning* if Player I has a winning strategy for getting into and *II-winning* if Player II has a winning strategy for getting into . (Just in case there is any confusion here, I really do mean that is II-winning if Player II has a winning strategy for getting into . I didn’t mean to write .)

Because the game is finite, it is determined. Therefore, we have the following Ramseyish statement: given any 2-colouring of a complexity structure , either the red set is I-winning or the blue set is II-winning. (Normally with a Ramsey statement one talks about *containing* a structure of a certain kind. If we wanted to, we could do that here by looking at minimal I-winning and minimal II-winning sets.)

Given a complexity structure , I define a *lift* of to be a complexity structure together with a map that satisfies the condition set out earlier. I define a lift to be *Ramsey* if is a winning subset of whenever is a winning subset of , and moreover it is winning for the same player. A more accurate name would be “winning-set preserving”, but I think of “Ramsey” as an abbreviation for that.

This gives us a potential method for showing that a subset is I-winning: we can find a Ramsey lift such that is simple enough for it to be easy to show that it is a I-winning subset of . Then the Ramsey property guarantees that , and hence , is I-winning in .

The definition of a Ramsey lift is closely modelled on Martin’s definition of a lift from one game to another, though there are also some important differences that I will not discuss here.

Now let me say what the property is that I hope will distinguish sets of low circuit complexity from some set in NP. I stress once again that this is a rather weak kind of hope: I think it probably won’t work, and the main reason I have not yet established for certain that it doesn’t work is that the definition of a Ramsey lift is complicated enough to make it fairly hard to prove even rather simple facts about it. However, I think the difficulties are reasonable ones rather than unreasonable ones. That is, I think that there are a number of questions that are tricky to answer, but that should yield reasonably quickly. I do *not* think that the difficulties are a disguised form of the usual difficulties connected with circuit complexity. So the most likely outcome of opening up the approach to public scrutiny is that the answers to these smaller questions will be found and they will not be what I want them to be.

To explain the property, let me first give an example of a Ramsey lift that converts every subset of into a basic set. I will take to be the set of sequences with the following properties.

- There exists such that is an ordered pair of the form , where and is a I-winning subset of with a winning strategy that begins with the move .
- For every other , is an element of .
- For every , is an ordered pair of the form , where , , and .
- .

The map is the obvious one that takes the sequence above to .

Given a set , its inverse image is equal to the set of all such that for some . This is a basic subset of , as claimed earlier.

It remains to show that is a Ramsey lift of . Let be a I-winning subset of and let be a winning strategy for Player I for getting into .

Suppose that Player I’s first move is of the form for some and some I-winning subset for which is the first move of a winning strategy. Player II can now play an arbitrary move of the form , where , , and . Since is a winning strategy for getting into , the result will always be a win for Player I. Therefore, for every with there exists with . Let be the set of sequences such that . Then . So a winning strategy for Player I for is to begin with the move and then to play the rest of the strategy that gets into , which will get her into .

Now suppose that Player I’s first move is of the form . This time, Player II is free to choose an arbitrary such that and play the move . After that, Player I is guaranteed to produce a sequence in , which implies that contains all sequences with . Therefore, Player I has a winning strategy for , since she can simply start with the move .

Now let be a II-winning subset of and let be a winning strategy for Player II for getting into . Then for every opening move that Player I might choose to make, Player II can defeat that move. It follows that there exists with such that the sequence

belongs to . Therefore, for every Player I winning set there exists such that . It follows that Player I does not have a winning strategy for , so Player II does have a winning strategy for .

I called that an important example because it gives us a “trivial upper bound” on the size we need to have if we want to find a Ramsey lift from to that makes a set simple. The lift above makes every single subset of into a basic set. Note that this bound is quite large: there are doubly exponentially many winning sets . (Slightly less obviously, there are doubly exponentially many *minimal* winning sets. I haven’t written out a full proof of this, but here is why I believe it. If you take a random set with a certain critical probability, then it should be a I-winning set, but it should not be possible to remove lots of elements from it and still have a I-winning set. Therefore, we need to have a collection of sets of density almost as big as the critical probability such that almost every set with the critical probability has a subset in the collection. That should make the collection doubly exponential in size. It would be good to make this argument rigorous.)

What I would like to prove is something like this. There is one part of what I want that is unfortunately a little vague, which is the definition of “simple”. I’ll discuss that in a moment.

- If a set has polynomial circuit complexity, then there exists a Ramsey lift of with such that is simple and the cardinality of is much less than doubly exponential.
- If is a random subset of , then with high probability the smallest Ramsey lift that makes simple is doubly exponential.
- There exists an NP set such that

the smallest Ramsey lift that makes simple is doubly exponential.

If one could prove 1 and 3, then one would have shown that PNP. If one could prove 1 and 2, then one would have exhibited a non-trivial property that distinguishes between functions of polynomial circuit complexity and random functions. That in itself would not prove that PNP, but it might point the way towards other methods of defining “unnatural” properties, which is a necessary first step towards proving that PNP.

I’ll say once again that I don’t yet consider this to be a serious approach, even if I ignore the problem that I don’t yet know what a “simple” set is. Given a precise definition of “simple” (and I have some candidates for this), I have just exhibited a pair of statements the conjunction of which would imply that PNP. However, for an observation that to count as a serious approach to proving , there are two other properties one wants. The first is good evidence that is actually true, which I do not have — I do not count my failure to disprove it so far as good evidence, and the analogy with Martin’s theorem has certain drawbacks that make me think that it is likely that either 1 will be false, or else 1 will be true but only because *all* sets can be efficiently lifted, so that both 2 and 3 will be false. The second requirement is some reason to believe that might be easier to prove than . Here I think the implication above fares slightly better: while I have no idea how to prove lower bounds on the “Ramsey-lift complexity” of a set, the fact that proving upper bounds doesn’t seem to be easy for sets of low circuit complexity suggests that if one *did* manage to prove such upper bounds, one would have a reduction of the problem that didn’t feel trivially equivalent in difficulty to the original problem, though it might in practice turn out to be very hard as well. If further thought about 1-3 led people to believe that they were likely to be true after all, then and only then would I want to say that this was a serious approach. But as I’ve said several times now, I think that is fairly unlikely.

A key question I’d like to know the answer to is whether there is an efficient (that is, much smaller than doubly exponential) Ramsey lift for the parity function, or rather the set of points with an odd number of 1s. The reason is that it looks to me at the moment as though the most likely thing to go wrong will be that the most efficient lift blows up rapidly as the circuit complexity of a function increases — so rapidly that it becomes doubly exponential for circuits of linear size. (All we would need for this is for the size of to square at each increase by 1 in the length of a straight-line computation. Obviously, slightly weaker statements would also suffice.) If that is indeed the case, it may well be that what determines the size of the smallest lift is closely related to the noise sensitivity of the set , in which case the parity function is a good one to use as a test.

Another possibility for a cheap demolition of the whole approach is if you can spot a simple Ramsey lift that converts an arbitrary set into a basic set and needs an alphabet of only exponential size. I haven’t really tried to find such a lift, so I could easily have missed something obvious.

For Martin a simple set was one that is open and closed. The best analogue I can think of for the notion of an open set is the following. Let be a complexity structure. Call a subset of -*basic* if it is an intersection of basic sets, and call it -*open* if it is a union of -basic sets. Call it -*closed* if its complement is -open. Then we could look at sets that are -open and -closed for some suitably small .

But how small? The only natural candidates seem to be 1,2 or something around , but there appear to be difficulties with all these choices.

Why not define a set to be simple if it is basic? I have a problem with that, which is that if it is too easy to lift a set to a basic set, then we will probably be able to lift all the coordinate hyperplanes in (which are basic already) to sets of the form — that is, to sets that are not just basic but defined by restrictions of the *first* coordinate. But if we can do that, then *all* sets lift to basic sets in .

If you want an even easier question to think about than the one above about the parity function, I have not yet even managed to determine whether one particular lift works. Here’s how it is defined. I take the set of all sequences of the form , where and is the parity of . The map takes this sequence to . Thus, the game in is the same as the game in except that when Player II specifies the th coordinate, he must also commit himself to a particular parity and to playing his last move to ensure that has this parity.

This extra condition on Player II should disadvantage him, so there should be payoff sets that Player I can get into if Player II has the extra restriction but cannot get into otherwise. I’m pretty sure it will be easy to find such a payoff set, but my first few attempts have failed, so I have not managed to do it yet. I do have a heuristic argument that suggests that a suitably chosen random set ought to be an example, so one approach would be to make that argument rigorous.

One can also consider small variants of the above lift, for some of which it is not at all clear that a random set should work, so what I’d really like to see is either a proof that some simple variant is in fact, contrary to expectations, a Ramsey lift, or an argument that is sufficiently general to rule out a large class of similar constructions.

For anyone wondering whether to invest any time in helping me think about my approach, this is of course the key question. It’s hard to say for *sure* that the approach wouldn’t naturalize. Perhaps one could come up with a clever criterion that would say which sets can be efficiently lifted. But I’m fairly confident that the property “there exists a complexity structure with an alphabet of significantly less than doubly exponential size and a map that takes I-winning sets to I-winning sets and II-winning sets to II-winning sets such that is simple” is not easy to reformulate as a property with polynomial (or quasipolynomial, or anything at all small) circuit complexity (in the truth table of ). In fact, I think it is not even in NP, since we existentially quantify over and but then require to have a property that holds for a very large class of rather complicated subsets of . So even if we allow to be “only exponential” in size (which is not required by the approach), the natural formulation of the property appears to be .

Of course, it’s one thing to write down a strange property, and quite another to expect it to hold for functions of low circuit complexity. But the fact that I have been strongly motivated by the proof of Borel determinacy gives me some small reason to hope that a miracle might occur. It is in the nature of miracles that it probably won’t occur, but the subjective probability I associate with it is far enough from zero that I don’t want to give up on it until I am absolutely sure that it won’t.

I should add that even if the property I have given above does not work, it may be that some related property does, as there are various details that could be changed. For example, we could replace the classes of I-winning and II-winning sets by other classes of sets and ask for our maps to preserve those.

Finally, here is the proof-discovery tree as I have developed it so far. To get started, I recommend clicking on “PvsNP Sitemap” in the toolbar at the top of the page. Even if the approach collapses almost immediately, I hope you may enjoy looking at it and getting an idea of what can be done on TiddlySpace.

]]>

The system I have in mind works as follows. It’s a multilevel representative democracy. Suppose for convenience that for some positive integer . (It is easy, but slightly tedious, to modify what I am about to write to take care of more general .) Suppose that the country is divided into three “super-constituencies”, each of which gets a vote in the top-level decision-making body (known as the triumvirate). Suppose that decisions in that body are passed by a majority vote. A group of people that wants to control the country can do so as long as it can control at least two votes in the triumvirate.

How are the members of the triumvirate chosen? They are elected by another triumvirate one level down. The representative in the top-level triumvirate is representing the views of the three people in the triumvirate one level down, and is worried about stepping out of line, since then he/she risks being deselected by the three people in the level-2 body.

So if a merry band of fanatics wants to control a representative in the top-level triumvirate, it is enough to control at least two of the representatives in the second-level triumvirate that selects the top-level representative.

Of course, we can iterate this argument. So how many people do we need to control the country? We need two at the top level, and therefore four at the second level, and so on. Therefore, we need at the bottom level. (Note that the representatives do not have to be fanatics themselves — if they don’t vote in the way that the fanatics want, then they get deselected by the people one level down, losing all those lovely perks that go with a high-level job in politics.) If , then , so we’re done.

One might want to make small adjustments to the bound to allow all the different levels of influence to be disjoint. So then . But this is within a constant of . Similarly, if we start with some that is not of that precise form, that again affects the estimate by just a constant factor.

So the conclusion is that in principle people can mess up a country with population . If you have more people than that, then the main thing you want is a system with a few levels of groups within groups — not necessarily formal at every level — and a distribution that is not too concentrated and not too diffuse. (If it is too concentrated, then you’ll end up wasting a lot of votes on controlling representatives who are already controlled, but if it is too diffuse, then you won’t control anybody except at very low levels. In the extreme case, what you want is to be arranged in what can be viewed as a discrete approximation to the Cantor set: in less extreme cases you still want to be somewhat “fractal” and “Cantor-like”.)

]]>

The purpose of this post is to add some rigour to what I wrote in the previous post, and in particular to the subsection entitled “Why should we believe that the set of easily computable functions is a ‘random-like’ set?” There I proved that *if* the Rubik’s-cube-like problem is as hard as it looks, then there can be no polynomial-time-computable property that distinguishes between a random composition of 3-bit scramblers and a purely random Boolean function. This implies that there can be no polynomial-time-computable “simplicity” property that is satisfied by all Boolean functions of circuit complexity at most that is not satisfied by almost all Boolean functions.

I personally find the assumption that the Rubik’s-cube-like problem is hard very plausible. However, if you disagree with me, then I don’t have much more I can say (though see Boaz Barak’s first comment on the previous post). What Razborov and Rudich did was to use a different set of random polynomial-time-computable functions that has a better theoretical backing. They build them out of a pseudorandom function generator, which in turn is built out of a pseudorandom generator, which is known to exist if the discrete logarithm problem is hard. And the discrete logarithm problem is hard if factorizing large integers is hard. Since many people have tried hard to find an algorithm for factorizing large integers, there is some quite strong empirical evidence for this problem’s being hard. It’s true that there are also people who think that it is not hard, but the existence of a pseudorandom generator does not depend on the hardness of factorizing. Perhaps a more significant advantage of the Razborov-Rudich argument is that *any* pseudorandom generator will do. So the correctness of their conclusion is based on a weaker hypothesis than the one I used earlier.

It’s time I said in more detail what a pseudorandom generator is. Suppose you have a Boolean function , with . Then you have two obvious probability distributions on . The first is just the uniform distribution, which we can think of as choosing a random 01-string of length . The second is obtained by choosing an element uniformly at random from and applying the function . This we can think of as a *pseudo*random 01-string of length . The idea is that if mixes things up sufficiently, then there is no efficient algorithm that will give significantly different results when fed a purely random 01-string and a pseudorandom 01-string.

We can be slightly more formal about this as follows. Suppose is a Boolean function. Define to be the probability that when is chosen randomly from . Define to be the probability that when is chosen randomly from . We say that is an -*hard pseudorandom generator* if whenever can be computed in time .

It may look a little strange that appears twice there. Shouldn’t one talk about a -hard pseudorandom generator, where the number of steps is and the difference in the probabilities is at most ? The reason for setting equal to is that, up to a polynomial, it is the only interesting value, for the following reason. Suppose that the difference in the probabilities is . Then if we run the algorithm times, the difference in the expected number of times we get 1 is . If that is significantly bigger than , then the probability that the difference in the actual number of times we get a 1 is not at least will be small, so we can detect the difference between the two with high probability by counting how many 1s each one gives. This happens when is proportional to . Speaking a little roughly, if the probabilities differ by , then you need at least runs of the experiment and at most runs to tell the difference between random and pseudorandom, where and are fixed polynomial functions. Since running the experiment times doesn’t affect the complexity of the detection process by more than a polynomial amount when depends polynomially on , we might as well set : if you prefer a bigger you can get it by repeating the experiment, and there is nothing to be gained from a smaller since the difference between random and pseudorandom is already hard to detect.

Intuitively, a pseudorandom generator is a function from a small Boolean cube to a big one whose output “looks random”. The formal definition is making precise what “looks random” means. I took it to mean “looks random to a computer program that runs in polynomial time” but one can of course use a similar definition for *any* model of computation, or indeed any class whatever of potential distinguishing functions. If no function in that class can distinguish between random functions and images of with reasonable probability, then is pseudorandom for that class.

A pseudorandom generator produces a small subset of (technically it’s a multiset, but this isn’t too important) with the property that it is hard to distinguish between a random string in and a purely random string. However, sometimes we want more than this. For example, sometimes we would like to find a function from to that “looks random”. We could of course think of such a function as a 01 string of length (that is, as a list of the values taken by the function at each point of ) and use a pseudorandom generator to generate it, but that will typically be very inefficient.

Here is a much better method, which was devised by Goldreich, Goldwasser and Micali. (Their paper is here.) Let be a pseudorandom generator. (Later I’ll be more precise about how hard it needs to be.) We can and will think of this as a pair of functions , each one from to . If we are now given a string , we can use it and the functions and to define a function . We simply take the composition of s and s that correspond to . That is, . To put that another way, you use the digits of to decide which of and to apply. For instance, if , then you apply , then , then , then , then .

What we actually want is a function to , but if we take the first digit then we’ve got one. We can think of it as a function of two variables: given and , then equals the first digit of . But a function of two variables can also be thought of as a collection of functions of one variable in two different ways. We’ve thought so far of as indexing functions and as being the argument, but now let’s switch round: we’ll take as the index and as the argument. That is, let’s write for , which is itself just the first digit of .

We now have two probability distributions on the set of all functions from to . One is just the uniform distribution — this is what we mean by a random function. The other is obtained by choosing a random string of length , applying the composition to it and taking the first digit — this is what we mean by a pseudorandom function (associated with this particular construction).

Note that we are now in a very similar situation to the one we were in with 3-bit scramblers earlier. We have a small bunch of efficiently computable functions — the functions — and it is hard to distinguish between those and entirely random functions. But now we shall be able to prove that it is hard, subject to widely believed hardness hypotheses. Also, even if you don’t believe those hypotheses, the reduction to them is interesting.

How easily can be computed? Well, given , we have to calculate the result of applying to a composition of functions, each of which is a polynomial-time-computable function from to . So the number of steps we need is for some polynomial . This is polynomial in provided that is polynomial in . Accordingly, we take for some large constant .

The idea now is to show that if we can distinguish between a random function of the form and a genuinely random function, then the pseudorandom generator is not after all very hard: in fact, it will have hardness at most , which is substantially less than exponential in . Since the pseudorandom generator was arbitrary, this will show that no pseudorandom generator of that hardness exists.

By the way, let me draw attention to the parts of this proof that have always caused me difficulty (though I should say again that it’s the kind of difficulty that can be overcome if one is sufficiently motivated to do so — I’ve just been lazy about it up to now). The first is the point about the roles of the two variables and above and the way those roles switch round. Another is a wrong argument that has somehow made me feel that what is going on must be subtler than it actually is. That argument is that a pseudorandom generator is a function defined on , so its hardness is a reasonable function of , while the kind of pseudorandomness we’re interested in takes place at the level of Boolean functions defined on , which have domains of size , so breaking those in polynomial time in will surely have no bearing on the far smaller function that makes the pseudorandom generator.

I didn’t express that wrong argument very well — necessarily, since it’s wrong — but the thing I’ve been missing is that is quite large compared with , and we are making really quite a strong assumption about the hardness of the pseudorandom generator. Specifically, we’re not just assuming that the generator has superpolynomial (in ) hardness: we’re assuming that its hardness is at least for some small positive constant . That way the hardness can easily be comparable to . So there isn’t some clever way of “dropping down a level” from subsets of to subsets of or anything like that.

The third thing that got in the way of my understanding the proof *is* connected with levels. It’s surprising how often something easy can feel hard because it is talking about, say, sets of sets of sets. Here it is important to get clear about what we’re about to discuss, which is a sequence of probability distributions of Boolean functions from to . This shouldn’t be *too* frightening, since we’ve already discussed two such probability distributions: the uniform distribution and the distribution where you pick a random and take the function . What we’re going to do now is create, in a natural way, a sequence of probability distributions that get gradually more and more random. That is, we’ll start with the the distribution that’s uniform over all the , and step by step we’ll introduce more randomness into the picture until after steps we’ll have the uniform distribution over all functions from to .

I’m going to describe the sequence of distributions in a slightly different way from the way that Razborov and Rudich describe it. (In particular, their distributions start with the uniform distribution and get less and less random.) I don’t claim any particular advantage for my tiny reformulation — I just find it slightly easier. I also feel that if I’ve reworked something in even a minor way then it’s a sign that I understand it, so to a large extent I’m doing it for my own benefit.

First of all, let us take the set of binary sequences of length less than or equal to . These sequences form a binary tree if we join each sequence to the two sequences you get by appending a 0 and a 1. Let us take a sequence of trees , where consists just of the root of (that is, the empty sequence) and is the full tree , and let us do so in such a way that each is obtained from by finding a leaf and adding its children: that is, picking a sequence in that is not a sequence of length and is not contained in any other sequence in and adding the sequences and . It is not hard to see that this can be done, and since we add two sequences at a time and get from the tree that consists just of the empty sequence to the tree of all binary sequences of length at most , there are trees in this sequence of trees.

Given a tree , we create a probability distribution on the set of functions from as follows. For every we let be maximal such that the subsequence belongs to . We then take a random , apply the composition , and take the first digit of the result. If then we interpret this to mean that we simply pick a random . An important point to be clear about here is that the random points *do not depend on* . So what I should really have said is that for each vertex of we pick a random . Then for each we find the maximal subsequence that belongs to , apply to the composition , and pass to the first digit. If we interpret the composition as the identity function, so we simply take .

Note that if , then this is just applying the composition to and taking the first digit, which is exactly what it means to take a random function of the form . Note also that if , then for every all we do is take , which is another way of saying that whatever is, we choose a random and take its first digit, which of course gives us a function from to chosen uniformly at random. So it really is the case that the first distribution in our sequence is uniform over all and the last distribution is uniform over all functions from to .

Now let’s think about the difference between the distribution that comes from and the distribution that comes from . Let and be the binary sequences that belong to but not to . Let us also write for the random element of associated with any given 01-sequence . Let be the length of . Then the one thing we do differently when evaluating the random image of is this. If has as an initial segment, then instead of evaluating the first digit of we evaluate the first digit of . If does not have as an initial segment, then nothing changes.

Note that the first of these evaluations can be described as the first digit of . The basic idea now is that if we can distinguish between that and , then we can distinguish between and . But is a purely random sequence in whereas is a random output from the pseudorandom generator.

Let us now remind ourselves what we are trying to prove. Suppose that is a simplicity property that can be computed in time (which is another way of saying that it can be computed in time that is polynomial in ). By “simplicity property” I mean that holds whenever has circuit complexity at most , where is the polynomial function described earlier, and does not hold for almost all functions. Actually, we can be quite generous in our interpretation of the latter statement: we shall assume that if is a purely random function, then holds with probability at most .

If has those two properties, then holds for every , and therefore

I wrote there to mean the probability when is chosen uniformly at random, and to mean the probability when is chosen uniformly at random.

Let me now write for the probability distribution associated with . From the above inequality and the fact that there are of these distributions it follows that there exists such that

That is, the probability that holds if you choose a function randomly using the st distribution is greater by than the probability if you use the th distribution.

What we would like to show now is that this implies that the hardness of the pseudorandom generator is at most . To do that, we condition on the values of for all sequences other than , and . (Recall that was defined to be the unique sequence such that and belong to but not to .) By averaging, there must be some choice of all those sequences such that, conditioned on that choice, we still have

We now have a way of breaking the pseudorandom generator . Suppose we are given a sequence and want to guess whether it is a random sequence of the form (with chosen uniformly from ) or a purely random element of . We create a function from to as follows. For each , let be the maximal initial segment of that belongs to . If is not equal to , then take the first digit of , where is the length of and is the fixed sequence from the set of sequences on which we have conditioned. If and has length , then apply to the left half of if the next digit of is 0 and to the right half of if it is 1. Then take the first digit of the result.

If is a random sequence of the form , then what we are doing is choosing a random and taking the first digit of . Therefore, we are choosing a random function according to the distribution , conditioned on the choices of . If on the other hand is a purely random sequence in , then we are choosing a random function according to the distribution under the same conditioning. Since the probabilities that holds differ by at least and can (by hypothesis) be computed in time , it follows that the hardness of is at most .

Since for an arbitrarily small constant , it follows that if there is a polynomial-time-computable property that distinguishes between random and pseudorandom functions from to , then no pseudorandom generator from to can have hardness greater than .

A small remark to make at this point is that the hardness of the generator needs to be defined in terms of circuit complexity for this argument to work. Basically, this is because it is not itself that we are using to distinguish between random and pseudorandom sequences but a function that is created out of (using in particular lots of restrictions of the random ) in a not necessarily uniform way. So even if can be computed in polynomial time, it does not follow that there is an *algorithm* (as opposed to circuit) that will break the generator in time .

Recall that earlier I proposed a way of getting round the natural-proofs barrier and proceeded to argue that it almost certainly failed, for reasons very similar to the reasons for the natural-proofs barrier itself. The question I would like to consider here is whether that argument can be made to rely on the hardness of factorizing rather than on the hardness of a problem based on 3-bit scramblers that does not, as far as I know, connect to the main body of hardness assumptions that are traditionally made in theoretical computer science.

Here is an informal proposal for doing so. Let , let , identify the points of with graphs on (labelled) vertices in some obvious way and let take the value 1 if the corresponding graph contains a clique of size and 0 otherwise. Also, let be the function that is 1 if the th edge belongs to the graph and 0 otherwise. Those functions are chosen so that the function is (trivially) an injection.

Now there is a concept in theoretical computer science called a *pseudorandom permutation* from to . I’ll define it properly in a moment, but for now it’s enough to say that if you try to invent the definition for yourself, you’ll probably get it right. Roughly, however, it’s a permutation of that depends on a randomly chosen string in for some suitable and is hard to distinguish from a purely random permutation of . Importantly, pseudorandom permutations exist if pseudorandom functions exist (I’ll discuss this too), as was shown by Luby and Rackoff.

So let’s compose the function with a pseudorandom permutation , obtaining a function .

Actually, one thing you might not guess if you try to define a pseudorandom permutation is that the permutation *and its inverse* should both be efficiently computable functions. Because of that, if we are provided the values of for a graph , we can easily tell whether or not contains a clique of size : we just compute , which gives us , and look at the first digit.

Now let’s suppose that we have a progress-towards-cliques property that takes the value 1 for a sequence of functions if, given it is easy to determine whether contains a clique of size , and suppose that does not apply to almost all sequences of functions. That is, let us suppose that if is a purely random sequence of functions (subject to the condition that the resulting function to is an injection) then the probability that satisfies is at most .

Next, suppose that we have a permutation and we want to guess whether it has been chosen randomly or pseudorandomly. Composing it with the function and applying to the resulting function, we get 1 if has been chosen pseudorandomly. Note that to do this calculation we need at most steps to calculate and, if is polynomial-time computable, at most steps to determine whether holds. So if has hardness at least , it follows that the probability that a random injection yields functions that satisfy is at least . For sufficiently large , this is a contradiction.

I’m fairly confident that I can make the above argument precise and rigorous, but it may be a known result, or folklore, or uninteresting for some reason I haven’t noticed. If anyone who knows what they are talking about thinks it’s worth my turning it into a note and at least putting it on the arXiv, then I’m ready to do that, but perhaps it is better left in its current slightly informal state.

]]>

I have a secondary motivation for the posts, which is to discuss a way in which one might try to get round the natural-proofs barrier. Or rather, it’s to discuss a way in which one might initially think of trying to get round it, since what I shall actually do is explain why a rather similar barrier seems to apply to this proof attempt. It might be interesting to convert this part of the discussion into a rigorous argument similar to that of Razborov and Rudich, which is what prompts me to try to understand their paper properly.

But first let me take a little time to talk about what the result says. It concerns a very natural (hence the name of the paper) way that one might attempt to prove that P does not equal NP. Let be the set of all Boolean functions . Then the strategy they discuss is to show on the one hand that all functions in that can be computed in fewer than steps have some property of “simplicity”, and on the other hand that some particular function in NP does not have that simplicity property.

Now if one wants to design a proof along those lines, it is important that the simplicity property shouldn’t be *trivial*. By that I mean that it shouldn’t be a property such as “can be computed in fewer than steps”. The good news about that kind of property is that it is probably true that it distinguishes between your favourite NP-complete function and functions that can be computed in fewer than steps. But the obvious bad news is that proving that this is the case is trivially equivalent to the problem we are trying to solve.

The moral of that silly example is that we are looking for a property that is in some sense *genuinely different* from the property of being computable in at most steps. Without that, we’ve done nothing.

There are plenty of other silly examples, like “can be computed in fewer than steps”, which are slightly less trivial but unsatisfactory in exactly the same way. So what we really want is some kind of simplicity property that isn’t obviously to do with how easy a function is to compute — it is that air of tautology that makes certain properties useless for our purposes.

Now one way that we might hope to make the simplicity property non-trivial in this sense is if it is somehow simpler to deal with than the property of being computable in a certain number of steps.

Let me pause here to stress that there are two levels at which I am talking about simplicity here. One is the level of Boolean functions: we want to show that some Boolean functions are simple and some are not, according to some as yet unformulated definition of simplicity. The other is the level of *properties* of Boolean functions: we want our simplicity property to be in some sense simple itself, so that it doesn’t have the drawback of the tautologous examples. So one form of simplicity concerns subsets of (which are equivalent to Boolean functions in ) and the other concerns subsets of , which can be identified with subsets of , a -dimensional discrete cube.

Why do we want the simplicity property to be itself simple? There are two potential advantages. One is that it will be easier to prove that some Boolean functions are simple and other Boolean functions are not simple if the simplicity property is not too strange and complicated. The other is that if the simplicity property is simple, then it will give us some confidence that it is not one of the semi-tautologous properties that get us nowhere.

This second point isn’t obvious — how do we know that the property of being computable in at most steps corresponds to a very complicated subset of the -dimensional Boolean cube? The result of Razborov and Rudich gives a surprisingly precise answer to this question, but if you just want to convince yourself that it is probably true, then a short and easy nonrigorous argument is enough, and provides a good introduction to the slightly longer rigorous argument of Razborov and Rudich.

The basic philosophy behind the argument is this: a random efficiently computable function is almost impossible to distinguish from a random function. So if we let be the subset of that consists of all Boolean functions computable in at most steps, then looks very like a random subset of . (Recall that is the set of *all* Boolean functions on , so is a set of size .)

Let me briefly argue *very* nonrigorously (this is not the nonrigorous argument I was talking about two paragraphs ago, but an even vaguer one). A property of Boolean functions can be identified with a subset of . A *simple* property of Boolean functions can therefore be thought of as a simple subset of . A very general heuristic tells us that if is a set, is a simple subset of of density and is a “random-like” subset of of density , then has density roughly . That is, there is almost no correlation between a simple set and a random-like set. If we were to say “random” instead of “random-like”, then this kind of statement can often be proved using an easy counting argument: for each , the probability that has density significantly different from is very small. (I’m taking to be a random set of density .) Since there aren’t very many simple sets, most sets have the property that they do not correlate with *any* simple sets.

Suppose that that argument transferred from random sets to “random-like” sets and that the set of functions computable in at most steps is a “random-like” subset of . That will tell us that if is any simple simplicity property, then the probability that a random function in has property is almost the same as the probability that a random function in has property . It follows that if all functions in are simple (as we want for the proof strategy to get off the ground), then almost all functions in must be simple. But that’s saying that a *random* function should be simple (with high probability), which hardly sounds like the sort of simplicity property we know and love.

The statement of Razborov and Rudich’s main theorem is starting to take shape. What the above argument suggests is that if we want to use a simplicity property to show that then we have an unwelcome choice: either has to be a strange and complicated property or almost all Boolean functions must have property . Razborov and Rudich formulate a precise version of this statement and prove it subject to the assumption that pseudorandom generators exist — an assumption that is widely believed to be true.

The previous section leaves a number of unjustified statements. Before I attempt to justify them, let me make two remarks. The first is that I haven’t said what I mean by the set of all functions computable in at most steps. Am I putting some bound on the size of the Turing machine that does the computation? If so, how?

The second remark is that for the general idea to be valid (that simple simplicity properties won’t distinguish between efficiently computable functions and arbitrary functions), it is enough if we can find *some* set of efficiently computable functions and convince ourselves that it looks like a random set. It doesn’t matter whether is the set of “all” efficiently computable functions, so we don’t have to decide what “all efficiently computable functions” even means. So that deals with the problem just mentioned.

I now want to describe a set of functions of low circuit complexity. This is not the same as low computational complexity, so the remarks I am about to make concern the question of how one might distinguish between NP and the class of functions of polynomial circuit complexity. Since functions computable in polynomial time can be computed with polynomial-sized circuits, this would be enough to show that ; indeed, it is one of the main strategies for showing it.

Let us define a *3-bit scrambler* to be a function of the following form. Let be a subset of of size 3, and assume for convenience that . Let be a permutation of . (That is, it takes the eight points in and permutes them in some way — it doesn’t matter how.) Then takes an -bit Boolean sequence and “does to “. I hope that that informal definition will be enough for most people, but if you want a formal definition then here goes. Let’s define to be the projection that takes an -bit sequence to the sequence , and let’s define to be the “insertion” that takes a pair of sequences and and replaces the bits and by and , respectively. Finally, if is an -bit sequence, define to be . In other words, we isolate the bits in , apply the permutation , and then stick the resulting three bits back into the slots where the original three bits came from.

A simple example of a 3-bit scrambler is the map that takes an -bit sequence and performs the following operation. If the first three bits are , then it replaces them by ; if the first three bits are , then it replaces them by ; otherwise it does nothing.

It is easy to see that any 3-bit scrambler can be created using a circuit of bounded size. Therefore, a composition of 3-bit scramblers has circuit complexity at most for some absolute constant .

What’s nice about 3-bit scramblers is that they give us a big supply of pretty random looking functions of low circuit complexity: you just pick a random sequence of 3-bit scramblers and compose them. That gives you a function from to , but if you want a function from to you can simply take the first digit.

Now I would like to convince you, with a complete absence of anything so vulgar as an actual proof, that a random function created in this way is hard to distinguish from a genuinely random function. Let’s think about what a 3-bit scrambler looks like geometrically. If we have the function , then there is a sense in which what it does depends only on the bits in . But what is that sense, since the image depends on all the bits of ? A nice way to look at it is this. The Boolean cube can be partitioned into eight parts according to the values at the three bits in . Each of these parts is a subcube of codimension 3. The effect of is to apply a permutation to those eight parts, which it carries out in the simplest way possible. For example, if part X is to move to part Y, then it is simply translated there: the bits inside are changed but the bits outside are not changed. So you chop up the big cube into eight bits and swap those bits around without rotating them or altering their internal structure in any way.

I like to think of this as a sort of gigantic Rubik’s cube operation. The analogy is not perfect, since rotation does take place in a Rubik’s cube. However, what the two situations have in common is a set of fairly simple permutations that can combine to create much more complicated ones. In fact, the 3-bit scramblers generate every even permutation of the set . This isn’t obvious, but isn’t a massively hard result either. It is false for 2-bit scramblers, because those are all affine over .

Consider now the following problem: you are given a scrambled Rubik’s cube and asked to unscramble it in at most 15 moves. The worst positions are known to need 20 moves. Of course, I’m assuming that at most 15 moves have been used for the scrambling — in fact, let’s assume that those 15 moves were selected randomly. As far as I know, finding an economical unscrambling is a hard problem, one that in general you shouldn’t expect to be able to solve except by brute force. A good reason for expecting it to be hard is that it’s very much in the territory of problems that are known to be not just hard but impossible, such as solving the word problem in groups.

And now consider a closely related problem: you are given a Rubik’s cube to which a random 15 moves have been applied, and another Rubik’s cube that is scrambled uniformly at random (that is, it is in a random position chosen uniformly from all positions reachable from the starting configuration), and are asked to guess which is which. Is there some quick way of making a guess that is significantly better than chance?

If you agree that the answer is probably no, then you should be even readier to agree that the answer is no for the corresponding problem concerning 3-bit scramblers, since those are all the more complicated. But I suppose I shouldn’t say that without providing a little bit of evidence that they really are complicated. For that I’ll refer to a paper of mine that was published in 1996, where I showed that if you compose a random sequence of 3-bit scramblers, then the resulting permutation of the Boolean cube is *almost -wise independent* for some that depends in a power-type way on and , meaning that if you choose any distinct sequences, then their images are approximately uniformly and independently distributed. This gives a reasonably strong sense in which a random composition of 3-bit scramblers looks like a random permutation of . Of course, it’s a long way from a proof that a random composition of 3-bit scramblers cannot be efficiently distinguished from a random permutation, but that’s not something we’re going to be able to prove any time soon, since it would imply that . However, it is a reassuring piece of evidence: although the idea that these random scramblings are hard to distinguish from genuinely random functions is quite plausible, it is good to have some reason to believe that this plausibility is not a mirage.

It is important be clear here what “hard to distinguish” means, so let’s pause for a moment and think how we could distinguish between random compositions of 3-bit scramblers and genuinely random even permutations of . (Again, if you want to talk about functions to instead, then take first digits. It doesn’t affect the discussion much.) To be precise about what the problem is asking, you are given two even permutations of , one a random composition of 3-bit scramblers and the other an even permutation chosen uniformly at random. Your task is to guess which is which with a probability significantly better than 1/2 of being correct. The question is how much computer power you need to do that.

The only obvious strategy is brute force: you look at every composition of 3-bit scramblers and see whether any of the resulting permutations is equal to one of the two permutations you’ve been given. If it is, then with very high probability that’s the one that was not chosen purely randomly. (It’s possible, but extraordinarily unlikely, that a purely random even permutation just happens to be a composition of 3-bit scramblers.)

The number of compositions of 3-bit scramblers is , which is bigger than exponential, so this strategy is very expensive indeed. In fact, it’s superpolynomial not just in but also in , which is a more appropriate measure, since to specify the problem we need to specify in the region of bits of information: the values taken by the two permutations. (It’s actually more like , though that’s a slight overestimate since we know that both functions are even permutations.)

What is in terms of ? Well, let’s write . Then (here is a constant that can vary from expression to expression), so . A polynomial function of takes the form , so this is distinctly bigger.

I said that this part of the post would not be rigorous, but that is slightly misleading, since I *have* just proved something rigorous: that *if* being able to detect the output of a 3-bit scrambler with probability better than chance is a hard problem, in the sense that the best algorithm is not much better than brute force, then the ugly choice described earlier really is necessary: if you want a property that distinguishes between functions computable by polynomial-sized circuits and arbitrary functions, then either that property will have to be one that cannot be computed in polynomial time (as a function of ) or it will have to apply to almost all functions.

The drawback with this argument is that its interest depends on the unsupported assertion that the 3-bit-scrambler problem is hard. What Razborov and Rudich did was similar, but they used a different assertion — also unproved, but more convincingly supported — namely that factorizing is hard.

Before I get on to how Razborov and Rudich did that, I want to discuss an approach to showing that that initially appears to get round the difficulty I’ve just described. Recall that the difficulty is this. If is a property of Boolean functions that applies to all functions of circuit complexity at most , then if certain problems that look very hard really are very hard, it follows that either is not computable in polynomial time (as a function of ) or applies to almost all functions.

In the latter case, it seems unreasonable to think of as a “simplicity” property. But so what? Do we need a simplicity property? Another idea is to have what one might think of as a “making-progress” property. Suppose, for example, that we are trying to prove that the problem of detecting whether a graph has a clique of size is of superpolynomial circuit complexity. Perhaps we could define some kind of measure that we could apply to Boolean functions, such that the higher that measure was, the more information the Boolean functions would, in some sense, contain about which graphs contained cliques of size and which did not.

There is a well-known argument that instantly kills this idea. Let’s suppose that our measure of progress towards detecting cliques is not completely stupid. In that case, a random function will, with very high probability, have made absolutely no progress towards detecting cliques. But now let be the function that’s 1 if your graph contains a clique of size and 0 otherwise, and let be a Boolean function chosen uniformly at random. Then the function is also a Boolean function chosen uniformly at random. So and have, individually, made no progress whatever towards detecting cliques. However, , so in one very simple operation — the exclusive OR, we get from no progress at all to the clique function itself.

But does that really kill the idea? A natural response to this example is to think not about individual functions but about *ensembles* of functions. Is there a useful sense in which, while neither nor on its own carries any information about whether graphs contain cliques, the pair does?

There is obviously *some* sense in which the pair contains this information, since if you are given the functions and you can easily determine whether a graph contains a clique of size . However, we would like to generalize this very simple example. Here is a strategy one might try to adopt to prove that .

1. Choose your favourite NP-complete function, such as the clique function.

2. Define a “clique usefulness” property on ensembles of functions: roughly speaking this would tell you, given a set of Boolean functions, whether it had any chance of helping you determine in a short time whether a graph contains a clique of size .

3. Prove that the set of coordinate functions (that is, the functions defined by ) does not have the clique usefulness property.

Note that if we do things backwards like this, focusing very much on the target (to detect cliques) rather than the initial information (whether or not each edge belongs to the graph), then the property of “getting close to the target” is naturally small. So could we use this kind of idea to get round the difficulty that any reasonably simple simplicity property has to apply to almost all functions?

I think the answer is no, for reasons that are fairly similar to the reasons discussed in the previous section. Again I’ll use 3-bit scramblers to make my point. Let’s suppose that we have a property that applies to ensembles of functions, and that measures, in some sense, “how much information they contain about cliques”. Now let me define a collection of ensembles of functions using 3-bit scramblers. I’ll start with the clique function itself, which I’ll call , and I’ll also take some random Boolean functions . (It isn’t actually important that there are functions, but there should be around .) Putting those functions together gives me a function . Now I’ll simply compose with a random composition of 3-bit scramblers. That is, I’ll let be random 3-bit scramblers (with ) and I’ll define to be the Boolean function .

Suppose I know and the functions . Then it is easy to reconstruct , since I can just take the composition . Thus, if I am given the Boolean functions , then with the help of a polynomial-sized circuit (to calculate the composition of the inverses of the 3-bit scramblers) I can reconstruct . Taking the first digit, I find out whether or not my graph contains a clique of size .

Therefore, any “clique usefulness” property is going to have to do something that looks rather hard: it must distinguish between ensembles produced in the manner just described, and genuinely random ensembles of Boolean functions. Note that what is not random about the functions is not the functions themselves but the very subtle dependencies between them.

There is a slightly unsatisfactory feature of this problem, which is that it depends on a very specific function, namely the clique function. Also, when we create the function , we don’t create a bijection, since it is not the case that exactly half of all graphs contain a clique of size . To deal with the latter criticism, let’s increase the number of random functions, so now we start with for some that’s large enough that is an injection. (It won’t have to be very large for this — linear in will be fine.) Now compose with random 3-bit scramblers, where the sets are subsets of . The result of this is some functions .

The problem we would now like to solve is this. Given the functions , find a sequence of 3-bit scramblers defined on the Boolean cube such that, writing for the composition and for the function (so and ), we have if and only if contains a clique of size .

This is a special case of the following problem. Suppose you are given sequences of points and of . Does there exist a composition of 3-bit scramblers such that for every and for every ?

Actually, that isn’t quite the problem that’s of interest, but it is very closely related. The real problem is more like this. Suppose you are given points and in and told that one of the following two situations holds. Either they have been chosen randomly or we have chosen randomly with first coordinate 1 and randomly with first coordinate 0, and taken a random composition of 3-bit scramblers, setting and for each . Can you efficiently guess which is the case with a chance of being correct that is significantly different from 1/2 without using vast amounts of computer power?

This doesn’t look at all easy, so it looks very much as though something rather similar to the natural-proofs statement holds in this reverse direction as well. It would say something like this. Suppose that you have some polynomially computable property (for “informativeness”) of sets of functions , such that has property whenever the clique function (or any other NP function of your choice) can be efficiently computed given the values of . Then almost every sequence of functions has property . The “proof” is similar to the earlier argument: a polynomially computable property can’t distinguish, even statistically, between genuinely random sets of functions and random sets of functions that have been cooked up to have just enough dependence to be informative about cliques. Since all the latter must have property , almost all the former must have property as well.

In the next post I’ll turn to the actual argument of Razborov and Rudich.

]]>