The next morning kicked off (after breakfast at the place on the corner opposite my hotel, which served decent espressos) with Jim Arthur, who gave a talk about the Langlands programme and his role in it. He told us at the beginning that he was under strict instructions to make his talk comprehensible — which is what you are supposed to do as a plenary lecturer, but this time it was taken more seriously, which resulted in a higher than average standard. Ingrid Daubechies deserves a lot of credit for that. He explained that in response to that instruction, he was going to spend about two thirds of his lecture giving a gentle introduction to the Langlands programme and about one third talking about his own work. In the event he messed up the timing and left only about five minutes for his contribution, but for everybody except him that was just fine: we all knew he was there because he had done wonderful work, and most of us stood to learn a lot more from hearing about the background than from hearing about the work itself.

I’ve made a few attempts to understand the Langlands programme — not by actually studying it, you understand, but by attending general-audience talks or reading general-audience articles. It’s a bit of a two-steps-forward (during the talk) and one-step-back (during the weeks and months after the talk) process, but this was a very good lecture and I really felt I learned things from it. Some of them I immediately forgot, but have in my notes, and perhaps I’ll fix them slightly better in my brain by writing about them here.

For example, if you had asked me what the central problem in algebraic number theory is, I would never have thought of saying this. Given a fixed polynomial and a prime , we can factorize into irreducibles over the field . It turns out to be inconvenient if any of these irreducible factors occurs with multiplicity greater than 1, so an initial assumption is that has distinct roots over (or at least I think that’s the assumption). [Insert: looking at my notes, I realize that a better thing to write is , the splitting field of , rather than , though I presume that that gives the same answer.] But even then, it may be that over some primes there are repeated irreducible factors. The word “ramified”, which I had always been slightly scared of, comes in here. I can’t remember what ramifies over what, or which way round is ramified and which unramified, so let me quickly look that up. Hmm, that was harder than I expected, because the proper definition is to do with rings, extension fields and the like. But it appears that “ramified” refers to the case where you have multiplicity greater than 1 somewhere. For the purposes of this post, let’s say that a prime is ramified (I’ll take the polynomial as given) if has an irreducible factor over with multiplicity greater than 1. The main point to remember is that the set of ramified primes is small. I think Arthur said that it was always finite.

So what is the fundamental problem of algebraic number theory? Well, when you decompose a polynomial into irreducible factors, those factors have degrees. If the degree of is , then the degrees of the irreducible factors form a partition of : that is, a collection of positive integers that add up to . The question is this: which (unramified) primes give rise to which partitions of ?

How on earth is *that* the fundamental problem of algebraic number theory? What’s interesting about it? Aren’t number theorists supposed (after a long and circuitous route) to be solving Diophantine equations and things like that?

Arthur gave us a pretty convincing partial answer to these questions by discussing the example . The splitting field is — that is, rational linear combinations of 1 and — and the only ramified prime is 2. (The reason 2 is ramified is that over we have .)

Since the degree of is 2, the two partitions of the degree are and . The first occurs if and only if cannot be factorized over , which is the same as saying that -1 is not a quadratic residue. So in this case, the question becomes, “For which odd primes is a quadratic residue?” to which the answer is, famously, all primes congruent to 1 mod 4. So Arthur’s big grown-up question is a generalization of a familiar classical result of number theory.

To answer the question for quadratic polynomials, Gauss’s law of quadratic reciprocity is a massive help. I think it is correct to say that the Langlands programme is all about trying to find vast generalizations of quadratic reciprocity that will address the far more general question about the degrees of irreducible factors of arbitrary polynomials. But perhaps it is more general still — at the time of writing I’m not quite sure.

Actually, I think I am sure. One thing Arthur described was Artin L-functions, which are a way of packaging up the data I’ve just described. Here is the definition he gave. You start with a representation of the Galois group of . For simplicity he assumed that the Galois group was actually (where is the degree of ). Then for each unramified prime the partition of you get can be thought of as the cycle type of a permutation and thus as a conjugacy class in . The image of this conjugacy class under is a conjugacy class in , and since conjugate linear maps have the same determinant, this conjugacy class has a well-defined determinant, which we denote by . The Artin L-function is then defined to be

If you expand this out, you get a Dirichlet series, of which this is the Euler product. And Dirichlet series that have Euler products are basically L-functions. Just as the Riemann zeta function packages up lots of important information about the primes, so the Artin L-functions package up lots of important information about the fundamental problem of algebraic number theory discussed earlier.

One interesting thing that Arthur told us was that in order to do research in this area, you have to use results from many different areas. This makes it difficult to get started, so most young researchers start by scouring the textbooks for the key theorems and using them as black boxes, understanding them fully only much later.

For example, certain Riemannian manifolds are particularly important, because automorphic forms come from solutions to differential equations (based on the Laplacian) on those manifolds. Arthur didn’t tell us exactly what these “special Riemannian manifolds” were, but he did say that they corresponded to reductive algebraic groups. (An algebraic group is roughly speaking a group defined using polynomials. For example, is algebraic, because the condition of having determinant 1 is expressible as a polynomial in the entries of a matrix, and the group operation, matrix multiplication, is also a polynomial operation. What “reductive” means I don’t know.) He then said that many beginners memorize ten key theorems about reductive algebraic groups and don’t bother themselves with the proofs.

Where does Langlands come into all this? He defined some L-functions that have a formula very similar to the formula for Artin L-functions: in fact, all you have to do is replace the in that formula with a . So a lot depends on what is. Apparently it’s an automorphic representation. I’m not sure what those are.

A big conjecture is that every arithmetic L-function is an automorphic L-function. This would give us a non-Abelian class field theory. (Classical class field theory studies Abelian field extensions, and can tell you things like which numbers are cubic residues mod .)

This conjecture is a special case of Langlands’s famous principle of functoriality, which Artin described as *the* fundamental problem. (OK, I’ve already described something else as the fundamental problem, but this is somehow the *real* fundamental problem.) I can’t resist stating the problem, because it looks as though it ought to be easy. I can imagine getting hooked on it in a parallel life, because it screams out, “Think about me in the right way and I’ll drop out.” Of course, that’s a very superficial impression, and probably once one actually does think about it, one quickly loses any feeling that it should at some sufficiently deep level be easy.

The principle says this.

**Conjecture.** *Given two groups and , an automorphic representation of and an analytic homomorphism between their dual groups*

there is an automorphic representation of such that ; that is,

*as conjugacy classes in .*

To me it looks like the kind of trivial-but-not-trivially-trivial statement one proves in a basic algebra course, but obviously it is far more than that.

One quite nice thing that Arthur did was to draw an extended analogy with a situation that held in physics a century or so ago. It was observed that the absorption spectra of starlight had black lines where certain frequencies were absent, and these corresponded to the wavelengths emitted by familiar elements. This suggested that the chemistry of stars was similar to the chemistry on earth. Furthermore, because these absorption spectra were red-shifted to various extents, it also suggested that the stars were moving away from us, and ultimately suggested the Big Bang theory. However, exactly *why* these black lines appeared was a mystery, which was not solved until the formulation of quantum mechanics.

Something like this is how Arthur sees number theory today. Automorphic forms tell us about other number-theoretic worlds. Spectra come from differential equations that are quite similar to the Schrödinger equation — in particular, they are based on Laplacians — that come from the geometry of the special Riemannian manifolds I mentioned above. But exactly how the connection between the number theory and the spectral theory works is still a mystery.

To end on a rather different note, the one other thing I got out of this excellent talk was to see Gerhard “I was at ICM2014″ Paseman, of Mathoverflow fame. Later I even got to meet him, and he gave me a Mathoverflow teeshirt. I became aware of him because there were some small technical problems during the talk, and GP offered advice from the audience.

]]>

That’s a fairly easy question, so let’s follow it up with another one: how surprised should we be about this? Is there unconscious bias towards mathematicians with this property? Of this year’s 21 plenary lecturers, the only one with the property was Mirzakhani, and out of the 20 plenary lecturers in 2010, the only one with the property was Avila. What is going on?

On to more serious matters. After Candès’s lecture I had a solitary lunch in the subterranean mall (Korean food of some description, but I’ve forgotten exactly what) and went to hear Martin Hairer deliver his Fields medal lecture, which I’m not going to report on because I don’t have much more to say about his work than I’ve already said.

By and large, the organization of the congress was notably good — for example, I almost never had to queue for anything, and never for any length of time — but there was a little lapse this afternoon, in that Hairer’s lecture was scheduled to finish at 3pm, exactly the time that the afternoon’s parallel sessions started. In some places that might have been OK, but not in the vast COEX Seoul conference centre. I had to get from the main hall to a room at the other end of the centre where theoretical computer science talks were taking place, which was probably about as far as walking from my house in Cambridge to the railway station. (OK, I live close to the station, but even so.)

Inevitably, therefore, I arrived late to Boaz Barak’s talk, but he welcomed me, and a few others in my position, with the reassuring words that everything he had said up to now was bullshit and we didn’t need to worry about it. (He was quoting a juggler he had seen in Washington Square.)

I always like it when little themes recur at ICMs in different contexts. I’ve already mentioned the theme of looking at big spaces of objects in order to understand typical objects. Another one I mentioned when describing Candès’s lecture: that one should not necessarily be afraid of NP-complete problems, a theme which was present in Barak’s talk as well. I’m particularly fond of it because I’ve spent a lot of time in the last few years thinking about the well-known NP-complete problem where the input is a mathematical statement and the task (in the decision version) is to say whether there is a proof of that statement of length at most — in some appropriate formal system. The fact that this problem is NP-complete does not deter mathematicians from spending their lives solving instances of it. What explains this apparent success? I dream that there might be a very nice answer to this question, rather than just a hand-wavy one that says that the instances studied by mathematicians are far from general.

Barak was talking about something a little different, however. He too has a dream, which is to obtain a very precise understanding of why certain problems are hard (in the complexity sense of not being soluble with efficient algorithms) and others easy. He is not satisfied with mere lists of easy and hard problems, with algorithms for the former and reductions to NP-complete or other “known hard” problems for the latter. He wants a theory that will say which problems are hard and which easy, or at least do that for large classes of problems. And the way he wants to do it is to find “meta-algorithms” — which roughly speaking means very general algorithmic approaches with the property that if they work then the problem is easy and if they fail then it’s hard.

Why is there the slightest reason to think that that can be done? Isn’t there a wide variety of algorithms, each of which requires a lot of ingenuity to find? If one approach fails, might there not be some clever alternative approach that nobody had thought of?

These are all perfectly reasonable objections, but the message, or at least *a* message, of Barak’s talk is that it is not completely outlandish to think that it really is the case that there is what one might call a “best possible” meta-algorithm, in the sense that if it fails, then nothing else can succeed. Again, I stress that this would be for large and interesting classes of algorithm problems (e.g. certain optimization problems) and not for every single describable Boolean function. One reason to hold out hope is that if you delve a little more deeply into the algorithms we know about, you find that actually many of them are based on just a few ideas, such as linear and semidefinite programming, solving simultaneous linear equations, and so on. Of course, that could just reflect our lack of imagination, but it could be an indication that something deeper is going on.

Another reason for optimism is that he has a candidate: the sum-of-squares algorithm. This is connected with Hilbert’s 17th problem, which asked whether every multivariate polynomial that takes only non-negative values can be written as a sum of squares of rational functions. (It turns out that they can’t necessarily be written as a sum of squares of polynomials: a counterexample is .) An interesting algorithmic problem is to write a polynomial as a sum of squares when it can be so written. One of the reasons this problem interests Barak is that many other problems can be reduced to it. Another, which I don’t properly understand but I think would understand if I watched his talk again (it is here, by the way), is that if the unique games conjecture is false, and recall that it too is sort of saying that a certain algorithm is best possible, then the sum-of-squares algorithm is waiting in the wings to take over as the new candidate that will do the job.

An unfortunate aspect of going to Barak’s talk was that I missed Harald Helfgott’s. However, the sacrifice was rewarded, and I can always watch Harald on Youtube.

After another longish walk, but with a non-zero amount of time for it, I arrived at my next talk of the afternoon, given by Bob Guralnick. This was another very nice talk, just what an ICM invited lecture should be like. (By that I mean that it should be aimed principally at non-experts, while at the same time conveying what has been going on recently in the field. In other words, it should be more of a colloquium style talk than a research seminar.)

Guralnick’s title was Applications of the Classification of Finite Simple Groups. One thing he did was talk about the theorem itself, how a proof was announced in 1983 but not actually completed for another twenty years, and how there are now — or will be soon — “second-generation” proofs that are shorter, though still long, and use new ideas. He also mentioned a few statements that can be proved with the classification theorem and are seemingly completely out of reach without it. Here are a few of them.

1. Every finite simple group is generated by two elements.

2. The probability that a random pair of elements generates a finite simple group tends to 1 as the size of the group tends to infinity.

3. For every non-identity element of a finite simple group, there exists an element such that and generate the group.

4. For every finite simple group there exist conjugacy classes and such that for every and every the elements and generate the group.

Why does the classification of finite simple groups help with these problems? Because it means that instead of having to give an abstract proof that somehow uses the condition of having no proper normal subgroups, you have the option of doing a proof that involves calculations in concrete groups. Because the list of (families of) groups you have to consider is finite, this is a feasible approach. Actually, it’s not just that there are only finitely many families, but also that the families themselves are very nice, especially the various families of Lie type. As far as I can tell from the relevant Wikipedia article, there isn’t a formal definition of “group of Lie type”, but basically it means a group that’s like a Lie group but defined over a finite field instead of over $\mathbb{R}$ or $\mathbb{C}$. So things like PSL$(2,q)$ are finite simple groups of Lie type.

Just as the geometrization theorem didn’t kill off research in 3-manifolds, the classification of finite simple groups didn’t kill off group theory, even though in the past many mathematicians have thought that it would. It’s easy to see how that perception might have arisen: the project of classifying finite simple groups became such a major focus for group theorists that once it was done, a huge chunk of what they were engaged in was no longer available.

So what’s left? One answer, one might imagine, is that not all groups are simple. That is not a completely satisfactory answer, because groups can be put together from simple groups in such a way that for many problems it is enough to solve them just for simple groups (just as in number theory one can often prove a result for primes and prove that the product of two numbers that satisfy the result also satisfies the result). But it is part of the answer. For example -groups (that is, groups of prime power order) are built out of copies of a cyclic group of prime order, but that doesn’t begin to answer all the questions people have about -groups.

Another answer, which is closer to the reason that 3-manifold theory survived Perelman, is that proving results even for specific families of groups is often far from easy. For example, have a go at proving that a random pair of (equivalence classes of) matrices generates PSL$(2,q)$ with high probability when is large: it’s a genuine theorem rather than simply a verification.

I want to mention a very nice result that I think is due to Guralnick and his co-authors, though he didn’t quite explicitly say so. Let be a polynomial of degree , with coprime to . Then for every , either is bijective on the field or the set of values it takes has size at most .

What’s so nice about that? Well, the result is interesting, but even more interesting (at least to me) is the fact that the proof involved the classification of finite simple groups, and Guralnick described it (or more accurately, a different result just before it but I think the same remark applies) as untouchable without CFSG, even though the statement is about polynomials rather than groups.

Here is the video of Guralnick’s lecture.

The third invited lecture I went to was given by Francis Brown. Although I was expecting to understand very little, I wanted to go to it out of curiosity, because I knew Francis Brown when he was an undergraduate at Trinity — I think I taught him once or twice. After leaving Trinity he went to France, where he had been ever since, until very recently taking up a (highly prestigious) professorial fellowship at All Soul’s in Oxford. It was natural for him to go to France, because his mother is French and he is bilingual — another aspect that interests me since two of my children are in the same position. I heard nothing of him for a long time, but then in the last few years he suddenly popped up again as the person who has proved some important results concerning motivic zeta functions.

The word “motivic” scares me, and I’m not going to try to say what it means, because I can’t. I first heard of motives about twenty years ago, when the message I got was that they were objects that people studied even though they didn’t know how to define them. That may be a caricature, but my best guess as to the correct story is that even though people don’t know the right definition, they do know what *properties* this definition should have. In other words, there is a highly desirable theory that would do lots of nice things, if only one could find the objects that got you started.

However, what Brown was doing appeared not to be based on layers of conjecture, so I suppose it must be that “motivic versions” of certain objects have been shown to exist.

This was a talk in which I did not take notes. To do a decent job describing it, I’d need to watch it again, but rather than do that, I’ll just describe the traces it left in my memory.

One was that he mentioned the famous problem of the irrationality of for odd , and more generally the problem of whether the vector space over the rationals generated by has dimension . (It has been shown by Ball and Rivoal to have dimension that tends to infinity with , which was a major result when it was proved.)

Another was that he defined multiple zeta values, which are zeta-like functions of more than one integer variable, which come up naturally when one takes two zeta values, multiplies them together, and expands out the result. They were defined by Euler.

He also talked about periods, a very interesting concept defined (I think for the first time) by Kontsevich and Zagier. I highly recommend looking at their paper, available here in preprint form. At least the beginning of it is accessible to non-experts, and contains a very interesting open problem. Roughly speaking, a period is anything you can define using reasonably nice integrals. For example, is a period because it is the area of the unit disc, which has a nice polynomial equation . The nice problem is to prove that an explicit number is not a period. There are only countably many periods, so such numbers exist in abundance. If you want a specific number to try, then you can begin with . Best of luck.

While discussing what motivic zeta values are, he said that there were two approaches one could use, one involving Betti numbers and the other involving de Rham cohomology. He preferred the de Rham approach. “Betti” and “de Rham” became a sort of chorus throughout the talk, and even now I have ringing in my head phrases like “-Betti or -de Rham”.

If I understood correctly, linear dependences between motivic zeta values (which are much fancier objects that still depend on tuples of integers) imply the corresponding dependences between standard zeta values. (I’m talking about both single and multiple zeta values here.) That’s not much help if you are trying to prove *independence* of standard zeta values, but it does do two things for you. One is that it provides a closely related context in which the world seems to be a tidier place. As I understand it, all the conjectures one wants to be true for standard zeta values are true for their motivic cousins. But it has also enabled Brown to discover unexpected dependences between standard zeta values: for instance every multiple zeta value is a linear combination of multiple zeta values where every argument is 2 or 3. (I suppose multiple must mean “genuinely multiple” here.) Actually, looking very briefly at the relevant part of the talk, which is round about the 27th minute, I see that this was proving something called the Hoffman conjecture, so perhaps it is wrong to call it unexpected. But it is still a very interesting result, given that the proof was highly non-trivial and went via motivic zeta values.

My remaining memory trace is that the things Brown was talking about were related to a lot of other important parts of mathematics, and even theoretical physics. I’d love to understand this kind of thing better.

So although a lot of this talk (which is here) went over my head, enough of it didn’t that my attention was engaged throughout. Given the type of material, that was far from obviously going to be the case, so this was another very good talk, to round off a pretty amazing day.

]]>

One person who doesn’t lose any sleep over doubts like this is Emmanuel Candès, who gave the second plenary lecture I went to. He began by talking a little about the motivation for the kinds of problems he was going to discuss, which one could summarize as follows: his research is worthwhile because *it helps save the lives of children*. More precisely, it used to be the case that if a child had an illness that was sufficiently serious to warrant an MRI scan, then doctors faced the following dilemma. In order for the image to be useful, the child would have to keep completely still for two minutes. The only way to achieve that was to stop the child’s breathing for those two minutes. But depriving a child’s brain (or indeed any brain, I’d imagine) of oxygen for two minutes is not without risk, to put it mildly.

Now, thanks to the famous work of Candès and others on compressed sensing, one can reconstruct the image using many fewer samples, which reduces the time the child must keep still to 15 seconds. Depriving the brain of oxygen for 15 seconds is not risky at all. Candès told us about a specific boy who had something seriously wrong with his liver (I’ve forgotten the details) who benefited from this. If you want a ready answer for when people ask you about the point of doing maths, and if you’re sick of the Hardy-said-number-theory-useless-ha-ha-but-what-about-public-key-cryptography-internet-security-blah-blah example, then I recommend watching at least some of Candès’s lecture, which is available here, and using that instead. Then you’ll really have seized the moral high ground.

Actually, I recommend watching it *anyway*, because it was a fascinating lecture from start to finish. In that case, you may like to regard this post as something like a film review with spoilers: if you mind spoilers, then you’d better stop reading here.

I have to admit that as I started this post, I realized that there was something fairly crucial that I didn’t understand, that meant I couldn’t give a satisfactory account of what I wanted to describe. I didn’t take many notes during the talk, because I just wanted to sit back and enjoy it, and it felt as though I would remember everything easily, but there was one important mathematical point that I missed. I’ll come back to it in a moment.

Anyhow, the basic mathematical problem that the MRI scan leads to is this. A full scan basically presents you with the Fourier transform of the image you want, so to reconstruct the image you simply invert the Fourier transform. But if you are sampling in only two percent of directions and you take an inverse Fourier transform (it’s easy to make sense of that, but I won’t bother here), then you get a distorted image with all sorts of strange lines all over it — Candès showed us a picture — and it is useless for diagnostic purposes.

So, in a moment that Candès described as one of the luckiest in his life, a radiologist approached him and asked if there was any way of getting the right image from the much smaller set of samples. On the face of it, the answer might seem to be no, since the dimension of the space of possible outputs has become much smaller, so there must be many distinctions between inputs that are not detectable any more. However, in practice the answer is yes, for reasons that I’ll discuss after I’ve mentioned Candès’s second example.

The second example was related to things like the Netflix challenge, which was to find a good way of predicting which films somebody would like, given the preferences of other people and at least some of the preferences of the person in question. If we make the reasonable hypothesis that people’s preferences depend by and large on a fairly small number of variables (describing properties of the people and properties of the films), then we might expect that a matrix where the th entry represents the strength of preference of person for film would have fairly small rank. Or more reasonably, one might expect it to be a small perturbation of a matrix with small rank.

And thus we arrive at the following problem: you are given a few scattered entries of a matrix, and you want to find a low-rank matrix that agrees pretty well with the entries you observe. Also, you want it the low-rank matrix to be unique (up to a small perturbation) since otherwise you can’t use it for prediction.

As Candès pointed out, simple examples show that the uniqueness condition cannot always be obtained. For example, suppose you have 99 people with very similar preferences and one person whose preferences are completely different. Then the underlying matrix that describes their preferences has rank 2 — basically, one row for describing the preferences of the 99 and one for describing the preference of the one eccentric outlier. If all you have is a few entries for the outlier’s preferences, then there is nothing you can do to guess anything else about those preferences.

However, there is a natural assumption you can make, which I’ve now forgotten, that rules out this kind of example, and if a matrix satisfies this assumption then it can be reconstructed exactly.

Writing this, I realize that Candès was actually discussing a slight idealization of the problem I’ve described, in that he didn’t have perturbations. In other words, the problem was to reconstruct a low-rank matrix exactly from a few entries. An obvious necessary condition is that the number of samples should exceed the number of degrees of freedom of the set of low-rank matrices. But there are other conditions such as the one I’ve mentioned, and also things like that every row and every column should have a few samples. But given those conditions (or perhaps the sampling is done at random — I can’t remember) it turns out to be possible to reconstruct the matrix exactly.

The MRI problem boils down to something like this. You have a set of linear equations to solve (because you want to invert a Fourier transform) but the number of unknowns is significantly larger than the number of equations (because you have a sparse set of samples of the Fourier transform you want to invert). This is an impossible problem unless you make some assumption about the solution, and the assumption Candès makes is that it should be a *sparse vector*, meaning that it has only a few non-zero entries. This reduces the number of degrees of freedom considerably, but the resulting problem is no longer pure linear algebra.

The point that I missed was what sparse vectors have to do with MRI scans, since the image you want to reconstruct doesn’t appear to be a sparse vector. But looking back at the video I see that Candès addressed this point as follows: although the *image* is not sparse, the *gradient* of the image is sparse. Roughly speaking, you get quite a lot of patches of fairly constant colour, and if you assume that that is the case, then the number of degrees of freedom in the solution goes right down and you have a chance of reconstructing the image.

Going back to the more general problem, there is another condition that is needed in order to make it soluble, which is that the matrix of equations should not have too many sparse rows, since typically a sparse row acting on a sparse vector will give you zero, which doesn’t help you to work out what the sparse vector was.

I don’t want to say too much more, but there was one point that particularly appealed to me. If you try to solve these problems in the obvious way, then you might try to find algorithms for solving the following problems.

1. Given a system of underdetermined linear equations, find the sparsest solution.

2. Given a set of entries of a matrix, find the lowest rank matrix consistent with those entries.

Unfortunately, no efficient algorithms are known for these problems, and I think in the second case it’s even NP complete. However, what Candès and his collaborators did was consider *convex relaxations* of these problems.

1. Given a system of underdetermined linear equations, find the solution with smallest norm.

2. Given a set of entries of a matrix, find the matrix with smallest nuclear norm consistent with those entries.

If you don’t know what the nuclear norm is, it’s simple to define. Whereas the rank of a matrix is the smallest number of rank-1 matrices such that is a linear combination of those matrices, the nuclear norm of is the minimum such that you can write with each a rank-1 matrix of norm 1. So it’s more like a quantitative notion of rank.

It’s a standard fact that convex relaxations of problems tend to be much easier than the problems themselves. But usually that comes at a significant cost: the solutions you get out are not solutions of the form you originally wanted, but more like convex combinations of such solutions. (For example, if you relax the graph-colouring problem, you can solve the relaxation but you get something called a fractional colouring of your graph, where the total amount of each colour at two adjacent vertices is at most 1, and that can’t easily be converted into a genuine colouring.)

However, in the cases that Candès was telling us about, it turns out that if you solve the convex relaxations, you get exactly correct solutions to the original problems. So you have the following very nice situation: a problem is NP-complete, but if you nevertheless go ahead and try to solve it using an algorithm that is doomed to fail in general, the algorithm still works in a wide range of interesting cases.

At first this seems miraculous, but Candès spent the rest of the talk explaining to us why it isn’t. It boiled down to a very geometrical picture: you have a convex body and a plane through one of its extreme points, and if the plane is tangent to the body then the algorithm will work. It is this geometrical condition that underlies the necessary conditions I mentioned earlier.

For me this lecture was one of the highlights of the ICM, and I met many other people who greatly enjoyed it too.

]]>

Eventually I just made it, by going back to a place that was semi-above ground (meaning that it was below ground but you entered it a sunken area that was not covered by a roof) that I had earlier rejected on the grounds that it didn’t have a satisfactory food option, and just had an espresso. Thus fortified, I made my way to the talk and arrived just in time, which didn’t stop me getting a seat near the front. That was to be the case at all talks — if I marched to the front, I could get a seat. I think part of the reason was that there were “Reserved” stickers on several seats, which had been there for the opening ceremony and not been removed. But maybe it was also because some people like to sit some way back so that they can zone out of the talk if they want to, maybe even getting out their laptops. (However, although wireless was in theory available throughout the conference centre, in practice it was very hard to connect.)

The first talk was by Ian Agol. I was told before the talk that I would be unlikely to understand it — the comment was about Agol rather than about me — and the result of this lowering of my expectations was that I enjoyed the talk. In fact, I might even have enjoyed it without the lowering of expectations. Having said that, I did hear one criticism afterwards that I will try to explain, since it provides a good introduction to the content of the lecture.

When I first heard of Thurston’s famous geometrization conjecture, I thought of it as the ultimate aim of the study of 3-manifolds: what more could you want than a complete classification? However, this view was not correct. Although a proof of the geometrization conjecture would be (and later was) a massive step forward, it wouldn’t by itself answer all the questions that people really wanted to answer about 3-manifolds. But some very important work by Agol and others since Perelman’s breakthrough has, in some sense that I don’t understand, finished off some big programme in the subject. The criticism I heard was that Agol didn’t really explain what this programme was. I hadn’t really noticed that as a problem during the talk — I just took it on trust that the work Agol was describing was considered very important by the experts (and I was well aware of Agol’s reputation) — but perhaps he could have done a little more scene setting.

What he actually did by way of introduction was to mention two questions from a famous 1982 paper of Thurston (Three-dimensional manifolds, Kleinian groups and hyperbolic geometry) in which he asked 24 questions. The ones Agol mentioned were questions 16-18. I’ve just had a look at the Thurston paper, and it’s well worth a browse, as it’s a relatively gentle survey written for the Bulletin of the AMS. It also has lots of nice pictures. I didn’t get a sense from my skim through it that questions 16-18 were significantly more important than the others (apart from the geometrization conjecture), but perhaps the story is that when the dust had settled after Perelman’s work, it was those questions that were still hard. Maybe someone who knows what they’re talking about can give a better explanation in a comment.

One definition I learned from the lecture is this: a 3-manifold is said to have a property P *virtually* if it has a finite-sheeted cover with property P. I presume that a finite-sheeted cover is another 3-manifold and a suitable surjection to the first one such that each point in the first has preimages for some finite (that doesn’t depend on the point).

Thurston’s question 16 asks whether every aspherical 3-manifold (I presume that just means that it isn’t a 3-sphere) is virtually Haken.

A little later in the talk, Agol told us what “Haken” meant, other than being the name of a very well-known mathematician. Here’s the definition he gave, which left me with very little intuitive understanding of the concept. A compact 3-manifold with hyperbolic interior is *Haken* if it contains an embedded -injective surface. An example, if my understanding of my rapidly scrawled notes is correct, is a knot complement, one of the standard ways of constructing interesting 3-manifolds. If you take the complement of a knot in you get a 3-manifold, and if you take a tubular neighbourhood of that knot, then its boundary will be your -injective surface. (I’m only pretending to know what -injective means here.)

Thurston, in the paper mentioned earlier, describes Haken manifolds in a different, and for me more helpful, way. Let me approach the concept in top-down fashion: that is, I’ll define it in terms of other mysterious concepts, then work backwards through Thurston’s paper until everything is defined (to my satisfaction at least).

Thurston writes, “A 3-manifold is called a Haken manifold if it is prime and it contains a 2-sided incompressible surface (whose boundary, if any, is on ) which is not a 2-sphere.”

Incidentally, one thing I picked up during Agol’s talk is that it seems to be conventional to refer to a 3-manifold as the first time you mention it and as thereafter.

Now we need to know what “prime” and “incompressible” mean. The following paragraph of Thurston defines “prime” very nicely.

The decomposition referred to really has two stages. The first stage is the prime decomposition, obtained by repeatedly cutting a 3-manifold along 2-spheres embedded in so that they separate the manifold into two parts neither of which is a 3-ball, and then gluing 3-balls to the resulting boundary components, thus obtaining closed 3-manifolds which are “simpler”. Kneser proved that this process terminates after a finite number of steps. The resulting pieces, called the prime summands of , are uniquely determined by up to homeomorphism.

Hmm, perhaps the rule is more general: you refer to it as to start with and after that it’s sort of up to you whether you want to call it or .

The equivalent process in two dimensions could be used to simplify a two-holed torus. You first identify a circle that cuts it into two pieces and doesn’t bound a disc: basically what you get if you chop the surface into two with one hole on each side. Then you have two surfaces with circles as boundaries. You fill in those circles with discs and then you have two tori. At this point you can’t chop the surface in two in a non-trivial way, so a torus is prime. Unless my intuition is all wrong, that’s more or less telling us that the prime decomposition of an arbitrary orientable surface (without boundary) is into tori, one for each hole, except that the sphere would be prime.

What about “incompressible”? Thurston offers us this.

A surface embedded in a 3-manifold is two-sided if cuts a regular neighborhood of into two pieces, i.e., the normal bundle to is oriented. Since we are assuming that is oriented, this is equivalent to the condition that is oriented. A two-sided surface is incompressible if every simple curve on which bounds a disk in with interior disjoint from also bounds a disk on .

I think we can forget the first part there: just assume that everything in sight is oriented. Let’s try to think what it would mean for an embedded surface not to be incompressible. Consider for example a copy of the torus embedded in the 3-sphere. Then a loop that goes round the torus bounds a disc in the 3-sphere with no problem, but it doesn’t bound a disc in the torus. So that torus fails to be incompressible. But suppose we embedded the torus into a 3-dimensional torus in a natural way, by taking the 3D torus to be the quotient of by and the 2D torus to be the set of all points with -coordinate an (equivalence class of an) integer. Then the loops that don’t bound discs in the 2-torus don’t bound discs in the 3-torus either, so that surface is — again if what seems likely to be true actually is true — incompressible. It seems that an incompressible surface sort of spans the 3-manifold in an essential way rather than sitting inside a boring part of the 3-manifold and pretending that it isn’t boring.

OK, that’s what Haken manifolds *are*, but for the non-expert that’s not enough. We want to know why we should care about them. Thurston gives us an answer to this too. Here is a very useful paragraph about them.

It is hard to say how general the class of Haken manifolds is. There are many closed manifolds which are Haken and many which are not. Haken manifolds can be analyzed by inductive processes, because as Haken proved, a Haken manifold can be cut successively along incompressible surfaces until one is left with a collection of 3-balls. The condition that a 3-manifold has an incompressible surface is useful in proving that it has a hyperbolic structure (when it does), but intuitively it really seems to have little to do with the question of existence of a hyperbolic structure.

To put it more vaguely, Haken manifolds are good because they can be chopped into pieces in a way that makes them easy to understand. So I’d guess that the importance of showing that every aspherical 3-manifold is virtually Haken is that finite-sheeted coverings are sufficiently nice that even knowing that a manifold is *virtually* Haken means that in some sense you understand it.

One very nice thing Agol did was give us some basic examples of 3-manifolds, by which I mean not things like the 3-sphere, but examples of the kind that one wouldn’t immediately think of and that improve one’s intuition about what a typical 3-manifold looks like.

The first one was a (solid) dodecahedron with opposite faces identified — with a twist. I meant the word “twist” literally, but I suppose you could say that the twist is that there is a twist, meaning that given two opposite faces, you don’t identify each vertex with the one opposite it, but rather you first rotate one of the faces through and *then* identify opposite vertices. (Obviously you’ll have to do that in a consistent way somehow.)

There are some questions here that I can’t answer in my head. For example, if you take a vertex of the dodecahedron, then it belongs to three faces. Each of these faces is identified in a twisty way with the opposite face, so if we want to understand what’s going on near the vertex, then we should glue three more dodecahedra to our original one at those faces, keeping track of the various identifications. Now do the identifications mean that those dodecahedra all join up nicely so that the point is at the intersection of four copies of the dodecahedron? Or do we have to do some *more* gluing before everything starts to join together? One thing we *don’t* have to worry about is that there isn’t room for all those dodecahedra, which in a certain sense would be the case if the solid angle at a vertex is greater than 1. (I’m defining, I hope standardly, the solid angle of a cone to be the size of the intersection of that cone with a unit sphere centred at the apex, or whatever one calls it. Since a unit sphere has surface area , the largest possible solid angle is .)

Anyhow, as I said, this doesn’t matter. Indeed, far from mattering, it is to be positively welcomed, since if the solid angles of the dodecahedra that meet at a point add up to more than , then it indicates that the geometry of the resulting manifold will be hyperbolic, which is exactly what we want. I presume that another way of defining the example is to start with a tiling of hyperbolic 3-space by regular dodecahedra and then identify neighbouring dodecahedra using little twists. I’m guessing here, but opposite faces of a dodecahedron are parallel, while not being translates of one another. So maybe as you come out of a face, you give it the smallest (anticlockwise, say) twist you can to make it a translate of the opposite face, which will be a rotation by an angle of , and then re-enter the opposite face by the corresponding translated point. But it’s not clear to me that that is a consistent definition. (I haven’t said which dodecahedral tiling I’m even taking. Perhaps the one where all the pentagons have right angles at their vertices.)

The other example was actually a pair of examples. One was a figure-of-eight-knot complement, and the other was the complement of the Whitehead link. Agol showed us drawings of the knot and link: I’ll leave you to Google for them if you are interested.

How does a knot complement give you a 3-manifold? I’m not entirely sure. One thing that’s clear is that it gives you a 3-manifold with boundary, since you can take a tubular neighbourhood of the knot/link and take the complement of that, which will be a 3D region whose boundary is homeomorphic to a torus but sits in in a knotted way. I also know (from Thurston, but I’ve seen it before) that you can produce lots of 3-manifolds by defining some non-trivial homeomorphism from a torus to itself, removing a tubular neighbourhood of a knot from and gluing it back in again, but only after applying the homeomorphism to the boundary. That is, given your solid knot and your solid-knot-shaped hole, you identify the boundary of the knot with the boundary of the hole, but not in the obvious way. This process is called Dehn surgery, and in fact can be used to create all 3-manifolds.

But I still find myself unable to explain how a knot complement is *itself* a 3-manifold, unless it is a 3-manifold with boundary, or one compactifies it somehow, or something. So I had the illusion of understanding during the talk but am found out now.

The twisted-dodecahedron example was discovered by Seifert and Weber, and is interesting because it is a non-Haken manifold (a discovery of Burton, Rubinstein and Tillmann) that is virtually Haken.

Going back to the question of why the geometrization conjecture didn’t just finish off the subject, my guess is that it is probably possible to construct lots of complicated 3-manifolds that obviously satisfy the geometrization conjecture because they are already hyperbolic, but that are not by virtue of that fact alone easy to understand. What Agol appeared to say is that the role of the geometrization conjecture is essentially to reduce the whole problem of understanding 3-manifolds to that of understanding hyperbolic 3-manifolds. He also said something that is more or less a compulsory remark in a general lecture on 3-manifolds, namely that although they are topological objects, they are studied by geometrical means. (The corresponding compulsory remark for 4-manifolds is that 4D is the odd dimension out, where lots of weird things happen.)

As I’ve said, Agol discussed two other problems. I think the virtual Haken conjecture was the big one (after all, that was the title of his lecture), but the other two were, as he put it, stronger statements that were easier to think about. Question 17 asks whether every aspherical 3-manifold virtually has positive first Betti number, and question 18 asks whether it virtually fibres over the circle. I’ll pass straight to the second of these questions.

A 3-manifold *fibres over the circle* if there is a (suitably nice) map such that the preimage of every point in is a surface (the fibre at that point).

Let me state Agol’s main results without saying what they mean. In 2008 he proved that if is virtually special cubulated, then it is virtually fibred. In 2012 he proved that cubulations with hyperbolic fundamental group are virtually special, answering a 2011 conjecture of Wise. A corollary is that every closed hyperbolic 3-manifold virtually fibres over the circle, which answers questions 16-18.

There appears to be a missing step there, namely to show that every closed hyperbolic 3-manifold has a cubulation with hyperbolic fundamental group. That I think must have been the main message of what he said in a fairly long discussion about cubulations that preceded the statements of these big results, and about which I did not take detailed notes.

What I remember about the discussion was a number of pictures of cube complexes made up of cubes of different dimensions. An important aspect of these complexes was a kind of avoidance of positive curvature, which worked something like this. (I’ll discuss a low-dimensional situation, but it generalizes.) Suppose you have three squares that meet at a vertex just as they do if they are faces of a cube. Then at that vertex you’ve got some positive curvature, which is what you want to avoid. So to avoid it, you’re obliged to fill in the entire cube, and now the positive curvature is rendered harmless because it’s just the surface of some bit of 3D stuff. (This feels a bit like the way we don’t pay attention to embedded surfaces unless they are incompressible.)

I haven’t given the definition because I don’t remember it. The term CAT(0) came up a lot. At the time I felt I was following what was going on reasonably well, helped by the fact that I had seen an excellent talk by my former colleague Vlad Markovic on similar topics. (Markovic was mentioned in Agol’s talk, and himself was an invited speaker at the ICM.) The main message I remember now is that there is some kind of dictionary between cube complexes and 3-manifolds, so you try to find “cubulations” with particular properties that will enable you to prove that your 3-manifolds have corresponding properties. Note that although the manifolds are three-dimensional, the cubes in the corresponding cube complexes are not limited to three dimensions.

That’s about all I can remember, even with the help of notes. In case I have given the wrong impression, let me make clear that I very much enjoyed this lecture and thought it got the “working” part of the congress off to a great start. And it’s clear that the results of Agol and others are a big achievement. If you want to watch the lecture for yourself, it can be found here.

]]>

When the announcement was made a few hours earlier, my knowledge of Subhash Khot could be summarized as follows.

- He’s the person who formulated the unique games conjecture.
- I’ve been to a few talks on that in the past, including at least one by him, and there have been times in my life when I have briefly understood what it says.
- It’s a hardness conjecture that is a lot stronger than the assertion that PNP, and therefore a lot less obviously true.

What I hoped to get out of the laudatio was a return to the position of understanding what it says, and also some appreciation of what was so good about Khot’s work. Anybody can make a conjecture, but one doesn’t usually win a major prize for it. But sometimes a conjecture is so far from obvious, or requires such insight to formulate, or has such an importance on a field, that it is at least as big an achievement as proving a major theorem: the Birch–Swinnerton-Dyer conjecture and the various conjectures of Langlands are two obvious examples.

The unique games conjecture starts with a problem at the intersection of combinatorics and linear algebra.

Suppose you are given a collection of linear equations over the field . Then you can use Gaussian elimination to determine whether or not they have a solution. Now suppose that you find out that they do *not* have a solution. Then something you might consider doing is looking for an assignment to the variables that solves as many of the equations as possible. If , then a random assignment will solve on average half the equations, so it must be possible to solve at least half the equations. So the interesting thing is to do better than 50%. A famous result of Johan Håstad states that this cannot be done, even when each equation involves just three variables. (Actually, that restriction to three variables is not the surprising aspect — there are many situations where doing something for 2 is easy and the difficulty kicks in at 3. For example, it is easy to determine whether a graph is 2-colourable — you just start at a vertex, colour all its neighbours differently, etc. etc., and since all moves are forced apart from when you start again at a new connected component, if the process doesn’t yield a colouring then you know there isn’t one — but NP-hard to determine whether it is 3-colourable.)

More precisely, Håstad’s result says that for any fixed , if there were a polynomial-time algorithm that could tell you whether it was possible to satisfy at least a proportion of a collection of linear equations over (each equation involving three variables), then P would equal NP. His proof relies on one of the big highlights of theoretical computer science: the PCP theorem.

The unique games conjecture also concerns maximizing the number of linear equations you can solve, but this time we work mod and the equations are very special: they take the form .

To get a little intuition about this, I suppose one should do something I haven’t done until this very moment, and think about how one might go about finding a good algorithm for solving as many equations of this type from some collection as possible. An obvious observation is that once we’ve chosen , the value of is determined if we want to solve the equation . And that may well determine another variable, and so on. It feels natural to think of these equations as a labelled directed graph with the variables as vertices and with an edge from to labelled if the above equation is present in the system. Then following the implications of a choice of variables is closely related to exploring the component of that vertex in the graph. However, since our aim is to solve as many equations as possible, rather than all of them, we have the option of removing edges to make our task easier, though we want to remove as few edges as possible.

Maybe those few remarks will make it seem reasonably natural that the unique games conjecture can be connected with something called the *max cut problem*. This is the problem of finding a partition of the vertices of a graph into two sets such that the number of edges from one side to the other is as big as possible.

Actually, while browsing some slides of Håstad, I’ve just seen the following connection, which seems worth mentioning. If and all the equal 1, then if and only if the variables and get different assignments. So in this case, solving as many equations as possible is precisely the same as the max cut problem.

However, before we get too carried away with this, let me say what the unique games conjecture actually says. Apparently it has been reformulated a few times, and this version comes from 2004, whereas the original version was 2002. It says that even if 99% of the equations (of the form over ) can be simultaneously satisfied, then it is still NP hard to determine whether 1% of them can be simultaneously satisfied. Note that it is important to allow to be large here, since the random approach gives you a proportion straight away. Also, I think 99% and 1% are a friendly way of saying and for an arbitrary fixed .

In case the statement isn’t clear, let me put it slightly more formally. The unique games conjecture says the following. Suppose that for some there exists a polynomial-time algorithm that outputs YES if a proportion of the equations can be solved simultaneously and NO if it is impossible to solve more than a proportion of them, with no requirements on what the algorithm should output if the maximum proportion lies between and . Then P=NP.

At this point I should explain why the conjecture is called the unique games conjecture. But I’m not going to because I don’t know. I’ve been told a couple of times, but it never stays in my head, and when I do get told, I am also told that the name is something of a historical accident, since the later reformulations have nothing to do with games. So I think the name is best thought of as a strange type of label whose role is simply to identify the conjecture and not to describe it.

To give an idea of why the UGC is important, Arora took us back to an important paper of Goemans and Williamson from 1993 concerning the max cut problem. The simple random approach tells us that we can find a partition such that the size of the resulting cut is at least half the number of edges in the graph, since each edge has a 50% chance of joining a vertex in one half to a vertex in the other half. (Incidentally, there are standard “derandomization” techniques for converting observations like this into algorithms for finding the cuts. This is another beautiful idea from theoretical computer science, but it’s been around for long enough that people have got used to it.)

Goemans and Williamson were the first people to go beyond 50%. They used semidefinite programming to devise an algorithm that could find a cut for which the number of edges was at least 0.878 times the size of the max cut. I don’t know what that 0.878 really is — presumably some irrational number that came out of the proof — but it was sufficiently unnatural looking that there was a widespread belief that the bound would in due course be improved further. However, a check on that belief was given in 2004 by Khot, Kindler, Mossel and O’Donnell and in 2005 by Mossel, O’Donnell and Oleskiewicz (how they all contributed to the result I don’t know), who showed the very surprising result that if UGC is true, then the Goemans-Williamson bound is optimal. From what I understand, the proof is a lot more than just a clever observation that max cut can be reduced to unique games. If you don’t believe me, then try to explain to yourself how the constant 0.878 can arise in a simple way from a conjecture that involves only the constants “nearly 0″ and “nearly 1″.

In general, it turns out that UGC implies sharp thresholds for approximability for many problems. What this means is that there is some threshold, below which you can do what you want with a polynomial-time algorithm and above which doing what you want is NP hard. (So in the max cut example the threshold is 0.878: getting smaller than that proportion can be done in polynomial time, and getting above that proportion is NP hard — at least if you believe UGC.)

Almost as interesting is that the thresholds predicted by UGC all come from rather standard techniques such as semidefinite programming and linear programming. So in some sense it is telling us not just that a certain *bound* is best possible but that a certain *technique* is best possible. To put it a bit crudely and inaccurately, it’s saying that for one of these problems, the best you can do with semidefinite programming is the best you can do full stop.

Arora said something even stronger that I haven’t properly understood, but I reproduce it for completeness. Apparently UGC even tells us that the failure of a standard algorithm to beat the threshold *on a single instance* implies that no algorithm can do better. I suppose that must mean that one can choose a clever instance in such a way that if the standard algorithm succeeds with that instance, then that fact can be converted into a machine for solving arbitrary instances of UGC. How you get from one instance of one problem to lots of instances of another is mysterious to me, but Arora did say that this result came as a big surprise.

There were a couple of other things that Arora said at the end of his talk to explain why Khot’s work was important. Apparently while the UGC is just a conjecture, and not even a conjecture that is confidently believed to be true (indeed, if you want to become famous, then it may be worth trying your hand at finding an efficient algorithm for it, since there seems to be a non-negligible chance that such an algorithm exists), it has led to a number of non-obvious predictions that have then been proved unconditionally.

Soon after Arora’s laudatio, Khot himself gave a talk. This was an odd piece of scheduling, since there was necessarily a considerable overlap between the two talks (in their content, that is). I’ll end by mentioning a reformulation of UGC that Khot talked about and Arora didn’t.

A very important concept in graph theory is that of *expansion*. Loosely speaking, a graph is called an expander if for any (not too large) set of vertices, there are many edges from that set to its complement. More precisely, if is a -regular graph and is a set of vertices, then we define the expansion of to be the number of edges leaving divided by (the latter being the most such edges there could possibly be). Another way of looking at this is that you pick a random point and a random neighbour of , and define the expansion of to be the probability that is not in .

The expansion of the graph as a whole is the minimum expansion over all subsets of size at most (where is the number of vertices of ). If this quantity is high, it is saying that is “highly interconnected”.

Khot is interested in *small-set* expansion. That is, he picks a small and takes the minimum over sets of size at most rather than at most .

The precise reformulation I’m about to give is not in fact the one that Khot gave but rather a small modification that Boaz Barak, another well-known theoretical computer scientist, gave in his invited lecture a day later. The unique games conjecture is equivalent to the assertion that it is NP hard to distinguish between the following two classes of graphs.

- Graphs where there exists a set of size at most with small expansion.
- Graphs where every set of size at most has very big expansion.

I think for the latter one can take the expansion to be at least 1/2 for each such set, whereas for the former it is at most for some small that you can probably choose.

What is interesting here is that for ordinary expansion there is a simple characterization in terms of the size of the second largest eigenvalue of the adjacency matrix. Since eigenvalues can be approximated efficiently, there is an efficient method for determining whether a graph is an expander. UGC is equivalent to saying that when the sets get small, their expansion properties can “hide” in the graph in a rather strong way: you can’t tell the difference between a graph that has very good small-set expansion and a graph where there’s a set that fails very badly.

I had lunch with Boaz Barak on one of the days of the congress, so I asked him whether he believed UGC. He gave me a very interesting answer (a special case of the more general proposition that Boaz Barak has a lot of very interesting things to say about complexity), which I have unfortunately mostly forgotten. However, my rapidly fading memory is that he would like it to be true, because it would be a beautiful description of the boundary of what algorithms can do, but thinks it may very well be false. He thought that one possibility was that solving the problems that UGC says are NP hard is not in fact NP hard, but not possible in polynomial time either. It is perfectly possible for a problem to be of intermediate difficulty.

Although it wouldn’t directly contradict NP hardness, it would be very interesting to find an algorithm that solved the small-set expansion problem in a time that was only modestly superpolynomial: something like , say. That would probably get you an invitation to speak at an ICM.

]]>

The most concrete thing I remember (without being 100% sure I’ve got it right) is that one of Mirzakhani’s major results concerns counting closed geodesics in Riemann surfaces. A geodesic is roughly speaking a curve that feels like a straight line to an inhabitant of the surface. Another way of putting it is that if you take two points that are close together on a geodesic, then the part of the geodesic between those points is the shortest curve that joins those two points. (Hmm, on writing that I feel that I’ve made an elementary mistake of exposition, in that I have assumed that you know what a Riemann surface is, and then gone to a little trouble to say what a geodesic is, when not many people will know the former without also knowing the latter. To atone for that, let me add a link to the Wikipedia article on Riemann surfaces, though I’m afraid that article is not much good for the beginner. A beginner’s definition, not precise at all but perhaps adequate for the purposes of reading this post, is that a Riemann surface is a surface like a sphere or a torus, but with some very important extra structure that comes from the fact that each little patch of surface looks like a little patch of the complex plane.)

If you follow your nose inside a Riemann surface, then sometimes you get back to where you started and are pointing in the same direction. In that case, you follow your original path all over again and the geodesic is called *closed*. But sometimes that doesn’t happen.

We can further classify closed geodesics into two types: those that cross themselves and those that don’t. The ones that don’t are called *simple*. An example of a simple closed geodesic is a great circle on the surface of a sphere. Apparently, the problem of counting closed geodesics was pretty much solved, but the problem of counting *simple* closed geodesics was significantly harder. It is this problem that Mirzakhani solved. (I’m not quite sure what “solved” meant here — perhaps her work means that if someone gives you a Riemann surface, you can tell them how many simple closed geodesics it contains.)

The more I write, the more I realize that the counting must be up to some kind of equivalence, since otherwise it seems to me that there will almost certainly either be no simple closed geodesics or uncountably many. But I’ll have to wait to look at my notes to get more precise about that.

The other main thing I remember from the talk is that moduli spaces were a very important part of Mirzakhani’s work, which provided another nice thematic connection between the work of different medallists. Just as Avila studied whole families of dynamical systems, a moduli space is a whole family of Riemann surfaces. And in both cases the family is far more than merely a *set* of objects: it is a set *with geometrical structure*. For example, if you take all interval exchange maps that chop into five parts and permute them in a certain specified way, then each one is uniquely determined by the end points of the intervals other than and . So we can naturally associate with each one an element of the set

(Those include some degenerate examples.) This is a polyhedral subset of , so it has nice geometrical, topological and measure-theoretic structure, which allows one to talk about almost all interval exchange maps, or nowhere dense sets of interval exchange maps, and so on.

An example that people often give to demonstrate what a moduli space is (and I should say that my entire knowledge of this concept comes from my memory of editing a very nice article by David Ben-Zvi on the subject for the Princeton Companion to Mathematics — though obviously anything I say about them that is false is not his fault) is the space of all tori. If you are not used to Riemann surfaces, then you may think that there is just one torus up to isomorphism, but there you would be wrong. Topologically it is true, but we want an isomorphism *of Riemann surfaces*, and the maps that you are allowed to use are much more rigid. So for example if you take the complex plane and quotient out by , you get a torus that is not isomorphic to the torus you get if instead you quotient out by the triangular lattice. (Roughly speaking, the obvious attempt to define an isomorphism would involve shearing the plane, but shears are not holomorphic.)

If we quotient by two lattices, when will the results give isomorphic tori? If one is an expansion of the other, then they will, and if one is a rotation of the other, then they will again. From that we get that if two complex numbers generate a lattice, then the isomorphism type of the torus depends only on their ratio. So we have already reduced the family of tori to a single complex parameter. However, that isn’t the whole story as different complex parameters do not necessarily give rise to different tori. But it gives some idea that the tori form a “space” that itself has an interesting geometrical structure. For reasons I don’t fully understand, moduli spaces are very helpful in the study of Riemann surfaces, and are also extremely interesting objects in their own right.

OK that’s about it for what I remember. But before I look at my notes, I’d like to mention briefly one other connection with Avila, which is that Mirzakhani is also very interested in billiards in polygons, though this wasn’t mentioned in the laudatio.

Actually, that reminds me of one other thing, which is that one of Mirzakhani’s results is strongly reminiscent of famous results of Marina Ratner. Maybe I’ll be able to say more about that after looking at my notes.

OK, now I’ve looked at my notes I find that, as I thought, I had forgotten quite a bit.

One important detail is that Mirzakhani looked at surfaces of genus at least two (that is, surfaces with at least two “holes”, so not tori). This is important because it means that the metrics on them are hyperbolic. It turns out that the moduli space of Riemann surfaces of genus is a complex variety of complex dimension , and is also a symplectic orbifold. (An orbifold is a bit like a manifold but is allowed to have a few singularities. In the torus example, one of these singularities arises as a result of the fact that the triangular lattice has a symmetry — rotation by 60 degrees — that most lattices do not have.)

The moduli spaces are totally inhomogeneous. That is very important, but I don’t know what it means. (I can’t remember whether McMullen told us — probably he did.)

McMullen concentrated on three aspects of Mirzakhani’s work. The first was what I’ve already mentioned, namely counting simple closed geodesics. My feeling that there would be uncountably many of these unless one looked at equivalence classes somehow was based on the sphere and the torus, so maybe when the geometry becomes hyperbolic.

He told us that if is a Riemann surface of genus , then the the number of simple loops grows like . I can’t remember what the parameter means. I’ve written to indicate what is being counted.

It seems a bit silly not to try to find out what is going on here, so let me have a quick look at the citation.

Ah, that makes much more sense! stands for length. So the formula is an estimate for the number of simple loops of length at most . If you look at all closed geodesics (i.e., allowing self-crossing ones too) then the growth rate is .

This apparently led to a new proof of a famous conjecture of Witten — a formula for intersection numbers on the moduli space — which was originally proved by Kontsevich in 1992.

Another consequence is the result that the probability that a random simple loop in genus 2 cuts the surface into two pieces is 1/7.

The second major topic was complex geodesics in . I don’t know the precise definition, but I presume that the idea is that if you take a point in that is surrounded by a copy of an infinitesimally small part of the complex plane, then there is a unique way of continuing that “in the same direction” and getting what I presume is a Riemann surface that lives inside . So it would be a little bit like a 2D generalization of a geodesic but would also involve the complex structure. Ah, I see that I have written that a complex geodesic is a holomorphic isometry from the hyperbolic plane to , though I wonder whether that should be a local isometry — that is, that for each point in the hyperbolic plane there is a neighbourhood such that the restriction of the map to that neighbourhood is an isometry.

I’ve written that there are complex geodesics through every point in in every direction, and that they are called Teichmuller discs.

Apparently real geodesics are usually dense in . Sometimes they can be exotic shapes such as fractal cobwebs (whatever those are), defying classification. What about in two dimensions? Can we get some 2D analogue of fractal cobwebs? No we can’t. Mirzakhani and her coworkers showed that you always get an algebraic subvariety. This is strongly reminiscent of work of Margulis and Ratner.

What is remarkable about this result is that it is an analogue of the Margulis/Ratner results in a totally inhomogeneous situation, which was completely unexpected.

I’ve just cheated and looked at the citation again, because it seemed to be particularly important to get some idea of what “totally inhomogeneous” means. The answer is fairly simple. A homogeneous space is one where the geometry at every point is the same. To say that is totally inhomogeneous is to say that at *no* two points is the geometry the same. While looking for that, I also saw that Mirzakhani solved the simple-loop-counting problem by connecting it to a certain volume computation in the moduli space . So it was a definite case where looking at the entire family helps you to prove things about the individual members of the family.

The third aspect of Mirzakhani’s work that McMullen talked about concerned something called earthquake flow that was defined by Thurston. I thought I had some understanding of what this was when I was watching the talk, but can’t really remember now. On watching the explanation again, I find that I can understand part of what McMullen says (about deforming Riemann surfaces by cutting along closed geodesics and giving them a twist, and then doing something similar but with an entire “lamination” of closed geodesics), but I still don’t quite get how that leads to a flow. (If you want to try, then the video is here and the explanation starts at 25:24.)

The result is that the earthquake flow is ergodic and mixing, and this means something like that if you randomly apply earthquakes then you get all shapes of genus . Apparently, Mirzakhani established a measurable isomorphism between earthquake flow and horocycle flow, and this was a big surprise. Those are just words to me, but when I hear someone like Curt McMullen say that a result is very surprising, then I am impressed.

]]>

I was rescued by an extraordinary piece of luck. When I got to the gate with my boarding card, the woman who took it from me tore it up and gave me another one, curtly informing me that I had been upgraded. I have no idea why. I wonder whether it had anything to do with the fact that in order to avoid standing any longer than necessary I waited until almost the end before boarding. But perhaps the decision had been made well before that: I have no idea how these things work. Anyhow, it meant that I could make my seat pretty well horizontal and I slept for quite a lot of the journey. Unfortunately, I wasn’t feeling well enough to make full use of all the perks, one of which was a bar where one could ask for single malt whisky. I didn’t have any alcohol or coffee and only picked at my food. I also didn’t watch a single film or do any work. If I’d been feeling OK, the day would have been very different. However, perhaps the fact that I wasn’t feeling OK meant that the difference it made to me to be in business class was actually greater than it would have been otherwise. I rather like that way of looking at it.

An amusing thing happened when we landed in Paris. We landed out on the tarmac and were met by buses. They let the classy people off first (even we business-class people had to wait for the first-class people, just in case we got above ourselves), so that they wouldn’t have to share a bus with the riff raff. One reason I had been pleased to be travelling business class was that it meant that I had after all got to experience the top floor of an Airbus 380. But when I turned round to look, there was only one row of windows, and then I saw that it had been a Boeing 777. Oh well. It was operated by Air France. I’ve forgotten the right phrase: something like “shared code”. A number of little anomalies resolved themselves, such as that that take-off didn’t feel like the one in Paris, that the slope of the walls didn’t seem quite correct if we were on the top floor, etc.

I thought that as an experiment I would see what I could remember about the laudatio for Martin Hairer without the notes I took, and then after that I would see how much more there was to say *with* the notes. So here goes. The laudatio was given by Ofer Zeitouni, one of the people on the Fields Medal committee. Early on, he made a link with what Ghys had said about Avila, by saying that Hairer too studied situations where physicists don’t know what the equation is. However, these situations were somewhat different: instead of studying typical dynamical systems, Hairer studied stochastic PDEs. As I understand it, an important class of stochastic PDEs is conventional PDEs with a noise term added, which is often some kind of Brownian motion term.

Unfortunately, Brownian motion can’t be differentiated, but that isn’t by itself a huge problem because it can be differentiated if you allow yourself to work with distributions. However, while distributions are great for many purposes, there are certain things you can’t do with them — notably multiply them together.

Hairer looked at a stochastic PDE that modelled a physical situation that gives rise to a complicated fractal boundary between two regions. I think the phrase “interface dynamics” may have been one of the buzz phrases here. The naive approach to this stochastic PDE led quickly to the need to multiply two distributions together, so it didn’t work. So Hairer added a “mollifier” — that is, he smoothed the noise slightly. Associated with this mollifier was a parameter : the smaller was, the less smoothing took place. So he then solved the smoothed system, let tend to zero, showed that the smoothed solutions tended to a limit, and defined that limit to be the solution of the original equation.

The way I’ve described it, that sounds like a fairly obvious thing to do, so what was so good about it?

A first answer is that in this particular case it was far from obvious that the smoothed solutions really did tend to a limit. In order to show this, it was necessary to do a renormalization (another thematic link with Avila), which involved subtracting a constant . The only other thing I remember was that the proof also involved something a bit like a Taylor expansion, but that a key insight of Hairer was that instead of expanding with respect to a fix basis of functions, one should instead let the basis of functions depend on the function was expanding — or something like that anyway.

I was left with the feeling that a lot of people are very excited about what Hairer has done, because with his new theoretical framework he has managed to go a long way beyond what people thought was possible.

OK, now let me look at the notes and see whether I want to add anything.

My memory seems to have served me quite well. Here are a couple of extra details. An important one is that Zeitouni opened with a brief summary of Hairer’s major contributions, which makes them sound like much more than a clever trick to deal with one particular troublesome stochastic PDE. These were

1. a theory of regularity structures, and

2. a theory of ergodicity for infinite-dimensional systems.

I don’t know how those two relate to the solution of the differential equation, which, by the way, is called the KPZ equation, and is the following.

It models the evolution of interfaces. (So maybe “interface dynamics” was not after all the buzz phrase.)

When I said that the noise was Brownian, I should have said that the noise was completely uncorrelated in time, and therefore makes no sense pointwise, but it integrates to Brownian motion.

The mollifiers are functions that replace the noise term . The constants I mentioned earlier depend on your choice of mollifier, but the limit doesn’t (which is obviously very important).

What Zeitouni actually said about Taylor expansion was that one should measure smoothness by expansions that are tailored (his word not mine) to the equation, rather than with respect to a universal basis. This was a key insight of Hairer.

One of the major tools introduced by Hairer is a generalization of something called rough-path theory, due to Terry Lyons. Another is his renormalization procedure.

Zeitouni summarized by saying that Hairer had invented new methods for defining solutions to PDEs driven by rough noise, and that these methods were robust with respect to mollification. He also said something about quantitative behaviour of solutions.

If you find that account a little vague and unsatisfactory, bear in mind that my aim here is not to give the clearest possible presentation of Hairer’s work, but rather to discuss what it was like to be at the ICM, and in particular to attend this laudatio. One doesn’t usually expect to come out of a maths talk understanding it so well that one could give the same talk oneself. As I’ve mentioned in another post, there are some very good accounts of the work of all the prizewinners here. (To see them, follow the link and then follow further links to press releases.)

**Update:** if you want to appreciate some of these ideas more fully, then here is a very nice blog post: it doesn’t say much more about Hairer’s work, but it does a much better job than this post of setting his work in context.

]]>

Dick Gross also gave an excellent talk. He began with some of the basic theory of binary quadratic forms over the integers, that is, expressions of the form . One assumes that they are *primitive* (meaning that , and don’t have some common factor). The *discriminant* of a binary quadratic form is the quantity . The group SL then acts on these by a change of basis. For example, if we take the matrix , we’ll replace by and end up with the form , which can be rearranged to

(modulo any mistakes I may have made). Because the matrix is invertible over the integers, the new form can be transformed back to the old one by another change of basis, and hence takes the same set of values. Two such forms are called *equivalent*.

For some purposes it is more transparent to write a binary quadratic form as

If we do that, then it is easy to see that replacing a form by an equivalent form does not change its discriminant since it is just -4 times the determinant of the matrix of coefficients, which gets multiplied by a couple of matrices of determinant 1 (the base-change matrix and its transpose).

Given any equivalence relation it is good if one can find nice representatives of each equivalence class. In the case of binary quadratic forms, there is a unique representative such that or . From this it follows that up to equivalence there are finitely many forms with any given discriminant. The question of how many there are with discriminant is a very interesting one.

Even more interesting is that the equivalence classes form an Abelian group under a certain composition law that was defined by Gauss. Apparently it occupied about 30 pages of the *Disquisitiones*, which are possibly the most difficult part of the book.

Going back to the number of forms of discriminant , Gauss did some calculations and stated (without proof) the formula

There was, however, a heuristic justification for the formula. (I can’t remember whether Dick Gross said that Gauss had explicitly stated this justification or whether it was simply a reconstruction of what he must have been thinking.) It turns out that the sum on the left-hand side works out as the number of integer points in a certain region of (or at least I assume it is since the binary form has three coefficients), and this region has volume . Unfortunately, however, the region is not convex, or even bounded, so this does not by itself prove anything. What one has to do is show that certain cusps don’t accidentally contain lots of integer points, and that is quite delicate.

One rather amazing thing that Bhargava did, though it isn’t his main result, was show that if a binary quadratic form represents all the positive integers up to 290 then it represents all positive integers, and that this bound is best possible. (I may have misremembered the numbers. Also, one doesn’t have to know that it represents every single number up to 290 in order to prove the result: there is some proper subset of that does the job.)

But the first of his Fields-medal-earning results was quite extraordinary. As a PhD student, he decided to do what few people do, and actually read the *Disquisitiones*. He then did what even fewer people do: he decided that he could improve on Gauss. More precisely, he felt that Gauss’s definition of the composition law was hard to understand and that it should be possible to replace it by something better and more transparent.

I should say that there are more modern ways of understanding the composition law, but they are also more abstract. Bhargava was interested in a definition that would be computational but better than Gauss’s. I suppose it isn’t completely surprising that Gauss might have produced something suboptimal, but what is surprising is that it was suboptimal *and* nobody had improved it in 200 years.

The key insight came to Bhargava, if we are to believe the story he tells us, when he was playing with a Rubik’s cube. He realized that if he put the letters to at the vertices of the cube, then there were three ways of slicing the cube to produce two matrices. One could then do something with their determinants, the details of which I have forgotten, and end up producing three binary quadratic forms that are related, and this relationship leads to a natural way of defining Gauss’s composition law. Unfortunately, I couldn’t keep the precise definitions in my head.

Here’s a fancier way that Dick Gross put it. Bhargava reinvented the composition law by studying the action of SL on . The orbits are in bijection with triples of ideal classes for the ring that satisfy . That’s basically the abstract way of thinking about what Bhargava did computationally.

In this way, Bhargava found a symmetric reformulation of Gauss composition. And having found the right way of thinking about it, he was able to do what Gauss couldn’t, namely generalize it. He found 14 more integral representations on objects like above, which gave composition laws for higher degree forms.

He was also able to enumerate number fields of small degree, showing that the number of fields of degree and discriminant less than grows like . This Gross described as a fantastic generalization of Gauss’s work.

I spent the academic years 2000-2002 at Princeton and as a result had the privilege of attending Bhargava’s thesis defence, at which he presented these results. It must have been one of the best PhD theses ever written. Are there any reasonable candidates for better ones? Perhaps Simon Donaldson’s would offer decent competition.

It’s not clear whether those results would have warranted a Fields medal on their own, but the matter was put beyond the slightest doubt when Bhargava and Shankar proved a spectacular result about elliptic curves. Famously, an elliptic curve comes with a group law: given two points, you take the line through them, see where it cuts the elliptic curve again, and define that to be the inverse of the product. This gives an Abelian group. (Associativity is not obvious: it can be proved by direct computation, but I don’t know what the most conceptual argument is.) The group law takes rational points to rational points, and a famous theorem of Mordell states that the rational points form a finitely generated subgroup. The structure theorem for Abelian groups tells us that for some it must be a product of with a finite group. The integer is called the *rank* of the curve.

It is conjectured that the rank can be arbitrarily large, but not everyone agrees with that conjecture. The record so far is held by the curve

discovered by Noam Elkies (who else?) and shown to have rank 19. According to Wikipedia, from which I stole that formula, there are curves of unknown rank that are known to have rank at least 28, so in another sense the record is 28, in that that is the highest known integer for which there is proved to be an elliptic curve of rank at least that integer.

Bhargava and Shankar proved that the *average* rank is less than 1. Previously this was not even known to be finite. They also showed that at least 80% of elliptic curves have rank 0 or 1.

The Birch–Swinnerton-Dyer conjecture concerns ranks of elliptic curves, and one consequence of their results (or perhaps it is a further result — I’m not quite sure) is that the conjecture is true for at least 66% of elliptic curves. Gross said that there was some hope of improving 66% to 100%, but cautioned that that would not prove the conjecture, since 0% of all elliptic curves doesn’t mean no elliptic curves. But it is still a stunning advance. As far as I know, nobody had even thought of trying to prove average statements like these.

I think I also picked up that there were connections between the delicate methods that Bhargava used to enumerate number fields (which again involved counting lattice points in unbounded sets) and his more recent work with Shankar.

Finally, Gross reminded us that Faltings showed that for hyperelliptic curves (a curve of the form for a polynomial — when is a cubic you get an elliptic curve) the number of rational points is finite. Another result of Bhargava is that for almost all hyperelliptic curves there are in fact no rational points.

While it is clear from what people have said about the work of the four medallists that they have all proved amazing results and changed their fields, I think that in Bhargava’s case it is easiest for the non-expert to understand just *why* his work is so amazing. I can’t wait to see what he does next.

]]>

The first one was an excellent talk by Etienne Ghys on the work of Artur Avila. (The only other talk I’ve heard by Ghys was his plenary lecture at the ICM in Madrid in 2006, which was also excellent.) It began particularly well, with a brief sketch of the important stages in the history of dynamics. These were as follows.

1. Associated with Newton is the idea that you are given a differential equation, and you try to find solutions. This has of course had a number of amazing successes.

2. However, after a while it became clear that the differential equations for which one could hope to find a solution were not typical. The next stage, initiated by Poincaré, was to aim for something less. One could summarize it by saying that now, given a differential equation, one tries merely to say something interesting about its solutions.

3. In the 1960s, Smale and Thom went a stage further, trying to take on board the realization that often physicists don’t actually know the equation that models the phenomenon they are looking at. As Ghys put it, the endeavour now can be summed up as follows: you are not given a differential equation and you want to say something interesting about its solutions.

Of course, once the well-deserved laugh had died down, he explained a bit further what he meant. One way he put it was to ask what a typical dynamical system looks like.

He then talked about four important results of Avila that fit into this broad framework. One concerns iterates of unimodal maps, which are maps that look like upside-down parabolas (they are zero at 0 and 1 and have a single local maximum in between, which lies above the line ). Avila showed that given an analytic family of such maps, almost every function in the family gives rise either to a very structured dynamical system or a rather random-like one. More precisely, for almost every in the family, either almost every orbit converges to an attracting cycle (such systems are called *regular*) or there is an absolutely continuous measure such that almost every orbit in is distributed according to .

The main tool in the proof is something called the renormalization operator. I didn’t fully understand what this was, but I got a partial understanding. A discrete dynamical system is a set together with a map (usually assumed to have extra properties such as continuity or preservation of measure, which of course requires to have some structure so that those properties make sense) that one iterates. We are interested in orbits, which are simply sequences of the form .

Now suppose you have a subset of . Often you can define a dynamical system on by simply setting to be for the smallest positive integer for which . And often this dynamical system is closely related to the big dynamical system on . In a way I didn’t pick up from the lecture, the renormalization operator exploits this close relationship to turn maps from to into maps from to . We can use this basic idea to define a renormalization operator on the space of all unimodal maps.

It is not obvious to me why this is a good thing to do, except that it fits into the general philosophy, that applies in many many contexts, that considering a lot of objects of a certain type at once is often a great way to learn about individual objects of that type. (This theme was to reappear in a big way in the talk about Mirzakhani’s work.) Avila did not invent the renormalization map, but according to Ghys he is an absolute master at using it, and has in that way made it his own.

The second result was about interval exchange maps. These are maps that take a unit interval, chop it up into finitely many pieces (of varying lengths if you want the map to be interesting) and reassemble them in a different order. In 2007, Avila and Giovanni Forni proved that almost all interval exchange maps are weak mixing. This means that if you take any two sets and , then for almost every the measure of is approximately what you would expect if was a “random set” — that is, the product of the measures of and .

Renormalization was the tool here too. Apparently the key to proving this result was to show that the renormalization map on the space of interval exchange maps is chaotic. I don’t know exactly what this means.

I have always had a soft spot for interval exchange maps, because I once heard a fascinating open problem and thought about it very hard with no success. Suppose you are given a polygonal but not necessarily convex room lined with mirrors and you switch a light on. Must it illuminate the whole room? (Assume that the light comes from a point source.) There is a very nice construction called Kafka’s study, which shows that the answer can be no in a room with a smooth boundary. To draw it, you begin by drawing an ellipse, cutting it in half along the line joining its two foci, which I’ll take to be horizontal, keeping only one half, and then creating a sort of mushroom shape with the half ellipse at the top and a curve that goes horizontally through the two foci but also dips down between the foci (to make the “stalk” of the mushroom). If a beam of light comes out of one focus and hits the boundary of the ellipse, then it bounces back to the other focus. From this it is easy to see that if you switch on a light in the stalk part of the room, then the two other bits that do not lie in the top half of the ellipse will remain dark. I think the idea behind the name was that Kafka could work in the side parts without being disturbed by noise from the stalk part.

Another way of thinking about this is as a billiards problem. If you fire off a billiards ball (infinitesimally small of course) from the stalk part of the room, then however much it bounces, it will never reach the side parts.

What about the polygonal case? If a room is polygonal and all the sides make an angle with the horizontal that’s a rational multiple of , then a billiard ball will only ever travel in one of a finite number of directions, so we can define a map from the set of pairs of the form (boundary point, possible direction from that boundary point) to itself, which, if you think about it for a bit, can be seen to be an interval exchange map.

Years ago I managed to prove to my own satisfaction the known (I’m pretty sure, though I don’t know enough about the area to know where to find it) result that for almost every direction you send a billiard ball out in the resulting orbit will be dense. However, once the angles stop being nice rational multiples of , the dynamical system becomes a rather unpleasant map that moves bits of the plane about while also applying affine transformations to them.

As a means of simplifying the problem, I decided to consider a natural 2D analogue of interval exchange maps. This time you take a square, chop it up into finitely many rectangles, and reassemble the rectangles in some other way into the square. That led to a question I spent a long time on and couldn’t answer. (This was probably in about 1989 or so.) Take a rectangle exchange map of the kind I’ve just described, and take a point in the square. Is it recurrent? That is, will its iterates necessarily come back arbitrarily close to the original point? In the 1D case the answer is yes, and I seem to remember that was a key lemma in the proof about dense orbits.

Note that I’m not asking whether *almost* all points are recurrent: that is an easy excercise (and a result of Poincaré). I really want them all.

Incidentally, a few years after I was obsessed with the billiards-in-polygons problem, a paper came out that purported to solve it. Imagine my surprise when the polygon in question had rational angles. It turned out that the paper did something like assuming that corners absorbed light, or something like that. Anyhow, as far as I know the following two questions are still open, but if not, then I’d be interested to be pointed to the appropriate literature.

1. If you have a light source that’s more like a real light in that light comes in all directions from everywhere in a non-empty open set, then must an arbitrary polygonal room be illuminated?

2. If you take a point in a polygonal room and send off a billiard ball, is it true that for almost every direction you might choose the trajectory of the ball will be dense? (As far as I know “almost every” could mean for every direction not belonging to some countable set.)

Moving on to the other two of Avila’s results, I’m going to say much less. The first one was a solution of the ten-martini problem, so called because Mark Kac offered ten martinis to whoever solved it. Unfortunately, he had died by the time Avila was in a position to claim them. I didn’t really understand the problem, but it was to do with the Schrödinger equation and boiled down to a problem in spectral theory, which Avila, remarkably, solved using dynamical systems.

The last problem was one that Etienne Ghys told us most people assume must be easy when they hear it for the first time, and often offer incorrect proofs. Maybe because he had said that I didn’t have any particular feeling that it should be easy, but perhaps you, dear reader, will.

It is known that a diffeomorphism on a manifold can be approximated (in ) by a diffeomorphism. Avila showed that if the diffeomorphism is volume preserving, then the one can be taken to be volume preserving as well. The proof was apparently very hard.

The main other thing I remember from the talk was that Ghys prepared a sequence of photos that flashed up in front of us in a seemingly endless sequence, of all Avila’s collaborators. The fact that he has so many is one of the remarkable things about him: he is apparently very generous with his ideas, a great illustration of how that kind of generosity can be hugely beneficial not just to the people who are on the receiving end but also to those who exhibit it.

]]>

I didn’t manage to maintain my ignorance of the fourth Fields medallist, because I was sitting only a few rows behind the medallists, and when Martin Hairer turned up wearing a suit, there was no longer any room for doubt. However, there was a small element of surprise in the way that the medals were announced. Ingrid Daubechies (president of the IMU) told us that they had made short videos about each medallist, and also about the Nevanlinna Prize winner, who was Subhash Khot. So for each winner in turn, she told us that a video was about to start. An animation of a Fields medal then rotated on the large screens at the front of the hall, and when it settled down one could see the name of the next winner. The beginning of each video was drowned out by the resulting applause (and also a cheer for Bhargava and an even louder one for Mirzakhani), but they were pretty good. At the end of each video, the winner went up on stage, to more applause, and sat down. Then when the five videos were over, the medals were presented, to each winner in turn, by the president of Korea.

Here they are, getting their medals/prize. It wasn’t easy to get good photos with a cheap camera on maximum zoom, but they give some idea.

After those prizes were announced, we had the announcements of the Gauss prize and the Chern medal. The former is for mathematical work that has had a strong impact outside mathematics, and the latter is for lifetime achievement. The Gauss medal went to Stanley Osher and the Chern medal to Phillip Griffiths.

If you haven’t already seen it, the IMU page about the winners has links to very good short (but not too short) summaries of their work. I’m quite glad about that because I think it means I can get away with writing less about them myself. I also recommend this Google Plus post by John Baez about the work of Mirzakhani.

I have one remark to make about the Fields medals, which is that I think that this time round there were an unusually large number of people who could easily have got medals, including other women. (This last point is important — one should think of Mirzakhani’s medal as the new normal rather than as some freak event.) I have two words to say about them: Mikhail Gromov. To spell it out, he is an extreme, but by no means unique, example of a mathematician who did not get a Fields medal but whose reputation would be pretty much unaltered if he had. In the end it’s the theorems that count, and there have been some wonderful theorems proved by people who just missed out this year.

Other aspects of the ceremony were much as one would expect, but there was rather less time devoted to long and repetitive speeches about the host country than I have been used to at other ICMs, which was welcome.

That is not to say that interesting facts about the host country were entirely ignored. The final speech of the ceremony was given by Martin Groetschel, who told us several interesting things, one of which was the number of mathematics papers published in international journals by Koreans in 1981. He asked us to guess, so I’m giving you the opportunity to guess before reading on.

Now Korea is 11th in the world for the number of mathematical publications. Of course, one can question what this really means, but it certainly means something when you hear that the answer to the question above is 3. So in just one generation a serious mathematical tradition has been created from almost nothing.

He also told us the names of the people on various committees. Here they are, except that I couldn’t quite copy all of them down fast enough.

The Fields Medal committee consisted of Daubechies, Ambrosio, Eisenbud, Fukaya, Ghys, Dick Gross, Kirwan, Kollar, Kontsevich, Struwe, Zeitouni and GÃ¼nter Ziegler.

The program committee consisted of Carlos Kenig (chair), Bolthausen, Alice Chang, de Melo, Esnault, me, Kannan, Jong Hae Keum, Le Bris, Lubotsky, Nesetril and Okounkov.

The ICM executive committee (if that’s the right phrase) for the next four years will be Shigefumi Mori (president), Helge Holden (secretary), Alicia Dickenstein (VP), Vaughan Jones (VP), Dick Gross, Hyungju Park, Christiane Rousseau, Vasudevan Srinivas, John Toland and Wendelin Werner.

He also told us about various initiatives of the IMU, one of which sounded interesting (by which I don’t mean that the others didn’t). It’s called the adopt-a-graduate-student initiative. The idea is that the IMU will support researchers in developed countries who want to provide some kind of mentorship for graduate students in less developed countries working in a similar area who might otherwise not find it easy to receive appropriate guidance. Or something like that.

Ingrid Daubechies also told us about two other initiatives connected with the developing world. One was that the winner of the Chern Medal gets to nominate a good cause to receive a large amount of money. Stupidly I seem not to have written it down, but it may have been $250,000. Anyhow, that order of magnitude. Phillip Griffiths chose the African Mathematics Millennium Science Initiative, or AMMSI. The other was that the five winners of the Breakthrough Prizes in mathematics, Donaldson, Kontsevich, Lurie, Tao and Taylor, have each given $100,000 towards a $500,000 fund for helping graduate students from the developing world. I don’t know exactly what form the help will take, but the phrase “breakout graduate fellowships” was involved.

When I get time, I’ll try to write something about the Laudationes, but right now I need to sleep. I have to confess that during Jim Simons’s talk, my jet lag caught up with me in a major way and I simply couldn’t keep awake. So I don’t really have much to say about it, except that there was an amusing Q&A session where several people asked long rambling “questions” that left Jim Simons himself amusingly nonplussed. His repeated requests for short pithy questions were ignored.Â

Just before I finish, I’ve remembered an amusing thing that happened during the early part of the ceremony, when some traditional dancing was taking place (or at least I assume it was traditional). At one point some men in masks appeared, who looked like this.

Just while we’re at it, here are some more dancers.

Anyhow, when the men in masks came on stage, there were screams of terror from Mirzakhani’s daughter, who looked about two and a half, and delightful, and she (the daughter) took a long time to be calmed down. I think my six-year-old son might have felt the same way — he had to leave a pantomime version of Hansel and Gretel, to which he had been taken as a birthday treat when he was five, almost the instant it started, and still has those tendencies.

]]>

The flight over was not exactly fun — a night flight never is — but I watched two passable films, got a little bit of work done, missed out on the hot towels (which was good news because it meant I must have been properly asleep), and had possibly the best inflight meal of my life. The last was probably a well-known dish but it happened not to be known to me. I had a choice between beef, chicken, and bibimbap, with the first two being western and the third Korean. That was a no-brainer, but when I asked for the bibimbap I was given not just the bibimbap itself but a leaflet explaining how to assemble it. The steps were as follows.

1. Please put the steamed rice into the “Bibimbap” bowl.

2. Add gochujang (Korean hot pepper paste).

Spicy level 1. (Mild): 1/2 of tube.

Spicy level 2. (Hot): Full tube.

3. Add sesame oil.

4. Mix the “Bibimbap” together.

5. Enjoy the “Bibimbap” with side dish and soup.

I squeezed out almost all the tube of hot pepper sauce and the result was pleasantly hot without threatening to be painful. It was also delicious and substantial. The soup, which I think may have been seaweed soup, was also very good.

I now regret choosing omelette for breakfast when I could have had something called rice porridge, which also looked interesting. (The omelette wasn’t.)

The one other notable thing about the flight was that the plane was so vast that it took off before it felt as though it had picked up enough speed to do so. It also satisfied the “law of turbulence”: that no matter how big a plane is, it gets buffeted about just as much as any other plane. I wonder if there is some scaling law there: for instance, the faster you go, the more dramatic the changes in pressure and wind direction, or something like that.

Seoul was fairly similar to what I expected, though a bit more spread out perhaps. My impression of the place is gleaned from just one bus journey (over an hour) from airport to hotel. Maybe I’ll have more to say about it later.

When I arrived, I immediately went to register. That was quick and efficient, and I picked up my unusually tasteful conference bag, which resembles a large handbag. I had a choice between black and brown, and went daringly for the latter. It had the usual kinds of things in it, with one exception: no notepad. (For the younger generation out there, that means a number of sheets of paper conveniently joined together, rather than some kind of tablet computer.) That will make my note-taking work slightly harder, but I’ll think of something.

The first event of the ICM was an opening reception, which took place in a huge room in the conference centre. There was an extraordinary amount of food there, and also beer, which was very welcome. The food was good, and some of it interestingly Korean, but it didn’t quite reach the heights of the bibimbap (or should that be “Bibimbap”?).

Although I’m not strictly forced to leave the hotel, I’m not sure I’m ready to pay $40 for breakfast, so I’m going to nip out quickly and try to find some coffee and a bun or something like that. I noticed from the bus that there were lots of quite promising looking coffee places: it will certainly be a bonus if, as looks as though will be the case, Korea is a country where one can get a good cup of coffee. And then it’s off to the opening ceremony. More later.

Actually, more sooner, because I’ve just remembered that I was going to mention an amusing story that I was told at the reception yesterday. Apparently the Pope is visiting Korea, and asked for an audience with the president today. And the president told him that he would have to wait till tomorrow, because today she was otherwise occupied. It’s heartening to know that mathematics takes precedence over the Catholic church.

And slightly more again: I have a bit of battery left on my laptop, which I was allowed to bring into the opening ceremony. As was advised, I got here very much earlier than the start time, which makes an already long ceremony a significant chunk longer. We’ve been treated to Beatles songs arranged for some Korean instrument that I don’t know the name of — it looks a bit like a lyre but sits horizontally on the lap. Meanwhile, it seems that the names of the Fields Medallists have, disappointingly, been leaked. Despite that, I’ve managed to maintain my ignorance. (To be more accurate, I am now certain about three of the names but still don’t know who the fourth person is. We’ll see whether I can avoid learning that before it is announced.)

]]>

Just as the last ICM was the first (and still only) time I had been to India, this one will be my first visit to Korea. I’m looking forward to that aspect too, though my hotel is right next to where the congress is taking place and the programme looks pretty packed, so I’m not sure I’ll see much of the country. Talking of the packedness of the programme, I can already see that there are going to be some agonising decisions. For example, Tom Sanders is giving an invited lecture at the same time as Ryan Williams, two speakers I very much want to listen to. I suppose I’ll just have to read the proceedings article of the one I don’t go to. Equally unfortunate is that Ben Green’s plenary lecture is not until next week, when I’ll have gone. But I hope that I’ll still be able to get some kind of feel for where mathematics is now, what people outside my area consider important, and so on, and that I’ll be able to convey some of that in the next few posts.

I’d better stop this now, since I’ll soon be getting on to an Airbus 380 — a monstrously large double-decker plane. One of my children is something of a transport enthusiast and told me in advance that this would be the case (he had looked it up on the internet). I had hoped to end up on the top floor, but that turns out to be for business class only. The flight is about 11 hours: it leaves at 9pm French time and arrives at around 2:30pm Korean time. The challenge will be not to be utterly exhausted by the time of the opening ceremony on Wednesday morning. My memory of Hyderabad is that by the end of the four days I was so tired that I was almost getting anxious about my health. I plan to look after myself a bit better this time, but it may be difficult.

]]>

What I wrote gives some kind of illustration of the twists and turns, many of them fruitless, that people typically take when solving a problem. If I were to draw a moral from it, it would be this: when trying to solve a problem, it is a mistake to expect to take a direct route to the solution. Instead, one formulates subquestions and gradually builds up a useful bank of observations until the direct route becomes clear. Given that we’ve just had the football world cup, I’ll draw an analogy that I find not too bad (though not perfect either): a team plays better if it patiently builds up to an attack on goal than if it hoofs the ball up the pitch or takes shots from a distance. Germany gave an extraordinary illustration of this in their 7-1 defeat of Brazil.

I imagine that the rest of this post will be much more interesting if you yourself solve the problem before reading what I did. I in turn would be interested in hearing about other people’s experiences with the problem: were they similar to mine, or quite different? I would very much like to get a feel for how varied people’s experiences are. If you’re a competitor who solved the problem, feel free to join the discussion!

If I find myself with some spare time, I might have a go at doing the same with some of the other questions.

What follows is exactly what I wrote (or rather typed), with no editing at all, apart from changing the LaTeX so that it compiles in WordPress and adding two comments that are clearly marked in red.

**Problem** *Let be an infinite sequence of positive integers. Prove that there exists a unique integer such that*

Slight bafflement.

The expression in the middle is not an average. If we were to replace it by an average we would have the second inequality automatically.

Try looking at simple cases. Here we could consider what happens when , for example. Then the inequality says

Here we automatically have the first inequality, but there is no reason for the second inequality to be true.

Putting those observations together, we see that the first inequality is true when , and the second inequality is “close to being true” as gets large, since it is true if we replace by in the denominator.

If the inequality holds for a unique , then a plausible guess is that the first inequality fails at some and if is minimal such that it fails, then both inequalities are true for . I shall investigate that in due course, but I have another idea.

It is clear that WLOG . Can we now choose in such a way that we always get equality for the second inequality? We can certainly solve the equations, so the question is whether the resulting sequence will be increasing.

We get , so I’d better set and then continue constructing a sequence.

So , , , and so on. Thus all the with are equal, which they are not supposed to be. This feels significant.

Out of interest, what happens to the inequalities when we (illegally) take the above sequence? We get , so we get equality on both sides except when when we get .

Try to disprove the result.

Try to find the simplest counterexample you can.

An obvious thing to do is to try to make the inequality true when and when . So let’s go. Without loss of generality , . We now need .

For we need . That can be rearranged to , exactly contradicting what we had before.

That doesn’t solve the problem but it looks interesting. In particular, it suggests rearranging the first inequality in the general case, to

That’s quite nice because the right hand side is a genuine average this time.

Actually, if getting an average is what we care about, we could also rearrange the first inequality by simply multiplying through by , which gives

I think it is time to revisit that guess, in order to try to prove at least that there *exists* a solution. So we know that the first inequality holds when , since all it says then is that . Can it always hold? If so, then again WLOG and , and after that we get , , etc.

Let’s write and for . Then we have , , , etc. We also require .

Let’s set . Now the first condition becomes but . Is that possible?

Is it possible with equality? WLOG . Then we have , , , etc.

I’m starting to wonder whether the integer must be something like 1 or 2. Let’s think about it. We know that . If then we have our . Suppose instead that . Then , so . Now if then we are again done, so suppose that .

But since , we can simply insert in between the two. Why can’t we continue doing that kind of thing? Let me try.

If , then , so we can insert in between the two.

I seem to have disproved the result, so I’d better see where I’m going wrong. I’ll try to construct a sequence explicitly. I’ll take , . I need , so I’ll take . Now I need , so I’ll take . Now I need , so I’ll take .

I don’t seem to be getting stuck, so let me try to prove that I can always continue. Suppose I’ve already chosen . Then the condition I need is that

By induction we already have that , from which it follows that and therefore that . We may therefore find between these two numbers, as desired.

You idiot Gowers, read the question: the have to be positive integers.

Fortunately, the work I’ve done so far is not a complete waste of time. [The half-conscious thought in the back of my mind here, which is clearer in retrospect, was that the successive differences in the example I had just constructed were getting smaller and smaller. So it seemed highly likely that using the same general circle of thoughts I would be able to prove that I couldn't take the to be integers.]

Here’s a trivial observation: if the second inequality fails, then . So if , then . How long can we keep that going with positive integers? Answer: for ever, since we can take .

Never mind about that. I want to go back to an earlier idea. [It isn't obvious what I mean by "earlier idea" here. Actually, I had earlier had the idea of defining the as below, but got distracted by something else and ended up not writing it down. So a small part of the record of my journey to the proof is missing.] It is simply to define and for . Then for if the first inequality holds we have

So each new is less than the average of the up to that , and hence less than the average of the before that . But that means that the average of the forms a decreasing sequence. That also means that the are bounded above by , something I could have observed ages ago. So they can’t be an increasing sequence of integers.

I’ve now shown that the first inequality must fail at some point. Suppose is the first point at which it fails. Then we have

and

The second inequality tells us that exceeds the average of , which implies that it exceeds the average of . That gives us the inequality

So now I’ve proved that there exists an integer such that the inequalities both hold. It remains to prove uniqueness. This formulation with the ought to help. We’ve picked the first point at which is at least as big as the average of . Does that imply that is at least as big as the average of ? Yes, because is at least as big as that average, and is bigger than . In other words, we can prove easily that if the first inequality fails for then it fails for , and hence by induction for all .

]]>

Just before I start this post, let me say that I do still intend to write a couple of follow-up posts to my previous one about journal prices. But I’ve been busy with a number of other things, so it may still take a little while.

This post is about the next European Congress of Mathematics, which takes place in Berlin in just over two years’ time. I have agreed to chair the scientific committee, which is responsible for choosing approximately 10 plenary speakers and approximately 30 invited lecturers, the latter to speak in four or five parallel sessions.

The ECM is less secretive than the ICM when it comes to drawing up its scientific programme. In particular, the names of the committee members were made public some time ago, and you can read them here.

I am all in favour of as much openness as possible, so I am very pleased that this is the way that the European Mathematical Society operates. But what is the maximum reasonable level of openness in this case? Clearly, public discussion of the merits of different candidates is completely out of order, but I think anything else goes. In particular, and this is the main point of the post, I would very much welcome suggestions for potential speakers. If you know of a mathematician who is European (and for these purposes Europe includes certain not obviously European countries such as Russia and Israel), has done exciting work (ideally recently), and will not already be speaking about that work at the International Congress of Mathematicians in Seoul, then we would like to hear about it. Our main aim is that the congress should be rewarding for its participants, so we will take some account of people’s ability to give a good talk. This applies in particular to plenary speakers.

~~I shall moderate all comments on this post. If you suggest a possible speaker, I will not publish your comment, but will note the suggestion.~~ More general comments are also welcome and will be published, assuming that they are the kinds of comments I would normally allow.

[In parentheses, let me say what my comment policy now is. The volume of spam I get on this blog has reached a level where I have decided to implement a feature that WordPress allows, where if you have never had a comment accepted, then your comment will automatically be moderated. I try to check the moderation queue quite frequently. If you have had a comment accepted in the past, then your comments will appear as normal.

I am very reluctant to delete comments, but I do delete obvious spam, and I also delete any comment that tries to use this blog as a form of self-promotion (such as using a comment to draw attention to the author's proof of the Riemann hypothesis, or to the author's fascinating blog, etc. etc.). I sometimes delete pingbacks as well -- it depends whether I think readers of my blog might conceivably be interested in the post from which the pingback originates.]

Going back to the European Congress, if you would prefer to make your suggestion by getting in contact directly with a committee member, then that is obviously fine too. The list of committee members includes email addresses.

However you make your suggestions, it would be very helpful if you could give not just a name but a brief reason for the suggestion: what the work is that you think should be recognised, and why it is important.

The main other thing I am happy to be open about is the stage that the committee has reached in its deliberations, and the plans for how it will carry out its work. Right now, we are at the stage of trying to put together a longlist of possible speakers. I have asked the other committee members to suggest to me at least six potential speakers each, of whom at least six should be broadly in their area. I hope that will give us enough candidates to make it possible to achieve a reasonable subject balance. We will of course also strive for other forms of balance, such as gender and geographical balance, to the extent that we can. Once we have a decent-sized longlist, we will cut it down to the right sort of size.

We are aiming to produce a near-complete list of speakers by around November. This is rather a long time in advance of the Congress itself, which worried me a bit, but I have permission from the EMS to leave open a few slots so that if somebody does something spectacular after November, then we will have the option of inviting them to speak.

]]>

**Further update: figures in from Nottingham too.**

**Further update: figures now in from Oxford.**

**Final update: figures in from LSE.**

A little over two years ago, the Cost of Knowledge boycott of Elsevier journals began. Initially, it seemed to be highly successful, with the number of signatories rapidly reaching 10,000 and including some very high-profile researchers, and Elsevier making a number of concessions, such as dropping support for the Research Works Act and making papers over four years old from several mathematics journals freely available online. It has also contributed to an increased awareness of the issues related to high journal prices and the locking up of articles behind paywalls.

However, it is possible to take a more pessimistic view. There were rumblings from the editorial boards of some Elsevier journals, but in the end, while a few individual members of those boards resigned, no board took the more radical step of resigning en masse and setting up with a different publisher under a new name (as some journals have done in the past), which would have forced Elsevier to sit up and take more serious notice. Instead, they waited for things to settle down, and now, two years later, the main problems, bundling and exorbitant prices, continue unabated: in 2013, Elsevier’s profit margin was up to 39%. (The profit is a little over Â£800 million on a little over Â£2 billion.) As for the boycott, the number of signatories appears to have reached a plateau of about 14,500.

Is there anything more that can be done? One answer that is often given is that the open access movement is now unstoppable, and that it is only a matter of time before the current system will have changed significantly. However, the pace of change is slow, and the alternative system that is most strongly promoted — open access articles paid for by article processing charges — is one that mathematicians tend to find unpalatable. (And not only mathematicians: they are extremely unpopular in the humanities.) I don’t want to rehearse the arguments for and against APCs in this post, except to say that there is no sign that they will help to bring down costs any time soon and no convincing market mechanism by which one might expect them to.

I have come to the conclusion that if it is not possible to bring about a rapid change to the current system, then the next best thing to do, which has the advantage of being a lot easier, is to obtain as much information as possible about it. Part of the problem with trying to explain what is wrong with the system is that there are many highly relevant factual questions to which we do not yet have reliable answers. Amongst them are the following.

1. How willing would researchers be to do without the services provided by Elsevier?

2. How easy is it on average to find on the web copies of Elsevier articles that can be read legally and free of charge?

3. To what extent are libraries actually suffering as a result of high journal prices?

4. What effect are Elsevier’s Gold Open Access articles having on their subscription prices?

5. How much are our universities paying for Elsevier journals?

The main purpose of this post is to report on efforts that I and others have made to start obtaining answers to these questions. I shall pay particular attention to the last one, since it is about that that I have most to say. I will try to keep the post as factual as possible and give my opinions about some of the facts in a separate post.

I have two small pieces of evidence. The first is an interesting comment that was made on a Google Plus post of mine by Benoît Kloeckner, who wrote the following.

In France, when the national consortium “Couperin” was dealing with Springer for the 2012-2014 contract, we issued a petition asserting that some terms (notably interdiction to unsubscribe from a number of journals) were unacceptable and that we, mathematicians, would agree not to get access to Springer journals. This was done to give negotiators more strength, but had little effect despite a significant number of signatures.

This points to a problem that I will discuss in more detail in my next post: that different subjects have different needs. Part of the reason mathematicians find the current system so objectionable is that we have already got to the stage where we don’t really need journals for anything other than the very crude measure of quality that it gives us, since a fairly high, and ever increasing, proportion of the articles that interest us are freely available in preprint form. But in some subjects, such as biology or medicine, this is much less true, and as a result people rely far more on journal articles.

I tried to take the temperature in the mathematics faculty in Cambridge by asking my colleagues to complete a very brief questionnaire: there were two questions, with multiple-choice answers. The questions were as follows.

1. How easily could you do without access to Elsevier journals via ScienceDirect and print copies?

2. For those who negotiate on our behalf to be in a strong bargaining position, they have to be able to risk our losing access to Elsevier products (other than those that are freely available) for a significant length of time. How willing would you be for them to take that risk?

In case the results were interestingly different, I got people in DAMTP (the department of applied mathematics and theoretical physics) to answer one copy of the questionnaire and people in DPMMS (the department of pure mathematics and mathematical statistics) to answer another. The results were as follows. There were 96 responses from DAMTP and 80 from DPMMS. I give the DAMTP figure first and then the DPMMS figure, both as percentages.

1. How easily could you do without access to Elsevier journals via ScienceDirect and print copies?

(i) It would be no problem at all. [27.1, 23.8]

(ii) It would be OK, but a minor inconvenience. [26.0, 38.8]

(iii) It would be OK most of the time, but occasionally very inconvenient. [24.0, 32.5]

(iv) It would be a significant inconvenience. [14.6, 5.0]

(v) It would have a strongly negative impact on my research. [8.3, 0.0]

2. For those who negotiate on our behalf to be in a strong bargaining position, they have to be able to risk our losing access to Elsevier products (other than those that are freely available) for a significant length of time. How willing would you be for them to take that risk?

(i) Very willing [46.9, 55.7]

(ii) Willing [31.3, 39.2]

(iii) Unwilling [14.6, 3.8]

(iv) Very unwilling [7.3, 1.3]

Thus, if the responses were representative, then in both departments, most people would not suffer too much inconvenience if they had to do without Elsevier’s products and services, and a large majority were willing to risk doing without them if that would strengthen the bargaining position of those who negotiate with Elsevier.

Another question I might have asked is how much the answers would have changed if the departments were to subscribe to just a few important journals. That is an important question, since it might be that the University of Cambridge should follow the examples of Harvard, MIT, Cornell and others (that link is from 2004 so the situation may have changed), stop paying for a Big Deal contract and switch to paying for individual journals at list prices instead.

It is very easy to find websites where surveys like the one I conducted can be set up for no charge. (But be a little careful: I accidentally chose one called Surveymonkey that allowed only 100 responses, as a result of which I had to ask people to do it again.) I would be extremely interested if other people could do similar surveys in their own departments, both in mathematics and in other subjects.

My impression has for some time been that in mathematics a significant proportion of articles are available on the arXiv or on authors’ home pages, to the point where I almost never need to look at the journal version. There also appears to be a distinct positive correlation between the quality of a journal and the proportion of its articles freely available. And there seem to be national differences in the extent to which people make their papers available. But until recently it was a rather long and tedious process to obtain any hard figures about this.

Recently, however, Scott Morrison has set up a website called The Mathematics Literature Project, to which you can contribute if you have the time. Although one still has to input the information manually, Scott has written software that automates the process to some extent and makes it much quicker. The project is still in its infancy, but it already demonstrates that a large proportion of articles in various different journals, not all of them Elsevier journals, are indeed freely available in preprint form. And there is some evidence for the correlation with quality: for example, Discrete Mathematics is a less good journal than the Journal of Combinatorial Theory A and B, and a lot fewer of its articles can be found. (For JCTA the proportion is over 80%, whereas for Discrete Mathematics it is more like 30%.)

Thus, there is plenty of evidence that mathematicians at least do not really need their universities to pay large sums of money to Elsevier. Unfortunately, because of bundling, that fact on its own has had almost no effect on prices.

I’m tempted just to suggest that you go and talk to a librarian. You won’t be left in much doubt about the answer, at least qualitatively speaking. In brief, libraries suffer because bundling means that they have very little control over their budgets. If Elsevier raises its prices, then libraries simply have to pay them or else lose the entire bundle, so effectively they are forced to make cuts elsewhere. And this happens. For example, Phil Sykes, former chair of Research Libraries UK, shared a document with me that includes many interesting figures, one of which is that between 2001 and 2009, mean expenditure on books went up by 0.17%, which is a substantial real-terms cut, while mean expenditure on journals went up by 82%. Apparently, the expenditure on books as a proportion of total expenditure went down from 11% to just over 7% between 1999 and 2009.

But this distortion is not confined to books. Journals that belong to a large bundle are artificially protected, at the expense of other, potentially more useful, journals that do not belong to the bundle. If you think that this is just a theoretical possibility, then take a look at the example of the Université de Paris Descartes. This is the top university in Paris for medicine, the university you try to get into if you are French and want to be a doctor.

It would seem a safe bet that a top medical university would subscribe to at least some journals from the Nature publishing group, such as Nature Medicine, which describes itself as the premier journal for medical research, or Nature, which likes to think of itself as the premier journal full stop. But no: subscriptions to all Nature journals as well as many others were cancelled this year. In the long list of cancelled subscriptions, you won’t find any mention of Elsevier journals, because they are bundled together.

From time to time, a library decides that enough is enough. A couple of years ago, the mathematics department of the Technisches Universität München decided to cancel all its subscriptions to Elsevier journals. And very recently the entire Universität Konstanz, also in Germany, decided to cancel its license negotiations and replace its license by “alternative procurement channels”. Given the evidence that we are becoming less reliant on journal subscriptions, it would seem rational for other libraries to consider whether to take similar measures.

Recall that Gold Open Access refers to the practice where a publisher makes an article freely available online in return for an article processing charge (APC), which is typically paid by an author’s institution or by a grant-awarding body. Elsevier now has various journals that are funded that way, as well as “hybrid” journals — that is, journals to which libraries still subscribe but which allow authors to make their articles open access in return for an APC. The proportion of Elsevier articles for which APCs have been paid is currently very small, but it is likely to increase, since various funding bodies are starting to insist that the academics they fund should make their articles open access, and often (but not always) the assumption is that this should be done via an APC.

A few months ago, it occurred to me to wonder what would happen if the proportion of Gold Open Access articles did indeed increase. Would Elsevier continue to rake in its subscription revenue and receive the APCs on top? This would seem particularly unjust in the case of hybrid journals, since libraries with Big Deal contracts cannot cancel their subscriptions to them, and in any case if several of the articles are not open access they may well not want to. So there would seem to be a danger that Elsevier is receiving substantial article processing charges that are not needed to cover the cost of processing (the additional cost of making an article open access is at least an order of magnitude less than the APCs), or to compensate Elsevier for loss of subscription revenue.

I then discovered that, not surprisingly, many other people had been concerned about this point. There is even a technical term for the practice of effectively charging twice for the same article: it is called *double dipping*. I found a page on Elsevier’s website where they stated that they had a no-double-dipping policy. However, that mentioned only the list prices of journals, so it did not address my concern at all, given that most libraries have Big Deal contracts. I decided to write to Elsevier to ask about this, and the result was that they updated the relevant page.

I think one can summarize what they say on the page now as follows: they set their prices based on the number of non-open-access articles included in the Freedom Collection; this has gone up, so they feel no compunction about charging more for the Freedom Collection. So they are at least *implying* that if enough open-access articles were published that the total volume of non-open-access articles went down, they would lower their prices.

That leaves me with two concerns. The first is that if their Big Deal contracts are confidential, then we have no way of knowing whether they are sticking to their official policy. The second is that what matters should not be the number of open access articles as a proportion of the whole, but the proportion of open access articles *amongst the articles that people actually want to read*. If, for example, half the articles in journals such as Cell and The Lancet became open access but Elsevier launched a handful of joke journals that published a comparable volume of articles, then the value of the non-open-access component to libraries would have gone down substantially, but according to Elsevier’s stated policy their charges would not be decreased.

On top of all that is a remarkable scandal that has attracted a great deal of attention recently, which is that Elsevier has been double dipping in the most direct way possible: charging people to download articles for which APCs have been paid. Mike Taylor spotted this about two years ago. Elsevier’s response, coordinated by Alicia Wise, was less than swift, not surprisingly given their strong incentive to drag their feet about it. Peter Murray-Rust has been vigorously campaigning about this issue. If you’re interested, you can check out the March 2014 archive of his blog and work backwards.

Now we come to the big question. One of the most annoying aspects of the current situation in academic publishing is that the big publishers don’t want us to know what our universities are paying for their journals, so they insist on confidentiality clauses. As a result, we can’t tell whether we are getting good value for money, though there is plenty of indirect evidence, and even some direct evidence, that we are not.

There have been a few attempts in the past to use freedom-of-information legislation to get round these confidentiality clauses, some successful and others not. Also, some information has been made available by other means. Here are the cases I know about, but this list is very likely to be incomplete. (If I am notified of further useful information, I will be happy to add it to the list with appropriate acknowledgement.)

1. In 2009 public-record requests were made by Paul Courant, Ted Bergstrom and Preston McAfee to a large number of US universities asking for details of their Big Deal contracts with publishers. They had considerable success with this, obtaining information from 36 institutions. Elsevier made strenuous efforts to prevent the disclosures, contesting the request to Washington State University, but a judge ruled against them. See this page for further details. Together with Michael Williams they wrote an analysis of what they discovered, which ~~will soon become available in preprint form.~~ has now been published. It includes the following figures for what a number of universities spent on Elsevier contracts. The first figure in each row is the cost in dollars of the Elsevier Freedom Package and the figure in brackets is the enrolment. (The latter is not by any means a perfect measure of the size of a university, but it gives at least some idea.)

University |
Cost in dollars |
Enrolment |

Arizona Universities* | 2,724,888 | 123,473 |

Auburn | 1,252,544 | 22,654 |

Clemson | 1,296,044 | 16,582 |

Colorado State | 1,319,633 | 24,409 |

Cornell | 1,969,908 | 20,340 |

Georgia State | 934,764 | 25,135 |

Louisiana State | 1,198,237 | 28,467 |

New York U. | 1,878,962 | 40,291 |

U of Alabama | 1,018,614 | 22,971 |

U of California** | 8,760,968 | 218,320 |

U of Colorado | 1,725,023 | 28,333 |

U of Denver | 467,406 | 10,036 |

U of Georgia | 1,854,419 | 33,079 |

U of Idaho | 750,808 | 10,008 |

Illinois Universities*** | 2,319,383 | 72,751 |

U of Iowa | 1,420,484 | 27,361 |

U of Maryland | 1,760,173 | 31,573 |

U of Michigan | 2,164,830 | 39,447 |

U of Tennessee | 579,815 | 27,635 |

U of Texas, Arlington | 620,042 | 20,136 |

U of Texas, Austin | 1,539,380 | 46,537 |

U of Wisconsin | 1,215,516 | 35,295 |

U of Wyoming | 497,014 | 10,478 |

*A consortium of three universities in Arizona

**A joint license for ten University of California campuses

***A joint license for three University of Illinois campuses

If you like this kind of thing, then take a look at the appendix to their paper, from which the above table comes, and which is not behind a paywall. In case you have access to PNAS, the article is here.

One related thing I have found, which interests me a lot because of its relevance to this post, is a judgment from Greg Abbott, the Attorney General of Texas, that the University of Texas should release details of its contracts with publishers. The part that interests me starts near the bottom of page 3, where there is a detailed discussion of what constitutes a trade secret. Roughly speaking, information is a trade secret of one company if disclosing it to other companies would cause substantial competitive harm to the first company. The Attorney General concludes in robust terms that the Big Deal contracts do not meet the definition of a trade secret, which I agree with because the different publishing companies are not competing to sell the same product.

2. There is a fascinating blog post by David Colquhoun written in December 2011, which I would certainly have referred to before if I had been aware of it, in which he discusses in detail the situation at his institution, which is University College London. In it, he says, “I’ve found some interesting numbers, with help from librarians, and through access to The Journal Usage Statistics Portal (JUSP).” The word “interesting” is an understatement. The first number is that UCL then paid Elsevier €1.25 million for electronic only access to Elsevier journals. But as interesting as that headline figure is his analysis of the usage of Elsevier (and other) journals. As one might expect, but it is very good to see this confirmed, there are a few journals that are used a lot, but the usage tails off extremely rapidly.

3. In this country, there have been Freedom of Information requests to De Montfort University in 2010 (successful), Swansea University in 2014 (unsuccessful), and the University of Edinburgh in 2014 (successful). I recommend at this point that you read the refusal letter by Swansea. For reasons that I’ll come to, it is fairly clear that the letter was basically written by Elsevier, so it gives us some insight into their official reasons for wanting to keep their contracts secret. As I’ll discuss later, their arguments are very weak.

There was also a successful request to Swansea in 2013, but this one asked for the amount spent on all journal subscriptions, rather than just Elsevier subscriptions. It reveals that the amount went up from Â£1,514,890.88 in 2007/8 to Â£1,861,823.92 in 2011/12. (From the wording, it seems that these figures include VAT, but I’m not quite sure.) That’s a whopping 23% increase in four years. Of course, that may be because Swansea University decided to increase significantly the number of journals it subscribed to, but that explanation seems a trifle unlikely in the current economic climate. Whatever the explanation, the amount of money is very high.

The successful request to Edinburgh was made on January 16th by Sean Williams. The response was delayed, but on April 8th they finally responded, giving full details for two years and the totals for three. This reveals that Edinburgh spends around Â£845,000 plus VAT per year.

4. Recently there was a long negotiation between Elsevier and Couperin, a large consortium representing French academic insitutions. (Actually, I say long, but Elsevier apparently has an annoying habit of not beginning the process of negotiation in earnest until close to the end of the existing contract, so that the other side must either make decisions very quickly or risk large numbers of academics temporarily losing access to Elsevier journals.) The result was what one might call a Huge Deal, one that gave complete access to ScienceDirect to all academic institutions, from the very largest to the very smallest. Couperin professed to be pleased with the deal. I do not yet know whether that satisfaction is shared by the universities that are actually paying for it. If you want to know how much France is paying for access to ScienceDirect, then I recommend typing “Elsevier Couperin” into Google. After at most a couple of minutes of digging, you will find a document that tells you. Three important aspects of this deal are (i) that it lasts for five years, (ii) that the total amount paid to Elsevier is initially lower than before but goes up each year and ends up higher and (iii) that the access is now spread to many more institutions. What I do not know is what the effect of this is on the large universities that were paying for Elsevier journals before. Does the fact that many more institutions are involved mean that prices have gone down substantially? Or are most of the institutions that have newly been granted access paying very little for it and therefore not saving much money for the others? It would be good to have some insight into these questions. The bottom line though, is that Elsevier’s profits in France are protected by the deal.

5. Brazil too has a national agreement with Elsevier, and refuses to sign a confidentiality clause. Somewhere I did once find, or get referred to, a page with details about the deal, but have not managed to find it again. My memory of it was that it was rather hard to understand.

**Update 25/4/2014**: many thanks to Rafael Pezzi, whose comment below I reproduce here, for more information about the situation in Brazil.

From the Brazilian open science mailing list:

Brazil has an nation wide agreement providing journal access to 423 academic and research institutions. It is called Portal de PeriÃ³dicos, provided by CAPES. According to its 2013 financial report [1], last year CAPES spent US$ 93,872,151.11 (with US$ 31,644,204.12 paid to Elsevier).

Some institutions that are not covered by the agreement, as they do not meet the eligibility criteria, had to pay in separate in order to get access to this portal, spending additional US$ 11,560,438.93.

Rafael

[1] http://www.capes.gov.br/images/stories/download/Contas_Publicas/Relatorio-de-Gestao-2013.pdf

6. A comment by Anonymous below points me to a blog post that says that at the end of 2011 Purdue agreed a $2.9 million deal with Elsevier and describes the general situation facing libraries when they negotiate these deals. It also links to a post about Pittsburgh (with less precise figures).

In early January, I decided to try to find out more about what UK universities are paying by making a request under the Freedom of Information Act. As in France, the negotiations are carried out by a consortium: the British one is called JISC collections. (It’s surprisingly hard to find out what JISC stands for: the answer is Joint Information Systems Committee.) Initially (to be precise, on the 8th of January), I wrote to Lorraine Estelle, who is the head of JISC collections. I made a FOI request, and the information I asked to be told was how much JISC had agreed to pay Elsevier in the most recent round of negotiations, and how that payment was shared between the institutions represented by JISC.

She suggested that we should speak on the phone, which we did. I learned some important things from the phone call, which I will come to later, but I did not get the information I had actually asked for. She explained why on the phone, and some time later, when I found that I couldn’t quite remember her explanation, I asked for a clarification in writing. She provided me with the following.

Your question: As I understood it, you didn’t actually have the data that I was asking for. Is that correct? And do you mean that you negotiated a total — which, presumably, you would know — but do not know how it was split between the various universities?

Answer: We do have the data and we do know the split – but because we do not actually aggregate the subscriptions ourselves for the Elsevier deal, I have to get the total sum and the split from Elsevier.

I interpret that as meaning that for legal purposes she did not have the information in a form that might have obliged her to disclose it under the Freedom of Information Act.

And thus, I was passed on to Alicia Wise. As many people who have had dealings with Alicia Wise have found, including Peter Murray-Rust in his attempts to stop Elsevier charging for access to open access articles, this is not a good situation to be in.

Obviously she didn’t say, “Of course, I’d be happy to provide you with that information.” But I’d have been satisfied with a clear statement from her that she was not prepared to provide it, and I couldn’t get that either. Here is a sample of our correspondence. (Incidentally, owing first to some misunderstanding and then, apparently, to Alicia Wise wanting to check that Lorraine Estelle had not given me any confidential information, which she hadn’t, the correspondence didn’t even begin until about a fortnight after Lorraine Estelle had passed on my request.)

Her first email message, sent on February 5th, explained that Elsevier makes “an array of pricing information publicly available” and provided some links. These were to list prices of journals, which, because of bundling, give no indication of what universities actually pay. She also proposed that we should meet, or perhaps talk on the phone. I wrote back on the 7th suggesting that a phone conversation would be more convenient. I got no response for four days, so on the 11th I sent my reply again, which prompted a suggestion of several possible dates for a meeting. She said,

Sorry, should have sent you a receipt acknowledgment. We’ve worked out internally that Chris Greenwell and I should, together, be able to answer questions that arise (although I am also contemplating inviting someone from our pricing team along in case you have very very detailed questions!)

At this point I had a little worry, so I put it to her.

But before we actually arrange anything, and in particular before we decide whether it is better to meet physically or by phone, perhaps it is worth clarifying what could come out of such a meeting. The main question I asked in my FOI request was the following: “there is one particular thing I would like to know, and that is details of the most recent round of negotiations between JISC and Elsevier. I would like to know what annual payment was agreed, and how that payment was shared between the higher education institutions represented.”

If you are prepared to answer that question in full (I’m talking actual amounts of money rather than the general principles underlying the negotiations), and without binding me to any confidentiality agreement, then we have something serious to talk about. If not, then I’m not sure there is any point in having a discussion. However, in the second case, it would still be useful to know your reasons for not being prepared to divulge the information.

She responded as follows.

Thanks for this. I continue to think a call or meeting would be helpful as my immediate question is what hypothesis do you have, or are you testing, that require data at this level of granularity? The data you request are commercially sensitive. I am wondering if publicly available data â€“ for example the attached which is from publications by the Society of College, University, and National Libraries (http://www.sconul.ac.uk/) â€“ might serve your purpose? If we could understand better what you are after and why, we might be better able to come up with data that helps you. (And, yes, we would have even greater flexibility if you were prepared to consider treating some information in confidence but I appreciate you might be unwilling to do so.)

To which I said this.

Thanks for sending those slides, though of course you must have known perfectly well that they would not be of any help to me.

I can’t see what is unclear about what I am after. As I said, I would like to know what the UK universities represented by JISC are paying annually for Elsevier journals (a combination of Core Collections and access to Science Direct). My main reason for wanting to know that is that I think it is in the public interest for people to know how much universities are spending.

However, there are more specific reasons that I am interested in the data. One is that because the cost to universities of their Core Collections is based on historic spend on print journals, there is the potential for very similar universities to pay very different amounts for a similar service from Elsevier. I have been told that this is the case — for example, Cambridge suffers because historically college libraries have subscribed to journals — but would like to have the data so that I can confirm this.

If you won’t give me this information on the grounds of commercial sensitivity, then just let me know, and it will save us all time.

That was on February 12th. Her next reply came on March 7th, and said this.

Thanks for this. I did intend for the slides to be useful to you, but now that you have explained more clearly what you are after can see this was not the case. They have, however, helped to move our conversation on. We are focused on delivering value for money to all our customers, including Cambridge. The most direct way to find out the information you are looking for with respect to Cambridge might be a conversation with the library there?

So after all that, I still didn’t have a straight answer. However, by then I had long since lost patience: on February 19th, I submitted Freedom of Information requests to all 24 Russell Group universities, with the exceptions of Cardiff, where my email kept bouncing back, and Exeter, which I missed out accidentally. (Later I sent requests to them too.) My request was as follows.

Dear [Head of university library],

I would like to make a request under the Freedom of Information Act. I am interested to know what [name of university] currently spends annually for access to Elsevier journals. I understand that this is typically split into three parts, a subscription price for core content, which is based on historic spend, a content fee for accessing those journals via ScienceDirect, and a further fee for accessing unsubscribed titles from the Freedom Collection, also via ScienceDirect. I would like to know the total fee, and how it is split up into those three components.

Many thanks in advance for any help you can give me on this.

Yours sincerely,

Timothy Gowers

When I sent these requests, I had very little idea what my chances were of finding anything out at all. Lorraine Estelle had told me that JISC Collections are firmly against confidentiality clauses, but that Elsevier had insisted. But also, and crucially, there was a clause about FOI requests that made it not completely certain that they would fail. Unfortunately, this clause cannot be made public. (Yes, you read that correctly: the confidentiality clause is itself confidential.) However, as we shall see, the responses by some of the universities give some indication of what is probably in it.

In the end, the result was that, to my surprise and delight, a substantial majority of universities decided to give me the information I wanted, though many of them gave me just the total and not the breakdown into its three components. Here are the figures from the 18 universities that were brave and public spirited enough to give me them, together with Edinburgh, which, for reasons I don’t understand, refused to give any figures to me but provided them to Sean Williams. The figures *exclude* VAT, which adds a not exactly negligible 20% to the cost, but at least that goes back to the taxpayer rather than swelling even further the coffers of Elsevier. The price is rounded to the nearest pound. I obtained the enrolment figures from this page.

**Update 25/4/2014:** Richard van Noorden has kindly pointed me to a document from which I can obtain staff numbers. So I’ve now added a third column to the table, which gives the number of full-time academic staff followed by the number of part-time academic staff. (These figures are for the academic year 2012/3. Again, they may not be a perfect measure of how much people are using Elsevier journals, but they are probably better than student numbers.)

**Update 28/4/2014** Imperial College London has responded to my request for a review of their initial decision by providing me with their total figure (but not the breakdown).

**Update 30/4/2014** The University of Nottingham has done the same. The breakdown is not provided because they “consider the likelihood and scale of prejudice here [to both Elsevier's and the University's commercial interests] to be very high and therefore the test favours application of the exemption.” It is clear that there is some kind of game going on here, since everybody knows that the breakdown is basically that almost the entire amount is the subscription fee, with the content fee and Freedom Collection fee being a tiny proportion of the whole. (See below for an explanation of what I am talking about here.) So there is no imaginable effect that publishing the exact numbers could possibly have. However, equally, it is not all that important to know them.

**Update 16/5/2014** Queen Mary University of London has supplied their total figure to Edward Hughes, who is there.

**Update 23/5/2014** I now have the figures from Oxford.

**Update 31/5/2014** Figures from LSE added.

University |
Cost |
Enrolment |
Academic Staff |

Birmingham | Â£764,553 | 31,070 | 2355 + 440 |

Bristol | Â£808,840 | 19,220 | 2090 + 525 |

Cambridge | Â£1,161,571 | 19,945 | 4205 + 710 |

Cardiff | Â£720,533 | 30,000 | 2130 + 825 |

*Durham | Â£461,020 | 16,570 | 1250 + 305 |

**Edinburgh | Â£845,000 | 31,323 | 2945 + 540 |

*Exeter | Â£234,126 | 18,720 | 1270 + 290 |

Glasgow | Â£686,104 | 26,395 | 2000 + 650 |

Imperial College London | Â£1,340,213 | 16,000 | 3295 + 535 |

King’s College London | Â£655,054 | 26,460 | 2920 + 1190 |

Leeds | Â£847,429 | 32,510 | 2470 + 655 |

Liverpool | Â£659,796 | 21,875 | 1835 + 530 |

Â§London School of Economics | Â£146,117 | 9,805 | 755 + 825 |

Manchester | Â£1,257,407 | 40,860 | 3810 + 745 |

Newcastle | Â£974,930 | 21,055 | 2010 + 495 |

Nottingham | Â£903,076 | 35,630 | 2805 + 585 |

Oxford | Â£990,775 | 25,595 | 5190 + 775 |

* ***Queen Mary U of London | Â£454,422 | 14,860 | 1495 + 565 |

Queen’s U Belfast | Â£584,020 | 22,990 | 1375 + 170 |

Sheffield | Â£562,277 | 25,965 | 2300 + 460 |

Southampton | Â£766,616 | 24,135 | 2065 + 655 |

University College London | Â£1,381,380 | 25,525 | 4315 + 1185 |

Warwick | Â£631,851 | 27,440 | 1535 + 305 |

*York | Â£400,445 | 17,405 | 1205 + 285 |

*Joined the Russell Group two years ago.

**Information obtained by Sean Williams.

***Information obtained by Edward Hughes.

Â§LSE subscribes to a package of subject collections rather than to the full Freedom Collection.

~~The universities for which I still do not have the information are~~ ~~Imperial College London~~, ~~London School of Economics and Political Science,~~ ~~Nottingham,~~ ~~and Oxford.~~ ~~, and Queen Mary University of London.~~ ~~I still have hopes of finding out the figures for~~ ~~Imperial~~, ~~Nottingham and~~ ~~Oxford, and will provide them if I do.~~

A striking aspect of these amounts is just how much they vary. How does it come about, for example, that University College London pays over twice as much as King’s College London, and almost six times as much as Exeter? In order to explain this, I need to say something about the system as it is at the moment. It is here that I am indebted to Lorraine Estelle.

The present system (as it is in the UK, but my guess is that these remarks apply more generally) would be inexplicable were it not for the fact that it grew out of an older system that existed before the internet. Given that fact, though, it makes a lot more sense. (I don’t mean that it is fair — just that its existence is comprehensible.) If you were an Elsevier executive managing the transition from a world of print journals to a world where most people want to read articles online, what service would you offer and what would you do about prices? Since it costs almost nothing to make articles that are already online available to more people, and since it is convenient for a university to have access to everything, the obvious service to offer is complete access to all Elsevier journals. But what should you charge for this service?

Up to now, different universities have spent significantly different amounts on Elsevier journals, so if you start all over again and work out a price for the complete package, either some universities will have to pay much more than they did before, which they would probably be unwilling to do, or some universities will end up paying much less than they did before and profits will suffer quite badly. So you try to devise a system that will give universities the new service at prices that are based on the old service. That way, no university ends up paying significantly more or less than it did before. But because this is unfair — after all, now different universities will be paying very different amounts for the same service — you feel that you can’t let the universities know what other universities are paying.

The current system in the UK is very much as the above thought experiment would lead one to expect. So it is easy to see why Elsevier wants confidentiality clauses. It also explains the rather strange structure of the deals that universities have with Elsevier. Typically they have a certain “core content” (roughly, the journals they subscribed to before the transition), for which they pay something close to list prices and receive print copies. They then pay a small extra fee for permanent electronic access to that core content, and another small extra fee for electronic access to all other Elsevier journals, but this time only while the university continues to have a contract with Elsevier. Of course, in such a situation a university would like to cut down its core content to zero, but that is not allowed: there are strict controls on what they are allowed to cancel. The buzz phrase here is “historic spend”, which roughly means what universities spent on print subscriptions before the transition to electronic access. The system ensures that what universities pay now closely matches their historic spend.

Here is how Lorraine Estelle explains it.

Prior to the move to online journal, each institution subscribed to titles on a title by title basis.

When NESLI was set up, our negotiations were confined to the “e-fee” or “top-up fee”.

This was the fee that institutions needed to pay in order to have access to all a publisher’s content in electronic format. Their “subscribed titles” plus all other titles from that publisher. (This is the deal that has become known as “The Big Deal’ and adopted by all major publishers).

The “e-fee” or “top-up fee” was (and usually is still) contingent of the institutions maintaining the level of spend for the “subscribed titles”.This article provides the background to NESLI http://www.uksg.org/serials/nesli back in 1998

As institutions have moved to e-only – we negotiate with most publishers on the total cost across the consortium. However, in most (but not all) deals the division of spend across the UK library consortium is uneven – and still depends on the level of historic spend on subscribed titles. So an institution that used to subscribe to many titles, will still pay more than one that used to subscribe to fewer.

We negotiate the total increase – known as the price cap, the cancellation allowance (which means institutions can cancel a percentage of historically subscribed titles and still retain e-access), and the licence terms and conditions. This is not unique and it is the model employed by most academic library consortia across the world.

The deal is negotiated by Jisc Collections – but we do have support and input from the institutions. Oversight of our negotiations is provided by our Electronic Information Resources working group http://www.jisc-collections.ac.uk/About-JISC-Collections/Advisory-Groups/Electronic-Resources-Information-Group/ It is very rare for an institution to negotiate its own deal, because it would be difficult for them to get the same terms on an individual basis. The few exceptions are where an institution has a special relationship with a publisher – University of Oxford for OUP titles, for example.

All this is important, because it shows that a certain picture of how Elsevier operates, one that I used to believe in, is an oversimplification. In that picture, Elsevier insists on confidentiality clauses in order to be able to screw each university for whatever it can get. However, such a description is misleading on two counts. First, Elsevier negotiates with JISC rather than directly with universities, and secondly, the amount that universities pay is based on historic spend rather than on what Elsevier manages to wring out of them.

I say “an oversimplification” rather than “wrong” because if Elsevier *did* operate in the way I had previously imagined, the results would probably be rather similar. What is the maximum that Elsevier would be likely to persuade a university to pay? It would be very hard to persuade a university to agree to a huge leap in prices, so in each year one would expect the maximum to be whatever the university paid in the previous year plus a small real-terms increase. And all the evidence suggests that that is more or less exactly what Elsevier has managed to achieve.

Another factor that is perhaps worth briefly discussing is the fact that Durham, Exeter, Queen Mary University of London and York joined the Russell Group only two years ago. This probably helps to explain why (apart from QMUL, which refused to provide me with its figures) these universities are paying significantly less than most of the others. Whether Elsevier had an explicit policy of charging less to supposedly less prestigious universities (though the list of universities not in the Russell Group contains several that appear to me to be at least as prestigious as several that are in the Russell Group), or whether there is merely a strong correlation between membership of the Russell Group and historic spend on Elsevier journals, I don’t know. I think the former may be the case, since I have heard librarians talking about a “banding system” (I don’t know any details about how it works), and also because Bergstrom et al mention in their paper that in the US there is a classification of universities into different types according to how research intensive they are, with prices depending to a considerable extent on this classification.

A further factor that may possibly explain some of the data is that some institutions have recently merged with others. For example, The University of Manchester, one of the universities that pays most, merged in 2004 with UMIST (University of Manchester Institute of Science and Technology), and UCL merged in 2012 with The School of Pharmacy, University of London. The latter fact may help to explain why they are paying so much more now than what David Colquhoun said they were paying in 2011.

Although the differences between the amounts that different universities pay are eye-catching, it is important to be clear that they are a *symptom* of what is wrong with the system, and not the problem itself. The problem is quite simply that Elsevier has a monopoly over a product for which the demand is still very inelastic (the lack of elasticity being largely the fault of the academic community), with the result that the prices are unreasonably high for the service that Elsevier provides. (It bears repeating that the refereeing process and editorial selection are not paid for by Elsevier — those services are provided free of charge by academics.) If Elsevier were to equalize the prices (or equalize some suitable quantity such as price divided by size of university, or price per use) while keeping the aggregate the same, this would *not* solve the underlying problem.

As I have explained above, the price that a typical university pays to Elsevier in its Big Deal is divided into three components. One is a “subscription fee”, which is to pay for a certain collection of journals at something comparable to their list prices. Another is a “content fee”, which is to pay for electronic access in perpetuity to those titles (via ScienceDirect). The third is a “Freedom Collection fee”, which is to pay for electronic access to the rest of Elsevier’s journals, but this access, unlike the access covered by the content fee, is lost if you cancel the Big Deal.

I have got breakdowns from seven universities, but rather than give them here, I would rather simply make a few general points about them.

1. The content fee (that is, the fee for electronic access to the subscribed titles) is, in all the cases I know about, very close to 5.8824% of the subscription fee. Since 1/17=0.05882352941, I think that is saying that the content fee is exactly one seventeenth of the subscription fee, with the tiny differences coming from rounding errors. Of course, the precise details here are unimportant: what matters is that it is a very small amount compared with the subscription fee itself.

2. The Freedom Collection fees do not have an obvious relationship with the subscription fee, but, amusingly, with the seven examples I have, the more you pay for the latter, the less you pay for the former. That actually makes some kind of sense, since the more you are paying the content fee, the bigger the chunk of the Freedom Collection you are already subscribing to. I haven’t managed to reverse-engineer any kind of simple quantitative relationship between the two prices, however.

3. The inverse relationship in point 2 might seem to make things fairer, and to a very small extent it does, but we are talking about fees of between Â£10,000 and Â£25,000 here, so even for a university with a small subscription fee the price of the Freedom Collection fee is well under a tenth of its subscription fee. In fact, it doesn’t even make up for the discrepancy in the content fees, because the price is not high enough to do so. Of course, it is grotesquely misleading to say that the Freedom Collection costs so little, because the price you pay for it is conditional on not cancelling the subscriptions that keep the subscription fee extremely high. Indeed, the entire “breakdown” is misleading for that reason: the effective cost of the Freedom Collection is far higher than its nominal cost.

The moral of all this is that the figures giving the total cost are what matter. What universities actually need is electronic access to Elsevier’s journals. In order to get that access, Elsevier insists that they nominally pay for something else, namely subscriptions that they are not allowed to cancel (even when they are duplicates, as has happened in Cambridge because of college libraries, and probably in Manchester and UCL as a result of mergers). But that is of no practical importance. It’s a bit like those advertisements that say “FREE OFFER!” and then in very small print they add “when you spend over Â£X,” which of course means that the so-called free offer is not free at all.

While I was still not at all sure that I would get any information about prices, I comforted myself with the thought that an institution that refuses a FOI request has to give reasons, and those reasons might well be informative. For example, they might reveal that the main reason for confidentiality is to protect Elsevier’s profits, which would conflict with Elsevier’s official reasons.

Or would it? If you’ve read this far, then your reward is the following rather wonderful video (which has done the rounds for a while, so you may have seen it) of David Tempest, from Elsevier, explaining why confidentiality clauses are necessary. Many thanks to Mike Taylor for obtaining it. A transcript can be found on his blog.

The person who asked the question is Stephen Curry, from Imperial College London. ~~I’m sorry to say that, as mentioned above, Imperial is one of the universities I have not managed to get figures from.~~ I’m glad to say that at last he can know what his university library is spending on his behalf.

David Tempest’s lapse aside, Elsevier usually does not admit that the confidentiality clauses are there to protect its profits. But the refusal letters I received tell a different story. A good example is the first response I had from any university (other than an acknowledgement), which was a refusal from Queen’s University Belfast. I will quote it in full.

Dear Mr Gowers

Freedom of Information Request â€“ Elsevier JournalsMy letter, dated 21 February 2014, in relation to the above refers. [sic]

Having reviewed your request and consulted with appropriate colleagues, I would respond as set out below:

I would like to make a request under the Freedom of Information Act. I am interested to know what Queen’s University Belfast currently spends annually for access to Elsevier journals. I understand that this is typically split into three parts, a subscription price for core content, which is based on historic spend, a content fee for accessing those journals via ScienceDirect, and a further fee for accessing unsubscribed titles from the Freedom Collection, also via ScienceDirect. I would like to know the total fee, and how it is split up into those three components.I can confirm that whilst the University does hold this information, it is not being provided to you as it is considered exempt under Section 43(2) of the Act.

Section 43(2) of the Act provides that information is exempt if its disclosure under the Act would be likely to prejudice the commercial interests of any person, including the public authority itself.

Commercial interests relate to the ability to successfully participate in a commercial activity. This could be the ability to buy or sell goods or services or the disclosure of financial and planning information to market competitors. It is, therefore, necessary to decide whether release of this information will have an impact on the commercial activity of Elsevier or the University.

In making this determination, the University has consulted with Elsevier regarding the disclosure of the requested information and whether such disclosure would be likely to prejudice Elsevierâ€™s commercial interests.

In written representations to the University, Elsevier has indicated that the disclosure of the amount of money spent annually on access to Elsevier journals would reveal pricing information, specifically the licensing fees that have been negotiated with the University in circumstances that may include a level of discount.

The disclosure of this information would be likely to have a detrimental effect on Elsevierâ€™s future negotiating position with that of the University and, indeed, the wider HEÂ sectorÂ â€“ which representsÂ a large percentage of their market.

The University accepts this argument and also considers that disclosureÂ of information that would reveal pricing would also be likely to prejudice the commercial interests of the University itself, insofar as it could have a detrimental impact on the future negotiation of tailoredÂ solutions for licensing of Elsevierâ€™sÂ products and discounts from list prices. Â â€¨

Section 43(2) is a qualified exemption and the University must, therefore, consider where the balance of the public interest lies.The University accepts the need for transparency and accountability for decision making. The requirement, however, for transparency and accountability needs to be weighed against the harm to the commercial interests of third parties or the University itself through disclosure. The University has, therefore, weighed the prejudice caused by disclosure of the requested information against the likely benefit to the wider public.

In considering arguments in favour of disclosing the information, the University has taken into account the wider interest of the general public in having access to information on how public funds are spent. In this instance, there is a public interest in demonstrating that the University has negotiated a competitive rate in relation to the procurement of Elsevierâ€™s products and services.

The University considers, however, that this public interest is already met by the significant amount of pricing information that Elsevier currently makes publicly available â€“ such information is available at:

http:\www.elsevier.com/librarians/journal-pricing and

http:\www.elsevier.com/librarians/physical-sciences/mathematics/journal-pricing.In relation to those factors favouring non-disclosure, the University has a duty to protect commercially sensitive information that is held about any third party. In this instance, disclosure of the amount of money spent by the University on Elsevier products would reveal pricing information that was acknowledged by both the University and Elsevier at the time the contract was entered into as being commercially confidential. Disclosure of this information would be likely to prejudice not only the commercial interests of Elsevier but also the interests of the University itself, along with the relationship that the University has with its supplier.

It is reasonable, therefore, in all the circumstances of this case that the exemption should be maintained and the requested information not disclosed.

If you are dissatisfied with the response provided, please put your complaint in writing to me at the above address. If this fails to resolve the matter, you have the right to apply to the Information Commissioner.

Yours sincerely

Amanda Aicken

Information Compliance Unit

I responded as follows.

Dear Amanda Aicken,

Thank you for your response to my Freedom of Information Request (reference FOI/14/42). You invited me to write to you if I was dissatisfied with it. I have a number of reasons for dissatisfaction, so I am taking you up on your invitation.

My main objection is that I disagree with several of your reasons for declining my request. I will present them as a numbered list.

1. You say that the disclosure of the information I ask for would be likely to have a detrimental effect on Elsevier’s future negotiating position with that of the university. You also say that it would be likely to prejudice the commercial interests of the university itself. I do not find these two statements easy to reconcile. Could you please explain how it is possible for

bothparties to lose out?2. You agree with me that there is a public interest in demonstrating that the university has negotiated a competitive rate in relation to the procurement of Elsevier’s products and services. You go on to say that this public interest is already met by the information that Elsevier has made publicly available online. However, this is manifestly untrue. The only figures provided by Elsevier are for the list prices of their journals. But since universities pay for Elsevier’s Freedom Collection with a Big Deal, the list prices do not give me any way of verifying that the university has negotiated a competitive rate. Indeed, they do not even allow me to work out the order of magnitude of how much Queen’s University is paying to Elsevier. Please would you either retract your statement that this public interest has already been met by Elsevier, or else explain to me how to use the list prices to estimate the total amount paid by Queen’s University?

3. Your letter implies that there are direct negotiations between Elsevier and Queen’s University of Belfast. However, this is also not true. The negotiations are mediated through JISC. Therefore, there is no obvious mechanism whereby disclosing the prices would cause any commercial harm to the university.

4. It has not escaped my notice that the letter you sent is remarkably similar to a letter sent by the University of Swansea to somebody else who made a similar request. It is clear that you used that letter as a template, or else that you and the University of Swansea used the same template, perhaps provided by Elsevier. This suggests to me that you have not considered the balance of arguments for and against disclosure with sufficient independence.

In summary, the main two points that I cannot accept are that the financial interests of Queen’s University are likely to be prejudiced by the disclosure of this information, and that there is sufficient information in the public domain to enable me to determine whether the university has negotiated a competitive rate. If you are going to refuse to disclose the information, then I would like it to be for reasons that are not obviously false.

Yours sincerely,

Timothy Gowers

The Swansea letter I referred to is this one, which I have already mentioned. It was the formulaic nature of the response, with ghastly Orwellian phrases such as “tailored solutions” and misleading references to “a level of discount” that appeared not just in these two letters but in many other refusal letters that I was to receive, that got me annoyed enough to express my dissatisfaction, which in the case of Queen’s University Belfast and a handful of other universities eventually resulted in success. The response I received to my letter above was as follows. It did not really address my arguments, but since it gave me the information that was not a big concern.

Dear Mr Gowers,

Freedom of Information Request — Elsevier Journals — Internal ReviewYour email to Mrs Amanda Aicken, dated 5 March 2014, requesting an internal review of the University’s response to your Freedom of Information request on the above, refers.

On 21 February 2014, you submitted a request for information in relation to the University’s annual expenditure on access to Elsevier Journals. You requested details of the total fee and how this is split up into three components: a subscription price for core content; a contnet fee for accessing those journals via ScienceDirect; and a further fee for accessing unsubscribed titles from the Freedom Collection.

On 4 March 2014, the University responded to your request, confirming that whilst this information was held, it was not being provided to you as it was considered commercially sensitive information and, therefore, was exempt under Section 43(2) of the Act. The University had made this determination following consultation with Elsevier, which had indicated that the disclosure of the requested information would prejudice its commercial interests by revealing pricing information. In particular, Elsevier argued that disclosure of the information would reveal the licensing fees that had been negotiated with the University in circumstances that may have included a level of discount.

I understand that you, subsequently, lodged a complaint in respect of the University’s response to your request and this complaint has been handled as an internal review of the decision not to provide the requested information.

You have expressed dissatisfaction with the response on the grounds that you ‘cannot accept (are) that the financial interests of Queen’s University are likely to be prejudiced by the disclosure of this information, and that there is sufficient information in the public domain to enable me to determine whether the University has negotiated a competitive rate’.

I have now completed my review and my findings are detailed below.

I have reconsidered the nature of the requested information and the application of the exemption to withhold this information. In doing so, I have taken into account written advice from relevant senior staff in the University’s McClay Library and advice received from JISC regarding the detail of the contract with Elsevier. I have also noted your comments regarding the need for transparency and the public interest in demonstrating that the University has negotiated a competitive rate in relation to the procurement of Elsevier’s products and services.

At the time of your request, the University was clearly of the view that disclosure of the requested information would be likely to have a detrimental effect on Elsevier’s future negotiating position with that of the University and, indeed, the wider HE sector. An additional, albeit secondary argument, was the possibility that disclosure would prejudice the interests of the University itslef with respect to the relationship that the University has with Elsevier as a supplier. I am persuaded that that [sic] this was not, in the circumstances, an unreasonable view.

I do, however, believe that on balance, the public interest in disclosure was greater than that in maintaining the commercial interests exemption. I also understand that subsequent to your original request, several institutions have disclosed information, either in relation to the total annual expenditure on access to Elsevier Journals, or on the detailed breakdown of expenditure as requested.

In light of the above, it is my view that the information should now be disclosed. I am, therefore, providing the requested information in relation to 2014 — this is provided in the table below.

I have had several correspondences like this. I would like to pick out a couple of excerpts from other refusal letters that are not essentially contained in the Belfast letter. I had this rather chilling paragraph from Queen Mary University of London.

However, in addition to the reasons outlined above already, revealing this information to the world at large may damage the relationship that QML has with Elsevier including the prospect of legal action that may be taken against QML. This could result in QML being unable to offer Elsevier products which would have the knock-on effect of impacting our resources, our research and even student recruitment. Since these would imperil QMLâ€™s finances, in financially tough times and while receiving less and less from the public purse, this cannot be said to be in the public interest.

It would be interesting to know what Elsevier said to them to provoke that. Because of this paragraph, I felt sorry for QMUL and decided not to request a review of their decision (16/5/2014 — they have now provided the total figure to Edward Hughes, perhaps reasoning that there was safety in numbers).

However, the following paragraph from Oxford had the opposite effect on me.

Maintaining confidentiality with regard to the information requested enables the University and Elsevier to arrive at a fair and competitive negotiated and customised price. Full pricing transparency would mean that the best pricing model publishers could offer would be list price, which would be likely to result in increased costs to the University. Disclosure of pricing terms would inhibit publishersâ€™ ability to develop flexible, tailored solutions suitable for a particular customerâ€™s needs.

Part of my response to that was that the statement beginning “Full pricing transparency” was manifestly false: publishers could offer any model they like. Also, that “tailored solutions” phrase is a red rag to a bull: knowing about how the system works, and how little it is “tailored for a particular customer’s needs”, I cannot read it without getting annoyed. I have requested a review from Oxford ~~but not yet heard back (though they should, legally, have responded by now).~~ and they have now given me their total figure.

Incidentally, although I wrote initially to librarians, they were legally obliged to pass my requests on to their Freedom of Information offices, so the letters I got back were (mostly) from bureaucrats. So when I got refusals, this did not necessarily reflect the wishes of the librarians, who stand to gain from the prices being known.

When it comes to high prices and confidentiality contracts, Elsevier are not the only offenders, though there is some anecdotal evidence that they are the leaders, in the sense that other publishers use Elsevier as a benchmark to see what they can get away with. So why submit Freedom of Information requests for Elsevier contracts without doing the same for Springer, Wiley, Taylor-Francis, etc.?

There is no good reason. My answer to this inevitable question is that I do not regard the work of finding out about journal prices as finished. I will report on this blog if and when I or other people find out about other publishers and other universities.

There is a great deal more that could be said about journal prices and what should be done about them. However, this post has passed the 10,000-word mark, so I shall leave further discussion for a second post. Among the questions I intend to address are the following, many of which concern other big publishers just as much as they concern Elsevier.

1. Is it fair to say that Elsevier is a monopoly?

2. Does Elsevier’s pricing policy violate competition law?

3. What would be a fair system for charging for electronic access to a large collection of journals?

4. Are the current prices really all that unreasonable, given the importance to science of journal articles?

5. Is it better for university libraries to form consortia or should they negotiate individually?

6. What would be the implications for Cambridge (and perhaps other universities too) of a switch to paying list prices for individual journals?

7. Different subjects have very different publishing cultures and very different needs. Are they better off campaigning together in a single open access movement or would it be better to have a fragmented movement, with different subjects campaigning separately for their different interests?

8. What more can be done to accelerate a move towards a cheaper journal system?

]]>

A good way to test your basic knowledge of (some of) the course would be to do a short multiple-choice quiz devised by Vicky Neale. If you don’t get the right answer first time for every question, then it will give you an idea of the areas of the course that need attention.

Terence Tao has also created a number of multiple-choice quizzes, some of which are relevant to the course. They can be found on this page. The quiz on continuity expects you to know the definitions of adherent points and limit points, which I did not discuss in lectures.

The first five posts on this blog in the IA Analysis category are devoted to the questions on this course in the 2003 Tripos. The course has not changed much since then, so these questions are similar to the kind of thing that could be set now. I try to say not just what the answers are but how I thought of them, how I decided what to write out in detail and what just to assume, and so on. They may be of some use when you prepare for the exams.

A long time ago I wrote a number of informal discussions of undergraduate mathematical topics. My ideas about some of these are not always identical to what they were then, but again you may find some of them helpful, particularly the ones on analysis.

If I think of further resources, I’ll add them to the post.

Finally, I’ve very much enjoyed giving this course — thanks for being a great audience (if that’s the right word).

]]>

and

relate to things like the opposite, adjacent and hypotenuse. Using the power-series definitions, we proved several facts about trigonometric functions, such as the addition formulae, their derivatives, and the fact that they are periodic. But we didn’t quite get to the stage of proving that if and is the angle that the line from to makes with the line from to , then and . So how does one establish that? How does one even *define* the angle? In this post, I will give one possible answer to these questions.

A cheating and not wholly satisfactory method would be to define the angle to be . Then it would be trivial that and we could use facts we know to prove that . (Or could we? Wouldn’t we just get that it was ? The fact that many angles have the same and creates annoying difficulties for this approach, though ones that could in principle be circumvented.) But if we did this, how could we be confident that the notion of angle we had just defined coincided with what we think angle should be? The problem has not been fully solved.

Another approach might be to define trigonometric functions geometrically, prove that they have the basic properties that we established using the power series definitions, and prove that these properties characterize the trigonometric functions (meaning that any two functions and that have the properties must be and ). However, this still requires us to make sense of the notion of angle somehow, and we might also feel slightly worried about whether the geometric arguments we used to justify the addition formulae and the like were truly rigorous. (I’m not saying it can’t be done satisfactorily — just that I don’t immediately see a good way of doing it, and I have a different approach to present.)

How are radians defined? You take a line L starting at the origin, and it hits the unit circle at some point P. Then the angle that line makes with the horizontal (or rather, the horizontal heading out to the right) is defined to be the length of the circular arc that goes anticlockwise round the unit circle from to P. (This defines a number between 0 and , but we can worry about numbers outside this range later.)

There is nothing wrong with this definition, except that it requires us to make rigorous sense of the length of a circular arc. How are we to do this?

For simplicity, let’s assume that our point P is and that both and are positive. So P is in the top right quadrant of the unit circle. How can we define and then calculate the length of the arc from to , or equivalently from to ?

One non-rigorous but informative way of thinking about this is that for each between and , we should take an interval , work out the length of the bit of the circle vertically above this interval, and sum up all those lengths. The bit of the circle in question is a straight line (since is infinitesimally small) and by similar triangles its length is .

How did I write that down? Well, the big triangle I was thinking of was one with vertices , and the point on the circle directly above , which is , by Pythagoras’s theorem. The little triangle has one side of length , which corresponds to the side in the big triangle of length . So the hypotenuse of the little triangle is , as I claimed.

Adding all these little lengths up, we get , so it remains to evaluate this integral.

This is of course a very standard integral, usually solved by substituting or for . If you do that, you find that the length works out as , which is just what we hoped. However, we haven’t discussed integration by substitution in this course, so let us see it in a more elementary way (not that proving an appropriate form of the integration-by-substitution rule is especially hard).

Using the rules for differentiating inverses, we find that

and since , this gives us . So the integrand has as an antiderivative, and therefore, by the fundamental theorem of calculus,

So the angle between the horizontal and the line joining the origin to is (by definition) the length of the arc from to , which we have calculated to be . Therefore, .

The process I just went through, of saying “Let’s add up a whole lot of infinitesimal lengths; that says we should write down the following integral; calculating the integral gives us L, so the length is L,” is a process that one often goes through when calculating similar quantities. Why are we so confident that it is OK?

I sometimes realize with mathematical questions like this that I have been a mathematician for many years and never bothered to worry about them. It’s just sort of obvious that if a function is reasonably nice, then writing something down that’s approximately true with and turning into and writing a nice sign in front gives you a correct expression for the quantity in question. But let’s try to think a bit about how we might define length rigorously.

First, we should say what a curve is. There are various definitions, according to how much niceness one wants to assume, but let me take a basic definition: a curve is a continuous function from an interval to . (I haven’t defined continuous functions to , but it simply means that if , then and are both continuous functions from to .)

This is an example of a curious habit of mathematicians of defining objects as things that they clearly aren’t. Surely a curve is not a function — it’s a special sort of subset of the plane. In fact, shouldn’t a curve be defined as the *image* of a continuous function from to ? It’s true that that corresponds more closely to what we are thinking of when we use the word “curve”, but the definition I’ve just given turns out to be more convenient, though it’s important to add that two curves (as I’ve defined them) and are *equivalent* if there is a strictly increasing continuous bijection such that for every . In this situation, we think of and as different ways of representing the same curve.

Incidentally, if you want a reason not to identify curves with their images, then one quite good reason is the existence of objects called *space-filling curves*. These are continuous functions from intervals of reals to that fill up entire two-dimensional sets. Here’s a picture of one, lifted from Wikipedia.

It shows the first few iterations of a process that gives you a sequence of functions that converge to a continuous limit that fills up an entire square.

Going back to lengths, let’s think about how one might define them. The one thing we know how to define is the length of a line segment. (Strictly speaking, I’m not allowed to say that, since a line segment isn’t a function, but let’s understand it as a particularly simple function from an interval to a line segment in the plane.) Given that, a reasonable definition of length would seem to be to approximate a given curve by a whole lot of little line segments. That leads to the following idea for at least approximating the length of a curve . We take a dissection and add up all the little distances . Here I am defining the distance between two points in in the normal way by Pythagoras’s theorem. This gives us the expression

for the approximate length given by the dissection. We then hope that as the differences get smaller and smaller, these estimates will tend to a limit. It isn’t hard to see that if you refine a dissection, then the estimate increases (you are replacing the length of a line segment that joins two points by the length of a path that consists of line segments and joins the same two points).

Actually, that hope is not always fulfilled: sometimes the estimates tend to infinity. Indeed, for space-filling curves, or fractal-like curves such as the Koch snowflake, the estimates *do* tend to infinity. In this case, we say that they have infinite length. But if the estimates tend to a limit as the maximum of the differences tends to zero, we call that limit the length of the curve. A curve that has a finite length defined this way is called *rectifiable*.

Suppose now that we have a curve given by and that the two functions and are continuously differentiable. Then both and are bounded on , so let’s suppose that is an upper bound for and . Then by the mean value theorem,

Therefore, for every dissection, which implies that the curve is rectifiable. (Remark: I didn’t really use the continuity of the derivatives there — just their boundedness.)

We can say slightly more than this, however. The differentiability of tells us that for some . And similarly for with some . Therefore, the estimate for the length can be written

This looks very similar to the kind of thing we write down when doing Riemann integration, so let’s see whether we can find a precise connection. We are concerned with the function . If we now *do* use the continuity of and , then is continuous too, so it can be integrated. Now since and belong to the interval , and both lie between the lower and upper sums given by the dissection. That implies the same for

Since is integrable, the limit of as the largest (which is often called the *mesh* of the dissection) tends to zero is .

We have shown that the length of the curve is given by the formula

Now, finally, let’s see whether we can justify our calculation of the length of the arc of the unit circle between and . It would be nice to parametrize the circle as , but we can’t do that, since we are defining using length, so we would end up with a circular definition (in more than one sense). [Actually, we *can* do something very close to this. See the final section of the post for details.] So let’s parametrize it as follows. We’ll define on the interval and we’ll send to . Then and , so

So the length is , which is exactly the expression we wrote down earlier.

Let me make two quick remarks about that. First, you might argue that although I have shown that the final *expression* is indeed correct, I haven’t shown that the informal *argument* is (essentially) correct. But I more or less have, since what I have effectively done is calculate the lengths of the hypotenuses of the little triangles in a slightly different way. Before, I used the fact that one side was and used similar triangles. Here I’ve used the fact that one side is and another side is and used Pythagoras.

A slightly more serious objection is that for this calculation I used a general result that depended on the assumption that both and are continuously differentiable, but didn’t check that the appropriate conditions held, which they don’t. The problem is that , so , which tends to infinity as and is undefined at .

However, it is easy to get round this problem. What we do is integrate from to , in which case the argument is valid, and then let tend to zero. The integral between and is , and that tends to .

One final remark is that this length calculation explains why the usual substitution of for in an integral of the form is not a piece of unmotivated magic. It is just a way of switching from one parametrization of a circular arc (using the x-coordinate) to another (using the angle, or equivalently the distance along the circular arc) that one expects to be simpler.

Thanks to a comment of Jason Fordham below, I now realize that we can after all parametrize the circle as . However, this is not the I’m trying to calculate, so let’s call it . I’m just taking to be an ordinary real number, and I’m defining and using the power-series definition. Then the arc of the unit circle that goes from to can be defined as the curve defined on the interval by the formula . The general formula for the length of a curve then gives us

So the length of the arc satisfies .

]]>

A preliminary question about this is why it is not more or less obvious. After all, writing , we have the following facts.

- Writing , we have that .
- For each , .

If we knew that , then we would be done.

Ah, you might be thinking, how do we know that the sequence converges? But it turns out that that is not the problem: it is reasonably straightforward to show that it converges. (Roughly speaking, inside the circle of convergence the series converges at least as fast as a GP, and multiplying the th term by doesn’t stop a GP converging (as can easily be seen with the help of the ratio test). So, writing for , we have the following facts at our disposal.

Doesn’t it follow from that that ?

We are appealing here to a general principle, which is that if some functions converge to and their derivatives converge to , then is differentiable with . Is this general principle correct?

Unfortunately, it isn’t. Suppose we take some continuous functions that converge to a step function. (Roughly speaking, you make be 0 up to 0, then linear with gradient until it hits 1, then 1 from that point onwards.) And suppose we then let be the function that differentiates to and is 0 up to 0. Then the converge to the function that is 0 up to 0 and for positive . This function *almost* differentiates to the step function, but it isn’t differentiable at 0.

So we’ve somehow got to use particular facts about power series in order to prove our result — we can’t appeal to general considerations, because then we are appealing to a principle that isn’t true. (Actually, in principle some compromise might be possible, where we show that functions defined by power series have a certain property and then use nothing apart from that property from that point on. But as it happens, we shall not do this.)

We have a formula for . Why don’t we write out a formula for and see if we can tell what happens when ?

That is certainly a sensible first thing to try, so let’s see what happens.

What can we do with that? Perhaps we’d better apply the binomial theorem. Then we find that the right-hand side is equal to

Part of the above expression gives us what we want, namely . So we’re left wanting to prove that

tends to 0 as .

Unfortunately, as gets big, some of those binomial coefficients get pretty big too. Indeed, when is bigger than , the growth in the binomial coefficients seems to outstrip the shrinking of the powers of . What can we do?

Fortunately, there is a better (for our purposes at least) way of writing . We just expanded out using the binomial theorem. But we could instead have used the expansion

Applying that with and , we get

Just before we continue, note that this gives us an alternative, and in my view nicer, way to see that the derivative of is , since if you divide the right-hand side by and let then each of the terms tends to .

Anyhow, if we use this trick, then works out to be

Now let’s subtract the thing we want this to tend to, which is . (This is not valid unless we know that this series converges. So at some stage we will need to prove that.) If we think of as a sum of copies of , then we can write the difference as

which equals

Now is another example of the expansion we had above. That is, we can write it as

We haven’t yet mentioned the radius of convergence of the original power series, but let’s do so now. Suppose it is , that is such that , and that we have chosen small enough that . Then the modulus of the expression above is at most .

It follows that

Since , this is equal to .

So this will tend to zero as as long as we can prove that the sum converges.

Let’s prove a lemma to deal with that last point. It says that if is smaller than the radius of convergence of the power series , then the power series converges.

The proof is very similar to an argument we have seen already. Let be the radius of convergence, and pick with . Then the power series converges, so the terms are bounded above, by , say. Then .

But the series converges, by the ratio test. Therefore, by the comparison test, the series converges.

This shows also that if then the power series converges (since we have just proved that it converges absolutely). So if we differentiate a power series term by term, we get a new power series that has the same radius of convergence, something we needed earlier.

If we apply this lemma a second time, we get that the power series converges, and dividing by 2 that gives us what we wanted above, namely that converges.

An obvious way of applying the result is to take some of your favourite power series and differentiate them term by term. This illustrates the very important general point that if you can obtain something in two different ways, then you usually end up proving something interesting.

So let’s take the function , which we have shown converges everywhere. Then we can obtain the derivative either by differentiating the function itself or by differentiating the power series term by term. That tells us that

, which simplifies to , which in turn simplifies to , which equals .

Earlier we proved this result by writing as and proving that . I still prefer that proof, but you are at liberty to disagree.

As another example, let us consider the power series . When this equals , by the formula for summing a GP. We can now differentiate the power series term by term, and we can also differentiate the function . Doing so tells us the interesting fact that

We can see that in another way as well. By our result on multiplying power series, the product of with itself is the power series , where is the convolution of the constant sequence with itself. That is, with every and equal to 1, which gives us . (This agrees with the previous answer, since is the same as .)

In the proof above, we used the identity

with and , and then we used it again to calculate what happened when we subtracted . Can we get those calculations out of the way in advance? That is, can we begin by finding a nice formula for ?

We obviously can, by subtracting from the right-hand side and simplifying, much as we did in the proof above (with and ). However, we can do things a bit more slickly as follows. Start with the identity

Differentiating both sides with respect to , we get

If we now take for and for , we deduce that is equal to

In particular, if and are both at most , then , which is the main fact we needed in the proof.

Armed with this fact, we could argue as follows. We want to show that

is . By the inequality we have just proved, if and are at most , then the modulus of this expression is at most

and an earlier lemma told us that this converges within the circle of convergence. So the quantity we want to be is in fact bounded above by a multiple of . (Sometimes people use the notation for this. The means “bounded above in modulus by a constant multiple of the modulus of”.)

The proof in this post has relied heavily on the idea, which appeared to come from nowhere, of writing not in the obvious way, which is

but in a “clever” way, namely

Is this something one just has to remember, or can it be regarded as the natural thing to do?

I chose the words “can it be regarded as” quite carefully, since I want to argue that it is the natural thing to do, but when I was preparing this lecture, I didn’t find it the natural thing to do, as I shall now explain. I came to this result with the following background. Many years ago, I lectured a IB course called Further Analysis, which was a sort of combination of the current courses Metric and Topological Spaces and Complex Analysis, all packed into 16 lectures. (Amazingly, it worked quite well, though it was a challenge to get through all the material.) As a result of lecturing that, I learnt a proof that power series can be differentiated term by term inside their circle of convergence, but the proof uses a number of results from complex analysis. I then believed what some people say, which is that the complex analysis proof of this result is a very good advertisement for complex analysis, since a direct proof is horrible. And then at some point I was chatting to Imre Leader about the reorganization of various courses, and he told me that it was a myth that proving the result directly was hard. It wasn’t trivial, he said, but it was basically fine. In fact, it may even be thanks to him that the result is in the course.

Until a few days ago, I didn’t bother to check for myself that the proof wasn’t too bad — I just believed what he said. And then with the lecture coming up, I decided that the time had finally come to check it: something that I assumed would be a reasonably simple exercise. I duly did the obvious thing, including expanding using the binomial theorem, and got stuck.

I would like to be able to say that I then thought hard about why I was stuck, and after a while thought of the idea of expanding using the expansion of . But actually that is not what happened. What happened was that I thought, “Damn, I’m going to have to look up the proof.” I found a few proofs online that looked dauntingly complicated and I couldn’t face reading them properly, apart from one that was quite nice and that for a while I thought I would use. But one thing all the proofs had in common was the use of that expansion, so that was how the idea occurred to me.

So what follows is a rational reconstruction of what I *wish* had been my thought processes, rather than of what actually went on in my mind.

Let’s go back to the question of how to differentiate . I commented above that one could do it using the expansion, and said that I even preferred that approach. But how might one think of doing it that way? There is a very simple answer to that, which is to use one of the alternative definitions of differentiability, namely that is differentiable at with derivative if as . This is simply replacing by , but that is nice because it has the effect of making the expression more symmetrical. (One might argue that since we are talking about differentiability *at* , the variables and are playing different roles, so there is not much motivation for symmetry. And indeed, that is why calling one point and the other is often a good idea. But symmetry is … well … sort of good to have even when not terribly strongly motivated.)

If we use this definition, then the derivative of is the limit as of , and now there is no temptation to use the binomial expansion (we would first have to write as and the whole thing would be disgusting) and the absolutely obvious thing to do is to observe that we have a nice formula for the ratio in question, namely

which obviously tends to as .

In fact, the whole proof is arguably nicer if one uses and rather than and .

Thus, the “clever” expansion is the natural one to do with the symmetric definition of differentiation, whereas the binomial expansion is the natural one to do with the definition. So in the presentation above, I have slightly obscured the origins of the argument by applying the clever expansion to the definition.

Another way of seeing that it is natural is to think about how we prove the statement that a product of limits is the limit of the products. The essence of this is to show that if is close to and is close to , then is close to . This we do by arguing that is close to , and that is close to .

Suppose we apply a similar technique to try to show that is close to . How might we represent their difference? A natural way of doing it would be to convert all the s into s in a sequence of steps. That is, we would argue that is close to , which is close to , and so on.

But the difference between and is , so if we adopt this approach, the we will end up showing precisely that

]]>

The problem is to show that if is an infinite sequence of s, then for every there exist and such that has modulus at least . This result is straightforward to prove by an exhaustive search when . One thing that the Polymath project did was to discover several sequences of length 1124 such that no sum has modulus greater than 2, and despite some effort nobody managed to find a longer one. That was enough to convince me that 1124 was the correct bound.

However, the new result shows the danger of this kind of empirical evidence. The authors used state of the art SAT solvers to find a sequence of length 1160 with no sum having modulus greater than 2, and also showed that this bound is best possible. Of this second statement, they write the following: “The negative witness, that is, the DRUP unsatisfiability certificate, is probably one of longest proofs of a non-trivial mathematical result ever produced. Its gigantic size is comparable, for example, with the size of the whole Wikipedia, so one may have doubts about to which degree this can be accepted as a proof of a mathematical statement.”

I personally am relaxed about huge computer proofs like this. It is conceivable that the authors made a mistake somewhere, but that is true of conventional proofs as well. The paper is by Boris Konev and Alexei Lisitsa and appears here.

]]>

I have always found this situation annoying, because a part of me said that the result ought to be a straightforward generalization of the mean value theorem, in the following sense. The mean value theorem applied to the interval tells us that there exists such that , and therefore that . Writing for some we obtain the statement . This is the case of Taylor’s theorem. So can’t we find some kind of “polynomial mean value theorem” that will do the same job for approximating by polynomials of higher degree?

Now that I’ve been forced to lecture this result again (for the second time actually — the first was in Princeton about twelve years ago, when I just suffered and memorized the Cauchy mean value theorem approach), I have made a proper effort to explore this question, and have realized that the answer is yes. I’m sure there must be textbooks that do it this way, but the ones I’ve looked at all use the Cauchy mean value theorem. I don’t understand why, since it seems to me that the way of proving the result that I’m about to present makes the whole argument completely transparent. I’m actually looking forward to lecturing it (as I add this sentence to the post, the lecture is about half an hour in the future), since the demands on my memory are going to be close to zero.

We know that we want a statement that will involve the first derivatives of at , the th derivative at some point in the interval , and the value of at . The idea with Rolle’s theorem is to make a whole lot of stuff zero, and then with the mean value theorem we take a more general function and subtract a linear part to obtain a function to which Rolle’s theorem applies. So let’s try a similar trick here: we’ll make as much as we can equal to zero. In fact, I’ll go even further and make the values of and zero.

So here’s what I’ll assume: that and also that . That’s as much as I can reasonably set to be zero. And what should be my conclusion? That there is some such that . Note that if we set then we are assuming that and trying to find such that , so this result really does generalize Rolle’s theorem. (I’m also assuming that is times differentiable on an open interval that contains . This is a slightly stronger condition than necessary, but it will hold in the situations where we want to use Taylor’s theorem.)

The proof of this generalization is almost trivial, given Rolle’s theorem itself. Since , there exists such that . But as well, so by Rolle’s theorem, this time applied to , we find such that . Continuing like this, we eventually find such that . So we can set and we are done.

For what it’s worth, I didn’t use the fact that , but just that .

Now let’s take an arbitrary function that is -times differentiable on an open interval containing . To prove the mean value theorem, we subtracted a linear function so as to obtain a function that satisfied the hypotheses of Rolle’s theorem. Here, the obvious thing to do is to subtract a polynomial of degree to obtain a function that satisfies the hypotheses of our higher-order Rolle theorem.

The properties we need to have are that , , and so on all the way up to , and finally . It turns out that we can more or less write down such a polynomial, once we have observed that the polynomial has the convenient property that except when when it is 1. This allows us to build a polynomial that has whatever derivatives we want at . So let’s do that. Define a polynomial by

Then for . A more explicit formula for is

Now doesn’t necessarily equal , so we need to add a multiple of to correct for this. (Doing that won’t affect the derivatives we’ve got at .) So we want our polynomial to be of the form

and we want . So we want to equal , which gives us . That is,

A quick check: if we substitute in for we get , which does indeed equal .

For the moment, we can forget the *formula* for . All that matters is its *properties*, which, just to remind you, are these.

- is a polynomial of degree .
- for .
- .

The second and third properties tell us that if we set , then for and . Those are the conditions needed for our higher-order Rolle theorem. Therefore, there exists such that , which implies that .

Let us just highlight what we have proved here.

**Theorem.** *Let be continuous on the interval and -times differentiable on an open interval that contains . Let be the unique polynomial of degree such that for and . Then there exists such that .*

Note that since is a polynomial of degree , the function is constant. In the case , the constant is , the gradient of the line joining to , and the theorem is just the mean value theorem.

Actually, the result we have just proved *is* Taylor’s theorem! To see that, all we have to do is use the explicit formula for and a tiny bit of rearrangement. To begin with, let us use the formula

Note that for every , so the theorem tells us that there exists such that

Rearranging, that gives us that

Finally, using the formula for , which was

and setting , we can rewrite our conclusion as

which is Taylor’s theorem with the Lagrange form of the remainder.

I think it is quite rare for a proof of Taylor’s theorem to be asked for in the exams. However, pretty well every year there is a question that requires you to understand the *statement* of Taylor’s theorem. (I am writing this post without any knowledge of what will be in this year’s exam, and the examiners will be entirely within their rights to ask for anything that’s on the syllabus. So I certainly don’t recommend not learning the proof of Taylor’s theorem.)

You may at school have seen the following style of reasoning. Suppose we want to calculate the power series of . Then we write

Taking we deduce that . Differentiating we get that

and taking we deduce that . In general, differentiating times and setting we deduce that if is even, if mod 4, and if mod 4. Therefore,

There are at least two reasons that this argument is not rigorous. (I’ll assume that we have defined trigonometric functions and proved rigorously that their derivatives are what we think they are. Actually, I plan to define them using power series later in the course, in which case they have their power series by definition, but it is possible to define them in other ways — e.g. using the differential equation — so this discussion is not a complete waste of time.) One is that we assumed that could be expanded as a power series. That is, at best what we have just shown is that *if* can be expanded as a power series, then the power series must be that one.

A second reason is that we just assumed that the power series could be differentiated term by term. That holds under certain circumstances, as we shall see later in the course, and those circumstances hold for this particular power series, but until we’ve proved that is given by this particular power series we don’t know that the conditions hold.

Taylor’s theorem helps us to clear up these difficulties. Applying it with replaced by 0 and replaced by , we find that

for some . All the terms apart from the last one are just the expected terms in the power series for , so we get that is equal to the partial sum of the power series up to the term in plus a remainder term.

The remainder term is , so its magnitude is at most . It is not hard to prove that tends to zero as . (One way to do this is to observe that the ratio of successive terms has magnitude at most 1/2 once is bigger than .) Therefore, the power series converges for every , and converges to .

The basic technique here is as follows.

(i) Write down what Taylor’s theorem gives you for your function.

(ii) Prove that for each (in the range where you want to prove that the power series converges) the remainder term tends to zero as tends to infinity.

The material in this section is not on the course, but is still worth thinking about. It begins with the definition of a derivative, which, as I said in lectures, can be expressed as follows. A function is differentiable at with derivative if

We can think of as the best linear approximation to for small .

Once we’ve said that, it becomes natural to ask for the best quadratic approximation, and in general for the best approximation by a polynomial of degree .

Let’s think about the quadratic case. In the light of Taylor’s theorem it is natural to expect that

in which case would indeed be the best quadratic approximation to for small .

What Taylor’s theorem as stated above gives us is

for some . If we know that is continuous at , then as , so we can write , where . But then , as we wanted, since .

However, this result does not need the continuity assumption, so let me briefly prove it. To keep the expressions simple I will prove only the quadratic case, but the general case is pretty well exactly the same.

I’ll do the same trick as usual, by which I mean I’ll first prove it when various things are zero and then I’ll deduce the general case. So let’s suppose that . We want to prove now that .

Since , we have that

Therefore, for every we can find such that for every with .

This gives us several inequalities, one of which is that for every such that . If we now set to be , then we have that for every . So by the mean value theorem, for every such , which implies that .

If we run a similar argument using the fact that we get that . And we can do similar arguments with as well, and the grand conclusion is that whenever we have .

What we have shown is that for every there exists such that whenever , which is exactly the statement that as , which in turn is exactly the statement that .

That does the proof when . Now let’s take a general and define a function by

Then , so , from which it follows that

which after rearranging gives us the statement we wanted:

As I said above, this argument generalizes straightforwardly and gives us Taylor’s theorem with what is known as *Peano’s form of the remainder*, which is the following statement.

For that we need to exist but we do not need to exist anywhere else, so we certainly don’t need any continuity assumptions on .

This version of Taylor’s theorem is not as useful as versions with an explicit formula for the remainder term, as you will see if you try to use it to prove that can be expanded as a power series: the information that the remainder term is is, for fixed , of no use whatever. But the information that it is gives us an expression that we can prove tends to zero.

However, one amusing (but not, as far as I know, useful) thing it gives us is a direct formula for the second derivative. By direct I mean that we do not go via the first derivative. Let us take the quadratic result and apply it to both and . We get

and

From this it follows that

Dividing through by we get that

as .

I’m not claiming the converse, which would say that if this limit exists, then is twice differentiable at . In fact, doesn’t even have to be once differentiable at . Consider, for example, the following function. For every integer (either positive or negative) and every in the interval we set equal to . We also set , and we take when . (That is, for negative we define so as to make it an odd function.)

Then for every , so for every , and in particular it tends to 0 as . However, is not differentiable at 0. To see this, note that when we have , whereas when is close to we have close to . Therefore, the ratio does not converge as , which tells us that is not differentiable at 0.

If you want an example that is continuous everywhere, then take . This again has the property that for every , and it is not differentiable at 0.

Even if we assume that is differentiable, we can’t get a proper converse. For example, the condition

does not imply that exists and equals 0. For a counterexample, take a function such as (and 0 at 0). Then must lie between and therefore certainly be . But the oscillations near zero are so fast that is unbounded near zero, so doesn’t exist at 0.

]]>

Suppose I were to ask you to memorize the sequence 5432187654321. Would you have to learn a string of 13 symbols? No, because after studying the sequence you would see that it is just counting down from 5 and then counting down from 8. What you want is for your memory of a proof to be like that too: you just keep doing the obvious thing except that from time to time the next step isn’t obvious, so you need to remember it. Even then, the better you can understand why the non-obvious step was in fact sensible, the easier it will be to memorize it, and as you get more experienced you may find that steps that previously seemed clever and nonobvious start to seem like the natural thing to do.

For some reason, Analysis I contains a number of proofs that experienced mathematicians find easy but many beginners find very hard. I want to try in this post to explain why the experienced mathematicians are right: in a rather precise sense many of these proofs *really are easy*, in the sense that if you just repeatedly do the obvious thing you will solve them. Others are mostly like that, with perhaps one smallish idea needed when the obvious steps run out. And even the hardest ones have easy parts to them.

I feel so strongly about this that a few years ago I teamed up with a colleague of mine, Mohan Ganesalingam, to write a computer program to solve easy problems. And after a lot of effort, we produced one that can solve several (but not yet all — there are still difficulties to sort out) problems of the kind I am talking about: easy for the experienced mathematician, but hard for the novice. Now you have some huge advantages over a computer. For example, you understand the English language. Also, you can be presented with a vague instruction such as “Do any obvious simplifications to the expression and then see whether it reminds you of anything,” and you will be able to follow it. (In principle, so could the program, but only if we spent a long time agonizing about what exactly constitutes an “obvious” simplification, what kind of similarity should be sufficient for one mathematical expression to trigger the program to call up another, and so on.) So if a mere computer can solve these problems, you should definitely be able to solve them.

What I plan to do in this post is basically explain how the program would go about proving some of the theorems we’ve proved in the course. To explain *exactly* how it works would be complicated. However, because you are humans, there are lots of technical details that I don’t need to worry about, and what remains of the algorithm when you ignore those details is really pretty simple.

The rough idea is that you should equip yourself with a small set of “moves” and simply apply these moves when the opportunity arises. That is an oversimplification, since sometimes one can do the moves in “silly” ways, but merely being consciously aware of the moves is very useful. (Incidentally, the notion of “silliness” is hard to define formally but is something that humans find easy to recognise when they see examples of it. So that’s another example of the kind of advantage you have over the computer.)

I’m going to describe a way of keeping track of where you have got to in your discovery of a proof. It’s not something I suggest you do for the rest of your mathematical lives. Rather, it is something that you might like to consider doing if you find it hard to come up with typical Analysis I proofs. If you use this technique a few times, then it should get easier, and after a while you will find that you don’t need to use the technique any more.

The technique is simply to record what statements you are likely to want to use, and what statement you are trying to prove. Both of these can change during the course of your proof discovery, as we shall see.

I think the easiest way to explain this and the moves is to begin by giving an example of the whole process in action. Then I’ll talk about the moves in a more abstract way. Let’s take as an example the proof that if a Cauchy sequence has a convergent subsequence then the sequence itself is convergent.

To begin with, we have nothing we obviously need to use, and a statement that we want to prove. That statement is the following.

—————————————————-

Every Cauchy sequence with a convergent subsequence converges

Let us begin by writing that very slightly more formally, to bring out the fact that it starts with .

—————————————————-

is Cauchy and has a convergent subsequence

converges

The next step is to apply the “let” move, which I’ve talked about several times in lectures. If you ever have a statement to prove of the form “For every such that holds, also holds,” then you can just automatically write “Let be such that holds,” and change your target to that of establishing that holds.

In our case, we write, “Let be a Cauchy sequence that has a convergent subsequence,” and modify our target to that of proving that converges. So now we represent where we’ve got to as follows.

is a Cauchy sequence

has a convergent subsequence

——————————————-

converges

Maybe the purpose of those strange horizontal lines is becoming clearer at this point. I am listing statements that we can *assume* above the line and ones that we are trying to *prove* below the line.

At this point it seems natural to give a name to the convergent subsequence that we are given. Let us call it . This again is just one instance of a very general move: if you are told you’ve got something, then give it a name. This sequence has two properties: it is a subsequence of and it converges. I’ll list those two properties separately.

is a Cauchy sequence

is a subsequence of

converges

——————————————-

converges

Having done that, I think I’ll remove the second hypothesis, since the fact that is a subsequence of is implicit in the notation.

is a Cauchy sequence

converges

——————————————-

converges

The second hypothesis here is again telling us we’ve got something: a limit of the subsequence. So let’s apply the naming move again, calling this limit .

is a Cauchy sequence

——————————————-

converges

That’s enough reformulation of our assumptions. It’s time to think about what we are trying to prove. To do that, we use a process called *expansion*. That means taking a definition and writing it out in more detail. It tends to be good to *avoid* expanding definitions unless you are genuinely stuck: that way you won’t miss opportunities to *use results from the course* rather than proving everything from first principles. However, here a proof from first principles is what is required. I’m going to do a partial expansion to start with: a sequence converges if there exists a real number that it converges to.

is a Cauchy sequence

——————————————-

converges to

Now our target has changed to an existential statement. How are we going to find an that the sequence converges to?

Sometimes proving existential statements is very hard, but here it is easy, since we have a candidate for the limit staring us in the face, and better still it is the only candidate around. So let us make a very reasonable guess that the sequence is going to converge to , and make proving that our new target.

is a Cauchy sequence

——————————————-

That’s nice because we’ve got rid of that existential quantifier. But what do we do next? We must continue to expand: this time the definition of . Note that if you want to be able to do this, it is absolutely vital that you *know your definitions*. Otherwise, you obviously can’t do this expansion move. And if you can’t do that, then you can kiss goodbye to any hopes you might have had of proving this kind of result.

is a Cauchy sequence

——————————————-

Now we have a target that begins with a universal quantifier, so it’s time for the “let” move again.

is a Cauchy sequence

——————————————-

Now things become slightly harder, because this time we do *not* have a candidate staring us in the face for the thing we are trying to find. (The thing we are trying to find is .) It’s not a bad idea in this situation to try to write out in vague terms what the key statements mean. One can do something like this.

Eventually all terms of are close to each other

Eventually all terms of are close to

————————————————

Eventually all terms of are close to

The rough idea of the proof should now be clear: if all terms in the subsequence are close to and all terms are close to each other, then eventually for each term we can say that it is close to a term in the subsequence, which is itself close to .

Since we are going to need to take two steps from a term in , one to the subsequence and one from the subsequence to , it seems a good idea to apply the two main hypotheses with . So let’s just go ahead and do that and see what we get.

——————————————-

Now we are once again in a position where we have been “given” something — in this case and . So let’s quietly drop the existential quantifiers and use the names and . (Purists might object to using the same names for the particular choices of and that we used when merely asserting that they exist. But this is very common practice amongst mathematicians and does not lead to confusion.)

——————————————-

How do we propose to “force” to be less than ? We are going to try to ensure, for suitable , that and . The first hypothesis tells us that we will be able to get the first condition if and are both at least , and the third hypothesis tells us that we we will be able to get the second condition if .

So our plan is going to be to choose and . For the plan to work, we shall need , , and .

We are now in a position to choose . We want our conclusion to hold when , and the tool we use works when , so it makes sense to take . If we substitute that in, we lose the existential quantifier in the target and arrive at the following.

——————————————-

Now we can apply the “let” move again, to get rid of the universal quantifier in the target statement.

——————————————-

We know we’re going to take , and that we can, since , so let’s go ahead and choose that value for in the first hypothesis. That leaves us with the following.

——————————————-

Just to make clear what I did there, it was a move called *substitution*. If you have a hypothesis of the form and a hypothesis , then you can substitute in for and get out . (One can also call this *modus ponens*: I prefer to call it substitution in this case because the condition is somehow not a very serious hypothesis, but more like a “restriction” applied on .)

Since I’ve used the hypothesis and am unlikely to need it again. I have deleted it.

Now we have to decide how to choose and how to choose . Recall that we needed and . In a human proof one just writes, “Let be such that and .” It’s a bit trickier for a computer to find it obvious that such a exists, but again that doesn’t matter to us here. I’ll use to denote the I’m choosing, and write down the conditions I’ve made sure satisfies.

——————————————-

Now we can substitute into the first hypothesis.

——————————————-

We can also substitute into the second hypothesis.

——————————————-

And now we are done by the triangle inequality.

Now that we have gone through a proof, let me list the main proof-generating moves we used.

If you are trying to prove a statement of the form “For every such that holds, also holds,” then write, “Let be such that holds,” (or words to that effect) and adjust your target to proving that holds.

If you are told that something exists, then give it a name. For example, if you are given the hypothesis is convergent, then you are told that a limit exists. So give it a name such as and change the hypothesis to .

If you are trying to prove something and you can’t find a high-level argument (by which I mean one that uses results from the course that are relevant to the statement you are trying to prove), and if what you are trying to prove involves concepts such as convergence or continuity that can be written out in low-level language (often, but not always, involving quantifiers), then rephrase what you are trying to prove in this lower-level way. That is, expand out the definition.

If you are given a hypothesis of the form , then given any object of the same type as , you are free to substitute it in for and obtain the hypothesis .

For example, in the proof above, we had the hypothesis “ is Cauchy”. In expanded form, this reads

We decided to substitute in , which is of the same type of thing as (both are positive real numbers), and yielded for us the statement

(We then applied the “naming” move to get rid of the .)

Often a hypothesis takes a slightly more general form, where *conditions* are assumed. That is, it takes the form

or still more generally

There the symbol means “and”, so this is saying that whenever you can find a that satisfies the conditions , then you can give yourself the hypothesis .

Suppose that you are trying to prove a statement of the form , and suppose you have identified an object of the same type as that you believe is going to do the job. Then you can change your target statement from to . (In words, instead of trying to show that there exists something that satisfies , you are going to try to show that satisfies .)

We did this when we moved from trying to prove that converges to *something* to trying to prove that it converges to .

This is not a complete set of useful moves. However, it is a start, and I hope it will help to back up my assertion that a large fraction of the proof steps that I take when writing out proofs in lectures are fairly automatic, and steps that you too will find straightforward if you put in the practice. I’ll try to discuss more moves in future posts.

]]>

I cannot promise to follow the amazing example of Vicky Neale, my predecessor on this course, who posted after every single lecture. However, her posts are still available online, so in some ways you are better off than the people who took Analysis I last year, since you will have her posts as well as mine. (I am making the assumption here that my posts will not contribute negatively to your understanding — I hope that proves to be correct.) Having said that, I probably won’t cover exactly the same material in each lecture as she did, so the correspondence between my lectures and her posts won’t be as good as the correspondence between her lectures and her posts. Nevertheless, I strongly recommend you look at her posts and see whether you find them helpful.

You will find this course *much* easier to understand if you are comfortable with basic logic. In particular, you should be clear about what “implies” means and should not be afraid of the quantifiers and . You may find a series of posts I wrote a couple of years ago helpful, and in particular the ones where I wrote about logic (NB, as with Vicky Neale’s posts above, they appear in reverse order). I also have a few old posts that are directly relevant to the Analysis I course (since they are old posts you may have to click on “older entries” a couple of times to reach them), but they are detailed discussions of Tripos questions rather than accompaniments to lectures. You may find them useful in the summer, and you may even be curious to have a quick look at them straight away, but for now your job is to learn mathematics rather than trying to get good at one particular style of exam, so I would not recommend devoting much time to them yet.

For the rest of this post, I want to describe briefly the prerequisites for this course. One of the messages I want to get across is that in a sense the entire course is built on one axiom, namely the least upper bound axiom for the real numbers. I don’t really mean that, but it would be correct to say that it is built on one *new* axiom, together with other properties of the real numbers that you are so familiar with that you hardly give them a second’s thought.

If I want to say that more precisely, then I will say that the course is built on the following assumption: there is, up to isomorphism, exactly one complete ordered field. If the phrase “complete ordered field” is unfamiliar to you, it doesn’t matter, though I will try to explain what it means in a moment. Roughly speaking, this assumption is saying that there is exactly one mathematical structure that has all the arithmetical and order properties that you would expect of the real numbers, and also satisfies the least upper bound axiom. And that structure is the one we call the real numbers.

And now let me make *that* more precise.

A field is a set with two binary operations and that behave in the same nice ways that addition and multiplication behave in the real numbers. That is, they have the following properties.

(i) is commutative and associative and has an identity element. Every element of has an inverse under .

(ii) is commutative and associative and has an identity element. Every element of other than the identity of has an inverse under .

(iii) is distributive over . That is, for any three elements of we have .

If we define an algebraic structure with some notions of addition and multiplication, then to say that it is a field is to say that all the usual rules we use to do algebraic manipulations are valid. It can be amusing and instructive to prove facts such as that assuming nothing more than the field axioms, but in this course I shall take these slightly less elementary facts as read as well. But I assure you that they *do* follow from the field axioms.

Some examples of fields that you have already met are , , and . (That last one is the field that consists of integers mod for a prime , with addition and multiplication mod . The only axiom that is not easy to verify is the existence of multiplicative inverses for non-zero elements of the field, which follows from the fact that if and are coprime then there are integers and such that .)

This question splits into two. First we need to know what an ordering is, and then we need to know how the ordering relates to the algebraic operations. Let me take these two in turn.

A *totally ordered set* is a set together with a relation that has the following properties.

- is
*transitive*: that is, if and , then . - satisfies the
*law of trichotomy*: that is, for any exactly one of the statements , , holds.

Note that the trichotomy law implies that is *antisymmetric*: that is, if then it cannot also be the case that .

In the above situation, we say that is a *total ordering* on . Given a total ordering we can make some obvious further definitions. For instance, we can define by saying that if and only if . (Note that is also a total ordering on .) Also, we can define by saying that if and only if either or , and similarly we can define .

Here’s an example of a totally ordered set that is not just a subset of the real numbers. We take to be the set of all polynomials with real coefficients, and if and are two polynomials, we say that if there exists a real number such that for every . (That is, if is “eventually bigger than “.) It is easy to check that this relation is transitive, and an instructive exercise to prove that the trichotomy law holds. (It is also not too hard, so I think it is better not to give the proof here.)

How should we define an ordered field? A first guess might be to say that it is a field with a total ordering on it. But a moment’s thought shows that that is a ridiculous definition, since we could define a “stupid” total ordering that had nothing to do with any natural ordering we might want to put on the field. For example, we could define an ordering on the rationals as follows: given two rational numbers and , written in their lowest terms with and positive, say that if either or and . That is certainly a total ordering on the rationals, but it is a rather strange one. For example, with this ordering we have and also .

What has gone wrong? The answer is that it is not interesting to have two structures on a set (in this case, the algebraic structure and the order structure) unless those structures *interact*. In fact, we have already seen this in the field axioms themselves: we have addition and multiplication, and it is absolutely crucial to have some kind of relationship between them. The relation we have is the distributivity law. Without that, we would allow “stupid” examples of pairs of binary operations that had nothing to do with each other.

An *ordered field* is a field together with a total ordering that satisfies the following properties.

- For every , if , then .
- For every , if and , then .

Basically what these properties are saying is that the usual rules we use when manipulating inequalities, such as adding the same thing to both sides, apply.

In practice, we tend to use a rather larger set of rules. For example, if we know that , we will feel free to deduce that . And nobody will bat an eyelid if you have a real number and state without proof that . Both these facts can be deduced fairly easily from the properties of ordered fields, and again it is quite a good exercise to do this if you haven’t already. However, in this course we shall take the following attitude. There are the axioms for an ordered field. There are also some simple deductions from these axioms that provide us with some further rules for manipulating equations and inequalities. All of these we will treat in the same way: we just use them without comment.

Before I get on to the most important axiom, and the one that very definitely will *not* be used without comment, I want to discuss a distinction that it is important to understand: the distinction between the abstract and the concrete approaches to mathematics. The abstract approach is to concentrate on the *properties* that mathematical structures have. We are given a bunch of properties and we see what we can deduce from them, and we do that quite independently of whether any object with those properties exists. Of course, we do like to check that the properties are consistent, which we do by finding an object that satisfies them, but once we have carried out that check we go back to concentrating on the properties themselves.

The concrete approach to mathematics is much more focused on the objects themselves. We take an object, such as the set of all prime numbers, and try to describe it, prove results about it, and so on.

The boundary between the two approaches is extremely fuzzy, because we often like to convert the concrete approach into a more abstract one. For example, consider the function . This can be defined concretely as the function given by the formula . (That’s just a concise way of writing .) And a similar definition can be given for . But somewhere along the line we will want to prove basic facts such as that , or that , or that . And once we’ve proved a few of those facts, we find that we no longer want to use the formula, because everything we need to know follows from those basic facts. And that is because with just a couple more facts of the above kind, we find that we have *characterized* the trigonometric functions: that is, we have written down properties that are satisfied by and and *by no other pair of functions*. When this kind of thing happens, our approach has shifted from the concrete (we are given the formulae and want to prove things about the resulting functions) to the abstract (we are given some properties and want to use them to deduce other properties).

Something very similar happens with the real numbers. Up to now (at least until taking Numbers and Sets), you will have been used to thinking of the real numbers as infinite decimals. In other words, the real number system is just out there, an object that you look at and prove things about. But at university level one takes the abstract approach. We start with a set of properties (the properties of ordered fields, together with the least upper bound axiom) and use those to deduce everything else. It’s important to understand that this is what is going on, or else you will be confused when your lecturers spend time proving things that appear to be completely obvious, such as that the sequence converges to 0. Isn’t that obvious? Well, yes it is if you think of a real number as one of those things with a decimal expansion. But it takes quite a lot of work to prove, using just the properties of a complete ordered field, that every real number has a decimal expansion, and rather than rely on all that work it is much easier to prove directly that converges to 0.

Let be a set of real numbers. A real number is an *upper bound* for if for every . For example, if is the open interval , then is an upper bound for .

A real number is *the least upper bound* of if it has the following two properties.

- is an upper bound for .
- If , then is not an upper bound for .

Another way of writing these two properties is as follows. I’ll use quantifiers.

- .
- .

In words, everything in is less than or equal to , and for any there is some that is bigger than .

As an example, is the least upper bound of the open interval . Why? Because if then , and if then we can find such that . (How do we do this? Well, if then take and if then take .)

The least upper bound property is the following statement: every non-empty subset of the reals that has an upper bound has a least upper bound.

But since we are thinking abstractly, we will not think of this as a *property* (of the previously given real numbers) but more as an *axiom*. To do so we can state it as follows.

Let be an ordered field. We say that has the *least upper bound property* if every non-empty subset of that has an upper bound has a least upper bound.

For reasons that will become clear only after the course has started, we say that an ordered field with the least upper bound property is *complete*. There are then two very important theorems that we shall assume.

**Theorem 1.** *There exists a complete ordered field.*

**Theorem 2.** *There is only one complete ordered field, in the sense that any two complete ordered fields are isomorphic.*

I don’t propose to give proofs of either of these results, but let me at least give some indication, for those who are interested, of how they can be proved. The proofs are not required knowledge for the course, but it’s not a bad idea to have some inkling of how they go.

One answer to this is that *the reals are a complete ordered field*! That is, if you take the good old infinite decimals that you are used to, and you say very carefully what it means to add or multiply two of them together, and you order them in the obvious way, then you can actually prove rigorously that you have a complete ordered field. It’s not very pretty (partly because of the fact that point nine recurring equals 1) but it can be done.

Here’s how one can prove the least upper bound property. For convenience let us take a non-empty set that consists of positive numbers only. Assuming that is bounded above, we would like to find a least upper bound. We can do this as follows. First, find the smallest integer that is an upper bound for . (We know that there must be an integer — just take any integer that is bigger than the upper bound we are given for . If we are defining the reals as infinite decimals, then it is genuinely obvious that such an integer exists — you just chop off everything beyond the decimal point and add 1.) Call this integer . Next, we find the smallest multiple of that is an upper bound for . This will be one of the numbers . Then you take the smallest multiple of that is an upper bound for , and so on. This gives you a sequence that might be something like . If you look at an individual digit of the numbers in this sequence, such as the fifth after the decimal point, it will eventually stabilize, and if you take these stabilized digits as the digits of a certain number, then that number will be an upper bound for and no smaller number will be. (Both these statements need to be checked, but both are reasonably straightforward.)

A more elegant way to prove the existence of a complete ordered field is to use objects called *Dedekind cuts*. A Dedekind cut is a partition of the rational numbers into two non-empty subsets and such that every element of is less than every element of , and such that does not have a minimal element.

To see why this might be a reasonably sensible definition, consider the sets and , where consists of all rationals such that either or , and consists of all positive rationals such that . This is the Dedekind cut that corresponds to our ordinary conception of the number .

The condition that should not have a minimal element is to make sure that we don’t have two different Dedekind cuts representing each rational number. (If the rational number is , the partition we are ruling out is and . We just allow the partition and .)

If and are two Dedekind cuts, we can define their sum to be , where is defined to be the set of all numbers such that and , and similarly for . It’s a bit harder to define products — you may like to try it. It’s not so hard to define a sensible total ordering on the set of all Dedekind cuts. And then there’s a lot of checking needed to prove that what results is a complete ordered field. (I may as well admit at this point that I’ve never bothered to check this for myself, or to read a proof in a book. I’m happy to know that it can be done, just as I’m happy to fly in an aeroplane without checking that the lift will be enough to keep me in the sky.)

Here’s one answer. You just go back to your notes in Numbers and Sets and look at the proof that every real number has a decimal expansion. Obviously if you define real numbers to be things with decimal expansions, then this is saying nothing at all, but that’s not what Professor Leader did. He deduced the existence of decimal expansions from the properties of complete ordered fields. So effectively he proved the following result: *every element of a complete ordered field has a decimal expansion*. We can say slightly more: it has a decimal expansion that does not end with an infinite sequence of 9s. Oh, and two different elements have different decimal expansions. So now if you want an isomorphism between two complete ordered fields, you just match up an element of one with the element of the other that has the same decimal expansion.

Let me very briefly sketch a neater approach. You first match up 1 with 1. (That is, you match up the multiplicative identity with the multiplicative identity.) Then you match up 1+1 with 1+1, and so on, until you have “the positive integers” inside your two complete ordered fields matched together. Then you match up 0 with 0 and the additive inverses of the positive integers with the additive inverses of the positive integers. Then you match up the reciprocals of the positive integers (or rather, their multiplicative inverses) with the reciprocals of the positive integers, and finally all the rationals with all the rationals. What I’m saying here is that in any complete ordered field you can make sense in only one reasonable way of the fraction when and are integers with , and you send each in one complete ordered field to its counterpart in the other.

Now let’s take *any* element of a complete ordered field. We can associate with the set of all “rationals” less than and map that set over to the other complete ordered field, using our correspondence between rationals. That gives us a set in the other complete ordered field. The least upper bound of is then the element that corresponds to .

As ever, there is work needed if you want to turn the above idea into a complete proof: if the map you’ve defined is , then you need to check things like that or that if a set has least upper bound , then has least upper bound . But all that can be done.

If you found what I’ve just written a bit intimidating, let me remind you that all you need to take away from it is that everything in this course will be deduced from the familiar algebraic and order properties of the reals, together with the least upper bound property. Since the algebraic and order properties should be very familiar to you, that means that the main things you need to learn are the definition of a least upper bound and the statement of the least upper bound property. The details matter, so a vague idea is not enough, but even so it’s not very much to learn.

]]>

When I got to my office, those other things I’ve been thinking about (the project with Mohan Ganesalingam on theorem proving) commanded my attention and the post didn’t get written. And then in the evening, with impeccable timing, Pavel Pudlak sent me an email with an observation that shows that one of the statements that I was hoping was false is in fact true: every subset of can be Ramsey lifted to a very simple subset of a not much larger set. (If you have forgotten these definitions, or never read them in the first place, I’ll recap them in a moment.)

How much of a disaster is this? Well, it’s *never* a disaster to learn that a statement you wanted to go one way in fact goes the other way. It may be disappointing, but it’s much better to know the truth than to waste time chasing a fantasy. Also, there can be far more to it than that. The effect of discovering that your hopes are dashed is often that you readjust your hopes. If you had a subgoal that you now realize is unachievable, but you still believe that the main goal might be achievable, then your options have been narrowed down in a potentially useful way.

Is that the case here? I’ll offer a few preliminary thoughts on that question and see whether they lead to an interesting discussion. If they don’t, that’s fine — my general attitude is that I’m happy to think about all this on my own, but that I’d be even happier to discuss it with other people. The subtitle of this post is supposed to reflect the fact that I have gained something from making my ideas public, in that Pavel’s observation, though simple enough to understand, is one that I might have taken a long, or even infinite, time to make if I had worked entirely privately. So he has potentially saved me a lot of time, and that is one of the main points of mathematics done in the open.

The basic idea I was pursuing was that perhaps we can find a property that distinguishes between subsets of (or Boolean functions) of low Boolean complexity and general subsets/functions of the following kind: a low-complexity set/function can be “lifted” from to a larger, but not too much larger, structure inside which it sits more simply. This basic idea was inspired by Martin’s proof that Borel sets are determined. After considering various possible ways of making the above ideas precise, and rejecting some of them when I realized that they couldn’t work, I arrived at the following set of definitions.

An *-dimensional complexity structure with alphabet* is a subset . (Sometimes it is convenient to define it as a subset of , in which case the definitions have to be modified slightly.) If and are two -dimensional complexity structures, then I call a function a *map* if for every , depends only on . Equivalently, is of the form .

A *basic -set* in a complexity structure is a subset of the form for some . A *basic set* is any set that is a basic -set for some . Note that if is a map and is a basic set in , then is a basic set in .

The *circuit complexity* or *straight-line complexity* of a subset of a complexity structure is the minimal for which there exists a sequence of subsets of such that every is a basic set or a union or intersection of two sets earlier in the sequence, and . If is a map and , then the circuit complexity of is at most the circuit complexity of , since preserves basic sets and Boolean operations.

I often, and perhaps slightly confusingly, describe a map as a *lift* of . That’s because it’s really and its effect on subsets of that I am interested in.

Let be a complexity structure. A *coordinate specification* is a statement of the form for some and .

Let us assume that is even and let . Then the *shrinking-neighbourhoods game* is a two-player game played according to the following rules.

- Player I starts, and the players alternately make coordinate specifications.
- Player I’s specifications must be of coordinates with , and Player II’s must be of coordinates with .
- No coordinate may be specified more than once.
- At every stage of the game, there must exist a sequence that obeys all the specifications made so far.

A subset of is *I-winning* if Player I has a winning strategy for ensuring that after all coordinates have been specified, the sequence that satisfies those specifications (which is obviously unique) belongs to . It is *II-winning* if Player II has a winning strategy for ensuring that the final sequence belongs to .

Since finite games are determined, if is any subset of , then either is I-winning or is II-winning.

This can be thought of as a kind of Ramsey property, something that I mention only to explain what would otherwise be a rather strange piece of terminology. I say that a map between complexity structures is *Ramsey* if for every I-winning subset of , is a I-winning subset of , and for every II-winning subset of , is a II-winning subset of . In other words, Ramsey maps preserve winning sets and the player that wins.

It is an easy exercise to show that is Ramsey if and only if for every subset , if is I-winning then is I-winning and if is II-winning then is II-winning. (This isn’t quite a triviality, however: it uses finite determinacy.) This formulation is often more convenient.

I don’t want to be too precise about this, because part of what I hoped was that the correct statement would to some extent emerge from the proof. But roughly what I wanted was the following.

- If is a set with low circuit complexity, then there is a complexity structure that is not too large, and a Ramsey map , such that is simple.
- If is a random set then no such pair exists.
- There is an NP set for which no such pair exists.

Achieving 1 and 2 together would give a non-trivial example of a property that distinguishes between sets of low circuit complexity and random sets, which is a highly desirable thing to do, given the difficulties associated with the natural-proofs barrier, even if it doesn’t immediately solve the P versus NP problem. And achieving 1 and 3 together would show that P doesn’t equal NP.

However, it was far from clear whether these statements were true under any reasonable interpretation. Perhaps even sets of low circuit complexity require enormous sets , or perhaps there is some simple way of lifting arbitrary sets with only a small . Either of these possibilities would show that the existence of efficient Ramsey lifts does not distinguish between sets of low circuit complexity and arbitrary sets. What Pavel sent me yesterday was an observation that basically shows that the second difficulty occurs. That is, he showed that one can lift an arbitrary set quite simply.

Before I present his example, I’ll just briefly mention that I had a philosophical reason for thinking that such an example was unlikely to exist, which was that any truly simple example ought to have an infinite counterpart, but in the infinite case it is not true that arbitrary sets can be efficiently lifted. I’ll try to give some sort of indication later of why this argument does not apply to Pavel’s example.

I’ll begin by describing the example in an informal way and then I’ll make it more formal. (Pavel provided both descriptions in his message to me, so I’m not adding anything here.)

Let be any set and define an auxiliary game played on as follows. It’s just like the shrinking-neighbourhoods game, except that at some point each player must declare a bit, and the parity of the two bits they declare must be odd if the final sequence belongs to and even if it doesn’t. (So they must both play consistently with this restriction.)

Suppose that Player I has a winning strategy for the original game for some set . Then she can win the auxiliary game with payoff set as follows. Let as usual. For her first moves, she simply plays her winning strategy for the original game (ignoring the extra bit that Player II declares if he declares it). Then for her last move, she continues to play the winning strategy, but she also declares her extra bit. If Player II has declared his bit, then she looks at the two possible sequences that can result after Player II’s final move. If they are both in or both in , then she makes sure that the parity of the two bits is odd in the first case and even in the second. If one sequence is in and the other in then it does not matter what she chooses for her extra bit. If Player II has not declared his bit, then she can play her extra bit arbitrarily, which will oblige Player II to ensure that the parity of the two bits is equal to 1 if the final sequence is in and 0 otherwise.

Now suppose that Player II has a winning strategy for in the original game. In this case the proof is even simpler. He just plays this strategy, ignoring Player I’s extra bit when she plays it, and declares his extra bit right at the end, making sure that the final parity of the two bits correctly reflects whether the sequence is in .

Finally, note that to tell whether the eventual sequence in the auxiliary game belongs to , it is only necessary to look at the two extra bits. So whether or not a point belongs to can be determined by just two coordinates of that point (though which coordinates they are can vary from point to point). That makes a very simple set (I call it 2-open, since it is a union of “2-basic open” sets), even though the “board” on which the auxiliary game is played is not very large.

Now let me give a precise definition of the complexity structure . It consists of all sequences with the following properties.

- For exactly one , . In this case we will write .
- For exactly one with , . In this case we will write .
- For all other we have and write .
- if and 0 otherwise.

So there are six possibilities for each coordinate (since it can be an arbitrary element of ). Thus, we can regard as a subset of , which is not that much bigger than .

The map does the obvious thing and takes to . It is then easy to see that the shrinking-neighbourhoods game in with payoff set is basically the same as the auxiliary game I described earlier.

It may be, but I think it would be a mistake to abandon the project immediately without thinking fairly hard about what has gone wrong so far. Is it a sign that nothing even remotely like this idea could work, or is it a sign that the problems are more “local” and that certain definitions should be adjusted? In the latter case, what might a new set of definitions look like?

I’ll try to explain in a future post why I think that it is worth exploring the general strategy of attempting to show that sets of low circuit complexity can be lifted (in some sense yet to be determined) to simple sets (also in some sense yet to be determined). For now, I’d just like to make the general point that there are many aspects of the definitions above that could be changed. For the moment, I still like the definition of a complexity structure, because when I came up with it I felt myself “forced” to it. (It would take a bit of time to remember why this was, however.) I also quite like the idea that the maps we want to consider are ones that preserve some class of sets, since that gives quite a bit of flexibility. We need the class of sets to be fairly complicated, since otherwise there is a danger that verifying that the sets are preserved becomes too easy, which could then mean that the property “can be efficiently lifted” becomes too simple and is ruled out by known complexity barriers. (I’m thinking here not just of the natural-proofs barrier but also of an interesting extension of it due to Rudich.)

Looking for a class of sets might seem a hopelessly complicated task, but there are several constraints on what the class of sets can be like for the proof to work. One important one is that it should be definable in any complexity structure. So it needs to be defined in a way that isn’t too specific to . It might be worth making precise what this restriction actually means.

The rough idea here is that in Pavel’s example it is possible to provide the extra information (that is, the extra bits in the auxiliary game) right at the end of the game. In an infinite game there is no such thing as “right at the end of the game”: whenever you play, you’re still very near the beginning. This difference has caused me difficulties in the past, and I think it is worth focusing on again. Is there some natural way of ruling out this postponing of the extra information?

One crude idea is to rule it out by … ruling it out. For example, we could define a set to be -winning for Player I/II if there is a winning strategy for Player I/II such that after her/his first moves the outcome of the game is already decided. There is probably some serious drawback with such a simple-minded approach, but it is worth finding that drawback. I have given very little thought to it, so there may be something very obviously bad about it. One small point is that if, as I think is likely to be necessary, a proof that low-complexity sets can be lifted is inductive in nature, then we will want a composition of simplifying lifts to be a simplifying lift. So we would want our lifts to be such that -winning sets lift to -winning sets for the same player (and not just that winning sets lift to winning sets). So we would preserve the -winning sets we’ve already created, and attempt to create some new ones.

I think that one of the reasons Polymath9 hasn’t taken off is that I presented too much material all at once. (I did try to make it clear that it wasn’t necessary to wade through it all, but even so I can see that it might have been off-putting.) In an effort to avoid that mistake this time, I’m going to resist the temptation to think further about how to respond to Pavel’s lift and go ahead and put up this post. If I do have further ideas, I’ll post them as comments.

]]>

If you are reasonably comfortable with the kind of basic logic needed in an undergraduate course, then you may enjoy trying to find the flaw in the following argument, which must have a flaw, since I’m going to prove a general statement and then give a counterexample to it. If you find the exercise extremely easy, then you may prefer to hold back so that others who find it harder will have a chance to think about it. Or perhaps I should just say that if you don’t find it easy, then I think it would be a good exercise to think about it for a while before looking at other people’s suggested solutions.

First up is the general statement. In fact, it’s a very general statement. Suppose you are trying to prove a statement and you have a hypothesis to work with. In other words, you are trying to prove the statement

Now if and are two statements, then is true if and only if either is false or is true. Hence what we are trying to prove can be rewritten as follows.

Now we can bring the inside the as long as we convert the into , so let’s do that. What we want to prove becomes this.

I’ll assume here that we haven’t done something foolish and given the name to one of the variables involved in the statement . So now I’m going to use the general rule that is equivalent to to rewrite what we want to prove as the following.

Finally, let’s rewrite what’s inside the brackets using the sign.

Every single step I took there was a logical equivalence, so the conclusion is that if you want to show that implies , your task is the same as that of finding a single such that .

Now let me give a counterexample to that useful logical principle. Let be a set of real numbers. Define the *diameter* of to be . I’ll write it .

Consider the following implication.

That is clearly correct: if every element of has modulus at most 1, then is contained in the interval , so clearly can’t have diameter greater than 2.

But then, by the logical principle just derived, there must be a single element of such that if *that* element has modulus at most 1, then the diameter of is at most 2. In other words,

But that is clearly nonsense. If all we know is that one particular element of has modulus at most 1, it can’t possibly imply that has diameter at most 2.

What has gone wrong here? If you can give a satisfactory answer, then you will have a good grasp of what mathematicians mean by “implies”.

]]>

I’ve thought a little about what phrase to attach to the project (the equivalent of “density Hales-Jewett” or “Erdős discrepancy problem”). I don’t want to call it “P versus NP” because that is misleading: the project I have in mind is much more specific than that. It is to assess whether there is any possibility of proving complexity lower bounds by drawing inspiration from Martin’s proof of Borel determinacy. Only if the answer turned out to be yes, which for various reasons seems unlikely at the moment, would it be reasonable to think of this as a genuine attack on the P versus NP problem. So the phrase I’ve gone for is “discretized Borel determinacy”. That’s what DBD stands for above. It’s not a perfect description, but it will do.

For the rest of this post, I want to set out once again what the approach is, and then I want to explain where I am running into difficulties. I’m doing that to try to expose the soft underbelly of my proof attempt, in order to make it as easy as possible for somebody else to stick the knife in. (One could think of this as a kind of Popperian method of assessing the plausibility of the approach.) Another thing I’ll try to do is ask a number of precise questions that ought not to be impossible to solve and that can be thought about in isolation. Answers to any of these questions would, I think, be very helpful, either in demolishing the approach or in advancing it.

This section is copied from my previous post.

I define a *complexity structure* to be a subset of a set . I call the union of the the *alphabet* associated with the structure. Often I consider the case where . The maps between complexity structures that I consider (if you like, you can call them the morphisms in my category) are maps such that for each , the coordinate depends only on . To put that another way, if is another complexity structure, the maps I consider are ones of the form . I have found it inconvenient not having a name for these, but I can’t think of a good one. So I hereby declare that when I use the word “map” to talk about a function between complexity structures, I shall *always* mean a map with this property.

I call a subset of a complexity structure *basic* if it is of the form for some and some . The motivation for the restriction on the maps is that I want the inverse image of a basic set to be basic.

The non-trivial basic sets in the complexity structure are the coordinate hyperplanes and . The circuit complexity of a subset of measures how easily it can be built up from basic sets using intersections and unions. The definition carries over almost unchanged to an arbitrary complexity structure, and the property of maps ensures that the inverse image of a set of circuit complexity has circuit complexity at most .

Given a complexity structure , we can define a game that I call the *shrinking-neighbourhoods game*. For convenience let us take to be for some positive integer . Then the players take turns specifying coordinates: that is, they make declarations of the form . The only rules governing these specifications are the following.

- Player I must specify coordinates from to .
- Player II must specify coordinates from to .
- At every stage of the game, there must be at least one that satisfies all the specifications so far (so that the game can continue until all coordinates are specified).

Note that I do not insist that the coordinates are specified in any particular order: just that Player I’s specifications concern the first half and Player II’s the second.

To determine who wins the game, we need a *payoff set*, which is simply a subset . Player I wins if the sequence that the two players have specified belongs to , and otherwise Player II wins. I call a set *I-winning* if Player I has a winning strategy for getting into and *II-winning* if Player II has a winning strategy for getting into . (Just in case there is any confusion here, I really do mean that is II-winning if Player II has a winning strategy for getting into . I didn’t mean to write .)

Because the game is finite, it is determined. Therefore, we have the following Ramseyish statement: given any 2-colouring of a complexity structure , either the red set is I-winning or the blue set is II-winning. (Normally with a Ramsey statement one talks about *containing* a structure of a certain kind. If we wanted to, we could do that here by looking at minimal I-winning and minimal II-winning sets.)

Given a complexity structure , I define a *lift* of to be a complexity structure together with a map that satisfies the condition set out earlier. I define a lift to be *Ramsey* if is a winning subset of whenever is a winning subset of , and moreover it is winning for the same player. A more accurate name would be “winning-set preserving”, but I think of “Ramsey” as an abbreviation for that.

This gives us a potential method for showing that a subset is I-winning: we can find a Ramsey lift such that is simple enough for it to be easy to show that it is a I-winning subset of . Then the Ramsey property guarantees that , and hence , is I-winning in .

The definition of a Ramsey lift is closely modelled on Martin’s definition of a lift from one game to another.

Suppose that we have a suitable definition of “simple”. Then I would like to prove the following.

- If a set has polynomial circuit complexity, then there exists a Ramsey lift of with such that is simple and the cardinality of is much less than doubly exponential.
- If is a random subset of , then with high probability the smallest Ramsey lift that makes simple has an alphabet of doubly exponential size.
- There exists an NP set such that

the smallest Ramsey lift that makes simple has an alphabet of doubly exponential size.

Obviously, the first and third statements combined would show that PNP. For the time being, I would be delighted even with just the first of these three statements, since that would give an example of a property of functions that follows non-trivially from low circuit complexity. (That’s not guaranteed, since there might conceivably be a very simple way of constructing lifts from circuits. However, I think that is unlikely.)

Having the first and second statements would be a whole lot better than just having the first, since then we would have not just a property that follows non-trivially from low circuit complexity, but a property that distinguishes between functions of low circuit complexity and random functions. Even if we could not then go on to show that it distinguished between functions of low circuit complexity and some function in NP, we would at least have got round the natural-proofs barrier, which, given how hard that seems to be to do, would be worth doing for its own sake. (Again this is not quite guaranteed, since again one needs to be confident that the distinguishing property is interestingly different from the property of having low circuit complexity.)

As I said in my previous post, I think there are three reasons that, when combined, justify thinking about this potential distinguishing property, despite the small probability that it will work. The first is of course that the P versus NP problem is important and difficult enough that it is worth pursuing any approach that you don’t yet know to be hopeless. The second is that the property didn’t just come out of nowhere: it came from thinking about a possible analogy with an infinitary result (that in some rather strange sense it is harder to prove determinacy of analytic sets than it is to prove determinacy of Borel sets). And finally, the property appears not to be even close to a natural property in the Razborov-Rudich sense: for one thing it quantifies over all possible complexity structures that are not too much bigger than , and then it demands that the maps should preserve the I-winning and II-winning properties.

It is conceivable that the property might turn out to be natural after all. For instance, maybe the property of preserving I-winning and II-winning sets is so hard to achieve (I have certainly found it hard to come up with examples) that all possible Ramsey lifts are of some very special type, and perhaps that makes checking whether there is a Ramsey lift that simplifies a given set possible with a polynomial-time algorithm (as always, polynomial in ). But I think I can at least say that if the above property is natural, then that is an interesting and surprising theorem rather than just a simple observation.

Let be a straight-line computation of a set . That is, each is either a *coordinate hyperplane* (a set of the form for some and some ), or the intersection or union of two earlier sets in the sequence, and . We would like to find a complexity structure with not too large, together with a map that has the properties required of a Ramsey lift, such that is simple. Since a composition of Ramsey lifts is a Ramsey lift, and since taking inverse images (under the kinds of maps we are talking about) preserves simple sets, whatever definition of “simple” we are likely to take, as well as preserving all Boolean operations, a natural approach is an inductive one. The inductive hypothesis is that we have found a Ramsey lift such that the sets are simple for every . We now look at . By the inductive hypothesis, this is a union or intersection of two simple sets, so we now look for a Ramsey lift such that is simple. Setting , we then have a Ramsey lift such that is simple for every .

Thus, if we can find a very efficient Ramsey lift that turns a given intersection or union of two simple sets into a simple set, then we will be done. “Very efficient” means efficient enough that repeating the process times (where is polynomial in — though even superlinear in would be interesting) does not result in an alphabet of doubly exponential size. Note that if our definition of “simple” is such that the complement of a simple set is simple, then it is enough to prove this just for intersections or just for unions.

What might we take as our definition of “simple”? The idea I had that ran into trouble was the following. I defined “simple” to be “basic”. I then tried to find a very efficient lift — I was hoping to multiply the size of the alphabet by a constant — that would take the intersection of two basic sets to a basic set.

Let us very temporarily define a basic set to be -*basic* if it is defined by means of a restriction of the th coordinate. That is, it is of the form . (I want this definition to be temporary because most of the time I prefer to use “-basic” to refer to an intersection of at most basic sets.) If is -basic and is -basic, then it is natural to expect that if we can lift to a basic set, that basic set should be either -basic or -basic. Furthermore, by symmetry we ought to be able to choose whether we want it to be -basic or -basic. But then if we let be the 1-basic set and let be any other basic set, that tells us that we can lift so that it becomes a 1-basic set.

Now let us apply that to the coordinate hyperplanes in . If we can lift these very efficiently one by one until they all become 1-basic sets, then we have a complexity structure with a small alphabet and a map such that is 1-basic for every coordinate hyperplane . But applying Boolean operations to 1-basic sets yields 1-basic sets, and every subset of is a Boolean combination of coordinate hyperplanes. Therefore, *every* subset of has become a 1-basic set!

This is highly undesirable, because it means that we have shown that the property “Can be made simple by means of an efficient Ramsey lift” does not distinguish functions of low circuit complexity from arbitrary functions.

Because of that undesirability, I have not tried as hard as I might have to find such a lift. An initial attempt can be found in this tiddler. Note that the argument I have just given does not show that there cannot be a Ramsey lift that turns an -basic set into a 1-basic set at the cost of multiplying the size of the alphabet by a constant. What I have shown is that *if* this could be done, then there would be a Ramsey lift that converted all sets simultaneously into 1-basic sets, with an alphabet of size at most . If that were the case, then I think the approach would be completely dead. (Correction: the approach if the sets to be preserved are I-winning and II-winning sets would almost certainly be dead, and I don’t have any reason to think that if one tried to preserve other classes of sets, then the situation would be any different.) So that is one possible way to kill it off.

**Problem 1.** Let be a complexity structure and let be a basic subset of . Must there exist a complexity structure and a Ramsey lift such that is 1-basic and ?

In fact, if all one wants to do is disprove the statement that for a random set there is a doubly exponential lower bound, it is enough to obtain a bound here of the form .

The above observation tells us that we are in trouble if we have a definition of “simple” such that simple sets are closed under unions and intersections. More generally, we have a problem if we can modify our existing definition so that it becomes closed under unions and intersections. (What I have in mind when I write this is the example of basic sets. Those are not closed under intersections and unions, but if one could prove that every intersection of two basic sets can be lifted to a basic set, then, as I argued above, one could probably strengthen that result and show that every intersection of two basic sets can be lifted to a 1-basic set. And the 1-basic sets *are* closed under intersections and unions.)

Before I go on to discuss what other definitions of “simple” one might try, I want to discuss a second difficulty, because it gives rise to another statement that, if true, would deal a serious blow to this approach.

In the previous post, I gave an example of a lift that provides us with what I think of as the “trivial upper bound”: a Ramsey lift that turns every single subset of into an -basic set, with an alphabet of doubly exponential size. So if we want an inductive argument of the kind I have discussed above, we will need to show that an intersection or union of two simple sets can be lifted to a simple set with the size of the alphabet increasing in such a way that if one iterates that increase polynomially many times, the resulting size will be less than doubly exponential. (Actually, that isn’t quite necessary: maybe we could establish a lower bound of for a function in NP and an upper bound of for functions of circuit complexity , where .) This makes it highly problematic if we want to do anything that *squares* the size of the alphabet after only polynomially many steps. If we do that, then the size of the alphabet after times that polynomial number of steps, which is of course still a polynomial number of steps, will be at least and we will have proved nothing.

The reason this is troubling is that even if I forget all about simplifying any set , I find it very hard to come up with examples of Ramsey lifts. (All I mean by a Ramsey lift of is a complexity structure and a map that takes I-winning sets to I-winning sets and II-winning sets to II-winning sets.) The only ones I know about can be found on this tiddler here. And they all have the property that the players have to provide “extra information” of a kind that at the very least squares the size of the alphabet. In fact, it is usually quite a lot worse than that.

Maybe I can try to be slightly more precise about what I mean there. All the lifts I have considered (and I don’t think this is much of a restriction) take the form of sets where a typical sequence in is of the form and the map takes that sequence to . If , then What makes it interesting is that we do not take *all* sequences of the above form (that is, for arbitrary and arbitrary . Rather, we take only *some* of those sequences. (It is that that makes it possible to simplify sets. Otherwise, there would be nothing interesting about lifts.) So if Player I makes an opening move , we can think of this as a move in the original game together with a binding obligation on the two players that the eventual sequence will have at least one preimage such that . The set of all such sequences is a set that may well be a proper subset of .

Suppose now that this extra information is enough to determine some other coordinate . Then unless there are already very few options for how to choose , the number of possibilities for will be comparable in size to the size of the alphabet, and therefore the size of the alphabet is in serious danger of squaring, and certainly of raising itself to the power 3/2, say. And that is, as I have just pointed out, much too big an increase to iterate superlinearly many times.

So it looks as though any “extra information” we declare has to be rather crude, in the sense that it does not cut down too drastically the set in which the game is played. But I have no example of a Ramsey lift with this property. What’s more, the kind of difficulty I run into makes me worry that such a lift may not exist. If it doesn’t, then that too will be a serious blow to the approach.

Let me ask a concrete problem, the answer to which would I think be very useful. It is a considerable weakening of Problem 1.

**Problem 2.** Let be a complexity structure. Does there necessarily exist a non-trivial Ramsey lift with and bounded above by a function of ?

The main concern is that should *not* depend on .

I have not sorted out completely what “non-trivial” means here, but let me give a class of examples that I consider trivial. Let be a large enough set and let be a surjection. Define a map by . Finally, let . Then we can think of as a map from to . Note that is in some sense just like : it’s just that the coordinates of may have been repeated.

I claim that this is a Ramsey lift. Indeed, suppose that is a I-winning subset of . Then a winning strategy for Player I for is simply to project the game so far to , play a winning strategy in , and choose arbitrarily how to lift each specification of a coordinate of to a specification of the corresponding coordinate of .

To put that more formally, if the specifications so far are for and it is Player I’s turn, then she works out the specification she would make in in response to the specifications for . If this specification is , then she picks an arbitrary preimage of and makes the specification .

A similar argument works for winning sets for Player II.

It is the fact that this can always be done that makes the lift in some sense “trivial”. Another way of thinking about it is that there is an equivalence relation on such that replacing a point by an equivalent point makes no difference.

As far as I can tell at this stage, the problem is interesting if one takes “non-trivial” to mean not of the form I have just described. However, I reserve the right to respond to other examples by enlarging this definition of triviality. The real test of non-triviality is that an interesting Ramsey lift is one that has the potential to simplify sets.

A positive answer to the problem above will not help us if is an enormously large function of . However, for now my main concern is to decide whether it is possible to obtain a bound independent of . If it is, then a major source of worry is removed. If it is not, then the approach will be in serious trouble.

I stopped writing for a few hours after that last paragraph, and during those few hours I realized that my definition of non-triviality was not wide enough. Before I explain why not, I want to discuss a worry I have had for a while, and a very simple observation that explains why I don’t have it any more.

Because the worry was unfounded, it is rather hard to explain it, but let me try. Let’s suppose that we are trying to find an interesting Ramsey lift . Suppose also that we choose a random subset of with the critical probability . That is, we choose elements with that probability that makes the probability that is a I-winning set equal to . Then it seems highly likely that will be “only just” a I-winning set if it is one. And we’ll need to make sure that every time just happens to be I-winning, then is I-winning, and every time it just fails to be I-winning, is II-winning. This seems extraordinarily delicate, unless somehow the winning strategies in are derived rather directly from the winning strategies in (as seems to be the case for the examples we have so far).

The observation I have now made is almost embarrassingly simple: if is only just a I-winning set, we do not mind if is a II-winning set. That is because is not usually the complement of . In fact, if is a random set and every element of has many preimages in , then both and will be pretty well all of .

It is worth drawing attention to the way that it seems to be most convenient to prove that a lift is Ramsey. Instead of taking a winning subset of and trying to prove that its image is winning (for the same player) in , I have been taking a winning subset of and trying to prove that its inverse image is winning (for the same player) in . Let me prove a very easy lemma that shows that this is OK.

**Lemma.** Suppose that is a lift. Then the following two statements are equivalent.

(i) The image of every winning subset of is winning in for the same player.

(ii) The inverse image of every winning subset of is winning in for the same player.

**Proof.** Suppose that the second condition holds and let be a winning subset of . If is not a winning subset of for the same player, then is a winning subset of for the other player, which implies that is a winning subset of for the other player. But , so this contradicts being a winning set for the original player.

Conversely, suppose that the first condition holds and let be a winning subset of . Then if is not a winning subset of for the same player, then is a winning subset of for the other player, which implies that is a winning subset of for the other player. But , so this contradicts being a winning set for the original player. QED

Another way of saying all this is that if we want to prove that a map is a Ramsey lift, then the only winning sets for which we need to prove that is also a winning set are inverse images of sets . And the reason for that is that one can replace by the superset without affecting the image.

The quick description of these is as follows: take a trivial Ramsey lift of the kind I described earlier (one that duplicates each coordinate several times) and pass to a random subset of it.

Let me sketch an argument for why that, or something similar to it, works. The reason is basically the same as the reason that the trivial lift works. For the sake of clarity let me introduce a little notation. I’ll start with a complexity structure . I’ll then take to be a random subset of , where is some set and I write a typical element of as a sequence . The map takes this sequence to . I’m thinking of as a fairly large set, and the elements of are chosen independently from with some suitable probability .

Now let be a winning subset of . I want to show that is a winning subset of for the same player. So let be a winning strategy for for Player I (the case of Player II is very similar, so I won’t discuss it). Then in she can play as follows. If it is her turn and the specifications so far are of for , then she looks at what the strategy dictates in in response to the specifications of the , ignoring the . This will involve specifying some . Now she must find some such that there exists a sequence in that satisfies the specifications so far as well as the specification .

Typically, the proportion of that will serve as a suitable is approximately , so what we need, roughly speaking, is that should be bigger than . It’s not quite as simple as that, since if the alphabet is very very large, then there may be occasional pieces of extraordinary bad luck. However, I’m pretty sure it will be possible to modify the above idea to make it watertight.

Let and be complexity structures and a Ramsey lift. Let us say that is trivial if for any set of specifications () that can arise during the game in , for any set of specifications () with (this is a slight abuse of notation) and for any further specification , there exists a further specification , consistent with all the previous ones, such that .

This is an attempt to describe the property that makes it very easy to lift strategies in to strategies in : you just see what you would do at each stage in and lift that to — a policy that does not work in general but works in some simple cases.

One thing that is probably true but that it would be good to confirm is that a Ramsey lift of this simple kind cannot be used to simplify sets. I’ll state this as a problem, but I’m expecting it to be an easy exercise.

**Problem 3.** Let be a lift that is trivial in the above sense. Is it the case that for every the straight-line complexity of is equal to the straight-line complexity of ?

(A quick reminder: in a general complexity structure, I define the straight-line complexity of a set to be the length of the smallest sequence of sets that ends with , where all earlier sets in the sequence are either basic sets or unions or intersections of two earlier sets.)

Assuming that the answer to Problem 3 is yes, then the next obvious question is this. It’s the same as Problem 2 except that now we have a candidate definition of “non-trivial”.

**Problem 4.** Let be a complexity structure. Does there necessarily exist a non-trivial Ramsey lift where the size of the alphabet goes up by at most a factor that depends on only?

I very much hope that the answer is yes. I was beginning to worry that it was no, but after the simple observation above, my perception of how difficult it is to create Ramsey lifts has altered. In that direction, let me ask a slightly more specific problem.

**Problem 5.** Is there a “just-do-it” approach to creating Ramsey lifts?

What I mean there is a procedure for enumerating all the winning sets in and then building up and in stages, ensuring for each winning set in turn that its inverse image is a winning set for the same player. I would be surprised if this could be done efficiently, but I think that it would make it much clearer what a typical Ramsey lift looked like.

Let me also recall a problem from the previous post.

**Problem 6.** Let be the set of all sequences in of odd parity. Does there exist a Ramsey lift such that is a basic set and the alphabet of is not too large?

I would also be interested in a Ramsey lift that made simple in some other sense. Indeed, I suspect that the best hope for this approach is that the answer to Problem 6 is no, but that for some less restrictive definition of “simple” it is yes.

Maybe that’s enough mathematics for one post. I’d like to finish by trying to clarify what I mean by “micro-publication” on the TiddlySpace document. I can’t do that completely, because I’m expecting to learn on the job to some extent.

I’ll begin by saying that Jason Dyer answered a question I asked in the previous post, and thereby became the first person to be micro-published. I don’t know whether it was his intention, but anyway I was pleased to have a contribution suitable for this purpose. He provided an example that showed that a certain lift that turns the parity function into a basic function was (as expected) not a Ramsey lift. It can be found here. There are several related lifts for which examples have not yet been found. See this tiddler for details.

Jason’s micro-publication should not be thought of as typical, however, since it just takes a question and answers it. Obviously it’s great if that can be done, but what I think of as the norm is not answering questions but more like this: you take a question, decide that it cannot be answered straight away, and instead generate new questions that should ideally have the following two properties.

- They are probably easier than the original question.
- If they can be answered, then the original question will probably become easier.

One could call questions of that kind “splitting questions”, because in a sense they split up the original task into smaller and simpler tasks — or at least there is some chance that they do so.

What I have not quite decided is what constitutes a micro-publication. Suppose, for example, somebody has a useful comment about a question, but does not generate any new questions. Does that count? And what if somebody else, motivated by the useful comment, comes up with a good question? I think what I’ll probably want to do in a case like that is write a tiddler with the useful comment and the splitting question or questions, carefully attributing each part to the person who contributed it, with links to the relevant blog comments.

Also, I think that when someone asks a good question, I will automatically create an empty tiddler for it. So one way of working out quickly where there are loose ends that need tying up is to look for empty tiddlers. (TiddlySpace makes this easy — their titles are in italics.)

Some people may be tempted to think hard about a question and then present a fairly highly developed answer to it. If you feel this temptation, then I’d be very grateful if you could do one of the following two things.

- Resist it.
- Keep a careful record of all the questions you ask in the process of answering the original question, so that your thought processes can be properly represented on the proof-discovery tree.

By “resist it”, what I mean is not that you should avoid thinking hard about a question, but merely that each time you generate new questions, you should write up your thoughts so far in the form of blog comments, so that we get the thought process and not just the answer. The main point is that if we end up proving something interesting, then I would like it to be as clear as possible how we did it. With this project, I am at least as interested in trying to improve my understanding of the research process as I am in trying to make progress on the P versus NP problem.

]]>