If is a union-closed family on a ground set , and , then we can take the family . The map is a homomorphism (in the sense that , so it makes sense to regard as a quotient of .

If instead we take an equivalence relation on , we can define a set-system to be the set of all unions of equivalence classes that belong to .

Thus, subsets of give quotient families and quotient sets of give subfamilies.

Possibly the most obvious product construction of two families and is to make their ground sets disjoint and then to take . (This is the special case with disjoint ground sets of the construction that Tom Eccles discussed earlier.)

Note that we could define this product slightly differently by saying that it consists of all pairs with the “union” operation . This gives an algebraic system called a join semilattice, and it is isomorphic in an obvious sense to with ordinary unions. Looked at this way, it is not so obvious how one should define abundances, because does not have a ground set. Of course, we can define them via the isomorphism to but it would be nice to do so more intrinsically.

Tobias Fritz, in this comment, defines a more general “fibre bundle” construction as follows. Let be a union-closed family of sets (the “base” of the system). For each let be a union-closed family (one of the “fibres”), and let the elements of consist of pairs with . We would like to define a join operation on by

for a suitable . For that we need a bit more structure, in the form of homomorphisms whenever . These should satisfy the obvious composition rule .

With that structure in place, we can take to be , and we have something like a union-closed system. To turn it into a union-closed system one needs to find a concrete realization of this “join semilattice” as a set system with the union operation. This can be done in certain cases (see the comment thread linked to above) and quite possibly in all cases.

First, here is a simple construction that shows that Conjecture 6 from the previous post is false. That conjecture states that if you choose a random non-empty and then a random , then the average abundance of is at least 1/2. It never seemed likely to be true, but it survived for a surprisingly long time, before the following example was discovered in a comment thread that starts here.

Let be a large integer and let be disjoint sets of size and . (Many details here are unimportant — for example, all that actually matters is that the sizes of the sets should increase fairly rapidly.) Now take the set system

.

To see that this is a counterexample, let us pick our random element of a random set, and then condition on the five possibilities for what that set is. I’ll do a couple of the calculations and then just state the rest. If , then its abundance is 2/3. If it is in , then its abundance is 1/2. If it is in , then the probability that it is in is , which is very small, so its abundance is very close to 1/2 (since with high probability the only three sets it belongs to are , and ). In this kind of way we get that for large enough we can make the average abundance as close as we like to

.

One thing I would like to do — or would like someone to do — is come up with a refinement of this conjecture that isn’t so obviously false. What this example demonstrates is that duplication shows that for the conjecture to have been true, the following apparently much stronger statement would have had to be true. For each non-empty , let be the minimum abundance of any element of . Then the average of over is at least 1/2.

How can we convert the average over into the minimum over ? The answer is simple: take the original set system and write the elements of the ground set in decreasing order of abundance. Now duplicate the first element (that is, the element with greatest abundance) once, the second element times, the third times, and so on. For very large , the effect of this is that if we choose a random element of (after the duplications have taken place) then it will have minimal abundance in .

So it seems that duplication of elements kills off this averaging argument too, but in a slightly subtler way. Could we somehow iterate this thought? For example, could we choose a random by first picking a random non-empty , then a random such that , and finally a random element ? And could we go further — e.g., picking a random chain of the form , etc., and stopping when we reach a set whose points cannot be separated further?

Tobias Fritz came up with a nice strengthening that again turned out (again as expected) to be false. The thought was that it might be nice to find a “bijective” proof of FUNC. Defining to be and to be , we would prove FUNC for if we could find an injection from to .

For such an argument to qualify as a proper bijective proof, it is not enough merely to establish the existence of an injection — that follows from FUNC on mere grounds of cardinality. Rather, one should define it in a nice way somehow. That makes it natural to think about what properties such an injection might have, and a particularly natural requirement that one might think about is that it should preserve unions.

It turns out that there are set systems for which there does not exist any with a union-preserving injection from to . After several failed attempts, I found the following example. Take a not too small pair of positive integers — it looks as though works. Then take a Steiner -system — that is, a collection of sets of size 5 such that each set of size 3 is contained in exactly one set from . (Work of Peter Keevash guarantees that such a set system exists, though this case was known before his amazing result.)

The counterexample is generated by all complements of sets in , though it is more convenient just to take and prove that there is no intersection-preserving injection from to . To establish this, one first proves that any such injection would have to take sets of size to sets of size , which is basically because you need room for all the subsets of size of a set to map to distinct subsets of the image of . Once that is established, it is fairly straightforward to show that there just isn’t room to do things. The argument can be found in the comment linked to above, and the thread below it.

Thomas Bloom came up with a simpler example, which is interesting for other reasons too. His example is generated by the sets , all -subsets of , and the 6 sets , , , , , . I asked him where this set system had come from, and the answer turned out to be very interesting. He had got it by staring at an example of Renaud and Sarvate of a union-closed set system with exactly one minimal-sized set, which has size 3, such that that minimal set contains no element of abundance at least 1/2. Thomas worked out how the Renaud-Servate example had been pieced together, and used similar ideas to produce his example. Tobias Fritz then went on to show that Thomas’s construction was a special case of his fibre-bundle construction.

This post is by no means a comprehensive account of all the potentially interesting ideas from the last post. For example, Gil Kalai has an interesting slant on the conjecture that I think should be pursued further, and there are a number of interesting questions that were asked in the previous comment thread that I have not repeated here, mainly because the post has taken a long time to write and I think it is time to post it.

]]>

Something I like to think about with Polymath projects is the following question: if we end up *not* solving the problem, then what can we hope to achieve? The Erdős discrepancy problem project is a good example here. An obvious answer is that we can hope that enough people have been stimulated in enough ways that the probability of somebody solving the problem in the not too distant future increases (for example because we have identified more clearly the gap in our understanding). But I was thinking of something a little more concrete than that: I would like at the very least for this project to leave behind it an online resource that will be essential reading for anybody who wants to attack the problem in future. The blog comments themselves may achieve this to some extent, but it is not practical to wade through hundreds of comments in search of ideas that may or may not be useful. With past projects, we have developed Wiki pages where we have tried to organize the ideas we have had into a more browsable form. One thing we didn’t do with EDP, which in retrospect I think we should have, is have an official “closing” of the project marked by the writing of a formal article that included what we judged to be the main ideas we had had, with complete proofs when we had them. An advantage of doing that is that if somebody later solves the problem, it is more convenient to be able to refer to an article (or preprint) than to a combination of blog comments and Wiki pages.

With an eye to this, I thought I would make FUNC1 a data-gathering exercise of the following slightly unusual kind. For somebody working on the problem in the future, it would be very useful, I would have thought, to have a list of natural strengthenings of the conjecture, together with a list of “troublesome” examples. One could then produce a table with strengthenings down the side and examples along the top, with a tick in the table entry if the example disproves the strengthening, a cross if it doesn’t, and a question mark if we don’t yet know whether it does.

A first step towards drawing up such a table is of course to come up with a good supply of strengthenings and examples, and that is what I want to do in this post. I am mainly selecting them from the comments on the previous post. I shall present the strengthenings as statements rather than questions, so they are not necessarily true.

Let be a function from the power set of a finite set to the non-negative reals. Suppose that the weights satisfy the condition for every and that at least one non-empty set has positive weight. Then there exists such that the sum of the weights of the sets containing is at least half the sum of all the weights.

~~Note that if all weights take values 0 or 1, then this becomes the original conjecture. It is possible that the above statement ~~*follows* from the original conjecture, but we do not know this (though it may be known).

This is not a good question after all, as the deleted statement above is false. When is 01-valued, the statement reduces to saying that for every up-set there is an element in at least half the sets, which is trivial: all the elements are in at least half the sets. Thanks to Tobias Fritz for pointing this out.

Let be a function from the power set of a finite set to the non-negative reals. Suppose that the weights satisfy the condition for every and that at least one non-empty set has positive weight. Then there exists such that the sum of the weights of the sets containing is at least half the sum of all the weights.

Again, if all weights take values 0 or 1, then the collection of sets of weight 1 is union closed and we obtain the original conjecture. It was suggested in this comment that one might perhaps be able to attack this strengthening using tropical geometry, since the operations it uses are addition and taking the minimum.

Tom Eccles suggests (in this comment) a generalization that concerns two set systems rather than one. Given set systems and , write for the union set . A family is union closed if and only if . What can we say if and are set systems with small? There are various conjectures one can make, of which one of the cleanest is the following: if and are of size and is of size at most , then there exists such that , where denotes the set of sets in that contain . This obviously implies FUNC.

Simple examples show that can be much smaller than either or — for instance, it can consist of just one set. But in those examples there always seems to be an element contained in many more sets. So it would be interesting to find a good conjecture by choosing an appropriate function to insert into the following statement: if , , and , then there exists such that .

Let be a union-closed family of subsets of a finite set . Then the average size of is at least .

This is false, as the example shows for any .

Let be a union-closed family of subsets of a finite set and suppose that *separates points*, meaning that if , then at least one set in contains exactly one of and . (Equivalently, the sets are all distinct.) Then the average size of is at least .

This again is false: see Example 2 below.

In this comment I had a rather amusing (and typically Polymathematical) experience of formulating a conjecture that I thought was obviously false in order to think about how it might be refined, and then discovering that I couldn’t disprove it (despite temporarily thinking I had a counterexample). So here it is.

As I have just noted (and also commented in the first post), very simple examples show that if we define the “abundance” of an element to be , then the average abundance does not have to be at least . However, that still leaves open the possibility that some kind of naturally defined *weighted* average might do the job. Since we want to define the weighting in terms of and to favour elements that are contained in lots of sets, a rather crude idea is to pick a random non-empty set and then a random element , and make that the probability distribution on that we use for calculating the average abundance.

A short calculation reveals that the average abundance with this probability distribution is equal to the *average overlap density*, which we define to be

where the averages are over . So one is led to the following conjecture, which implies FUNC: if is a union-closed family of sets, at least one of which is non-empty, then its average overlap density is at least 1/2.

A not wholly pleasant feature of this conjecture is that the average overlap density is very far from being isomorphism invariant. (That is, if you duplicate elements of , the average overlap density changes.) Initially, I thought this would make it easy to find counterexamples, but that seems not to be the case. It also means that one can give some thought to how to put a measure on that makes the average overlap density as small as possible. Perhaps if the conjecture is true, this “worst case” would be easier to analyse. (It’s not actually clear that there is a worst case — it may be that one wants to use a measure on that gives measure zero to some non-empty set , at which point the definition of average overlap density breaks down. So one might have to look at the “near worst” case.)

This conjecture comes from a comment by Igor Balla. Let be a union-closed family and let . Define a new family by replacing each by if and leaving it alone if . Repeat this process for every and the result is an *up-set* , that is, a set-system such that and implies that .

Note that each time we perform the “add if you can” operation, we are applying a bijection to the current set system, so we can compose all these bijections to obtain a bijection from to .

Suppose now that are distinct sets. It can be shown that there is no set such that and . In other words, is never a subset of .

Now the fact that is an up-set means that each element is in at least half the sets (since if then ). Moreover, it seems hard for too many sets in to be “far” from their images , since then there is a strong danger that we will be able to find a pair of sets and with .

This leads to the conjecture that Balla makes. He is not at all confident that it is true, but has checked that there are no small counterexamples.

**Conjecture.** Let be a set system such that there exist an up-set and a bijection with the following properties.

- For each , .
- For no distinct do we have .

Then there is an element that belongs to at least half the sets in .

The following comment by Gil Kalai is worth quoting: “Years ago I remember that Jeff Kahn said that he bet he will find a counterexample to every meaningful strengthening of Frankl’s conjecture. And indeed he shot down many of those and a few I proposed, including weighted versions. I have to look in my old emails to see if this one too.” So it seems that even to find a conjecture that genuinely strenghtens FUNC without being obviously false (at least to Jeff Kahn) would be some sort of achievement. (Apparently the final conjecture above passes the Jeff-Kahn test in the following weak sense: he believes it to be false but has not managed to find a counterexample.)

If is a finite set and is the power set of , then every element of has abundance 1/2. (Remark 1: I am using the word “abundance” for the *proportion* of sets in that contain the element in question. Remark 2: for what it’s worth, the above statement is meaningful and true even if is empty.)

Obviously this is not a counterexample to FUNC, but it was in fact a counterexample to an over-optimistic conjecture I very briefly made and then abandoned while writing it into a comment.

This example was mentioned by Alec Edgington. Let be a finite set and let be an element that does not belong to . Now let consist of together with all sets of the form such that .

If , then has abundance , while each has abundance . Therefore, only one point has abundance that is not less than 1/2.

A slightly different example, also used by Alec Edgington, is to take all subsets of together with the set . If , then the abundance of any element of is while the abundance of is . Therefore, the average abundance is

When is large, the amount by which exceeds 1/2 is exponentially small, from which it follows easily that this average is less than 1/2. In fact, it starts to be less than 1/2 when (which is the case Alec mentioned). This shows that Conjecture 5 above (that the average abundance must be at least 1/2 if the system separates points) is false.

Let be a positive integer and take the set system that consists of the sets and . This is a simple example (or rather class of examples) of a set system for which although there is certainly an element with abundance at least 1/2 (the element has abundance 2/3), the *average* abundance is close to 1/3. Very simple variants of this example can give average abundances that are arbitrarily small — just take a few small sets and one absolutely huge set.

I will not explain these in detail, but just point you to an interesting comment by Uwe Stroinski that suggests a number-theoretic way of constructing union-closed families.

I will continue with methods of building union-closed families out of other union-closed families.

I’ll define this process formally first. Let be a set of size and let be a collection of subsets of . Now let be a collection of disjoint non-empty sets and define to be the collection of all sets of the form for some . If is union closed, then so is .

One can think of as “duplicating” the element of times. A simple example of this process is to take the set system and let and . This gives the set system 3 above.

Let us say that if for some suitable set-valued function . And let us say that two set systems are *isomorphic* if they are in the same equivalence class of the symmetric-transitive closure of the relation . Equivalently, they are isomorphic if we can find and such that .

The effect of duplication is basically that we can convert the uniform measure on the ground set into any other probability measure (at least to an arbitrary approximation). What I mean by that is that the uniform measure on the ground set of , which is of course , gives you a probability of of landing in , so has the same effect as assigning that probability to and sticking with the set system . (So the precise statement is that we can get any probability measure where all the probabilities are rational.)

If one is looking for an averaging argument, then it would seem that a nice property that such an argument might have is (as I have already commented above) that the average should be with respect to a probability measure on that is constructed from in an isomorphism-invariant way.

It is common in the literature to outlaw duplication by insisting that separates points. However, it may be genuinely useful to consider different measures on the ground set.

Tom Eccles, in his off-diagonal conjecture, considered the set system, which he denoted by , that is defined to be . This might more properly be denoted , by analogy with the notation for sumsets, but obviously one can’t write it like that because that notation already stands for something else, so I’ll stick with Tom’s notation.

It’s trivial to see that if and are union closed, then so is . Moreover, sometimes it does quite natural things: for instance, if and are any two sets, then , where is the power-set operation.

Another remark is that if and are disjoint, and and , then the abundance of in is equal to the abundance of in .

I got this from a comment by Thomas Bloom. Let and be disjoint finite sets and let and be two union-closed families living inside and , respectively, and assume that and . We then build a new family as follows. Let be some function from to . Then take all sets of one of the following four forms:

- sets with ;
- sets with ;
- sets with ;
- sets with .

It can be checked quite easily (there are six cases to consider, all straightforward) that the resulting family is union closed.

Thomas Bloom remarks that if consists of all subsets of and consists of all subsets of , then (for suitable ) the result is a union-closed family that contains no set of size less than 3, and also contains a set of size 3 with no element of abundance greater than or equal to 1/2. This is interesting because a simple argument shows that if is a set with two elements in a union-closed family then at least one of its elements has abundance at least 1/2.

Thus, this construction method can be used to create interesting union-closed families out of boring ones.

Thomas discusses what happens to abundances when you do this construction, and the rough answer is that elements of become less abundant but elements of become quite a lot more abundant. So one can’t just perform this construction a few times and end up with a counterexample to FUNC. However, as Thomas also says, there is plenty of scope for modifying this basic idea, and maybe good things could flow from that.

I feel as though there is much more I could say, but this post has got quite long, and has taken me quite a long time to write, so I think it is better if I just post it. If there are things I wish I had mentioned, I’ll put them in comments and possibly repeat them in my next post.

I’ll close by remarking that I have created a wiki page. At the time of writing it has almost nothing on it but I hope that will change before too long.

]]>

A less serious problem is what acronym one would use for the project. For the density Hales-Jewett problem we went for DHJ, and for the Erdős discrepancy problem we used EDP. That general approach runs into difficulties with Frankl’s union-closed conjecture, so I suggest FUNC. This post, if the project were to go ahead, could be FUNC0; in general I like the idea that we would be engaged in a funky line of research.

The problem, for anyone who doesn’t know, is this. Suppose you have a family that consists of distinct subsets of a set . Suppose also that it is *union closed*, meaning that if , then as well. Must there be an element of that belongs to at least of the sets? This seems like the sort of question that ought to have an easy answer one way or the other, but it has turned out to be surprisingly difficult.

If you are potentially interested, then one good thing to do by way of preparation is look at this survey article by Henning Bruhn and Oliver Schaudt. It is very nicely written and seems to be a pretty comprehensive account of the current state of knowledge about the problem. It includes some quite interesting reformulations (interesting because you don’t just look at them and see that they are trivially equivalent to the original problem).

For the remainder of this post, I want to discuss a couple of failures. The first is a natural idea for generalizing the problem to make it easier that completely fails, at least initially, but can perhaps be rescued, and the second is a failed attempt to produce a counterexample. I’ll present these just in case one or other of them stimulates a useful idea in somebody else.

An immediate reaction of any probabilistic combinatorialist is likely to be to wonder whether in order to prove that there *exists* a point in at least half the sets it might be easier to show that in fact an *average* point belongs to half the sets.

Unfortunately, it is very easy to see that that is false: consider, for example, the three sets , , and . The average (over ) of the number of sets containing a random element is , but there are three sets.

However, this example doesn't feel like a genuine counterexample somehow, because the set system is just a dressed up version of : we replace the singleton by the set and that's it. So for this set system it seems more natural to consider a *weighted* average, or equivalently to take not the uniform distribution on , but some other distribution that reflects more naturally the properties of the set system at hand. For example, we could give a probability 1/2 to the element 1 and to each of the remaining 12 elements of the set. If we do that, then the average number of sets containing a random element will be the same as it is for the example with the uniform distribution (not that the uniform distribution is obviously the most natural distribution for that example).

This suggests a very slightly more sophisticated version of the averaging-argument idea: does there *exist* a probability distribution on the elements of the ground set such that the expected number of sets containing a random element (drawn according to that probability distribution) is at least half the number of sets?

With this question we have in a sense the opposite problem. Instead of the answer being a trivial no, it is a trivial yes — if, that is, the union-closed conjecture holds. That’s because if the conjecture holds, then some belongs to at least half the sets, so we can assign probability 1 to that and probability zero to all the other elements.

Of course, this still doesn’t feel like a complete demolition of the approach. It just means that for it not to be a trivial reformulation we will have to put *conditions* on the probability distribution. There are two ways I can imagine getting the approach to work. The first is to insist on some property that the distribution is required to have that means that its existence does *not* follow easily from the conjecture. That is, the idea would be to prove a stronger statement. It seems paradoxical, but as any experienced mathematician knows, it can sometimes be easier to prove a stronger statement, because there is less room for manoeuvre. In extreme cases, once a statement has been suitably strengthened, you have so little choice about what to do that the proof becomes almost trivial.

A second idea is that there might be a nice way of defining the probability distribution in terms of the set system. This would be a situation rather like the one I discussed in my previous post, on entropy and Sidorenko’s conjecture. There, the basic idea was to prove that a set had cardinality at least by proving that there is a probability distribution on with entropy at least . At first, this seems like an unhelpful idea, because if then the uniform distribution on will trivially do the job. But it turns out that there is a different distribution for which it is easier to *prove* that it does the job, even though it usually has lower entropy than the uniform distribution. Perhaps with the union-closed conjecture something like this works too: obviously the best distribution is supported on the set of elements that are contained in a maximal number of sets from the set system, but perhaps one can construct a different distribution out of the set system that gives a smaller average in general but about which it is easier to prove things.

I have no doubt that thoughts of the above kind have occurred to a high percentage of people who have thought about the union-closed conjecture, and can probably be found in the literature as well, but it would be odd not to mention them in this post.

To finish this section, here is a wild guess at a distribution that does the job. Like almost all wild guesses, its chances of being correct are very close to zero, but it gives the flavour of the kind of thing one might hope for.

Given a finite set and a collection of subsets of , we can pick a random set (uniformly from ) and look at the events for each . In general, these events are correlated.

Now let us define a matrix by . We could now try to find a probability distribution on that minimizes the sum . That is, in a certain sense we would be trying to make the events as uncorrelated as possible on average. (There may be much better ways of measuring this — I’m just writing down the first thing that comes into my head that I can’t immediately see is stupid.)

What does this give in the case of the three sets , and ? We have that if or or and . If and , then , since if , then is one of the two sets and , with equal probability.

So to minimize the sum we should choose so as to maximize the probability that and . If , then this probability is , which is maximized when , so in fact we get the distribution mentioned earlier. In particular, for this distribution the average number of sets containing a random point is , which is precisely half the total number of sets. (I find this slightly worrying, since for a successful proof of this kind I would expect equality to be achieved only in the case that you have disjoint sets and you take all their unions, including the empty set. But since this definition of a probability distribution isn’t supposed to be a serious candidate for a proof of the whole conjecture, I’m not too worried about being worried.)

Just to throw in another thought, perhaps some entropy-based distribution would be good. I wondered, for example, about defining a probability distribution as follows. Given any probability distribution, we obtain weights on the sets by taking to be the probability that a random element (chosen from the distribution) belongs to . We can then form a probability distribution on by taking the probabilities to be proportional to the weights. Finally, we can choose a distribution on the elements to maximize the entropy of the distribution on .

If we try that with the example above, and if is the probability assigned to the element 1, then the three weights are and , so the probabilities we will assign will be and . The entropy of this distribution will be maximized when the two non-zero probabilities are equal, which gives us , so in this case we will pick out the element 1. It isn’t completely obvious that that is a bad thing to do for this particular example — indeed, we will do it whenever there is an element that is contained in all the non-empty sets from . Again, there is virtually no chance that this rather artificial construction will work, but perhaps after a lot of thought and several modifications and refinements, something like it could be got to work.

I find the non-example I’m about to present interesting because I don’t have a good conceptual understanding of why it fails — it’s just that the numbers aren’t kind to me. But I think there *is* a proper understanding to be had. Can anyone give me a simple argument that no construction that is anything like what I tried can possibly work? (I haven’t even checked properly whether the known positive results about the problem ruled out my attempt before I even started.)

The idea was as follows. Let and be parameters to be chosen later, and let be a random set system obtained by choosing each subset of of size with probability , the choices being independent. We then take as our attempted counterexample the set of all unions of sets in .

Why might one entertain even for a second the thought that this could be a counterexample? Well, if we choose to be rather close to , but just slightly less, then a typical pair of sets of size have a union of size close to , and more generally a typical union of sets of size has size at least this. There are vastly fewer sets of size greater than than there are of size , so we could perhaps dare to hope that almost all the sets in the set system are the ones of size , so the average size is close to , which is less than . And since the sets are spread around, the elements are likely to be contained in roughly the same number of sets each, so this gives a counterexample.

Of course, the problem here is that although a typical union is large, there are many atypical unions, so we need to get rid of them somehow — or at least the vast majority of them. This is where choosing a random subset comes in. The hope is that if we choose a fairly sparse random subset, then all the unions will be large rather than merely almost all.

However, this introduces a new problem, which is that if we have passed to a *sparse* random subset, then it is no longer clear that the size of that subset is bigger than the number of possible unions. So it becomes a question of balance: can we choose small enough for the unions of those sets to be typical, but still large enough for the sets of size to dominate the set system? We’re also free to choose of course.

I usually find when I’m in a situation like this, where I’m hoping for a miracle, that a miracle doesn’t occur, and that indeed seems to be the case here. Let me explain my back-of-envelope calculation.

I’ll write for the set of unions of sets in . Let us now take and give an upper bound for the expected number of sets in of size . So fix a set of size and let us give a bound for the probability that . We know that must contain at least two sets in . But the number of pairs of sets of size contained in is at most and each such pair has a probability of being a pair of sets in , so the probability that is at most . Therefore, the expected number of sets in of size is at most .

As for the expected number of sets in , it is , so if we want the example to work, we would very much like it to be the case that when , we have the inequality

.

We can weaken this requirement by observing that the expected number of sets in of size is also trivially at most , so it is enough to go for

.

If the left-hand side is not just greater than the right-hand side, but greater by a factor of say for each , then we should be in good shape: the average size of a set in will be not much greater than and we’ll be done.

If is not much bigger than , then things look quite promising. In this case, will be comparable in size to , but will be quite small — it equals , and is small. A crude estimate says that we’ll be OK provided that is significantly smaller than . And that looks OK, since is a lot smaller than , so we aren’t being made to choose a ridiculously small value of .

If on the other hand is quite a lot larger than , then is much much smaller than , so we’re in great shape as long as we haven’t chosen so tiny that is also much much smaller than .

So what goes wrong? Well, the problem is that the first argument requires smaller and smaller values of as gets further and further away from , and the result seems to be that by the time the second regime takes over, has become too small for the trivial argument to work.

Let me try to be a bit more precise about this. The point at which becomes smaller than is of course the point at which . For that value of , we require , so we need . However, an easy calculation reveals that

,

(or observe that if you multiply both sides by , then both expressions are equal to the multinomial coefficient that counts the number of ways of writing an -element set as with and ). So unfortunately we find that however we choose the value of there is a value of such that the number of sets in of size is greater than . (I should remark that the estimate for the number of sets in of size can be improved to , but this does not make enough of a difference to rescue the argument.)

So unfortunately it turns out that the middle of the range is worse than the two ends, and indeed worse by enough to kill off the idea. However, it seemed to me to be good to make at least some attempt to find a counterexample in order to understand the problem better.

From here there are two obvious ways to go. One is to try to modify the above idea to give it a better chance of working. The other, which I have already mentioned, is to try to generalize the failure: that is, to explain why that example, and many others like it, had no hope of working. Alternatively, somebody could propose a completely different line of enquiry.

I’ll stop there. Experience with Polymath projects so far seems to suggest that, as with individual projects, it is hard to predict how long they will continue before there is a general feeling of being stuck. So I’m thinking of this as a slightly tentative suggestion, and if it provokes a sufficiently healthy conversation and interesting new (or at least new to me) ideas, then I’ll write another post and launch a project more formally. In particular, only at that point will I call it Polymath11 (or should that be Polymath12? — I don’t know whether the almost instantly successful polynomial-identities project got round to assigning itself a number). Also, for various reasons I don’t want to get properly going on a Polymath project for at least a week, though I realize I may not be in complete control of what happens in response to this post.

Just before I finish, let me remark that Polymath10, attempting to prove Erdős’s sunflower conjecture, is still continuing on Gil Kalai’s blog. What’s more, I think it is still at a stage where a newcomer could catch up with what is going on — it might take a couple of hours to find and digest a few of the more important comments. But Gil and I agree that there may well be room to have more than one Polymath project going at the same time, since a common pattern is for the group of participants to shrink down to a smallish number of “enthusiasts”, and there are enough mathematicians to form many such groups.

And a quick reminder, as maybe some people reading this will be new to the concept of Polymath projects. The aim is to try to make the problem-solving process easier in various ways. One is to have an open discussion, in the form of blog posts and comments, so that anybody can participate, and with luck a process of self-selection will take place that results in a team of enthusiastic people with a good mixture of skills and knowledge. Another is to encourage people to express ideas that may well be half-baked or even wrong, or even completely *obviously* wrong. (It’s surprising how often a completely obviously wrong idea can stimulate a different idea that turns out to be very useful. Naturally, expressing such an idea can be embarrassing, but it shouldn’t be, as it is an important part of what we do when we think about problems privately.) Another is to provide a mechanism where people can get very quick feedback on their ideas — this too can be extremely stimulating and speed up the process of thought considerably. If you like the problem but don’t feel like pursuing either of the approaches I’ve outlined above, that’s of course fine — your ideas are still welcome and may well be more fruitful than those ones, which are there just to get the discussion started.

]]>

The proof is very simple. For each , let be the characteristic function of the neighbourhood of . That is, if is an edge and otherwise. Then is the sum of the degrees of the , which is the number of edges of , which is . If we set , then this tells us that . By the Cauchy-Schwarz inequality, it follows that .

But by the Cauchy-Schwarz inequality again,

That last expression is times the number of quadruples such that all of and are edges, and our previous estimate shows that it is at least . Therefore, the probability that a random such quadruple consists entirely of edges is at least , as claimed (since there are possible quadruples to choose from).

Essentially the same proof applies to a weighted bipartite graph. That is, if you have some real weights for each of the edges, and if the average weight is , then

.

(One can also generalize this statement to complex weights if one puts in appropriate conjugates.) One way of thinking of this is that the sum on the left-hand side goes down if you apply an averaging projection to — that is, you replace all the values of by their average.

Thus, an appropriate weighted count of 4-cycles is minimized over all systems of weights with a given average when all the weights are equal.

Sidorenko’s conjecture is the statement that the same is true for any bipartite graph one might care to count. Or to be more precise, it is the statement that if you apply the averaging projection to a graph (rather than a weighted graph), then the count goes down: I haven’t checked whether the conjecture is still reasonable for weighted graphs, though standard arguments show that if it is true then it will still be true for weighted graphs if the weights are between 0 and 1, and hence (by rescaling) for all non-negative weights.

Here is a more formal statement of the conjecture.

**Conjecture** *Let be a bipartite graph with vertex sets and . Let the number of edges of be . Let be a bipartite graph of density with vertex sets and and let and be random functions from to and from to . Then the probability that is an edge of for every pair such that is an edge of is at least .*

This feels like the kind of statement that is either false with a simple counterexample or true with a simple proof. But this impression is incorrect.

My interest in the problem was aroused when I taught a graduate-level course in additive combinatorics and related ideas last year, and set a question that I wrongly thought I had solved, which is Sidorenko’s conjecture in the case of a path of length 3. That is, I asked for a proof or disproof of the statement that if has density , then the number of quadruples such that and are all edges is at least . I actually thought I had a counterexample, which I didn’t, as this case of Sidorenko’s conjecture is a theorem that I shall prove later in this post.

I also tried to prove the statement using the Cauchy-Schwarz inequality, but nothing I did seemed to work, and eventually I filed it away in my mind under the heading “unexpectedly hard problem”.

At some point around then I heard about a paper of Balázs Szegedy that used entropy to prove all (then) known cases of Sidorenko’s conjecture as well as a few more. I looked at the paper, but it was hard to extract from it a short easy proof for paths of length 3, because it is concerned with proving as general a result as possible, which necessitates setting up quite a lot of abstract language. If you’re like me, what you really need in order to understand a general result is (at least in complicated cases) not the actual proof of the result, but more like a proof of a special case that is general enough that once you’ve understood that you know that you could in principle think hard about what you used in order to extract from the argument as general a result as possible.

In order to get to that point, I set the paper as a mini-seminar topic for my research student Jason Long, and everything that follows is “joint work” in the sense that he gave a nice presentation of the paper, after which we had a discussion about what was actually going on in small cases, which led to a particularly simple proof for paths of length 3, which works just as well for all trees, as well as a somewhat more complicated proof for 4-cycles. (These proofs are entirely due to Szegedy, but we stripped them of all the abstract language, the result of which, to me at any rate, is to expose what is going on and what was presumably going on in Szegedy’s mind when he proved the more general result.) Actually, this doesn’t explain the whole paper, even in the sense just described, since we did not discuss a final section in which Szegedy deals with more complicated examples that don’t follow from his first approach, but it does at least show very clearly how entropy enters the picture.

I think it is possible to identify three ideas that drive the proof for paths of length 3. They are as follows.

- Recall that the
*entropy*of a probability distribution on a finite set is . By Jensen’s inequality, this is maximized for the uniform distribution, when it takes the value . It follows that one way of proving that is to identify a probability distribution on with entropy greater than . - That may look like a mad idea at first, since if anything is going to work, the uniform distribution will. However, it is not necessarily a mad idea, because there might in principle be a non-uniform distribution with entropy that was much easier to calculate than that of the uniform distribution.
- If is a graph, then there is a probability distribution, on the set of quadruples of vertices of such that and are all edges, that is not in general uniform and that has very nice properties.

The distribution mentioned in 3 is easy to describe. You first pick an edge of uniformly at random (from all the edges of ). Note that the probability that is the first vertex of this edge is proportional to the degree of , and the probability that is the end vertex is proportional to the degree of .

Having picked the edge you then pick uniformly at random from the neighbours of , and having picked you pick uniformly at random from the neighbours of . This guarantees that is a path of length 3 (possibly with repeated vertices, as we need for the statement of Sidorenko’s conjecture).

The beautiful property of this distribution is that the edges and are identically distributed. This follows from the fact that has the same distribution as and is a random neighbour of , which means that has the same distribution as and is a random neighbour of .

Now let us turn to point 2. The only conceivable advantage of using a non-uniform distribution and an entropy argument is that if we are lucky then the entropy will be easy to calculate and will give us a good enough bound to prove what we want. The reason this stands a chance of being true is that entropy has some very nice properties. Let me briefly review those.

It will be convenient to talk about entropy of random variables rather than probability distributions: if is a random variable on a finite set , then its entropy is , where I have written as shorthand for .

We also need the notion of *conditional* entropy . This is : that is, it is the expectation of the entropy of if you are told the value of .

If you think of the entropy of as the average amount of information needed to specify the value of , then is the average amount *more* information you need to specify if you have been told the value of . A couple of extreme cases are that if and are independent, then (since the distribution of is the same as the distribution of for each ), and if is a function of , then (since is concentrated at a point for each ). In general, we also have the formula

where is the entropy of the joint random variable . Intuitively this says that to know the value of you need to know the value of and then you need to know any extra information needed to specify . It can be verified easily from the formula for entropy.

Now let us calculate the entropy of the distribution we have defined on labelled paths of length 3. Let , , and be random variables that tell us where and are. We want to calculate . By a small generalization of the formula above — also intuitive and easy to check formally — we have

Now the nice properties of our distribution come into play. Note first that if you know that , then and are independent (they are both random neighbours of ). It follows that . Similarly, given the value of , is independent of the pair , so .

Now for each let be the degree of and let . Then , and , so

,

and because the edges all have the same distribution, we get the same answer for and .

As for the entropy , it is equal to

.

Putting all this together, we get that

.

By Jensen’s inequality, the second term is minimized if for every , and in that case we obtain

.

If the average degree is , then , which gives us

The moral here is that once one uses the nice distribution and calculates its entropy, the calculations follow very easily from the standard properties of conditional entropy, and they give exactly what we need — that the entropy must be at least , from which it follows that the number of labelled paths of length 3 must be at least .

Now I’ll prove the conjecture for 4-cycles. In a way this is a perverse thing to do, since as we have already seen, one can prove this case easily using Cauchy-Schwarz. However, the argument just given generalizes very straightforwardly to trees (and therefore forests), which makes the 4-cycle the simplest case that we have not yet covered, since it is the simplest bipartite graph that contains a cycle. Also, it is quite interesting that there should be a genuinely different proof for the 4-cycles case that is still natural.

What we would like for the proof to work is a distribution on the 4-cycles (as usual we do not insist that these vertices are distinct) such that the marginal distribution of each edge is uniform amongst all edges of . A natural guess about how to do this is to pick a random edge , pick a random neighbour of , and then pick a random from the intersection of the neighbourhoods of and . But that gives the wrong distribution on . Instead, when we pick we need to choose it according to the distribution we have on paths (that is, choose a random edge and then let be a random neighbour of ) and then condition on and . Note that for fixed and that is exactly the distribution we already have on : once that is pointed out, it becomes obvious that this is the only reasonable thing to do, and that it will work.

So now let us work out the entropy of this distribution. Again let be the (far from independent) random variables that tell us where the vertices go. Then we can write down a few equations using the usual rule about conditional entropy and exploiting independence where we have it.

From the second and third equations we get that

,

and substituting this into the first gives us

As before, we have that

and

.

Therefore,

.

As for the term , we now use the trivial upper bound . If the average degree is , then

.

The remaining part is minimized when every is equal to , in which case it gives us , so we end up with a lower bound for the entropy of , exactly as required.

This kind of argument deals with a fairly large class of graphs — to get the argument to work it is necessary for the graph to be built up in a certain way, but that covers many cases of the conjecture.

]]>

First, I’ll just briefly say that things are going well with the new journal Discrete Analysis, and I think we’re on course to launch, as planned, early next year with a few very good accepted papers — we certainly have a number of papers in the pipeline that look promising to me. Of course, we’d love to have more.

Secondly, a very interesting initiative has recently been started by Martin Eve, called the Open Library of Humanities. The rough idea is that they provide a platform for humanities journals that are free to read online and free for authors (or, as some people like to say, are Diamond OA journals). Perhaps the most interesting aspect of this initiative is that it is funded by a consortium of libraries. Librarians are the people who feel the pain of ridiculous subscription prices, so they have great goodwill towards people who are trying to build new and cheaper publication models. I think there is no reason that the sciences couldn’t do something similar — in fact, it should be even easier to find money.

The OLH is actively encouraging existing humanities journals to move to their platform, which brings me to the third event I wanted to mention: the resignation of the editorial board of the Elsevier journal Lingua, which is in linguistics. The story in brief is that the editors made demands of Elsevier that were both reasonable and unreasonable: reasonable in the sense that they would be fine if we had a sane publication system, but unreasonable in the sense that it was quite obvious that Elsevier wouldn’t agree to them. They wanted to become an open access journal with publication fees of $400, way below the usual rate for an Elsevier journal. Since Elsevier owns the title, Lingua has now become its Greek counterpart Glossa — or, if you look at it Elsevier’s way, an entirely new journal has been founded called Glossa with an editorial board that has an entirely coincidental resemblance to what was until very recently the editorial board of Lingua, and it just happens also that the future editorial board of Lingua will be disjoint from what was recently the editorial board of Lingua. A nice term has been coined for what Lingua (that is, the Elsevier version) is about to become: a zombie journal. Maybe it will go the way of another famous zombie journal, Topology, the soul of which entered a new body called the Journal of Topology, and which staggered on for a couple of years before being put out of its misery. Here is an article about the Lingua story, which includes some priceless quotes from the managing editor. And here is Elsevier’s response, which is as facepalmish as usual. For example, at one point they say the following, which needs no comment from me.

Lingua is a hybrid open access journal which means that every author who wants to publish open access (i.e., free-of-charge for the reader), can do so. However, we have observed little uptake of the open access option in Lingua or elsewhere in linguistics at price points that would be economically viable.

The Open Library of Humanities will be helping to support Glossa.

Lastly, there is a story brewing at the LMS, which made the decision to close one of its journals, the LMS Journal of Computation and Mathematics, which has been going since 1998. Somebody with a paper submitted to the journal told me that he received an email saying the following.

Dear [TITLE LAST-NAME],

I am writing with news that may have a bearing on your consideration of publishing your article in the LMS JCM, ‘[TITLE OF THE PAPER]’, by [FIRST-NAME LAST-NAME]

As you may be aware, the LMS Journal of Computation and Mathematics has been running for some years as a ‘free’ journal and the costs of publishing the journal have been borne by the London Mathematical Society. From the outset, it was intended that the journal should progress to at least break even and, for a few years, it ran as a subscription journal but did not manage to acquire sufficient support from libraries to cover the costs of subscription management. Over the last few years, we have been considering how to best get the journal to a satisfactory and successful state and, last Friday, the LMS Council (whose members are the Officers and Trustees of the London Mathematical Society) considered the LMS Publications Committee’s proposal for the JCM, which included moving the journal to a gold open access model.

However, the LMS Council did not accept the proposal, and decided instead that the journal should be closed, one reason being that it felt the move to a gold open access model would likely lead to a slow decline that could be more damaging to its reputation. Council felt that the general area of computation and mathematics was one that the Society should, in the long run, continue to be present in, but thought that there were probably better ways to use its resources in this direction. Of course the Society will continue to make the papers already published available in perpetuity.

While we are happy to continue the process of publication of your paper, we are giving all authors yet to be published the opportunity to withdraw their papers. We will continue to publish any papers still in the pipeline providing you are willing to continue.

If you wish to withdraw your paper, please let us know and we will do this on your behalf. If you do not wish to withdraw your paper, no further action is necessary on your part.

Not too surprisingly, this has annoyed a lot of people. The following letter, with many signatures, has been sent to the LMS Council to urge them to reverse the decision.

In accordance with Statute 19 of the LMS Charter and Statutes, we, members of the LMS, make a requisition to convene a Special General Meeting of the Society; the object of the meeting shall be the reversal of the LMS Council’s decision to close down The LMS Journal of Computation and Mathematics.

The Council’s decision to close the Journal seems to conflict with the public benefit statement of the Trustees’ Annual Report. Moreover, closing The LMS Journal of Computation and Mathematics may be at odds with the charitable aims of the LMS as spelled out in its Charter. Indeed, Article 3 of the Charter says:

“The objects for which the Society is incorporated shall be: […]

(vi) To *make grants of money* or donations in aid of mathematical investigations or *the publication of mathematical works* [our emphasis] or other matters or things for the purpose of promoting invention and research in mathematical science, or its applications, or in subjects connected therewith; […]”

We trust that our requisition will be treated in line with Statute 19 of the LMS Charter and Statutes:

“19. The Council shall within twenty-eight days of the receipt of a requisition in writing of not less than twenty Members of the Society stating the objects for which the meeting is desired convene a General Meeting of the Society. If upon a requisition the Council fails to convene a Special General Meeting within twenty-eight days of a receipt of the requisition then a Special General Meeting to be held within three months of the expiration of the said period of twenty-eight days may be convened by the President or the requisitionists.”

The LMS Journal of Computation and Mathematics is an electronic journal, so very cheap to run. Perhaps the LMS feels that to run a cheap journal at a small loss sets a dangerous precedent, given that it depends so heavily on the income it gets from its journals. But some sort of line has surely been crossed when a mathematical society closes down a journal that is successful mathematically on the grounds that it is insufficiently successful economically.

]]>

The problem, as with many problems in combinatorics, is easy to state, but fascinatingly hard to solve. It is a classic extremal problem, in that it asks how big some combinatorial object needs to be before it is guaranteed to contain a subobject of some particular kind. In this case, the object is a –*uniform hypergraph*, which just means a collection of sets of size . The subobject one would like to find is a *sunflower* of size , which means a collection of sets such that we can find disjoint sets with the disjoint and with for each . I have used the letters and to stand for “head” and “petal” — is the head of the sunflower and are the petals.

How many sets of size do you need to guarantee a sunflower of size ? A simple argument gives not too bad an upper bound of . We argue as follows. Let be a collection of sets, each of size . If we can find disjoint sets in then we are done (with an empty head and the petals being the sets themselves). Otherwise, let be the union of a maximal disjoint collection of sets in , and note that has cardinality at most , and that every set in has a non-empty intersection with .

By the pigeonhole principle, we can find and a subcollection of containing at least sets, such that every set contains .

Now remove from all the sets in to create a collection of sets of size . By induction, this collection contains a sunflower of size , and putting back the element then gives us a sunflower in . (The base case, when , states that sets are enough to guarantee a sunflower of size .)

How about a lower bound? An easy one is obtained as follows. Let be disjoint sets of size and let consist of all sets that intersect each in exactly one place. There are such sets, and there cannot be a sunflower, since there is not enough room in each to make it possible to have disjoint petals.

The main question of interest is the dependence on : for fixed , does the correct bound grow exponentially (like the lower bound just given), or more like , or like something in between? Even when , the first non-trivial case, the answer is not known (though there are bounds known that improve on the simple ones I’ve just given).

For more information, as well as a discussion about how a homological approach might be useful, see Gil’s post.

At the time of writing, Gil’s post has attracted 60 comments, but it is still at what one might call a warming-up stage, so if you are interested in the problem and understand what I have written above, it should still be easy to catch up with the discussion. I strongly recommend contributing — even small remarks can be very helpful for other people, sparking off ideas that they might not have had otherwise. And there’s nothing quite like thinking about a problem, writing regular bulletins of the little ideas you’ve had, and getting feedback on them from other Polymath participants. This problem has the same kind of notoriously hard feel about it that the Erdős discrepancy problem had — it would be wonderful if a Polymath collaboration could contribute to its finally getting solved.

If you have comments specific to what I’ve written above, such as to point out typos or inaccuracies, then by all means write them here, but if you have mathematical thoughts about the problem, please write them on Gil’s blog.

]]>

This post is therefore the final post of the polymath5 project. I refer you to Terry’s posts for the mathematics. I will just make a few comments about what all this says about polymath projects in general.

After the success of the first polymath project, which found a purely combinatorial proof of the density Hales-Jewett theorem, there was an appetite to try something similar. However, the subsequent experience made it look as though the first project had been rather lucky, and not necessarily a good indication of what the polymath approach will typically achieve. I started polymath2, about a Banach-space problem, which never really got off the ground. Gil Kalai started polymath3, on the polynomial Hirsch conjecture, but the problem was not solved. Terence Tao started polymath4, about finding a deterministic algorithm to output a prime between and , which did not find such an algorithm but did prove some partial results that were interesting enough to publish in an AMS journal called Mathematics of Computation. I started polymath5, with the aim of solving the Erdős discrepancy problem (after this problem was chosen by a vote from a shortlist that I drew up), and although we had some interesting ideas, we did not solve the problem. The most obviously successful polymath project was polymath8, which aimed to bring down the size of the gap in Zhang’s prime-gaps result, but it could be argued that success for that project was guaranteed in advance: it was obvious that the gap could be reduced, and the only question was how far.

Actually, that last argument is not very convincing, since a lot more came out of polymath8 than just a tightening up of the individual steps of Zhang’s argument. But I want to concentrate on polymath5. I have always felt that that project, despite not solving the problem, was a distinct success, because by the end of it I, and I was not alone, understood the problem far better and in a very different way. So when I discussed the polymath approach with people, I described its virtues as follows: a polymath discussion tends to go at lightning speed through all the preliminary stages of solving a difficult problem — trying out ideas, reformulating, asking interesting variants of the question, finding potentially useful reductions, and so on. With some problems, once you’ve done all that, the problem is softened up and you can go on and solve it. With others, the difficulties that remain are still substantial, but at least you understand far better what they are.

In the light of what has now happened, the second case seems like a very accurate description of the polymath5 project, since Terence Tao used ideas from that project in an essential way, but also recent breakthroughs in number theory by Kaisa Matomäki and Maksim Radziwiłł that led on to work by those authors and Terry himself that led on to the averaged form of the Elliott conjecture that Terry has just proved. Thus, if the proof of the Erdős discrepancy problem in some sense requires these ideas, then there was no way we could possibly have hoped to solve the problem back in 2010, when polymath5 was running, but what we did achieve was to create a sort of penumbra around the problem, which had the effect that when these remarkable results in number theory became available, the application to the Erdős discrepancy problem was significantly easier to spot, at least for Terence Tao …

I’ll remark here that the approach to the problem that excited me most when we were thinking about it was a use of duality to reduce the problem to an existential statement: you “just” have to find a function with certain properties and you are done. Unfortunately, finding such a function proved to be extremely hard. Terry’s work proves abstractly that such a function exists, but doesn’t tell us how to construct it. So I’m left feeling that perhaps I was a bit too wedded to that duality approach, though I also think that it would still be very nice if someone managed to make it work.

There are a couple of other questions that are interesting to think about. The first is whether polymath5 really did play a significant role in the discovery of the solution. Terry refers to the work of polymath5, but one of the key polymath5 steps he uses was contributed by him, so perhaps he could have just done the whole thing on his own.

At the very least I would say that polymath5 got him interested in the problem, and took him quickly through the stage I talked about above of looking at it from many different angles. Also, the Fourier reduction argument that Terry found was a sort of response to observations and speculations that had taken place in the earlier discussion, so it seems likely that in some sense polymath5 played a role in provoking Terry to have the thoughts he did. My own experience of polymath projects is that they often provoke me to have thoughts I wouldn’t have had otherwise, even if the relationship between those thoughts and what other people have written is very hard to pin down — it can be a bit like those moments where someone says A, and then you think of B, which appears to have nothing to do with A, but then you manage to reconstruct your daydreamy thought processes to see that A made you think of C, which made you think of D, which made you think of B.

Another question is what should happen to polymath projects that don’t result in a solution of the problem that they are trying to solve, but do have useful ideas. Shouldn’t there come a time when the project “closes” and the participants (and othes) are free to think about the problem individually? I feel strongly that there should, since otherwise there is a danger that a polymath project could actually delay progress on a problem by discouraging research on it. With polymath5 I tried to signal such a “closure” by writing a survey article that was partly about the work of polymath5. And Terry has now written up his work as an individual author, but been careful to say which ingredients of his proof were part of the polymath5 discussion and which were new. That seems to me to be exactly how things should work, but perhaps the lesson for the future is that the closing of a polymath project should be done more explicitly — up to now several of them have just quietly died. I had at one time intended to do rather more than what I did in the survey article, and write up, on behalf of polymath5 and published under the polymath name, a proper paper that would contain the main ideas discovered by polymath5 with full proofs. That would have been a better way of closing the project and would have led to a cleaner situation — Terry could have referred to that paper just as anyone refers to a mathematical paper. But while I regret not getting round to that, I don’t regret it too much, because I also quite like the idea that polymath5’s ideas are freely available on the internet but not in the form of a traditional journal article. (I still think that on balance it would have been better to write up the ideas though.)

Another lesson for the future is that it would be great to have some more polymath projects. We now know that Polymath5 has accelerated the solution of a famous open problem. I think we should be encouraged by this and try to do the same for several other famous open problems, but this time with the idea that as soon as the discussion stalls, the project will be declared to be finished. Gil Kalai has said on his blog that he plans to start a new project: I hope it will happen soon. And at some point when I feel slightly less busy, I would like to start one too, on another notorious problem with an elementary statement. It would be interesting to see whether a large group of people thinking together could find anything new to say about, for example, Frankl’s union-closed conjecture, or the asymptotics of Ramsey numbers, or the cap-set problem, or …

]]>

Part of the motivation for starting the journal is, of course, to challenge existing models of academic publishing and to contribute in a small way to creating an alternative and much cheaper system. However, I hope that in due course people will get used to this publication model, at which point the fact that Discrete Analysis is an arXiv overlay journal will no longer seem interesting or novel, and the main interest in the journal will be the mathematics it contains.

The members of the editorial board so far — but we may well add further people in the near future — are Ernie Croot, me, Ben Green, Gil Kalai, Nets Katz, Bryna Kra, Izabella Laba, Tom Sanders, Jozsef Solymosi, Terence Tao, Julia Wolf, and Tamar Ziegler. For the time being, I will be the managing editor. I interpret this as meaning that I will have the ultimate responsibility for the smooth running of the journal, and will have to do a bit more work than the other editors, but that decisions about journal policy and about accepting or rejecting papers will be made democratically by the whole editorial board. (For example, we had quite a lot of discussion, including a vote, about the title, and the other editors have approved this blog post after suggesting a couple of minor changes.)

I will write the rest of this post as a series of questions and answers.

The members of the editorial board all have an interest in additive combinatorics, but they also have other interests that may be only loosely related to additive combinatorics. So the scope of the journal is best thought of as a cluster of related subjects that cannot easily be pinned down with a concise definition, but that can be fairly easily recognised. (Wittgenstein refers to this kind of situation as a family resemblance.) Some of the subjects we will welcome in the journal are harmonic analysis, ergodic theory, topological dynamics, growth in groups, analytic number theory, combinatorial number theory, extremal combinatorics, probabilistic combinatorics, combinatorial geometry, convexity, metric geometry, and the more mathematical side of theoretical computer science. The phrase “discrete analysis” was coined by Ben Green when he wanted a suitable name for a seminar in Cambridge: despite its oxymoronic feel, it is in fact a good description of many parts of mathematics where the structures being studied are discrete, but the tools are analytical in character. (A particularly good example is the use of discrete Fourier analysis to solve combinatorial problems in number theory.)

We do not want the journal to be a fully general mathematical journal, but we do want it to be broad. If you are in doubt about whether the subject matter of your paper is suitable, then feel free to consult an editor. We will try to err on the side of inclusiveness.

No. This journal is what some people call a *diamond* open access journal: there are no charges for readers (obviously, since the papers are on the arXiv), and no charges for authors.

The software for managing the refereeing process will be provided by Scholastica, an outfit that was set up a few years ago by some graduates from the University of Chicago with the aim of making it very easy to create electronic journals. However, the look and feel of Discrete Analysis will be independent: the people at Scholastica are extremely helpful, and one of the services they provide is a web page designed to the specifications you want, with a URL that does not contain the word “scholastica”. Scholastica does charge for this service — a whopping $10 per submission. (This should be compared with typical article processing charges of well over 100 times this from more conventional journals.) Cambridge University has kindly agreed to provide a small grant to the journal, which means that we will be able to cover the cost of the first 500 or so submissions. I am confident that by the time we have had that many submissions, we will be able to find additional funding. The absolute worst that could happen is that in a few years’ time, we will have to ask people to pay an amount roughly equal to the cost of a couple of beers to submit a paper, but it is unlikely that we will ever have to charge anything.

Whatever happens, this journal will demonstrate the following important principle: if you trust authors to do their own typesetting and copy-editing to a satisfactory standard, with the help of suggestions from referees, then the cost of running a mathematics journal can be at least two orders of magnitude lower than the cost incurred by traditional publishers. In theory, this offers a way out of the current stranglehold that the publishers have over us: if enough universities set up enough journals at these very modest costs, then we will have an alternative and much cheaper publication system up and running, and it will look more and more pointless to submit papers to the expensive journals, which will save the universities huge amounts of money. Just to drive the point home, the cost of submitting an article from the UK to the Journal of the London Mathematical Society is, if you want to use their open-access option, £2,310. If Discrete Analysis gets 50 submissions per year (which is more than I would expect to start with), then this single article processing charge would cover our costs for well over five years.

Furthermore, even these modest costs could have been lower. We happened to have funds that allowed us to use Scholastica’s facilities, and decided to do that, but another possibility would have been the Episciences platform, which has been specifically designed for the setting up of overlay journals, and which does not charge anything. It is still in its very early stages, but it already has two mathematics journals (which existed before and migrated to the Episciences platform), and it would be very good to see more. Another possibility that some people might find it worth considering is Open Journal Systems, though that requires a degree of technical skill that I for one do not possess, whereas setting up a journal with Scholastica has been extremely easy, and I think using the Episciences platform would be easy as well.

Could a malevolent person — let us call him or her the Evil Seer — bankrupt the journal by submitting 1000 computer-generated papers? Is it reasonable for us to be charged $10 for instantly rejecting a two-page proof of the Riemann hypothesis that uses nothing more than high-school algebra? I have taken this up with Scholastica, and they have told me that in such cases we just need to tell them and will not be charged.

Yes. As already mentioned, the articles will be peer-reviewed in the traditional way. There will also be a numbering system for the articles, so that when they are cited, they look like journal articles rather than “mere” arXiv preprints. They will be exclusive to Discrete Analysis. They will have DOIs, and the journal will have an ISSN. Whether the journal will at some point have an impact factor I do not know, but I hope that most people who consider submitting to it will in any case have a healthy contempt for impact factors. We will adhere to the “best practice” as set out in MathSciNet’s Policy on Indexing Electronic Journals, so our articles should be listed there and on Zentralblatt — we are in the process of checking whether this will definitely happen.

No. Another example is SIGMA (Symmetry, Integrability and Geometry: Methods and Applications), though as well as giving arXiv links it hosts its own copies of its articles. And another, which is a mathematically oriented computer science journal, is Logical Methods in Computer Science. I would guess that there are several others that I am unaware of. But one can at least say that Discrete Analysis is an early adopter of the arXiv overlay model.

The current plan is that people are free to submit articles immediately, via a temporary website that has been set up for the purpose. We hope that we will be able to process a few good papers quickly, which will allow us to have an official launch of the journal in early 2016 with some articles already published.

It is difficult to be precise about this, especially before we have received any submissions. However, broadly speaking, we would like to publish genuinely interesting papers in the areas described above. So if you have proved a result that you think is likely to interest the editors, then please consider Discrete Analysis for it. We would like the journal to be consistently interesting, but we do not want to set the standard so high that we do not publish anything.

It would be a pity to exclude the editors from the journal, given that their areas of research are by definition suitable for it. Our policy will be to allow editors to be authors, but to apply slightly more rigorous standards to submissions from editors. In practice, that will mean that in borderline cases a paper will be at a disadvantage if one of its authors is an editor. It goes without saying that editors will be completely excluded from the discussion of any paper that might lead to a conflict of interest. Scholastica’s software makes it very easy to do this.

We have not (yet) discussed the question of whether I as *managing* editor should be allowed to submit to the journal, but I shall probably follow the policy of many reputable journals and avoid doing so (albeit with some regret) and send any papers that would have been suitable to other journals with publication models that I want to support.

An obvious partial answer to this question is that the list of links on our journal website will be a list of certificates that certain arXiv preprints have been peer reviewed and judged to be of a suitable standard for Discrete Analysis. Thus, it will provide information that the arXiv alone does not provide.

However, we intend to do slightly more than this. For each paper, we will give not just a link, but also a short description. This will be based on the abstract and introduction, and on any further context that one of our editors or referees may be able to give us. The advantage of this is that it will be possible to browse the journal and get a good idea of what it contains, without having to keep clicking back and forth to arXiv preprints. In this way, we hope to make visiting the Discrete Analysis home page a worthwhile experience.

Another thing we will be able to do with these descriptions is post links to newer versions of the articles. If an author wishes to update an article after it has been published, we will provide two links: one to the “official” version (that is, not the first submitted version, but the “final” version that takes into account comments by the referee), and one to the new updated version, with a brief summary of what has changed.

The mathematical community is now sufficiently dependent on the arXiv that it is very unlikely that the arXiv will fold, and if it does then there will be greater problems than the fate of Discrete Analysis. However, in this hypothetical situation, we will download all the articles accepted by Discrete Analysis, as well as those still under review, and find another way of hosting them. Note that articles posted to the arXiv are automatically uploaded to HAL as well, so one possibility would be simply to change the arXiv links to HAL links. As for Scholastica, they perform regular backups of all their data, so even if their main site were to be wiped out, all the information concerning their journals would be recoverable. In short, barring a catastrophic failure of the entire internet, articles published in Discrete Analysis will be secure and permanent.

The editors have widely differing views about these sorts of ideas. For now, we are taking a cautious approach, trying to make the journal as conventional as possible so as to maximize its chances of becoming successful. If at some point in the future we decide to experiment with newer methods of peer review, we shall continue to be cautious, and will always give authors the chance to opt out of them.

First, post it on the arXiv, selecting one of the CC-BY options when it asks you which licence you want to use (this is important for ensuring that the journal complies with the open-access requirements of various funding bodies, but if you have already posted the article under a more restrictive licence, you can always use a CC-BY licence for the version that is revised in the light of comments from referees). Then go to the journal’s temporary website, click on the red “Submit Manuscript” button in the top right-hand corner, and follow the simple instructions.

Not everybody reads blogs, so one way that you can support the journal is to bring it to the attention of anybody you know who might conceivably have a suitable paper for it. The sooner we can build up an initial list of interesting papers, the sooner the journal can become established, and the sooner the cheap arXiv overlay model can start competing with the expensive traditional models of publication.

]]>

As we have always said, the party with the most votes and the most seats in this election has the first right to seek to form a Government. The British people would rightly question the legitimacy of a coalition that didn’t allow the party with the largest number of seats and votes the opportunity to attempt to form a Government first.

I’m proud that the Liberal Democrats have proved we can form a strong and stable coalition government, able to bring prosperity to Britain.

Just like we would not put UKIP in charge of Europe, we are not going to put the SNP in charge of Britain – a country they want to rip apart.

The current projections at Five Thirty-Eight put the Conservatives on 281 seats, Labour on 268, the Scottish Nationalists on 49 and the Liberal Democrats on 26. If these are correct, then Clegg is saying that he will try first to form a Government with the Conservatives. I claim that this is inconsistent with all four of the fundamental Liberal values I mentioned.

It is obviously inconsistent with wanting to promote a broadly centre-left political programme. From what Ed Miliband is saying, it seems unlikely that there will be a formal coalition between Labour and the SNP. However, there does seem to be room for a looser agreement, since the SNP would hate to be responsible for there being another Conservative government, as Nicola Sturgeon has made very clear. Furthermore, Sturgeon has also been clear that she will not press for another referendum on Scottish independence, so there is no reason to suppose that a loose alliance between Labour and the SNP would be a threat to the UK. Thus, Clegg’s choice is between supporting a right wing party or a centre-left alliance of two parties.

The right-wing party is flirting dangerously with leaving the European Union: David Cameron’s official position is that he wants to renegotiate the treaty and then campaign to stay inside a reformed Europe. He has not said what he will do if, as is almost inevitable, he fails to reform the EU, but it is hard to see how he could campaign to remain inside an EU that has humiliated him by refusing his demands for reform. Labour and the SNP, by contrast, are committed to staying in the EU (unless it changes radically) and will not hold a referendum. Yet Clegg says that he will attempt to form a government with the right-wing party, which, I might add, is also full of climate-change deniers, as right-wing parties tend to be.

What is Clegg’s rationale for this? He talks about democratic legitimacy, and here his views are utterly inconsistent with the basic principles that lie behind arguments for voting reform. One of the strongest arguments is that under the current system if you have two parties with broadly similar views, they can split the vote and be heavily penalized, giving power to a much less representative party. And yet that is exactly what Clegg, in effect, supports. The great irony of his position is that by saying that he will support the largest party, he is advocating a first-past-the-post system for forming a government. If the Conservatives get the most seats but are greatly outnumbered in the House of Commons by people of a broadly centre-left persuasion, Clegg claims that a centre-left alliance will nevertheless lack democratic legitimacy. Has he forgotten why he argued against the first-past-the-post system?

To get a full idea of how wrong his position is, let’s imagine a different scenario. Suppose that Labour and the Conservatives had a very similar number of seats and the Lib Dems held the balance of power. According to Clegg, the Lib Dems should form a coalition with whichever of Labour and the Conservatives have the most seats. But isn’t he forgetting something? What about the political preferences of Lib Dem voters? Do they count for nothing? The democratically legitimate option is to choose the major party that best represents the interests of Lib Dem voters, since then the largest number of voters get roughly what they voted for.

I have been sufficiently loyal to the party to forgive it for some pretty dreadful mistakes over the last few years, such as killing off any hope of voting reform in my lifetime, and breaking their promises about tuition fees — I put these down to naivety resulting from inexperience with coalitions. But there is no excuse this time, and Clegg’s basic principles about coalition-building are simply wrong. It may be that he will try but fail to form a government with the Conservatives and end up in just the kind of alliance I would like to see. But he will be wrong even to try.

It will be difficult not to vote for Julian Huppert, our MP for the last five years, who has been excellent and independent-minded (for instance, he voted against tuition fees). I do not want to punish him for the sins of Nick Clegg. But I care even more about the values that have led me to vote Liberal in the past, and it now seems to me that every seat that Labour can pick up from the Lib Dems increases the chances of those values being promoted in the House of Commons. If any card-carrying Lib Dems want to try to persuade me otherwise in the comments below, they will be most welcome to do so.

]]>

Of course, any change will have to be in the direction of making the deal less generous for those with pensions. Indeed, changes have already been made. Until a few years ago, the amount you got at the end was based on your final salary. More precisely, you got one 80th of your final salary per year after retirement for each year that you contributed to the scheme, up to a maximum of 40 years of contributions (and thus a maximum of half your final salary when you retire). But a few years ago they closed this final-salary scheme to new entrants, because (they said) it had become too expensive. This was partly because now a much larger proportion of academics end up as professors, so their final salaries are higher, and also of course because people live for longer.

They now propose to close the final-salary scheme even for existing participants. That of course raises the question of what happens to the contributions we have already made to the scheme. If the USS really can’t afford to keep going with the present arrangements, it is perhaps reasonable to say that we cannot continue to make contributions under those arrangements, but our past contributions were made under the very clear understanding that each year of contributions would add one 80th of our final salary to our eventual annual pension payments. Will that still be the case?

I received a letter from the USS yesterday that included the following reassuring paragraph.

As an active member of the Final Salary section of the scheme, you would be affected by the proposed changes. Under the proposals, the pension benefits provided to you in the future would be different to those that are currently provided through the scheme. It is important to note that the pension rights you have already earned are protected by law and in the scheme rules; the proposed changes will only affect the pension benefits that you will be able to build up in the future if the changes are implemented as proposed.

Does this mean, then, that the pension I have already built up is safe? No, it decidedly doesn’t. If you received a similar letter and were reassured by the above paragraph, then please unreassure yourself, since it is hiding the fact that you stand to lose a *lot* of money (the precise amount depending on your circumstances — I will discuss this later in the post).

The key to how this can be lies in a paragraph from a leaflet that I received with the letter. It says the following.

If you are a member of the current final salary section, the benefits you have built up — your accrued benefits — will be calculated using your pensionable salary and pensionable service immediately prior to the implementation date. Going forward, those accrued benefits will be revalued in line with increases in official pensions (currently the Consumer Prices Index — CPI) each April, up to the point of retirement or leaving the scheme.

In plain language, they are saying that for each year of contributions that you have made to the scheme, you will now earn one 80th of your salary *at the time that the changes to the scheme are implemented* and not at the time that you retire. So if, say, you are in mid career and your final salary ends up 25% higher than your current salary, then what you will get for your contributions so far will be reduced by 20%. (The difference between those two percentages is because if you increase a number by 25%, then to get back to the original number you have to decrease the new number by 20%.)

Let’s illustrate this USS-style with a few hypothetical examples. I will ignore inflation, but it is straightforward to adjust for it.

1. Alice is a historian. She was appointed 19 years ago, when she was in her late 20s. Since then, she has had two children, which caused a temporary drop in her academic productivity, but she has made up for it since, and her career is going well. She has just become a reader, and is told that she is very likely to become a professor in the next two or three years. Her current salary is £56,482 per year and will be £58,172 next year.

Looking into the future, she does indeed become a professor, in 2018, and starts two notches up from the bottom of the professorial salary scale, at £71,506. Looking further into the future, she ends up at the top of Band 1 of the professorial scale, with a salary of £85,354 (plus inflationary increases).

Unfortunately for her, the changes to the scheme are implemented before she is promoted, so the 20 years of contributions that she has by then amassed earn her 20/80, or a quarter, of her reader’s salary of £58,172, per year. That is, it earns her £14,543 per year. (This is not her total pension — just the part of her pension that results from the contributions she has made so far.) Had the scheme not been changed, those contributions would have instead earned her a quarter of her final salary of £85,354, which would work out as £21,438.50 per year. So she has lost nearly £7,000 per year from her pension as a result of the changes. She is destined to live for 25 years after she retires, so her loss works out as £175,000.

2. Bob is also a historian and a good friend of Alice. He was appointed at the same time, is the same age, and has had a very similar career, but he has progressed slightly earlier because he did not have a period of low academic productivity. He became a reader three years ago and will become a professor later this year, starting two notches above the bottom salary level, at £71,506. He too is destined to end his career at the top of professorial Band 1 with a salary of £85,354.

Under the new scheme, his pension contributions up to the time of the change will earn him a quarter of £71,506 per year, or £17,876.50. Under the current scheme, they would have earned him £21,438.50 per year, just as Alice’s would, since their final salaries are destined to be the same. So Bob too has lost out.

However, Bob was luckier than Alice because he was promoted just before the change to the system, as a result of which his salary at the time of the change will be substantially higher than that of Alice. Even though Alice will be promoted soon afterwards, she will end up much worse off than Bob, to the tune of £3,333.50 per year.

3. Carl is a mathematician. He proved some very good results in his early 30s and was promoted to professor at the age of 38. He too has put in 20 years of contributions by the time of the changes, by which time he is at the top of Band 1 with a salary of £85,354. Unfortunately, soon after he became a professor, he burnt out somewhat, never quite matching the achievements of his youth, so his salary is not going to increase any further. So for him the changes to the system make no difference: his current salary is is final salary. As with both Alice and Bob, under the current system his contributions would earn him £21,438.50. But for Carl they will earn him £21,438.50 under the new system as well.

There are two general points I want to make with these examples. The first is that the changes amount to the breaking of an agreement. We were not obliged to take out a pension with USS, but were told that it was crazy not to do so because the payout was based on our final salary. I started my pension late (out of sheer stupidity, but that’s another story) and decided that at considerable expense (because there was not an accompanying employers’ contribution) I would make additional voluntary contributions. When I was deciding to do this, it was explained to me that each year I bought would add one 80th of my final salary to my pension. I am on a salary scale and have not reached the top of it, so if the USS make the proposed changes then they will be reneging on that agreement.

Is this legal? Here again is what they said.

It is important to note that the pension rights you have already earned are protected by law and in the scheme rules; the proposed changes will only affect the pension benefits that you will be able to build up in the future if the changes are implemented as proposed.

A lot depends on what is meant by “the pension rights you have already earned”. I would understand that to mean my final salary multiplied by the number of years I have contributed to the scheme divided by 80, since that is what I was told I would be getting for the money I have paid in so far. However, I think it may be that in law what I have already earned is what I could take away if I left the scheme now, which would be based on my current salary, and that part of “building up in the future” is sticking around in Cambridge while my salary increases. If anybody knows the answer to this legal question, I would be very interested. I have tried to find out by looking at the Pension Schemes Act 1993, and in particular Chapter 4, but it is pretty impenetrable. (Lawyers often claim that this impenetrability is necessary in order to avoid ambiguity, but in this instance it seems to have the opposite effect.)

But even if it turns out that it is not illegal for USS to interpret “the pension rights you have already earned” in this way, it is quite clearly immoral: it is a straightforward breaking of the terms of the agreement I had with them when I decided to take out a USS pension and make additional voluntary contributions. And of course I am far from alone in this respect. I personally don’t expect my final salary to be all that much higher than my current salary, so I probably won’t lose too much, but people whose final salaries are likely to be a lot higher than their current salaries will lose hugely.

The second point is that the way the USS has decided to share out the pain hugely exacerbates unfairnesses that are already present in the system. It is not fair that scientists are typically promoted much earlier than those in the humanities. In many cases it is not fair when men are promoted earlier than women. But at least those who were promoted more slowly could console themselves with the thought that they would probably catch up eventually, and that their pensions would therefore be comparable. If the changes come into effect, then as the examples above illustrate, if two people are in mid career at the time of the changes and are destined to reach the same final salary, but one has been promoted more than the other at the time of the changes, then the first person will end up not just with all that extra salary as at present but also with a substantially higher pension.

There is a mathematical point to make here that applies to many different policies. It is very wrong if the effect of the policy does not depend roughly continuously on somebody’s circumstances. But if you belong to the final-salary section and are up for promotion soon, you had better hope that you get promoted just before the change rather than just after it, since the accumulated difference it will make to your pension will be very large, even though the difference to your career progression will be small.

If all this bothers you, please do two things. First, alert your colleagues to what is going on and to what is wrong with it. Secondly, consider signing a petition that has been set up to oppose the changes.

**Update.** There are two further points that have come to my attention that mean that the situation is worse than I described it. The first is that I forgot to mention the lump sum that one receives on retirement. This is worth three times one’s annual pension, so for each of Alice, Bob and Carl, what they stand to lose from the lump sum under the new system is three quarters of the difference between their current salary and their final salary. Thus, Alice loses around £21,000 from her lump sum, while Carl loses nothing from his.

However, it turns out that Carl is not quite as fortunate as I claimed above, owing to a further consideration that I did not know about, which is that academic salaries tend to rise faster than inflation. I don’t mean that the salary of any one individual rises faster as a result of salary increments. I mean that if you take the salary at a fixed place in the salary scale, then that tends to rise faster than inflation. So although Carl will remain on the same point at the top of Band 1 for the rest of his career, his salary is likely to be significantly higher in real terms when he retires than it is now. I am told that it is quite usual for salaries to go up by at least 1% more than inflation, so in 20 years’ time this could make a big difference. This second consideration makes the situation worse for Alice and Bob by the same amount that it does for Carl.

]]>

I’ll start with the case . I want to phrase a familiar argument in a way that will make it easy to generalize in the way I want to generalize it. Let be the rectangle consisting of all integer points such that and . We can partition into those points for which and those points for which . The number of points of the first kind is , since for each we get possibilities for . The number of points of the second kind is , since for each we get possibilities for . Therefore, and we get the familiar formula.

Now let’s move to sums of squares. This time I’ll let be the set of points with , and . We can partition into three sets, the first consisting of points for which is maximal, the second of points for which is maximal and is not maximal, and the third of points for which is strictly larger than both and . The numbers of points in these sets are easily seen to be

and

,

respectively. This gives us the formula

from which we get the familiar formula for the sum of the first squares. Writing for the sum of the first th powers, we also get the relationship

A striking fact about power sums is that . One way of explaining this can be found in a rather nice animation that I came across as a result of a Google Plus post of Richard Green. Another comes from continuing with the approach here.

This time I’ll let be the set of points such that and are between 1 and n and and are between 1 and . Again I’ll partition according to which of is the largest, taking the first one that’s largest when there is a tie. That gives me four sets. Here are their sizes.

first largest: .

first largest: .

first largest: .

first largest: .

These sizes can be written as , and . So we get , which gives us that . It also gives us a kind of explanation of that fact: for we decompose into two equal pieces of size , while for we decompose into four pieces that don’t quite all have size but the two errors cancel out.

To see that this is a partial but not total coincidence, I’m going to jump to now. I’ll let be the set of points such that are between 1 and and are between 1 and . This time the calculations are as follows.

first largest: .

first largest: .

first largest: .

first largest: .

first largest: .

first largest: .

Adding all these up we find that . From that we get that

In general, if we use this method to calculate when is odd, then (as with other methods) we obtain a relationship between and earlier . But what is nice about it is that there is a lot of cancellation: all the for even make a contribution of zero.

Indeed, if , then we have sets, and their sizes are , , , and so on down to

and then the same thing but with all minus signs in these linear combinations replaced by plus signs. Adding it all up we get a linear combination of , , … , equalling , where if is even and if is odd. When is small, we don’t have to take into account too many , so the formulae remain quite nice for a while before they become disgusting.

Note that it is quite easy to work out the coefficients of the various in the above linear combination: they are just sums of binomial coefficients. Several other methods require one to solve simultaneous equations, though they are usually in triangular form, so not too bad.

A small remark is that the basic idea of this argument is to discretize a much easier continuous argument that shows that . That argument is to take the -dimensional cube consisting of all points such that each belongs to the interval and partition it into pieces according to which coordinate is largest. (I call the argument geometrical because these pieces are pyramids with -cube bases.) In the continuous case, we don’t have to worry about what happens if there is more than one largest coordinate, since that set has measure zero. Each piece has measure , and there are pieces, while the cube has measure , so we are done.

A second remark is that the method I previously knew of for calculating sums of th powers is to exploit the fact that deserves to be called the natural discrete analogue of the above. Define to be . This we think of as the discrete analogue of . Then we do a discrete analogue of differentiating, which is to look at , which equals . This is the discrete analogue of the fact that the derivative of is . Next, we use the discrete analogue of the fundamental theorem of calculus, which is the statement that to deduce that . This gives us for each a polynomial of degree that we can sum easily, namely , and then to work out the sum of the first th powers, we write it as a linear combination of the , sum everything, and simplify the answer. That works fine, but the calculations are quite a bit more complicated than what I did above, and the proof is too algebraic to explain why the answers have the fairly nice forms they do. (For example, why is a factor of the sum of the first th powers whenever is odd and at least 3? From the argument I gave earlier in the post, this follows fairly easily by induction.)

]]>

As a result, the first talk I went to was Manjul Bhargava’s plenary lecture, which was another superb example of what a plenary lecture should be like. Like Jim Arthur, he began by telling us an absolutely central general problem in number theory, but interestingly it wasn’t the same problem — though it is related.

Bhargava’s central problem was this: given a function on the integers/rationals that takes integer/rational values, when does it take square values? In order to persuade us that this problem had been a central preoccupation of number theorists for a very long time, he took as his first example the function . Asking for this to take square values is asking for a Pythagorean triple, and people have been interested in those for thousands of years. To demonstrate this, he showed us a cuneiform tablet, which was probably the Babylonian tablet Plimpton 322, which contains a list of Pythagorean triples, some of which involve what in decimal notation are 5-digit numbers, and therefore not the kind of example one stumbles on without some kind of systematic procedure for generating it.

If one takes one’s function to be a cubic in one variable, then one obtains an elliptic curve, and rational points on elliptic curves are of course a huge topic in modern number theory, one to which Bhargava has made a major contribution. I won’t say much more about that, since I have already said a reasonable amount about it when discussing his laudatio. But there were a few extra details that are worth reporting.

He told us that Goldfeld and Katz and Sarnak had conjectured that 50% of elliptic curves have rank 0 and 50% have rank 1 (so the density of elliptic curves with higher rank is zero). He then told us about some work of Brumer and McGuinness in 1990 that seems to cast doubt on this (later) conjecture: they found that rank 2 curves occur quite often and their frequency increases as the coefficients get larger. More recent computational work has very strongly suggested that the conjecture is false: if you draw a graph of the average rank of elliptic curves as the size goes from to , it increases quickly from 0.7 before tailing off and appearing to tend to about 0.87. Apparently the reaction of Katz and Sarnak was a cheerful, “Well, it will go down eventually.”

Bhargava was pretty sceptical about this, but became properly interested in the problem when he learnt about work of Brumer, who showed assuming the generalized Riemann hypothesis and the Birch–Swinnerton-Dyer conjecture that the average rank was bounded above by 2.3. As Bhargava put it, this was a result that depends on two million dollars worth of conjectures. But that meant that if one could prove that the average rank of elliptic curves was greater than 2.3, then one would have shown that at least one of the generalized Riemann hypothesis and the Birch–Swinnerton-Dyer conjecture was false.

Still using the two million dollars worth of conjecture, Heath-Brown got the bound down to 2 in 2004, and Young got it to 1.79 in 2009. Bhargava and Shankar managed to improve that by 0.9 and two million dollars: that is, they obtained an unconditional bound of 0.89, amusingly close to the apparent asymptote of the graph that comes from the computations. As Bhargava pointed out, if one could extend those computations and find that the density eventually surpassed 0.89, this would, paradoxically, be very good news for the conjecture of Katz and Sarnak, because it would prove that the graph did eventually have to start coming down.

More recently, with Chris Skinner, Bhargava got an unconditional lower bound of 0.2.

One thing I understood a bit better by the end of Bhargava’s lecture was the result that the Birch–Swinnerton-Dyer conjecture holds for a positive proportion of elliptic curves. Although this is a remarkable result, there is a sense in which it is a slight cheat. What I mean by that is that Bhargava and his collaborators have a clever way of proving that a positive proportion of elliptic curves have rank 1. Then of those curves, they have a clever way of showing that for a positive proportion of those curves the order of the L-function at s=1 is also 1. What this argument doesn’t do, if my understanding is correct, is show something like this (except perhaps in some trivial sense):

- Every elliptic curve that satisfies a certain criterion also satisfies the Birch–Swinnerton-Dyer conjecture.
- A positive proportion of elliptic curves satisfy that criterion.

So in some sense, it doesn’t really get us any closer to establishing a connection between the rank of an elliptic curve and the order of the associated L-function at s=1. Perhaps in that respect it is a bit like the various results that say that a positive proportion of the zeros of the zeta function lie on the critical line, though I’m not sure whether that is a good analogy. Nevertheless, it is a remarkable result, in the sense that it proves something that looked out of reach.

Perhaps my favourite moment in Bhargava’s talk came when he gave us a hint about how he proved things. By this time he was talking about hyperelliptic curves (that is, curves where is a polynomial of degree at least 5), where his main result is that most of them don’t have any rational solutions. How does he show that? The following slide, which I photographed, gives us a huge clue.

He looked at polynomials of degree 6. If the hyperelliptic curve has a rational solution , then by applying the change of variable , we can assume without loss of generality that the rational solution occurs at , which tells us that for some rational . But then you get the remarkable identity shown in the slide: a pair of explicit matrices and such that det. Note that to get these matrices, it was necessary to split up as a product , so we really are using the fact that there is a rational point on the curve. And apparently one can show that for most polynomials of degree 6 such a pair of matrices does not exist, so most polynomials of degree 6 do not take square values.

Just as the Babylonians didn’t find huge Pythagorean triples without some method of producing them, so Bhargava and his collaborators clearly didn’t find those matrices and without some method of producing them. He didn’t tell us what that method was, but my impression was that it belonged to the same circle of ideas as his work on generalizing Gauss’s composition law.

The lecture was rapturously received, especially by non-mathematicians in the audience (that could be interpreted as a subtly negative remark, but it isn’t meant that way), who came away from it amazed to feel that they had understood quite a bit of it. Afterwards, he was mobbed in a way that film stars might be used to, but mathematicians rather less so. I photographed that too.

If you give the photo coordinates in , then Bhargava’s head is at around and he is wearing a dark red shirt.

At 2pm there was the Gauss Prize lecture. I thought about skipping it, but then thought that that would be hypocritical of me after my views about people who left the laudationes just before the one for the Nevanlinna Prize. I shouldn’t be prejudiced against applied mathematics, and in any case Stanley Osher’s work, or at least part of it, is about image processing, something that I find very interesting.

I went to the talk thinking it would be given by Osher himself, but in fact it was given by someone else about his work. The slides were fairly dense, and there was a surprising amount of emphasis on what people call metrics — numbers of papers, H-factors and so on. The fact that the speaker said, “I realize there is more to academic output than these metrics,” somehow didn’t help. I found myself gradually zoning out of this talk and as a result, despite my initial good intentions, do not have anything more to say about Osher’s work, clearly interesting though it is.

I then did skip the first of the afternoon’s parallel sessions. I wondered about going to hear Mohammed Abouzaid, because I have heard that he is a rising star (or rather, an already risen star who probably has even further to rise), but I found his abstract too intimidating.

So the first talk I actually did go to was in the second session, when I went to hear Craig Gentry, a theoretical computer scientist famous for something called homomorphic encryption, which I had heard about without quite understanding what it was. My target for the 45 minutes was to remedy this situation.

In the end two things happened, one good and one bad. The good one was that early on in the talk Gentry explained what homomorphic encryption was in a a way that was easy to understand. The bad one was that I was attacked by one of my periodic waves of tiredness, so after the early success I took in very little else — I was too absorbed in the struggle to keep my eyes open (or rather, to ensure that the brief moments when I shut them didn’t accidentally turn into stretches of several minutes).

The basic idea of homomorphic encryption is this. Suppose you have some function that encrypts data, and let’s suppose that the items one encrypts are integers. Now suppose that you are given the encryptions and of and and want to work out the encryption of . For an arbitrary encryption system there’s not much you can do other than decrypt and , add up the results, and then encrypt again. In other words, you can’t do it unless you know how to decrypt. But what if you want people to be able to do things to encrypted data (such as, say, carrying out transactions on someone’s bank account) without having access to the original data? You’d like some weird operation with the property that . I think now it is clear what the word “homomorphic” is doing here: we want to be a homomorphism from (integers, +) to (encrypted integers, ).

Having said that, I think Gentry told us (but can’t remember for sure) that just doing this for addition was already known, and his achievement has been to find a system that allows you to add and multiply. So I think his encryption may be a ring homomorphism. Something I haven’t stressed enough here is that it isn’t enough for the “funny” operations and to *exist*: you need to be able to compute them efficiently without being able to decrypt efficiently. The little I took in about how he actually did this made it sound as though it was very clever: it wasn’t just some little trick that makes things easy once you’ve observed it.

If you want to know more, the talk is here.

The last talk I went to, of the entire congress, was that of Tom Sanders, who was talking about the context surrounding his remarkable work on Roth’s theorem on arithmetic progressions. Sanders was the first to show that a subset of of density must contain an arithmetic progression of length 3. This is tantalizingly close to the density of the primes in that interval, and also tantalizingly close to the density needed to prove the first non-trivial case of Erdős’s famous conjecture that a subset of such that contains arithmetic progressions of all lengths.

Sanders discussed the general question of which configurations can be found in the primes, but also the question of *why* they can be found. For instance, quadruples such that can be found in the primes, but the proof has nothing to do with the primes other than their density: the number of pairs with prime and less than is about , and the number of possible sums is at most , so some sum can be achieved in several ways. By contrast, while there are many solutions of the equation in the primes (an example is ), one can easily find dense sets of integers with no solutions: for instance, the set of integers congruent to 1 mod 3 or the set of integers strictly between and .

Roth’s theorem concerns the equation , and while has been known for many decades that there are many solutions to this equation in the primes, there is no proof known that uses only the density of the primes, and also no counterexample known that shows that that density is insufficient.

I had a conversation with Sanders after the talk, in which I asked him what he thought the lowest possible density was that guaranteed a progression of length 3. The two natural candidates, given what we know so far, are somewhere around , and somewhere around . (The latter is the density of the densest known set with no progression of length 3.) Recent work of Schoen and Shkredov, building on Sanders’s ideas, has shown that the equation has non-trivial solutions in any set of density at least . I put it to him that the fact that Schoen and Shkredov needed the extra “smoothness” that comes from taking a fivefold sumset on the left-hand side rather than just a twofold one paradoxically casts doubt on the fact that this type of bound is correct for Roth’s theorem. Rather, it suggests that perhaps the smoothness is actually needed. Sanders replied that this was not necessarily the case: while a convolution of two characteristic functions of dense sets can have “gaps”, in the sense of points where the value is significantly less than expected, it is difficult for that value to go all the way down to zero.

That will be a bit too vague to be comprehensible if you are not an additive combinatorialist, so let me try to give a little bit more explanation. Let be a subset of (the integers mod ) of density . We say that is –*quasirandom* if the sizes of the intersections , which have mean , have standard deviation at most . Now one way for the standard deviation to be small is for most of the intersections to have roughly the same size, but for a few of them to be empty. That is the kind of situation that needs to happen if you want an unexpectedly dense set with no arithmetic progression of length 3. (This exact situation doesn’t have to happen, but I’m trying to convey the general feel of what does.) But in many situations, it seems to be hard to get these empty intersections, rather than merely intersections that are quite a bit smaller than average.

After Sanders’s talk (which is here), I went back to my room. By this time, the stomach bug that I mentioned a few posts ago had struck, which wasn’t very good timing given that the conference banquet was coming up. Before that, I went up to the top of the hotel, where there was a stunning view over much of Seoul, to have a drink with Günter Ziegler and one other person whose name I have forgotten (if you’re reading this, I enjoyed meeting you and apologize for this memory lapse). Günter too had a stomach bug, but like me he had had a similar one shortly before coming to Korea, so neither of us could be sure that Korean food had anything to do with it.

The banquet was notable for an extraordinary Kung Fu performance that was put on for our entertainment. It included things like perfomers forming a human pyramid that other performers would run up in order to do a backwards somersault, in the middle of which they would demolish a piece of wood with a sharp blow from the foot. It was quite repetitive, but the tricks were sufficiently amazing to bear quite a bit of repetition.

My last memory of ICM2014 was of meeting Artur Avila in the lobby of the hotel at about 5:25am. I was waiting for the bus that would take me to the airport. “Are you leaving too?” I naively asked him. No, he was just getting back from a night on the town.

]]>

As I’ve already mentioned, Day 3 started with Jim Arthur’s excellent lecture on the Langlands programme. (In a comment on that post, somebody questioned my use of “Jim” rather than “James”. I’m pretty sure that’s how he likes to be known, but I can’t find any evidence of that on the web.) The next talk was by Demetrios Christodoulou, famous for some extraordinarily difficult results he has proved in general relativity. I’m not going to say anything about the talk, other than that I didn’t follow much of it, because he had a series of dense slides that he read word for word. The slides may even have been a suitably chopped up version of his article for the ICM proceedings, but I have not been able to check that. Anyhow, after a gentle introduction of about three or four minutes, I switched off.

I switched on again for János Kollár’s lecture, which was, like some of the others, what I feel a plenary lecture should be: a lecture that gives the non-expert a feel for what is important in the area being talked about. The first thing I wrote down was his brief description of the minimal model problem, one of the central questions in algebraic geometry. I think that by that time he had spent a while telling us what algebraic sets were, explaining why the picture you get if you just work over the reals is somewhat incomplete (for example, you may get a graph with two components, when if you work over the extended complex plane you have a torus), and so on.

The minimal model problem is this: given an algebraic variety , find a variety (the “minimal model” of ) such that the space of meromorphic functions on is isomorphic to the space of meromorphic functions on and the geometry of is as simple as possible. The condition that the function spaces are isomorphic seems (from a glance at Wikipedia) to be another way of saying that the two varieties are birationally equivalent, which is a fundamental notion of equivalence in algebraic geometry. So one is trying to find a good representative of each equivalence class.

The problem was solved for curves by Riemann in 1851, for surfaces by Enriques in 1914 and by Kodaira in 1966 (I don’t know exactly what that means, but I suppose Enriques made major inroads into the problem and Kodaira finished it off). And for higher dimensions there was the Mori program of 1981. As I understand it, Mori made huge progress towards understanding the three-dimensional case, and Christopher Hacon and James McKernan, much more recently, made huge progress in higher dimensions.

Another major focus of research is the *moduli problem*. This, Kollár told us, asks what are the simplest families of algebraic varieties, and how can we transform any family into a simplest one? I don’t know what this means, but I would guess that when he said “families of algebraic varieties” he was talking about some kind of moduli space (partly because that seems the most likely meaning, and partly because of the word “moduli” in the name of the problem). So perhaps the problem is sort of like a “family version” of the minimal model problem: you want to find a simplest moduli space that is in some sense similar to the one you started with.

Anyhow, whatever the problem is, it was done for curves by Deligne and Mumford in 1969, for surfaces by Kollár and Shepherd-Barron in 1988 and Alexeev in 1996 (again I don’t know who did what), and apparently in higher dimensions the Kollár-Shepherd-Barron-Alexeev method works, but there are technical details. (Does that mean that Kollár is confident that the method works but that a full proof has not yet been written out? He may well have told us, but my notes don’t tell me now.)

Kollár then explained to us a third problem. A general technique for studying a variety is to find a variety that is birationally equivalent to and study the question for instead. Under these circumstances, there will be lower dimensional subvarieties and such that . So one is left needing to answer a similar question for and , and since these are of lower dimension, one has the basis for an inductive proof. But for that to work, we want to be adapted to the problem, so the question, “When is a variety simple?” arises.

Apparently this was not even a precisely formulated question until work of Mori and Reid (1980-2) and Kollár, Miyaoka and Mori (1992). The precise formulation involves the first Chern class.

And that’s all I have, other than a general memory that this lecture continued the generally high standard of plenary lectures at the congress.

At 2pm, Avila gave his Fields medallist’s lecture. As with Hairer, I don’t feel I have much to say that I have not already said when describing the laudatio, so I’ll move on to 3pm, or rather 3:05 — by today the conference organizers had realized that it took a non-zero amount of time to get from one talk to another — when David Conlon was speaking.

David is a former student and collaborator of mine, and quite a bit of what he talked about concerned that collaboration. I’ll very briefly describe our main result.

There are many combinatorial theorems that can be regarded as questions about arbitrary subsets of nice structures such as the complete graph on vertices or the cyclic group of order . For example, Ramsey’s theorem says that if you 2-colour the edges of the complete graph on vertices, then (as long as is large enough) one of the colour classes will contain a complete graph on vertices. And Szemerédi’s theorem is equivalent to the assertion that for every and every positive integer there exists such that for every subset of the integers mod of size at least there exist and such that all of belong to .

For many such questions, one can generalize them from the “nice” structures to arbitrary structures. For instance, one can ask of a given graph whether if you colour its edges with two colours then one of those colours must contain a complete subgraph with vertices. Obviously, the answer will be yes for some and no for others, but to make it an interesting question, one can ask what happens for a *random* . More precisely, how sparse can a random graph be and still have the Ramsey property?

This question was answered in full by Rödl and Rucinski, but our method gives a new proof of the upper bound (on how dense the random graph needs to be), and also gives a very general method that solves many problems of this type that were previously unsolved. For example, for Szemerédi’s theorem it tells us the following. Define a subset of to be –*Szemerédi* if every subset of size at least contains an arithmetic progression of length . Then if is large enough (depending on and only), then a random subset of where elements are chosen independently with probability is -Szemerédi with high probability.

This bound is within a constant of best possible, since if the probability dips below , around half the elements of the random set will not even belong to an arithmetic progression of length , so those elements form a dense set that proves that is not -Szemerédi.

The method David and I used was inspired by the “transference principle” that Green and Tao used to prove their famous result about arithmetic progressions in the primes, though it involved several additional ingredients. A completely different approach was discovered independently by Mathias Schacht. Like ours, his approach established a large number of previously open “sparse random versions” of well-known combinatorial theorems.

David always gives very nice talks, and this one was no exception.

After his talk, I went to hear Nets Katz — with some regret as it meant missing Maria Chudnovski, who followed on from David in the combinatorics section. I must try to watch the video of her talk some time, though I’m bad at finding time to watch videos on the internet if they last for more than about three minutes.

Nets talked about work related to his famous solution with Larry Guth of the Erdős distance problem. That problem asks how many distinct distances there must be if you have points in the plane. If you put them evenly spaced along a line, you get distinct distances. You can do a bit better than that by putting them in a grid: because the density of numbers that can be expressed as a sum of two squares is roughly , one gets around distinct distances this way.

Erdős asked whether this was anywhere near to being best possible. More precisely, he asked whether there was a lower bound of , and that is what Guth and Katz proved. This was a great result that answered a question that many people had worked on, but it is also notable because the proof was very interesting. One of the main tools they used was the *polynomial method*, which I will not attempt to describe here, but if you are curious, then Terence Tao has posted on it several times. Nets Katz’s talk is here.

Then it was back (quite some way) to the combinatorics room to hear Michael Krivelevich talking about games. (This link is quite hard to find because they’ve accidentally put his name as Michael Krivelerich.) By “games” I mean two-player positional games, which are defined as follows. You have a set (the board) and a collection of subsets of (the winning positions). There are then two kinds of games that are typically studied. In both kinds, a *move* consists in choosing a point of that has not yet been chosen. In the first kind of game, the players alternate choosing points and the winner is the first player who can make a set in out of his/her points. (If neither player can do this by the time the entire board is filled up, then the result is a draw.) Noughts and crosses (or tic-tac-toe) is an example of this: is a 3-by-3 grid and consists of all lines of three points in that grid.

A well-known argument that goes back (at least) to John Nash when he was thinking about the game of Hex proves that the second player cannot have a winning strategy for this game. The argument, referred to as *strategy stealing* is as follows. Suppose that the second player does have a winning strategy. Then the first player has a winning strategy as well, which works like this. First choose an arbitrary . Then ignore , pretend that your opponent is the first player and play the second player’s winning strategy. If you ever find that you have already played the point that the strategy dictates, then play an arbitrary unoccupied point instead.

This contradiction (a contradiction since it is not possible for both players to have winning strategies) proves that the first player can guarantee a draw, but it is a highly inexplicit argument, so it gives no clue about *how* the first player can do that. An interesting open problem that Krivelevich mentioned relates this to the Hales-Jewett theorem. A consequence of the Hales-Jewett theorem is that if you play noughts and crosses on an -dimensional board where each side has length , then provided is large enough in terms of , it is not possible for the outcome to be a draw — since there is no 2-colouring of the points of the grid that does not give rise to a monochromatic line. So we know that the first player has a winning strategy. However, no explicit strategy is known, even if is allowed to be ridiculously large. (I am talking here about general : for small such as 3, and perhaps even 4, a winning strategy is known for fairly small .)

I asked Krivelevich about this problem, and his opinion was that it was probably very hard. The difficulty is that the first player has to devote too much attention to stopping the second player from winning, so cannot concentrate on trying to build up a line.

Another open problem is to find an explicit strategy that proves the following statement: there exist positive integers and such that for every , if the game is played on the complete graph on vertices (that is, players are alternately choosing edges), then the first player can create the first clique of size 5 in at most moves.

A moment I enjoyed in the talk was when Krivelevich mentioned something called the *extra set paradox*, which is the statement that if you add to the set of winning positions, a game that was previously a win for the first player can become a draw.

At first that seemed to me obviously false. When that happens, it is always interesting to try to analyse one’s thoughts and formulate the incorrect proof that has sprung to mind. The argument I had was something like that adding an extra set only increased the options available to the first player, so could not make it harder to win. And that argument is complete garbage, because it increases the options for the second player too. So if, for example, the first player plays as though the extra winning positions didn’t exist, the second player could potentially win by reaching one of those positions. The extra effort required to stop this can potentially (and sometimes does) kill the first player’s winning strategy.

Games of the kind I’ve just been discussing seem to be very hard to analyse, so attention has turned to a different kind of game, called a *maker-breaker game*. Here, the first player’s objective is to occupy a winning position, and the second player’s objective is to stop that happening. Also, the number of moves allotted to the two players is often different: we may allow one player to take moves for each move that the other player takes.

A typical question looked at is to take a graph property such as “contains a Hamilton cycle” and to try to find the threshold at which breaker can win. That is, if breaker gets moves for each move of maker, how large does need to be in order for breaker to be able to stop maker from making a Hamilton cycle? The answer to this, discovered by Krivelevich in 2011, is that the threshold is at , in the sense that if then maker wins, while if then breaker wins.

What makes this result particularly interesting is that the threshold occurs when the number of edges that maker gets to put down is (approximately) equal to the number of edges a random graph needs to have in order to contain a Hamilton cycle. This is the so-called *random paradigm* that allows one to guess the answers to many of these questions. (It was Erdős who first conjectured that this paradigm should hold.) It seems to be saying that if both players play optimally, then the graph formed by maker will end up looking like a random graph. It is rather remarkable that this has in some sense actually been proved.

Next up, at 6pm (this was a very long day) was the Abel lecture. This is a tradition started in 2010, where one of the last four Abel Prize winners gives a lecture at the ICM. The chosen speaker this time was John Milnor, whose title was “Topology through four centuries.” I did not take notes during this lecture, so I have to rely on my memory. Here’s what I remember. First of all, he gave us a lot of very interesting history. A moment I enjoyed was when he discussed the proof of a certain result and said that he liked it because it was the first example he knew of of the use of Morse theory. A long time ago, when I had very recently got my PhD, I thought about a problem about convex bodies that caused me to look at Milnor’s famous book on Morse theory. I can’t now remember what the problem was, but I think I was trying to think hard about what happens if you take the surface of a symmetric convex body with a sphere inside, gradually shrink it until it is inside the sphere, and look at the intersection of the two surfaces. That gives you (generically) a codimension-1 subset of the sphere that appears, moves about, and eventually vanishes again. That’s exactly the kind of situation studied by Morse theory.

Much more recently, indeed, since the talk, I have had acute personal experience of Morse theory in the outdoor unheated swimming pool where I was staying in France. Because I am worried about setting my heart out of rhythm if I give it too much of a shock, I get into cold swimming pools very slowly, rather than jumping in and getting the discomfort over all at once. This results in what my father describes as a ring of pain: the one-dimensional part of the surface of your body that is not yet used to the water and not safely outside it. Of course, the word “ring” is an oversimplification. Ignoring certain details that are inappropriate for a family post such as this, what I actually experience is initially two rings that after a while fuse to become a figure of eight, which then instantly opens out into a single large ring, to be joined by two more small rings that fuse with the large ring to make a yet larger ring that then becomes a lot smaller before increasing in size for a while and finally shrinking down to a point.

It is clear that if you are given the cross-sections of a surface with all the planes in a certain direction that intersect it, then you can reconstruct the surface. As I understand it, the basic insight of Morse theory is that what really matters if you want to know about the topology of the surface is what happens at the various singular moments such as when there is a figure of eight, or when a ring first appears, etc. The bits in between where the rings are just moving about and minding their own business don’t really affect anything. How this insight plays out in detail I don’t know.

As one would expect from Milnor, the talk was a beautiful one. In traditional fashion, he talked about surfaces, then 3-manifolds, and finally 4-manifolds. I think he may even have started in one dimension with a discussion of the bridges-of-Königsberg problem, but my memory of that is hazy. Anyhow, an indication of just how beautiful the talk was is what happened at the end. He misjudged the time, leaving himself about two minutes to discuss 4-manifolds. So he asked the chairman what he should do about it, and the chairman (who was Helge Holden) told him to take as much time as he wanted. Normally that would be the cause for hate rays to emanate towards the chairman and the speaker from the brains of almost the entire audience. But with this talk, the idea of missing out on the 4-manifold equivalent of what we had just heard for 2-manifolds and 3-manifolds was unthinkable, and there was a spontaneous burst of applause for the decision. I’ve never seen anything like it.

The one other thing I remember was a piece of superhuman modesty. When Milnor discussed examples of extraordinary facts about differentiable structures on 4-manifolds, the one he mentioned was the fact that there are uncountably many distinct such structures on , which was discovered by Cliff Taubes. The way Milnor presented it, one could have been forgiven for thinking that the fact that there can be distinct differentiable structures on a differentiable manifold was easy, and the truly remarkable thing was getting uncountably many, whereas in fact one of Milnor’s most famous results was the first example of a manifold with more than one differentiable structure. (The result of Taubes is remarkable even given what went before it: the first exotic structures on were discovered by Freedman and Kirby.)

Just to finish off the description of the day, I’ll mention that in the evening I went to a reception hosted by the Norwegians (so attending the Abel lecture was basically compulsory, though I’d have done so anyway). Two things I remember about that are a dish that contained a high density of snails and the delightful sight of Maryam Mirzakhani’s daughter running about in a forest of adult legs. Then it was back to my hotel room to try to gather energy for one final day.

]]>

The next morning kicked off (after breakfast at the place on the corner opposite my hotel, which served decent espressos) with Jim Arthur, who gave a talk about the Langlands programme and his role in it. He told us at the beginning that he was under strict instructions to make his talk comprehensible — which is what you are supposed to do as a plenary lecturer, but this time it was taken more seriously, which resulted in a higher than average standard. Ingrid Daubechies deserves a lot of credit for that. He explained that in response to that instruction, he was going to spend about two thirds of his lecture giving a gentle introduction to the Langlands programme and about one third talking about his own work. In the event he messed up the timing and left only about five minutes for his contribution, but for everybody except him that was just fine: we all knew he was there because he had done wonderful work, and most of us stood to learn a lot more from hearing about the background than from hearing about the work itself.

I’ve made a few attempts to understand the Langlands programme — not by actually studying it, you understand, but by attending general-audience talks or reading general-audience articles. It’s a bit of a two-steps-forward (during the talk) and one-step-back (during the weeks and months after the talk) process, but this was a very good lecture and I really felt I learned things from it. Some of them I immediately forgot, but have in my notes, and perhaps I’ll fix them slightly better in my brain by writing about them here.

For example, if you had asked me what the central problem in algebraic number theory is, I would never have thought of saying this. Given a fixed polynomial and a prime , we can factorize into irreducibles over the field . It turns out to be inconvenient if any of these irreducible factors occurs with multiplicity greater than 1, so an initial assumption is that has distinct roots over (or at least I think that’s the assumption). [Insert: looking at my notes, I realize that a better thing to write is , the splitting field of , rather than , though I presume that that gives the same answer.] But even then, it may be that over some primes there are repeated irreducible factors. The word “ramified”, which I had always been slightly scared of, comes in here. I can’t remember what ramifies over what, or which way round is ramified and which unramified, so let me quickly look that up. Hmm, that was harder than I expected, because the proper definition is to do with rings, extension fields and the like. But it appears that “ramified” refers to the case where you have multiplicity greater than 1 somewhere. For the purposes of this post, let’s say that a prime is ramified (I’ll take the polynomial as given) if has an irreducible factor over with multiplicity greater than 1. The main point to remember is that the set of ramified primes is small. I think Arthur said that it was always finite.

So what is the fundamental problem of algebraic number theory? Well, when you decompose a polynomial into irreducible factors, those factors have degrees. If the degree of is , then the degrees of the irreducible factors form a partition of : that is, a collection of positive integers that add up to . The question is this: which (unramified) primes give rise to which partitions of ?

How on earth is *that* the fundamental problem of algebraic number theory? What’s interesting about it? Aren’t number theorists supposed (after a long and circuitous route) to be solving Diophantine equations and things like that?

Arthur gave us a pretty convincing partial answer to these questions by discussing the example . The splitting field is — that is, rational linear combinations of 1 and — and the only ramified prime is 2. (The reason 2 is ramified is that over we have .)

Since the degree of is 2, the two partitions of the degree are and . The first occurs if and only if cannot be factorized over , which is the same as saying that -1 is not a quadratic residue. So in this case, the question becomes, “For which odd primes is a quadratic residue?” to which the answer is, famously, all primes congruent to 1 mod 4. So Arthur’s big grown-up question is a generalization of a familiar classical result of number theory.

To answer the question for quadratic polynomials, Gauss’s law of quadratic reciprocity is a massive help. I think it is correct to say that the Langlands programme is all about trying to find vast generalizations of quadratic reciprocity that will address the far more general question about the degrees of irreducible factors of arbitrary polynomials. But perhaps it is more general still — at the time of writing I’m not quite sure.

Actually, I think I am sure. One thing Arthur described was Artin L-functions, which are a way of packaging up the data I’ve just described. Here is the definition he gave. You start with a representation of the Galois group of . For simplicity he assumed that the Galois group was actually (where is the degree of ). Then for each unramified prime the partition of you get can be thought of as the cycle type of a permutation and thus as a conjugacy class in . The image of this conjugacy class under is a conjugacy class in , which is denoted by . The Artin L-function is then defined to be

It is easy to see that the determinant is well-defined — it follows from the fact that conjugate linear maps have the same determinant.

If you expand out this product, you get a Dirichlet series, of which this is the Euler product. And Dirichlet series that have Euler products are basically L-functions. Just as the Riemann zeta function packages up lots of important information about the primes, so the Artin L-functions package up lots of important information about the fundamental problem of algebraic number theory discussed earlier.

One interesting thing that Arthur told us was that in order to do research in this area, you have to use results from many different areas. This makes it difficult to get started, so most young researchers start by scouring the textbooks for the key theorems and using them as black boxes, understanding them fully only much later.

For example, certain Riemannian manifolds are particularly important, because automorphic forms come from solutions to differential equations (based on the Laplacian) on those manifolds. Arthur didn’t tell us exactly what these “special Riemannian manifolds” were, but he did say that they corresponded to reductive algebraic groups. (An algebraic group is roughly speaking a group defined using polynomials. For example, is algebraic, because the condition of having determinant 1 is expressible as a polynomial in the entries of a matrix, and the group operation, matrix multiplication, is also a polynomial operation. What “reductive” means I don’t know.) He then said that many beginners memorize ten key theorems about reductive algebraic groups and don’t bother themselves with the proofs.

Where does Langlands come into all this? He defined some L-functions that have a formula very similar to the formula for Artin L-functions: in fact, all you have to do is replace the in that formula with a . So a lot depends on what is. Apparently it’s an automorphic representation. I’m not sure what those are.

A big conjecture is that every arithmetic L-function is an automorphic L-function. This would give us a non-Abelian class field theory. (Classical class field theory studies Abelian field extensions, and can tell you things like which numbers are cubic residues mod .)

This conjecture is a special case of Langlands’s famous principle of functoriality, which Artin described as *the* fundamental problem. (OK, I’ve already described something else as the fundamental problem, but this is somehow the *real* fundamental problem.) I can’t resist stating the problem, because it looks as though it ought to be easy. I can imagine getting hooked on it in a parallel life, because it screams out, “Think about me in the right way and I’ll drop out.” Of course, that’s a very superficial impression, and probably once one actually does think about it, one quickly loses any feeling that it should at some sufficiently deep level be easy.

The principle says this.

**Conjecture.** *Given two groups and , an automorphic representation of and an analytic homomorphism between their dual groups*

there is an automorphic representation of such that ; that is,

*as conjugacy classes in .*

To me it looks like the kind of trivial-but-not-trivially-trivial statement one proves in a basic algebra course, but obviously it is far more than that.

One quite nice thing that Arthur did was to draw an extended analogy with a situation that held in physics a century or so ago. It was observed that the absorption spectra of starlight had black lines where certain frequencies were absent, and these corresponded to the wavelengths emitted by familiar elements. This suggested that the chemistry of stars was similar to the chemistry on earth. Furthermore, because these absorption spectra were red-shifted to various extents, it also suggested that the stars were moving away from us, and ultimately suggested the Big Bang theory. However, exactly *why* these black lines appeared was a mystery, which was not solved until the formulation of quantum mechanics.

Something like this is how Arthur sees number theory today. Automorphic forms tell us about other number-theoretic worlds. Spectra come from differential equations that are quite similar to the Schrödinger equation — in particular, they are based on Laplacians — that come from the geometry of the special Riemannian manifolds I mentioned above. But exactly how the connection between the number theory and the spectral theory works is still a mystery.

To end on a rather different note, the one other thing I got out of this excellent talk was to see Gerhard “I was at ICM2014” Paseman, of Mathoverflow fame. Later I even got to meet him, and he gave me a Mathoverflow teeshirt. I became aware of him because there were some small technical problems during the talk, and GP offered advice from the audience.

]]>

That’s a fairly easy question, so let’s follow it up with another one: how surprised should we be about this? Is there unconscious bias towards mathematicians with this property? Of this year’s 21 plenary lecturers, the only one with the property was Mirzakhani, and out of the 20 plenary lecturers in 2010, the only one with the property was Avila. What is going on?

On to more serious matters. After Candès’s lecture I had a solitary lunch in the subterranean mall (Korean food of some description, but I’ve forgotten exactly what) and went to hear Martin Hairer deliver his Fields medal lecture, which I’m not going to report on because I don’t have much more to say about his work than I’ve already said.

By and large, the organization of the congress was notably good — for example, I almost never had to queue for anything, and never for any length of time — but there was a little lapse this afternoon, in that Hairer’s lecture was scheduled to finish at 3pm, exactly the time that the afternoon’s parallel sessions started. In some places that might have been OK, but not in the vast COEX Seoul conference centre. I had to get from the main hall to a room at the other end of the centre where theoretical computer science talks were taking place, which was probably about as far as walking from my house in Cambridge to the railway station. (OK, I live close to the station, but even so.)

Inevitably, therefore, I arrived late to Boaz Barak’s talk, but he welcomed me, and a few others in my position, with the reassuring words that everything he had said up to now was bullshit and we didn’t need to worry about it. (He was quoting a juggler he had seen in Washington Square.)

I always like it when little themes recur at ICMs in different contexts. I’ve already mentioned the theme of looking at big spaces of objects in order to understand typical objects. Another one I mentioned when describing Candès’s lecture: that one should not necessarily be afraid of NP-complete problems, a theme which was present in Barak’s talk as well. I’m particularly fond of it because I’ve spent a lot of time in the last few years thinking about the well-known NP-complete problem where the input is a mathematical statement and the task (in the decision version) is to say whether there is a proof of that statement of length at most — in some appropriate formal system. The fact that this problem is NP-complete does not deter mathematicians from spending their lives solving instances of it. What explains this apparent success? I dream that there might be a very nice answer to this question, rather than just a hand-wavy one that says that the instances studied by mathematicians are far from general.

Barak was talking about something a little different, however. He too has a dream, which is to obtain a very precise understanding of why certain problems are hard (in the complexity sense of not being soluble with efficient algorithms) and others easy. He is not satisfied with mere lists of easy and hard problems, with algorithms for the former and reductions to NP-complete or other “known hard” problems for the latter. He wants a theory that will say which problems are hard and which easy, or at least do that for large classes of problems. And the way he wants to do it is to find “meta-algorithms” — which roughly speaking means very general algorithmic approaches with the property that if they work then the problem is easy and if they fail then it’s hard.

Why is there the slightest reason to think that that can be done? Isn’t there a wide variety of algorithms, each of which requires a lot of ingenuity to find? If one approach fails, might there not be some clever alternative approach that nobody had thought of?

These are all perfectly reasonable objections, but the message, or at least *a* message, of Barak’s talk is that it is not completely outlandish to think that it really is the case that there is what one might call a “best possible” meta-algorithm, in the sense that if it fails, then nothing else can succeed. Again, I stress that this would be for large and interesting classes of algorithm problems (e.g. certain optimization problems) and not for every single describable Boolean function. One reason to hold out hope is that if you delve a little more deeply into the algorithms we know about, you find that actually many of them are based on just a few ideas, such as linear and semidefinite programming, solving simultaneous linear equations, and so on. Of course, that could just reflect our lack of imagination, but it could be an indication that something deeper is going on.

Another reason for optimism is that he has a candidate: the sum-of-squares algorithm. This is connected with Hilbert’s 17th problem, which asked whether every multivariate polynomial that takes only non-negative values can be written as a sum of squares of rational functions. (It turns out that they can’t necessarily be written as a sum of squares of polynomials: a counterexample is .) An interesting algorithmic problem is to write a polynomial as a sum of squares when it can be so written. One of the reasons this problem interests Barak is that many other problems can be reduced to it. Another, which I don’t properly understand but I think would understand if I watched his talk again (it is here, by the way), is that if the unique games conjecture is false, and recall that it too is sort of saying that a certain algorithm is best possible, then the sum-of-squares algorithm is waiting in the wings to take over as the new candidate that will do the job.

An unfortunate aspect of going to Barak’s talk was that I missed Harald Helfgott’s. However, the sacrifice was rewarded, and I can always watch Harald on Youtube.

After another longish walk, but with a non-zero amount of time for it, I arrived at my next talk of the afternoon, given by Bob Guralnick. This was another very nice talk, just what an ICM invited lecture should be like. (By that I mean that it should be aimed principally at non-experts, while at the same time conveying what has been going on recently in the field. In other words, it should be more of a colloquium style talk than a research seminar.)

Guralnick’s title was Applications of the Classification of Finite Simple Groups. One thing he did was talk about the theorem itself, how a proof was announced in 1983 but not actually completed for another twenty years, and how there are now — or will be soon — “second-generation” proofs that are shorter, though still long, and use new ideas. He also mentioned a few statements that can be proved with the classification theorem and are seemingly completely out of reach without it. Here are a few of them.

1. Every finite simple group is generated by two elements.

2. The probability that a random pair of elements generates a finite simple group tends to 1 as the size of the group tends to infinity.

3. For every non-identity element of a finite simple group, there exists an element such that and generate the group.

4. For every finite simple group there exist conjugacy classes and such that for every and every the elements and generate the group.

Why does the classification of finite simple groups help with these problems? Because it means that instead of having to give an abstract proof that somehow uses the condition of having no proper normal subgroups, you have the option of doing a proof that involves calculations in concrete groups. Because the list of (families of) groups you have to consider is finite, this is a feasible approach. Actually, it’s not just that there are only finitely many families, but also that the families themselves are very nice, especially the various families of Lie type. As far as I can tell from the relevant Wikipedia article, there isn’t a formal definition of “group of Lie type”, but basically it means a group that’s like a Lie group but defined over a finite field instead of over $\mathbb{R}$ or $\mathbb{C}$. So things like PSL$(2,q)$ are finite simple groups of Lie type.

Just as the geometrization theorem didn’t kill off research in 3-manifolds, the classification of finite simple groups didn’t kill off group theory, even though in the past many mathematicians have thought that it would. It’s easy to see how that perception might have arisen: the project of classifying finite simple groups became such a major focus for group theorists that once it was done, a huge chunk of what they were engaged in was no longer available.

So what’s left? One answer, one might imagine, is that not all groups are simple. That is not a completely satisfactory answer, because groups can be put together from simple groups in such a way that for many problems it is enough to solve them just for simple groups (just as in number theory one can often prove a result for primes and prove that the product of two numbers that satisfy the result also satisfies the result). But it is part of the answer. For example -groups (that is, groups of prime power order) are built out of copies of a cyclic group of prime order, but that doesn’t begin to answer all the questions people have about -groups.

Another answer, which is closer to the reason that 3-manifold theory survived Perelman, is that proving results even for specific families of groups is often far from easy. For example, have a go at proving that a random pair of (equivalence classes of) matrices generates PSL$(2,q)$ with high probability when is large: it’s a genuine theorem rather than simply a verification.

I want to mention a very nice result that I think is due to Guralnick and his co-authors, though he didn’t quite explicitly say so. Let be a polynomial of degree , with coprime to . Then for every , either is bijective on the field or the set of values it takes has size at most .

What’s so nice about that? Well, the result is interesting, but even more interesting (at least to me) is the fact that the proof involved the classification of finite simple groups, and Guralnick described it (or more accurately, a different result just before it but I think the same remark applies) as untouchable without CFSG, even though the statement is about polynomials rather than groups.

Here is the video of Guralnick’s lecture.

The third invited lecture I went to was given by Francis Brown. Although I was expecting to understand very little, I wanted to go to it out of curiosity, because I knew Francis Brown when he was an undergraduate at Trinity — I think I taught him once or twice. After leaving Trinity he went to France, where he had been ever since, until very recently taking up a (highly prestigious) professorial fellowship at All Soul’s in Oxford. It was natural for him to go to France, because his mother is French and he is bilingual — another aspect that interests me since two of my children are in the same position. I heard nothing of him for a long time, but then in the last few years he suddenly popped up again as the person who has proved some important results concerning motivic zeta functions.

The word “motivic” scares me, and I’m not going to try to say what it means, because I can’t. I first heard of motives about twenty years ago, when the message I got was that they were objects that people studied even though they didn’t know how to define them. That may be a caricature, but my best guess as to the correct story is that even though people don’t know the right definition, they do know what *properties* this definition should have. In other words, there is a highly desirable theory that would do lots of nice things, if only one could find the objects that got you started.

However, what Brown was doing appeared not to be based on layers of conjecture, so I suppose it must be that “motivic versions” of certain objects have been shown to exist.

This was a talk in which I did not take notes. To do a decent job describing it, I’d need to watch it again, but rather than do that, I’ll just describe the traces it left in my memory.

One was that he mentioned the famous problem of the irrationality of for odd , and more generally the problem of whether the vector space over the rationals generated by has dimension . (It has been shown by Ball and Rivoal to have dimension that tends to infinity with , which was a major result when it was proved.)

Another was that he defined multiple zeta values, which are zeta-like functions of more than one integer variable, which come up naturally when one takes two zeta values, multiplies them together, and expands out the result. They were defined by Euler.

He also talked about periods, a very interesting concept defined (I think for the first time) by Kontsevich and Zagier. I highly recommend looking at their paper, available here in preprint form. At least the beginning of it is accessible to non-experts, and contains a very interesting open problem. Roughly speaking, a period is anything you can define using reasonably nice integrals. For example, is a period because it is the area of the unit disc, which has a nice polynomial equation . The nice problem is to prove that an explicit number is not a period. There are only countably many periods, so such numbers exist in abundance. If you want a specific number to try, then you can begin with . Best of luck.

While discussing what motivic zeta values are, he said that there were two approaches one could use, one involving Betti numbers and the other involving de Rham cohomology. He preferred the de Rham approach. “Betti” and “de Rham” became a sort of chorus throughout the talk, and even now I have ringing in my head phrases like “-Betti or -de Rham”.

If I understood correctly, linear dependences between motivic zeta values (which are much fancier objects that still depend on tuples of integers) imply the corresponding dependences between standard zeta values. (I’m talking about both single and multiple zeta values here.) That’s not much help if you are trying to prove *independence* of standard zeta values, but it does do two things for you. One is that it provides a closely related context in which the world seems to be a tidier place. As I understand it, all the conjectures one wants to be true for standard zeta values are true for their motivic cousins. But it has also enabled Brown to discover unexpected dependences between standard zeta values: for instance every multiple zeta value is a linear combination of multiple zeta values where every argument is 2 or 3. (I suppose multiple must mean “genuinely multiple” here.) Actually, looking very briefly at the relevant part of the talk, which is round about the 27th minute, I see that this was proving something called the Hoffman conjecture, so perhaps it is wrong to call it unexpected. But it is still a very interesting result, given that the proof was highly non-trivial and went via motivic zeta values.

My remaining memory trace is that the things Brown was talking about were related to a lot of other important parts of mathematics, and even theoretical physics. I’d love to understand this kind of thing better.

So although a lot of this talk (which is here) went over my head, enough of it didn’t that my attention was engaged throughout. Given the type of material, that was far from obviously going to be the case, so this was another very good talk, to round off a pretty amazing day.

]]>

One person who doesn’t lose any sleep over doubts like this is Emmanuel Candès, who gave the second plenary lecture I went to. He began by talking a little about the motivation for the kinds of problems he was going to discuss, which one could summarize as follows: his research is worthwhile because *it helps save the lives of children*. More precisely, it used to be the case that if a child had an illness that was sufficiently serious to warrant an MRI scan, then doctors faced the following dilemma. In order for the image to be useful, the child would have to keep completely still for two minutes. The only way to achieve that was to stop the child’s breathing for those two minutes. But depriving a child’s brain (or indeed any brain, I’d imagine) of oxygen for two minutes is not without risk, to put it mildly.

Now, thanks to the famous work of Candès and others on compressed sensing, one can reconstruct the image using many fewer samples, which reduces the time the child must keep still to 15 seconds. Depriving the brain of oxygen for 15 seconds is not risky at all. Candès told us about a specific boy who had something seriously wrong with his liver (I’ve forgotten the details) who benefited from this. If you want a ready answer for when people ask you about the point of doing maths, and if you’re sick of the Hardy-said-number-theory-useless-ha-ha-but-what-about-public-key-cryptography-internet-security-blah-blah example, then I recommend watching at least some of Candès’s lecture, which is available here, and using that instead. Then you’ll really have seized the moral high ground.

Actually, I recommend watching it *anyway*, because it was a fascinating lecture from start to finish. In that case, you may like to regard this post as something like a film review with spoilers: if you mind spoilers, then you’d better stop reading here.

I have to admit that as I started this post, I realized that there was something fairly crucial that I didn’t understand, that meant I couldn’t give a satisfactory account of what I wanted to describe. I didn’t take many notes during the talk, because I just wanted to sit back and enjoy it, and it felt as though I would remember everything easily, but there was one important mathematical point that I missed. I’ll come back to it in a moment.

Anyhow, the basic mathematical problem that the MRI scan leads to is this. A full scan basically presents you with the Fourier transform of the image you want, so to reconstruct the image you simply invert the Fourier transform. But if you are sampling in only two percent of directions and you take an inverse Fourier transform (it’s easy to make sense of that, but I won’t bother here), then you get a distorted image with all sorts of strange lines all over it — Candès showed us a picture — and it is useless for diagnostic purposes.

So, in a moment that Candès described as one of the luckiest in his life, a radiologist approached him and asked if there was any way of getting the right image from the much smaller set of samples. On the face of it, the answer might seem to be no, since the dimension of the space of possible outputs has become much smaller, so there must be many distinctions between inputs that are not detectable any more. However, in practice the answer is yes, for reasons that I’ll discuss after I’ve mentioned Candès’s second example.

The second example was related to things like the Netflix challenge, which was to find a good way of predicting which films somebody would like, given the preferences of other people and at least some of the preferences of the person in question. If we make the reasonable hypothesis that people’s preferences depend by and large on a fairly small number of variables (describing properties of the people and properties of the films), then we might expect that a matrix where the th entry represents the strength of preference of person for film would have fairly small rank. Or more reasonably, one might expect it to be a small perturbation of a matrix with small rank.

And thus we arrive at the following problem: you are given a few scattered entries of a matrix, and you want to find a low-rank matrix that agrees pretty well with the entries you observe. Also, you want it the low-rank matrix to be unique (up to a small perturbation) since otherwise you can’t use it for prediction.

As Candès pointed out, simple examples show that the uniqueness condition cannot always be obtained. For example, suppose you have 99 people with very similar preferences and one person whose preferences are completely different. Then the underlying matrix that describes their preferences has rank 2 — basically, one row for describing the preferences of the 99 and one for describing the preference of the one eccentric outlier. If all you have is a few entries for the outlier’s preferences, then there is nothing you can do to guess anything else about those preferences.

However, there is a natural assumption you can make, which I’ve now forgotten, that rules out this kind of example, and if a matrix satisfies this assumption then it can be reconstructed exactly.

Writing this, I realize that Candès was actually discussing a slight idealization of the problem I’ve described, in that he didn’t have perturbations. In other words, the problem was to reconstruct a low-rank matrix exactly from a few entries. An obvious necessary condition is that the number of samples should exceed the number of degrees of freedom of the set of low-rank matrices. But there are other conditions such as the one I’ve mentioned, and also things like that every row and every column should have a few samples. But given those conditions (or perhaps the sampling is done at random — I can’t remember) it turns out to be possible to reconstruct the matrix exactly.

The MRI problem boils down to something like this. You have a set of linear equations to solve (because you want to invert a Fourier transform) but the number of unknowns is significantly larger than the number of equations (because you have a sparse set of samples of the Fourier transform you want to invert). This is an impossible problem unless you make some assumption about the solution, and the assumption Candès makes is that it should be a *sparse vector*, meaning that it has only a few non-zero entries. This reduces the number of degrees of freedom considerably, but the resulting problem is no longer pure linear algebra.

The point that I missed was what sparse vectors have to do with MRI scans, since the image you want to reconstruct doesn’t appear to be a sparse vector. But looking back at the video I see that Candès addressed this point as follows: although the *image* is not sparse, the *gradient* of the image is sparse. Roughly speaking, you get quite a lot of patches of fairly constant colour, and if you assume that that is the case, then the number of degrees of freedom in the solution goes right down and you have a chance of reconstructing the image.

Going back to the more general problem, there is another condition that is needed in order to make it soluble, which is that the matrix of equations should not have too many sparse rows, since typically a sparse row acting on a sparse vector will give you zero, which doesn’t help you to work out what the sparse vector was.

I don’t want to say too much more, but there was one point that particularly appealed to me. If you try to solve these problems in the obvious way, then you might try to find algorithms for solving the following problems.

1. Given a system of underdetermined linear equations, find the sparsest solution.

2. Given a set of entries of a matrix, find the lowest rank matrix consistent with those entries.

Unfortunately, no efficient algorithms are known for these problems, and I think in the second case it’s even NP complete. However, what Candès and his collaborators did was consider *convex relaxations* of these problems.

1. Given a system of underdetermined linear equations, find the solution with smallest norm.

2. Given a set of entries of a matrix, find the matrix with smallest nuclear norm consistent with those entries.

If you don’t know what the nuclear norm is, it’s simple to define. Whereas the rank of a matrix is the smallest number of rank-1 matrices such that is a linear combination of those matrices, the nuclear norm of is the minimum such that you can write with each a rank-1 matrix of norm 1. So it’s more like a quantitative notion of rank.

It’s a standard fact that convex relaxations of problems tend to be much easier than the problems themselves. But usually that comes at a significant cost: the solutions you get out are not solutions of the form you originally wanted, but more like convex combinations of such solutions. (For example, if you relax the graph-colouring problem, you can solve the relaxation but you get something called a fractional colouring of your graph, where the total amount of each colour at two adjacent vertices is at most 1, and that can’t easily be converted into a genuine colouring.)

However, in the cases that Candès was telling us about, it turns out that if you solve the convex relaxations, you get exactly correct solutions to the original problems. So you have the following very nice situation: a problem is NP-complete, but if you nevertheless go ahead and try to solve it using an algorithm that is doomed to fail in general, the algorithm still works in a wide range of interesting cases.

At first this seems miraculous, but Candès spent the rest of the talk explaining to us why it isn’t. It boiled down to a very geometrical picture: you have a convex body and a plane through one of its extreme points, and if the plane is tangent to the body then the algorithm will work. It is this geometrical condition that underlies the necessary conditions I mentioned earlier.

For me this lecture was one of the highlights of the ICM, and I met many other people who greatly enjoyed it too.

]]>

Eventually I just made it, by going back to a place that was semi-above ground (meaning that it was below ground but you entered it a sunken area that was not covered by a roof) that I had earlier rejected on the grounds that it didn’t have a satisfactory food option, and just had an espresso. Thus fortified, I made my way to the talk and arrived just in time, which didn’t stop me getting a seat near the front. That was to be the case at all talks — if I marched to the front, I could get a seat. I think part of the reason was that there were “Reserved” stickers on several seats, which had been there for the opening ceremony and not been removed. But maybe it was also because some people like to sit some way back so that they can zone out of the talk if they want to, maybe even getting out their laptops. (However, although wireless was in theory available throughout the conference centre, in practice it was very hard to connect.)

The first talk was by Ian Agol. I was told before the talk that I would be unlikely to understand it — the comment was about Agol rather than about me — and the result of this lowering of my expectations was that I enjoyed the talk. In fact, I might even have enjoyed it without the lowering of expectations. Having said that, I did hear one criticism afterwards that I will try to explain, since it provides a good introduction to the content of the lecture.

When I first heard of Thurston’s famous geometrization conjecture, I thought of it as the ultimate aim of the study of 3-manifolds: what more could you want than a complete classification? However, this view was not correct. Although a proof of the geometrization conjecture would be (and later was) a massive step forward, it wouldn’t by itself answer all the questions that people really wanted to answer about 3-manifolds. But some very important work by Agol and others since Perelman’s breakthrough has, in some sense that I don’t understand, finished off some big programme in the subject. The criticism I heard was that Agol didn’t really explain what this programme was. I hadn’t really noticed that as a problem during the talk — I just took it on trust that the work Agol was describing was considered very important by the experts (and I was well aware of Agol’s reputation) — but perhaps he could have done a little more scene setting.

What he actually did by way of introduction was to mention two questions from a famous 1982 paper of Thurston (Three-dimensional manifolds, Kleinian groups and hyperbolic geometry) in which he asked 24 questions. The ones Agol mentioned were questions 16-18. I’ve just had a look at the Thurston paper, and it’s well worth a browse, as it’s a relatively gentle survey written for the Bulletin of the AMS. It also has lots of nice pictures. I didn’t get a sense from my skim through it that questions 16-18 were significantly more important than the others (apart from the geometrization conjecture), but perhaps the story is that when the dust had settled after Perelman’s work, it was those questions that were still hard. Maybe someone who knows what they’re talking about can give a better explanation in a comment.

One definition I learned from the lecture is this: a 3-manifold is said to have a property P *virtually* if it has a finite-sheeted cover with property P. I presume that a finite-sheeted cover is another 3-manifold and a suitable surjection to the first one such that each point in the first has preimages for some finite (that doesn’t depend on the point).

Thurston’s question 16 asks whether every aspherical 3-manifold (I presume that just means that it isn’t a 3-sphere) is virtually Haken.

A little later in the talk, Agol told us what “Haken” meant, other than being the name of a very well-known mathematician. Here’s the definition he gave, which left me with very little intuitive understanding of the concept. A compact 3-manifold with hyperbolic interior is *Haken* if it contains an embedded -injective surface. An example, if my understanding of my rapidly scrawled notes is correct, is a knot complement, one of the standard ways of constructing interesting 3-manifolds. If you take the complement of a knot in you get a 3-manifold, and if you take a tubular neighbourhood of that knot, then its boundary will be your -injective surface. (I’m only pretending to know what -injective means here.)

Thurston, in the paper mentioned earlier, describes Haken manifolds in a different, and for me more helpful, way. Let me approach the concept in top-down fashion: that is, I’ll define it in terms of other mysterious concepts, then work backwards through Thurston’s paper until everything is defined (to my satisfaction at least).

Thurston writes, “A 3-manifold is called a Haken manifold if it is prime and it contains a 2-sided incompressible surface (whose boundary, if any, is on ) which is not a 2-sphere.”

Incidentally, one thing I picked up during Agol’s talk is that it seems to be conventional to refer to a 3-manifold as the first time you mention it and as thereafter.

Now we need to know what “prime” and “incompressible” mean. The following paragraph of Thurston defines “prime” very nicely.

The decomposition referred to really has two stages. The first stage is the prime decomposition, obtained by repeatedly cutting a 3-manifold along 2-spheres embedded in so that they separate the manifold into two parts neither of which is a 3-ball, and then gluing 3-balls to the resulting boundary components, thus obtaining closed 3-manifolds which are “simpler”. Kneser proved that this process terminates after a finite number of steps. The resulting pieces, called the prime summands of , are uniquely determined by up to homeomorphism.

Hmm, perhaps the rule is more general: you refer to it as to start with and after that it’s sort of up to you whether you want to call it or .

The equivalent process in two dimensions could be used to simplify a two-holed torus. You first identify a circle that cuts it into two pieces and doesn’t bound a disc: basically what you get if you chop the surface into two with one hole on each side. Then you have two surfaces with circles as boundaries. You fill in those circles with discs and then you have two tori. At this point you can’t chop the surface in two in a non-trivial way, so a torus is prime. Unless my intuition is all wrong, that’s more or less telling us that the prime decomposition of an arbitrary orientable surface (without boundary) is into tori, one for each hole, except that the sphere would be prime.

What about “incompressible”? Thurston offers us this.

A surface embedded in a 3-manifold is two-sided if cuts a regular neighborhood of into two pieces, i.e., the normal bundle to is oriented. Since we are assuming that is oriented, this is equivalent to the condition that is oriented. A two-sided surface is incompressible if every simple curve on which bounds a disk in with interior disjoint from also bounds a disk on .

I think we can forget the first part there: just assume that everything in sight is oriented. Let’s try to think what it would mean for an embedded surface not to be incompressible. Consider for example a copy of the torus embedded in the 3-sphere. Then a loop that goes round the torus bounds a disc in the 3-sphere with no problem, but it doesn’t bound a disc in the torus. So that torus fails to be incompressible. But suppose we embedded the torus into a 3-dimensional torus in a natural way, by taking the 3D torus to be the quotient of by and the 2D torus to be the set of all points with -coordinate an (equivalence class of an) integer. Then the loops that don’t bound discs in the 2-torus don’t bound discs in the 3-torus either, so that surface is — again if what seems likely to be true actually is true — incompressible. It seems that an incompressible surface sort of spans the 3-manifold in an essential way rather than sitting inside a boring part of the 3-manifold and pretending that it isn’t boring.

OK, that’s what Haken manifolds *are*, but for the non-expert that’s not enough. We want to know why we should care about them. Thurston gives us an answer to this too. Here is a very useful paragraph about them.

It is hard to say how general the class of Haken manifolds is. There are many closed manifolds which are Haken and many which are not. Haken manifolds can be analyzed by inductive processes, because as Haken proved, a Haken manifold can be cut successively along incompressible surfaces until one is left with a collection of 3-balls. The condition that a 3-manifold has an incompressible surface is useful in proving that it has a hyperbolic structure (when it does), but intuitively it really seems to have little to do with the question of existence of a hyperbolic structure.

To put it more vaguely, Haken manifolds are good because they can be chopped into pieces in a way that makes them easy to understand. So I’d guess that the importance of showing that every aspherical 3-manifold is virtually Haken is that finite-sheeted coverings are sufficiently nice that even knowing that a manifold is *virtually* Haken means that in some sense you understand it.

One very nice thing Agol did was give us some basic examples of 3-manifolds, by which I mean not things like the 3-sphere, but examples of the kind that one wouldn’t immediately think of and that improve one’s intuition about what a typical 3-manifold looks like.

The first one was a (solid) dodecahedron with opposite faces identified — with a twist. I meant the word “twist” literally, but I suppose you could say that the twist is that there is a twist, meaning that given two opposite faces, you don’t identify each vertex with the one opposite it, but rather you first rotate one of the faces through and *then* identify opposite vertices. (Obviously you’ll have to do that in a consistent way somehow.)

There are some questions here that I can’t answer in my head. For example, if you take a vertex of the dodecahedron, then it belongs to three faces. Each of these faces is identified in a twisty way with the opposite face, so if we want to understand what’s going on near the vertex, then we should glue three more dodecahedra to our original one at those faces, keeping track of the various identifications. Now do the identifications mean that those dodecahedra all join up nicely so that the point is at the intersection of four copies of the dodecahedron? Or do we have to do some *more* gluing before everything starts to join together? One thing we *don’t* have to worry about is that there isn’t room for all those dodecahedra, which in a certain sense would be the case if the solid angle at a vertex is greater than 1. (I’m defining, I hope standardly, the solid angle of a cone to be the size of the intersection of that cone with a unit sphere centred at the apex, or whatever one calls it. Since a unit sphere has surface area , the largest possible solid angle is .)

Anyhow, as I said, this doesn’t matter. Indeed, far from mattering, it is to be positively welcomed, since if the solid angles of the dodecahedra that meet at a point add up to more than , then it indicates that the geometry of the resulting manifold will be hyperbolic, which is exactly what we want. I presume that another way of defining the example is to start with a tiling of hyperbolic 3-space by regular dodecahedra and then identify neighbouring dodecahedra using little twists. I’m guessing here, but opposite faces of a dodecahedron are parallel, while not being translates of one another. So maybe as you come out of a face, you give it the smallest (anticlockwise, say) twist you can to make it a translate of the opposite face, which will be a rotation by an angle of , and then re-enter the opposite face by the corresponding translated point. But it’s not clear to me that that is a consistent definition. (I haven’t said which dodecahedral tiling I’m even taking. Perhaps the one where all the pentagons have right angles at their vertices.)

The other example was actually a pair of examples. One was a figure-of-eight-knot complement, and the other was the complement of the Whitehead link. Agol showed us drawings of the knot and link: I’ll leave you to Google for them if you are interested.

How does a knot complement give you a 3-manifold? I’m not entirely sure. One thing that’s clear is that it gives you a 3-manifold with boundary, since you can take a tubular neighbourhood of the knot/link and take the complement of that, which will be a 3D region whose boundary is homeomorphic to a torus but sits in in a knotted way. I also know (from Thurston, but I’ve seen it before) that you can produce lots of 3-manifolds by defining some non-trivial homeomorphism from a torus to itself, removing a tubular neighbourhood of a knot from and gluing it back in again, but only after applying the homeomorphism to the boundary. That is, given your solid knot and your solid-knot-shaped hole, you identify the boundary of the knot with the boundary of the hole, but not in the obvious way. This process is called Dehn surgery, and in fact can be used to create all 3-manifolds.

But I still find myself unable to explain how a knot complement is *itself* a 3-manifold, unless it is a 3-manifold with boundary, or one compactifies it somehow, or something. So I had the illusion of understanding during the talk but am found out now.

The twisted-dodecahedron example was discovered by Seifert and Weber, and is interesting because it is a non-Haken manifold (a discovery of Burton, Rubinstein and Tillmann) that is virtually Haken.

Going back to the question of why the geometrization conjecture didn’t just finish off the subject, my guess is that it is probably possible to construct lots of complicated 3-manifolds that obviously satisfy the geometrization conjecture because they are already hyperbolic, but that are not by virtue of that fact alone easy to understand. What Agol appeared to say is that the role of the geometrization conjecture is essentially to reduce the whole problem of understanding 3-manifolds to that of understanding hyperbolic 3-manifolds. He also said something that is more or less a compulsory remark in a general lecture on 3-manifolds, namely that although they are topological objects, they are studied by geometrical means. (The corresponding compulsory remark for 4-manifolds is that 4D is the odd dimension out, where lots of weird things happen.)

As I’ve said, Agol discussed two other problems. I think the virtual Haken conjecture was the big one (after all, that was the title of his lecture), but the other two were, as he put it, stronger statements that were easier to think about. Question 17 asks whether every aspherical 3-manifold virtually has positive first Betti number, and question 18 asks whether it virtually fibres over the circle. I’ll pass straight to the second of these questions.

A 3-manifold *fibres over the circle* if there is a (suitably nice) map such that the preimage of every point in is a surface (the fibre at that point).

Let me state Agol’s main results without saying what they mean. In 2008 he proved that if is virtually special cubulated, then it is virtually fibred. In 2012 he proved that cubulations with hyperbolic fundamental group are virtually special, answering a 2011 conjecture of Wise. A corollary is that every closed hyperbolic 3-manifold virtually fibres over the circle, which answers questions 16-18.

There appears to be a missing step there, namely to show that every closed hyperbolic 3-manifold has a cubulation with hyperbolic fundamental group. That I think must have been the main message of what he said in a fairly long discussion about cubulations that preceded the statements of these big results, and about which I did not take detailed notes.

What I remember about the discussion was a number of pictures of cube complexes made up of cubes of different dimensions. An important aspect of these complexes was a kind of avoidance of positive curvature, which worked something like this. (I’ll discuss a low-dimensional situation, but it generalizes.) Suppose you have three squares that meet at a vertex just as they do if they are faces of a cube. Then at that vertex you’ve got some positive curvature, which is what you want to avoid. So to avoid it, you’re obliged to fill in the entire cube, and now the positive curvature is rendered harmless because it’s just the surface of some bit of 3D stuff. (This feels a bit like the way we don’t pay attention to embedded surfaces unless they are incompressible.)

I haven’t given the definition because I don’t remember it. The term CAT(0) came up a lot. At the time I felt I was following what was going on reasonably well, helped by the fact that I had seen an excellent talk by my former colleague Vlad Markovic on similar topics. (Markovic was mentioned in Agol’s talk, and himself was an invited speaker at the ICM.) The main message I remember now is that there is some kind of dictionary between cube complexes and 3-manifolds, so you try to find “cubulations” with particular properties that will enable you to prove that your 3-manifolds have corresponding properties. Note that although the manifolds are three-dimensional, the cubes in the corresponding cube complexes are not limited to three dimensions.

That’s about all I can remember, even with the help of notes. In case I have given the wrong impression, let me make clear that I very much enjoyed this lecture and thought it got the “working” part of the congress off to a great start. And it’s clear that the results of Agol and others are a big achievement. If you want to watch the lecture for yourself, it can be found here.

**Update.** I have found a series of three nice-looking blog posts by Danny Calegari about the virtual Haken conjecture and Agol’s proof. Here are the links: part 1, part 2 and part 3.

]]>

When the announcement was made a few hours earlier, my knowledge of Subhash Khot could be summarized as follows.

- He’s the person who formulated the unique games conjecture.
- I’ve been to a few talks on that in the past, including at least one by him, and there have been times in my life when I have briefly understood what it says.
- It’s a hardness conjecture that is a lot stronger than the assertion that PNP, and therefore a lot less obviously true.

What I hoped to get out of the laudatio was a return to the position of understanding what it says, and also some appreciation of what was so good about Khot’s work. Anybody can make a conjecture, but one doesn’t usually win a major prize for it. But sometimes a conjecture is so far from obvious, or requires such insight to formulate, or has such an importance on a field, that it is at least as big an achievement as proving a major theorem: the Birch–Swinnerton-Dyer conjecture and the various conjectures of Langlands are two obvious examples.

The unique games conjecture starts with a problem at the intersection of combinatorics and linear algebra.

Suppose you are given a collection of linear equations over the field . Then you can use Gaussian elimination to determine whether or not they have a solution. Now suppose that you find out that they do *not* have a solution. Then something you might consider doing is looking for an assignment to the variables that solves as many of the equations as possible. If , then a random assignment will solve on average half the equations, so it must be possible to solve at least half the equations. So the interesting thing is to do better than 50%. A famous result of Johan Håstad states that this cannot be done, even when each equation involves just three variables. (Actually, that restriction to three variables is not the surprising aspect — there are many situations where doing something for 2 is easy and the difficulty kicks in at 3. For example, it is easy to determine whether a graph is 2-colourable — you just start at a vertex, colour all its neighbours differently, etc. etc., and since all moves are forced apart from when you start again at a new connected component, if the process doesn’t yield a colouring then you know there isn’t one — but NP-hard to determine whether it is 3-colourable.)

More precisely, Håstad’s result says that for any fixed , if there were a polynomial-time algorithm that could tell you whether it was possible to satisfy at least a proportion of a collection of linear equations over (each equation involving three variables), then P would equal NP. His proof relies on one of the big highlights of theoretical computer science: the PCP theorem.

The unique games conjecture also concerns maximizing the number of linear equations you can solve, but this time we work mod and the equations are very special: they take the form .

To get a little intuition about this, I suppose one should do something I haven’t done until this very moment, and think about how one might go about finding a good algorithm for solving as many equations of this type from some collection as possible. An obvious observation is that once we’ve chosen , the value of is determined if we want to solve the equation . And that may well determine another variable, and so on. It feels natural to think of these equations as a labelled directed graph with the variables as vertices and with an edge from to labelled if the above equation is present in the system. Then following the implications of a choice of variables is closely related to exploring the component of that vertex in the graph. However, since our aim is to solve as many equations as possible, rather than all of them, we have the option of removing edges to make our task easier, though we want to remove as few edges as possible.

Maybe those few remarks will make it seem reasonably natural that the unique games conjecture can be connected with something called the *max cut problem*. This is the problem of finding a partition of the vertices of a graph into two sets such that the number of edges from one side to the other is as big as possible.

Actually, while browsing some slides of Håstad, I’ve just seen the following connection, which seems worth mentioning. If and all the equal 1, then if and only if the variables and get different assignments. So in this case, solving as many equations as possible is precisely the same as the max cut problem.

However, before we get too carried away with this, let me say what the unique games conjecture actually says. Apparently it has been reformulated a few times, and this version comes from 2004, whereas the original version was 2002. It says that even if 99% of the equations (of the form over ) can be simultaneously satisfied, then it is still NP hard to determine whether 1% of them can be simultaneously satisfied. Note that it is important to allow to be large here, since the random approach gives you a proportion straight away. Also, I think 99% and 1% are a friendly way of saying and for an arbitrary fixed .

In case the statement isn’t clear, let me put it slightly more formally. The unique games conjecture says the following. Suppose that for some there exists a polynomial-time algorithm that outputs YES if a proportion of the equations can be solved simultaneously and NO if it is impossible to solve more than a proportion of them, with no requirements on what the algorithm should output if the maximum proportion lies between and . Then P=NP.

At this point I should explain why the conjecture is called the unique games conjecture. But I’m not going to because I don’t know. I’ve been told a couple of times, but it never stays in my head, and when I do get told, I am also told that the name is something of a historical accident, since the later reformulations have nothing to do with games. So I think the name is best thought of as a strange type of label whose role is simply to identify the conjecture and not to describe it.

To give an idea of why the UGC is important, Arora took us back to an important paper of Goemans and Williamson from 1993 concerning the max cut problem. The simple random approach tells us that we can find a partition such that the size of the resulting cut is at least half the number of edges in the graph, since each edge has a 50% chance of joining a vertex in one half to a vertex in the other half. (Incidentally, there are standard “derandomization” techniques for converting observations like this into algorithms for finding the cuts. This is another beautiful idea from theoretical computer science, but it’s been around for long enough that people have got used to it.)

Goemans and Williamson were the first people to go beyond 50%. They used semidefinite programming to devise an algorithm that could find a cut for which the number of edges was at least 0.878 times the size of the max cut. I don’t know what that 0.878 really is — presumably some irrational number that came out of the proof — but it was sufficiently unnatural looking that there was a widespread belief that the bound would in due course be improved further. However, a check on that belief was given in 2004 by Khot, Kindler, Mossel and O’Donnell and in 2005 by Mossel, O’Donnell and Oleskiewicz (how they all contributed to the result I don’t know), who showed the very surprising result that if UGC is true, then the Goemans-Williamson bound is optimal. From what I understand, the proof is a lot more than just a clever observation that max cut can be reduced to unique games. If you don’t believe me, then try to explain to yourself how the constant 0.878 can arise in a simple way from a conjecture that involves only the constants “nearly 0” and “nearly 1”.

In general, it turns out that UGC implies sharp thresholds for approximability for many problems. What this means is that there is some threshold, below which you can do what you want with a polynomial-time algorithm and above which doing what you want is NP hard. (So in the max cut example the threshold is 0.878: getting smaller than that proportion can be done in polynomial time, and getting above that proportion is NP hard — at least if you believe UGC.)

Almost as interesting is that the thresholds predicted by UGC all come from rather standard techniques such as semidefinite programming and linear programming. So in some sense it is telling us not just that a certain *bound* is best possible but that a certain *technique* is best possible. To put it a bit crudely and inaccurately, it’s saying that for one of these problems, the best you can do with semidefinite programming is the best you can do full stop.

Arora said something even stronger that I haven’t properly understood, but I reproduce it for completeness. Apparently UGC even tells us that the failure of a standard algorithm to beat the threshold *on a single instance* implies that no algorithm can do better. I suppose that must mean that one can choose a clever instance in such a way that if the standard algorithm succeeds with that instance, then that fact can be converted into a machine for solving arbitrary instances of UGC. How you get from one instance of one problem to lots of instances of another is mysterious to me, but Arora did say that this result came as a big surprise.

There were a couple of other things that Arora said at the end of his talk to explain why Khot’s work was important. Apparently while the UGC is just a conjecture, and not even a conjecture that is confidently believed to be true (indeed, if you want to become famous, then it may be worth trying your hand at finding an efficient algorithm for it, since there seems to be a non-negligible chance that such an algorithm exists), it has led to a number of non-obvious predictions that have then been proved unconditionally.

Soon after Arora’s laudatio, Khot himself gave a talk. This was an odd piece of scheduling, since there was necessarily a considerable overlap between the two talks (in their content, that is). I’ll end by mentioning a reformulation of UGC that Khot talked about and Arora didn’t.

A very important concept in graph theory is that of *expansion*. Loosely speaking, a graph is called an expander if for any (not too large) set of vertices, there are many edges from that set to its complement. More precisely, if is a -regular graph and is a set of vertices, then we define the expansion of to be the number of edges leaving divided by (the latter being the most such edges there could possibly be). Another way of looking at this is that you pick a random point and a random neighbour of , and define the expansion of to be the probability that is not in .

The expansion of the graph as a whole is the minimum expansion over all subsets of size at most (where is the number of vertices of ). If this quantity is high, it is saying that is “highly interconnected”.

Khot is interested in *small-set* expansion. That is, he picks a small and takes the minimum over sets of size at most rather than at most .

The precise reformulation I’m about to give is not in fact the one that Khot gave but rather a small modification that Boaz Barak, another well-known theoretical computer scientist, gave in his invited lecture a day later. The unique games conjecture is equivalent to the assertion that it is NP hard to distinguish between the following two classes of graphs.

- Graphs where there exists a set of size at most with small expansion.
- Graphs where every set of size at most has very big expansion.

I think for the latter one can take the expansion to be at least 1/2 for each such set, whereas for the former it is at most for some small that you can probably choose.

What is interesting here is that for ordinary expansion there is a simple characterization in terms of the size of the second largest eigenvalue of the adjacency matrix. Since eigenvalues can be approximated efficiently, there is an efficient method for determining whether a graph is an expander. UGC is equivalent to saying that when the sets get small, their expansion properties can “hide” in the graph in a rather strong way: you can’t tell the difference between a graph that has very good small-set expansion and a graph where there’s a set that fails very badly.

I had lunch with Boaz Barak on one of the days of the congress, so I asked him whether he believed UGC. He gave me a very interesting answer (a special case of the more general proposition that Boaz Barak has a lot of very interesting things to say about complexity), which I have unfortunately mostly forgotten. However, my rapidly fading memory is that he would like it to be true, because it would be a beautiful description of the boundary of what algorithms can do, but thinks it may very well be false. He thought that one possibility was that solving the problems that UGC says are NP hard is not in fact NP hard, but not possible in polynomial time either. It is perfectly possible for a problem to be of intermediate difficulty.

Although it wouldn’t directly contradict NP hardness, it would be very interesting to find an algorithm that solved the small-set expansion problem in a time that was only modestly superpolynomial: something like , say. That would probably get you an invitation to speak at an ICM.

]]>

The most concrete thing I remember (without being 100% sure I’ve got it right) is that one of Mirzakhani’s major results concerns counting closed geodesics in Riemann surfaces. A geodesic is roughly speaking a curve that feels like a straight line to an inhabitant of the surface. Another way of putting it is that if you take two points that are close together on a geodesic, then the part of the geodesic between those points is the shortest curve that joins those two points. (Hmm, on writing that I feel that I’ve made an elementary mistake of exposition, in that I have assumed that you know what a Riemann surface is, and then gone to a little trouble to say what a geodesic is, when not many people will know the former without also knowing the latter. To atone for that, let me add a link to the Wikipedia article on Riemann surfaces, though I’m afraid that article is not much good for the beginner. A beginner’s definition, not precise at all but perhaps adequate for the purposes of reading this post, is that a Riemann surface is a surface like a sphere or a torus, but with some very important extra structure that comes from the fact that each little patch of surface looks like a little patch of the complex plane.)

If you follow your nose inside a Riemann surface, then sometimes you get back to where you started and are pointing in the same direction. In that case, you follow your original path all over again and the geodesic is called *closed*. But sometimes that doesn’t happen.

We can further classify closed geodesics into two types: those that cross themselves and those that don’t. The ones that don’t are called *simple*. An example of a simple closed geodesic is a great circle on the surface of a sphere. Apparently, the problem of counting closed geodesics was pretty much solved, but the problem of counting *simple* closed geodesics was significantly harder. It is this problem that Mirzakhani solved. (I’m not quite sure what “solved” meant here — perhaps her work means that if someone gives you a Riemann surface, you can tell them how many simple closed geodesics it contains.)

The more I write, the more I realize that the counting must be up to some kind of equivalence, since otherwise it seems to me that there will almost certainly either be no simple closed geodesics or uncountably many. But I’ll have to wait to look at my notes to get more precise about that.

The other main thing I remember from the talk is that moduli spaces were a very important part of Mirzakhani’s work, which provided another nice thematic connection between the work of different medallists. Just as Avila studied whole families of dynamical systems, a moduli space is a whole family of Riemann surfaces. And in both cases the family is far more than merely a *set* of objects: it is a set *with geometrical structure*. For example, if you take all interval exchange maps that chop into five parts and permute them in a certain specified way, then each one is uniquely determined by the end points of the intervals other than and . So we can naturally associate with each one an element of the set

(Those include some degenerate examples.) This is a polyhedral subset of , so it has nice geometrical, topological and measure-theoretic structure, which allows one to talk about almost all interval exchange maps, or nowhere dense sets of interval exchange maps, and so on.

An example that people often give to demonstrate what a moduli space is (and I should say that my entire knowledge of this concept comes from my memory of editing a very nice article by David Ben-Zvi on the subject for the Princeton Companion to Mathematics — though obviously anything I say about them that is false is not his fault) is the space of all tori. If you are not used to Riemann surfaces, then you may think that there is just one torus up to isomorphism, but there you would be wrong. Topologically it is true, but we want an isomorphism *of Riemann surfaces*, and the maps that you are allowed to use are much more rigid. So for example if you take the complex plane and quotient out by , you get a torus that is not isomorphic to the torus you get if instead you quotient out by the triangular lattice. (Roughly speaking, the obvious attempt to define an isomorphism would involve shearing the plane, but shears are not holomorphic.)

If we quotient by two lattices, when will the results give isomorphic tori? If one is an expansion of the other, then they will, and if one is a rotation of the other, then they will again. From that we get that if two complex numbers generate a lattice, then the isomorphism type of the torus depends only on their ratio. So we have already reduced the family of tori to a single complex parameter. However, that isn’t the whole story as different complex parameters do not necessarily give rise to different tori. But it gives some idea that the tori form a “space” that itself has an interesting geometrical structure. For reasons I don’t fully understand, moduli spaces are very helpful in the study of Riemann surfaces, and are also extremely interesting objects in their own right.

OK that’s about it for what I remember. But before I look at my notes, I’d like to mention briefly one other connection with Avila, which is that Mirzakhani is also very interested in billiards in polygons, though this wasn’t mentioned in the laudatio.

Actually, that reminds me of one other thing, which is that one of Mirzakhani’s results is strongly reminiscent of famous results of Marina Ratner. Maybe I’ll be able to say more about that after looking at my notes.

OK, now I’ve looked at my notes I find that, as I thought, I had forgotten quite a bit.

One important detail is that Mirzakhani looked at surfaces of genus at least two (that is, surfaces with at least two “holes”, so not tori). This is important because it means that the metrics on them are hyperbolic. It turns out that the moduli space of Riemann surfaces of genus is a complex variety of complex dimension , and is also a symplectic orbifold. (An orbifold is a bit like a manifold but is allowed to have a few singularities. In the torus example, one of these singularities arises as a result of the fact that the triangular lattice has a symmetry — rotation by 60 degrees — that most lattices do not have.)

The moduli spaces are totally inhomogeneous. That is very important, but I don’t know what it means. (I can’t remember whether McMullen told us — probably he did.)

McMullen concentrated on three aspects of Mirzakhani’s work. The first was what I’ve already mentioned, namely counting simple closed geodesics. My feeling that there would be uncountably many of these unless one looked at equivalence classes somehow was based on the sphere and the torus, so maybe when the geometry becomes hyperbolic.

He told us that if is a Riemann surface of genus , then the the number of simple loops grows like . I can’t remember what the parameter means. I’ve written to indicate what is being counted.

It seems a bit silly not to try to find out what is going on here, so let me have a quick look at the citation.

Ah, that makes much more sense! stands for length. So the formula is an estimate for the number of simple loops of length at most . If you look at all closed geodesics (i.e., allowing self-crossing ones too) then the growth rate is .

This apparently led to a new proof of a famous conjecture of Witten — a formula for intersection numbers on the moduli space — which was originally proved by Kontsevich in 1992.

Another consequence is the result that the probability that a random simple loop in genus 2 cuts the surface into two pieces is 1/7.

The second major topic was complex geodesics in . I don’t know the precise definition, but I presume that the idea is that if you take a point in that is surrounded by a copy of an infinitesimally small part of the complex plane, then there is a unique way of continuing that “in the same direction” and getting what I presume is a Riemann surface that lives inside . So it would be a little bit like a 2D generalization of a geodesic but would also involve the complex structure. Ah, I see that I have written that a complex geodesic is a holomorphic isometry from the hyperbolic plane to , though I wonder whether that should be a local isometry — that is, that for each point in the hyperbolic plane there is a neighbourhood such that the restriction of the map to that neighbourhood is an isometry.

I’ve written that there are complex geodesics through every point in in every direction, and that they are called Teichmuller discs.

Apparently real geodesics are usually dense in . Sometimes they can be exotic shapes such as fractal cobwebs (whatever those are), defying classification. What about in two dimensions? Can we get some 2D analogue of fractal cobwebs? No we can’t. Mirzakhani and her coworkers showed that you always get an algebraic subvariety. This is strongly reminiscent of work of Margulis and Ratner.

What is remarkable about this result is that it is an analogue of the Margulis/Ratner results in a totally inhomogeneous situation, which was completely unexpected.

I’ve just cheated and looked at the citation again, because it seemed to be particularly important to get some idea of what “totally inhomogeneous” means. The answer is fairly simple. A homogeneous space is one where the geometry at every point is the same. To say that is totally inhomogeneous is to say that at *no* two points is the geometry the same. While looking for that, I also saw that Mirzakhani solved the simple-loop-counting problem by connecting it to a certain volume computation in the moduli space . So it was a definite case where looking at the entire family helps you to prove things about the individual members of the family.

The third aspect of Mirzakhani’s work that McMullen talked about concerned something called earthquake flow that was defined by Thurston. I thought I had some understanding of what this was when I was watching the talk, but can’t really remember now. On watching the explanation again, I find that I can understand part of what McMullen says (about deforming Riemann surfaces by cutting along closed geodesics and giving them a twist, and then doing something similar but with an entire “lamination” of closed geodesics), but I still don’t quite get how that leads to a flow. (If you want to try, then the video is here and the explanation starts at 25:24.)

The result is that the earthquake flow is ergodic and mixing, and this means something like that if you randomly apply earthquakes then you get all shapes of genus . Apparently, Mirzakhani established a measurable isomorphism between earthquake flow and horocycle flow, and this was a big surprise. Those are just words to me, but when I hear someone like Curt McMullen say that a result is very surprising, then I am impressed.

]]>

I was rescued by an extraordinary piece of luck. When I got to the gate with my boarding card, the woman who took it from me tore it up and gave me another one, curtly informing me that I had been upgraded. I have no idea why. I wonder whether it had anything to do with the fact that in order to avoid standing any longer than necessary I waited until almost the end before boarding. But perhaps the decision had been made well before that: I have no idea how these things work. Anyhow, it meant that I could make my seat pretty well horizontal and I slept for quite a lot of the journey. Unfortunately, I wasn’t feeling well enough to make full use of all the perks, one of which was a bar where one could ask for single malt whisky. I didn’t have any alcohol or coffee and only picked at my food. I also didn’t watch a single film or do any work. If I’d been feeling OK, the day would have been very different. However, perhaps the fact that I wasn’t feeling OK meant that the difference it made to me to be in business class was actually greater than it would have been otherwise. I rather like that way of looking at it.

An amusing thing happened when we landed in Paris. We landed out on the tarmac and were met by buses. They let the classy people off first (even we business-class people had to wait for the first-class people, just in case we got above ourselves), so that they wouldn’t have to share a bus with the riff raff. One reason I had been pleased to be travelling business class was that it meant that I had after all got to experience the top floor of an Airbus 380. But when I turned round to look, there was only one row of windows, and then I saw that it had been a Boeing 777. Oh well. It was operated by Air France. I’ve forgotten the right phrase: something like “shared code”. A number of little anomalies resolved themselves, such as that that take-off didn’t feel like the one in Paris, that the slope of the walls didn’t seem quite correct if we were on the top floor, etc.

I thought that as an experiment I would see what I could remember about the laudatio for Martin Hairer without the notes I took, and then after that I would see how much more there was to say *with* the notes. So here goes. The laudatio was given by Ofer Zeitouni, one of the people on the Fields Medal committee. Early on, he made a link with what Ghys had said about Avila, by saying that Hairer too studied situations where physicists don’t know what the equation is. However, these situations were somewhat different: instead of studying typical dynamical systems, Hairer studied stochastic PDEs. As I understand it, an important class of stochastic PDEs is conventional PDEs with a noise term added, which is often some kind of Brownian motion term.

Unfortunately, Brownian motion can’t be differentiated, but that isn’t by itself a huge problem because it can be differentiated if you allow yourself to work with distributions. However, while distributions are great for many purposes, there are certain things you can’t do with them — notably multiply them together.

Hairer looked at a stochastic PDE that modelled a physical situation that gives rise to a complicated fractal boundary between two regions. I think the phrase “interface dynamics” may have been one of the buzz phrases here. The naive approach to this stochastic PDE led quickly to the need to multiply two distributions together, so it didn’t work. So Hairer added a “mollifier” — that is, he smoothed the noise slightly. Associated with this mollifier was a parameter : the smaller was, the less smoothing took place. So he then solved the smoothed system, let tend to zero, showed that the smoothed solutions tended to a limit, and defined that limit to be the solution of the original equation.

The way I’ve described it, that sounds like a fairly obvious thing to do, so what was so good about it?

A first answer is that in this particular case it was far from obvious that the smoothed solutions really did tend to a limit. In order to show this, it was necessary to do a renormalization (another thematic link with Avila), which involved subtracting a constant . The only other thing I remember was that the proof also involved something a bit like a Taylor expansion, but that a key insight of Hairer was that instead of expanding with respect to a fixed basis of functions, one should instead let the basis of functions depend on the function was expanding — or something like that anyway.

I was left with the feeling that a lot of people are very excited about what Hairer has done, because with his new theoretical framework he has managed to go a long way beyond what people thought was possible.

OK, now let me look at the notes and see whether I want to add anything.

My memory seems to have served me quite well. Here are a couple of extra details. An important one is that Zeitouni opened with a brief summary of Hairer’s major contributions, which makes them sound like much more than a clever trick to deal with one particular troublesome stochastic PDE. These were

1. a theory of regularity structures, and

2. a theory of ergodicity for infinite-dimensional systems.

I don’t know how those two relate to the solution of the differential equation, which, by the way, is called the KPZ equation, and is the following.

It models the evolution of interfaces. (So maybe “interface dynamics” was not after all the buzz phrase.)

When I said that the noise was Brownian, I should have said that the noise was completely uncorrelated in time, and therefore makes no sense pointwise, but it integrates to Brownian motion.

The mollifiers are functions that replace the noise term . The constants I mentioned earlier depend on your choice of mollifier, but the limit doesn’t (which is obviously very important).

What Zeitouni actually said about Taylor expansion was that one should measure smoothness by expansions that are tailored (his word not mine) to the equation, rather than with respect to a universal basis. This was a key insight of Hairer.

One of the major tools introduced by Hairer is a generalization of something called rough-path theory, due to Terry Lyons. Another is his renormalization procedure.

Zeitouni summarized by saying that Hairer had invented new methods for defining solutions to PDEs driven by rough noise, and that these methods were robust with respect to mollification. He also said something about quantitative behaviour of solutions.

If you find that account a little vague and unsatisfactory, bear in mind that my aim here is not to give the clearest possible presentation of Hairer’s work, but rather to discuss what it was like to be at the ICM, and in particular to attend this laudatio. One doesn’t usually expect to come out of a maths talk understanding it so well that one could give the same talk oneself. As I’ve mentioned in another post, there are some very good accounts of the work of all the prizewinners here. (To see them, follow the link and then follow further links to press releases.)

**Update:** if you want to appreciate some of these ideas more fully, then here is a very nice blog post: it doesn’t say much more about Hairer’s work, but it does a much better job than this post of setting his work in context.

]]>

Dick Gross also gave an excellent talk. He began with some of the basic theory of binary quadratic forms over the integers, that is, expressions of the form . One assumes that they are *primitive* (meaning that , and don’t have some common factor). The *discriminant* of a binary quadratic form is the quantity . The group SL then acts on these by a change of basis. For example, if we take the matrix , we’ll replace by and end up with the form , which can be rearranged to

(modulo any mistakes I may have made). Because the matrix is invertible over the integers, the new form can be transformed back to the old one by another change of basis, and hence takes the same set of values. Two such forms are called *equivalent*.

For some purposes it is more transparent to write a binary quadratic form as

If we do that, then it is easy to see that replacing a form by an equivalent form does not change its discriminant since it is just -4 times the determinant of the matrix of coefficients, which gets multiplied by a couple of matrices of determinant 1 (the base-change matrix and its transpose).

Given any equivalence relation it is good if one can find nice representatives of each equivalence class. In the case of binary quadratic forms, there is a unique representative such that or . From this it follows that up to equivalence there are finitely many forms with any given discriminant. The question of how many there are with discriminant is a very interesting one.

Even more interesting is that the equivalence classes form an Abelian group under a certain composition law that was defined by Gauss. Apparently it occupied about 30 pages of the *Disquisitiones*, which are possibly the most difficult part of the book.

Going back to the number of forms of discriminant , Gauss did some calculations and stated (without proof) the formula

There was, however, a heuristic justification for the formula. (I can’t remember whether Dick Gross said that Gauss had explicitly stated this justification or whether it was simply a reconstruction of what he must have been thinking.) It turns out that the sum on the left-hand side works out as the number of integer points in a certain region of (or at least I assume it is since the binary form has three coefficients), and this region has volume . Unfortunately, however, the region is not convex, or even bounded, so this does not by itself prove anything. What one has to do is show that certain cusps don’t accidentally contain lots of integer points, and that is quite delicate.

One rather amazing thing that Bhargava did, though it isn’t his main result, was show that if a binary quadratic form represents all the positive integers up to 290 then it represents all positive integers, and that this bound is best possible. (I may have misremembered the numbers. Also, one doesn’t have to know that it represents every single number up to 290 in order to prove the result: there is some proper subset of that does the job.)

But the first of his Fields-medal-earning results was quite extraordinary. As a PhD student, he decided to do what few people do, and actually read the *Disquisitiones*. He then did what even fewer people do: he decided that he could improve on Gauss. More precisely, he felt that Gauss’s definition of the composition law was hard to understand and that it should be possible to replace it by something better and more transparent.

I should say that there are more modern ways of understanding the composition law, but they are also more abstract. Bhargava was interested in a definition that would be computational but better than Gauss’s. I suppose it isn’t completely surprising that Gauss might have produced something suboptimal, but what is surprising is that it was suboptimal *and* nobody had improved it in 200 years.

The key insight came to Bhargava, if we are to believe the story he tells us, when he was playing with a Rubik’s cube. He realized that if he put the letters to at the vertices of the cube, then there were three ways of slicing the cube to produce two matrices. One could then do something with their determinants, the details of which I have forgotten, and end up producing three binary quadratic forms that are related, and this relationship leads to a natural way of defining Gauss’s composition law. Unfortunately, I couldn’t keep the precise definitions in my head.

Here’s a fancier way that Dick Gross put it. Bhargava reinvented the composition law by studying the action of SL on . The orbits are in bijection with triples of ideal classes for the ring that satisfy . That’s basically the abstract way of thinking about what Bhargava did computationally.

In this way, Bhargava found a symmetric reformulation of Gauss composition. And having found the right way of thinking about it, he was able to do what Gauss couldn’t, namely generalize it. He found 14 more integral representations on objects like above, which gave composition laws for higher degree forms.

He was also able to enumerate number fields of small degree, showing that the number of fields of degree and discriminant less than grows like . This Gross described as a fantastic generalization of Gauss’s work.

I spent the academic years 2000-2002 at Princeton and as a result had the privilege of attending Bhargava’s thesis defence, at which he presented these results. It must have been one of the best PhD theses ever written. Are there any reasonable candidates for better ones? Perhaps Simon Donaldson’s would offer decent competition.

It’s not clear whether those results would have warranted a Fields medal on their own, but the matter was put beyond the slightest doubt when Bhargava and Shankar proved a spectacular result about elliptic curves. Famously, an elliptic curve comes with a group law: given two points, you take the line through them, see where it cuts the elliptic curve again, and define that to be the inverse of the product. This gives an Abelian group. (Associativity is not obvious: it can be proved by direct computation, but I don’t know what the most conceptual argument is.) The group law takes rational points to rational points, and a famous theorem of Mordell states that the rational points form a finitely generated subgroup. The structure theorem for Abelian groups tells us that for some it must be a product of with a finite group. The integer is called the *rank* of the curve.

It is conjectured that the rank can be arbitrarily large, but not everyone agrees with that conjecture. The record so far is held by the curve

discovered by Noam Elkies (who else?) and shown to have rank 19. According to Wikipedia, from which I stole that formula, there are curves of unknown rank that are known to have rank at least 28, so in another sense the record is 28, in that that is the highest known integer for which there is proved to be an elliptic curve of rank at least that integer.

Bhargava and Shankar proved that the *average* rank is less than 1. Previously this was not even known to be finite. They also showed that at least 80% of elliptic curves have rank 0 or 1.

The Birch–Swinnerton-Dyer conjecture concerns ranks of elliptic curves, and one consequence of their results (or perhaps it is a further result — I’m not quite sure) is that the conjecture is true for at least 66% of elliptic curves. Gross said that there was some hope of improving 66% to 100%, but cautioned that that would not prove the conjecture, since 0% of all elliptic curves doesn’t mean no elliptic curves. But it is still a stunning advance. As far as I know, nobody had even thought of trying to prove average statements like these.

I think I also picked up that there were connections between the delicate methods that Bhargava used to enumerate number fields (which again involved counting lattice points in unbounded sets) and his more recent work with Shankar.

Finally, Gross reminded us that Faltings showed that for hyperelliptic curves (a curve of the form for a polynomial — when is a cubic you get an elliptic curve) the number of rational points is finite. Another result of Bhargava is that for almost all hyperelliptic curves there are in fact no rational points.

While it is clear from what people have said about the work of the four medallists that they have all proved amazing results and changed their fields, I think that in Bhargava’s case it is easiest for the non-expert to understand just *why* his work is so amazing. I can’t wait to see what he does next.

**Update.** Andrew Granville emailed me some corrections to what I had written above, which I reproduce with his permission.

A couple of major things — certainly composition was much better understood by Dirichlet (Gauss’s student) and his version is quite palatable (in fact rather easier to understand, I would say, than that of Bhargava). It also led, fairly easily, to re-interpretation in terms of ideals, and inspired Dedekind’s development of (modern) algebraic number theory. Where Bhargava’s version is interesting is that

1) It is the most extraordinarily surprising re-interpretation.

2) It is a beautiful example of an algebraic phenomenon (involving group actions on representations) that he has been able to develop in many extraordinary and surprising directions.

2/ 66% was proved by Bhargava, Skinner and Wei Zhang and goes some way beyond Bhargava/Shankar, involving some very deep ideas of Skinner (whereas most of Bhargava’s work is accessible to a widish audience).

]]>

The first one was an excellent talk by Etienne Ghys on the work of Artur Avila. (The only other talk I’ve heard by Ghys was his plenary lecture at the ICM in Madrid in 2006, which was also excellent.) It began particularly well, with a brief sketch of the important stages in the history of dynamics. These were as follows.

1. Associated with Newton is the idea that you are given a differential equation, and you try to find solutions. This has of course had a number of amazing successes.

2. However, after a while it became clear that the differential equations for which one could hope to find a solution were not typical. The next stage, initiated by Poincaré, was to aim for something less. One could summarize it by saying that now, given a differential equation, one tries merely to say something interesting about its solutions.

3. In the 1960s, Smale and Thom went a stage further, trying to take on board the realization that often physicists don’t actually know the equation that models the phenomenon they are looking at. As Ghys put it, the endeavour now can be summed up as follows: you are not given a differential equation and you want to say something interesting about its solutions.

Of course, once the well-deserved laugh had died down, he explained a bit further what he meant. One way he put it was to ask what a typical dynamical system looks like.

He then talked about four important results of Avila that fit into this broad framework. One concerns iterates of unimodal maps, which are maps that look like upside-down parabolas (they are zero at 0 and 1 and have a single local maximum in between, which lies above the line ). Avila showed that given an analytic family of such maps, almost every function in the family gives rise either to a very structured dynamical system or a rather random-like one. More precisely, for almost every in the family, either almost every orbit converges to an attracting cycle (such systems are called *regular*) or there is an absolutely continuous measure such that almost every orbit in is distributed according to .

The main tool in the proof is something called the renormalization operator. I didn’t fully understand what this was, but I got a partial understanding. A discrete dynamical system is a set together with a map (usually assumed to have extra properties such as continuity or preservation of measure, which of course requires to have some structure so that those properties make sense) that one iterates. We are interested in orbits, which are simply sequences of the form .

Now suppose you have a subset of . Often you can define a dynamical system on by simply setting to be for the smallest positive integer for which . And often this dynamical system is closely related to the big dynamical system on . In a way I didn’t pick up from the lecture, the renormalization operator exploits this close relationship to turn maps from to into maps from to . We can use this basic idea to define a renormalization operator on the space of all unimodal maps.

It is not obvious to me why this is a good thing to do, except that it fits into the general philosophy, that applies in many many contexts, that considering a lot of objects of a certain type at once is often a great way to learn about individual objects of that type. (This theme was to reappear in a big way in the talk about Mirzakhani’s work.) Avila did not invent the renormalization map, but according to Ghys he is an absolute master at using it, and has in that way made it his own.

The second result was about interval exchange maps. These are maps that take a unit interval, chop it up into finitely many pieces (of varying lengths if you want the map to be interesting) and reassemble them in a different order. In 2007, Avila and Giovanni Forni proved that almost all interval exchange maps are weak mixing. This means that if you take any two sets and , then for almost every the measure of is approximately what you would expect if was a “random set” — that is, the product of the measures of and .

Renormalization was the tool here too. Apparently the key to proving this result was to show that the renormalization map on the space of interval exchange maps is chaotic. I don’t know exactly what this means.

I have always had a soft spot for interval exchange maps, because I once heard a fascinating open problem and thought about it very hard with no success. Suppose you are given a polygonal but not necessarily convex room lined with mirrors and you switch a light on. Must it illuminate the whole room? (Assume that the light comes from a point source.) There is a very nice construction called Kafka’s study, which shows that the answer can be no in a room with a smooth boundary. To draw it, you begin by drawing an ellipse, cutting it in half along the line joining its two foci, which I’ll take to be horizontal, keeping only one half, and then creating a sort of mushroom shape with the half ellipse at the top and a curve that goes horizontally through the two foci but also dips down between the foci (to make the “stalk” of the mushroom). If a beam of light comes out of one focus and hits the boundary of the ellipse, then it bounces back to the other focus. From this it is easy to see that if you switch on a light in the stalk part of the room, then the two other bits that do not lie in the top half of the ellipse will remain dark. I think the idea behind the name was that Kafka could work in the side parts without being disturbed by noise from the stalk part.

Another way of thinking about this is as a billiards problem. If you fire off a billiards ball (infinitesimally small of course) from the stalk part of the room, then however much it bounces, it will never reach the side parts.

What about the polygonal case? If a room is polygonal and all the sides make an angle with the horizontal that’s a rational multiple of , then a billiard ball will only ever travel in one of a finite number of directions, so we can define a map from the set of pairs of the form (boundary point, possible direction from that boundary point) to itself, which, if you think about it for a bit, can be seen to be an interval exchange map.

Years ago I managed to prove to my own satisfaction the known (I’m pretty sure, though I don’t know enough about the area to know where to find it) result that for almost every direction you send a billiard ball out in the resulting orbit will be dense. However, once the angles stop being nice rational multiples of , the dynamical system becomes a rather unpleasant map that moves bits of the plane about while also applying affine transformations to them.

As a means of simplifying the problem, I decided to consider a natural 2D analogue of interval exchange maps. This time you take a square, chop it up into finitely many rectangles, and reassemble the rectangles in some other way into the square. That led to a question I spent a long time on and couldn’t answer. (This was probably in about 1989 or so.) Take a rectangle exchange map of the kind I’ve just described, and take a point in the square. Is it recurrent? That is, will its iterates necessarily come back arbitrarily close to the original point? In the 1D case the answer is yes, and I seem to remember that was a key lemma in the proof about dense orbits.

Note that I’m not asking whether *almost* all points are recurrent: that is an easy excercise (and a result of Poincaré). I really want them all.

Incidentally, a few years after I was obsessed with the billiards-in-polygons problem, a paper came out that purported to solve it. Imagine my surprise when the polygon in question had rational angles. It turned out that the paper did something like assuming that corners absorbed light, or something like that. Anyhow, as far as I know the following two questions are still open, but if not, then I’d be interested to be pointed to the appropriate literature.

1. If you have a light source that’s more like a real light in that light comes in all directions from everywhere in a non-empty open set, then must an arbitrary polygonal room be illuminated?

2. If you take a point in a polygonal room and send off a billiard ball, is it true that for almost every direction you might choose the trajectory of the ball will be dense? (As far as I know “almost every” could mean for every direction not belonging to some countable set.)

Moving on to the other two of Avila’s results, I’m going to say much less. The first one was a solution of the ten-martini problem, so called because Mark Kac offered ten martinis to whoever solved it. Unfortunately, he had died by the time Avila was in a position to claim them. I didn’t really understand the problem, but it was to do with the Schrödinger equation and boiled down to a problem in spectral theory, which Avila, remarkably, solved using dynamical systems.

The last problem was one that Etienne Ghys told us most people assume must be easy when they hear it for the first time, and often offer incorrect proofs. Maybe because he had said that I didn’t have any particular feeling that it should be easy, but perhaps you, dear reader, will.

It is known that a diffeomorphism on a manifold can be approximated (in ) by a diffeomorphism. Avila showed that if the diffeomorphism is volume preserving, then the one can be taken to be volume preserving as well. The proof was apparently very hard.

The main other thing I remember from the talk was that Ghys prepared a sequence of photos that flashed up in front of us in a seemingly endless sequence, of all Avila’s collaborators. The fact that he has so many is one of the remarkable things about him: he is apparently very generous with his ideas, a great illustration of how that kind of generosity can be hugely beneficial not just to the people who are on the receiving end but also to those who exhibit it.

]]>

I didn’t manage to maintain my ignorance of the fourth Fields medallist, because I was sitting only a few rows behind the medallists, and when Martin Hairer turned up wearing a suit, there was no longer any room for doubt. However, there was a small element of surprise in the way that the medals were announced. Ingrid Daubechies (president of the IMU) told us that they had made short videos about each medallist, and also about the Nevanlinna Prize winner, who was Subhash Khot. So for each winner in turn, she told us that a video was about to start. An animation of a Fields medal then rotated on the large screens at the front of the hall, and when it settled down one could see the name of the next winner. The beginning of each video was drowned out by the resulting applause (and also a cheer for Bhargava and an even louder one for Mirzakhani), but they were pretty good. At the end of each video, the winner went up on stage, to more applause, and sat down. Then when the five videos were over, the medals were presented, to each winner in turn, by the president of Korea.

Here they are, getting their medals/prize. It wasn’t easy to get good photos with a cheap camera on maximum zoom, but they give some idea.

After those prizes were announced, we had the announcements of the Gauss prize and the Chern medal. The former is for mathematical work that has had a strong impact outside mathematics, and the latter is for lifetime achievement. The Gauss medal went to Stanley Osher and the Chern medal to Phillip Griffiths.

If you haven’t already seen it, the IMU page about the winners has links to very good short (but not too short) summaries of their work. I’m quite glad about that because I think it means I can get away with writing less about them myself. I also recommend this Google Plus post by John Baez about the work of Mirzakhani.

I have one remark to make about the Fields medals, which is that I think that this time round there were an unusually large number of people who could easily have got medals, including other women. (This last point is important — one should think of Mirzakhani’s medal as the new normal rather than as some freak event.) I have two words to say about them: Mikhail Gromov. To spell it out, he is an extreme, but by no means unique, example of a mathematician who did not get a Fields medal but whose reputation would be pretty much unaltered if he had. In the end it’s the theorems that count, and there have been some wonderful theorems proved by people who just missed out this year.

Other aspects of the ceremony were much as one would expect, but there was rather less time devoted to long and repetitive speeches about the host country than I have been used to at other ICMs, which was welcome.

That is not to say that interesting facts about the host country were entirely ignored. The final speech of the ceremony was given by Martin Groetschel, who told us several interesting things, one of which was the number of mathematics papers published in international journals by Koreans in 1981. He asked us to guess, so I’m giving you the opportunity to guess before reading on.

Now Korea is 11th in the world for the number of mathematical publications. Of course, one can question what this really means, but it certainly means something when you hear that the answer to the question above is 3. So in just one generation a serious mathematical tradition has been created from almost nothing.

He also told us the names of the people on various committees. Here they are, except that I couldn’t quite copy all of them down fast enough.

The Fields Medal committee consisted of Daubechies, Ambrosio, Eisenbud, Fukaya, Ghys, Dick Gross, Kirwan, Kollar, Kontsevich, Struwe, Zeitouni and Günter Ziegler.

The program committee consisted of Carlos Kenig (chair), Bolthausen, Alice Chang, de Melo, Esnault, me, Kannan, Jong Hae Keum, Le Bris, Lubotsky, Nesetril and Okounkov.

The ICM executive committee (if that’s the right phrase) for the next four years will be Shigefumi Mori (president), Helge Holden (secretary), Alicia Dickenstein (VP), Vaughan Jones (VP), Dick Gross, Hyungju Park, Christiane Rousseau, Vasudevan Srinivas, John Toland and Wendelin Werner.

He also told us about various initiatives of the IMU, one of which sounded interesting (by which I don’t mean that the others didn’t). It’s called the adopt-a-graduate-student initiative. The idea is that the IMU will support researchers in developed countries who want to provide some kind of mentorship for graduate students in less developed countries working in a similar area who might otherwise not find it easy to receive appropriate guidance. Or something like that.

Ingrid Daubechies also told us about two other initiatives connected with the developing world. One was that the winner of the Chern Medal gets to nominate a good cause to receive a large amount of money. Stupidly I seem not to have written it down, but it may have been $250,000. Anyhow, that order of magnitude. Phillip Griffiths chose the African Mathematics Millennium Science Initiative, or AMMSI. The other was that the five winners of the Breakthrough Prizes in mathematics, Donaldson, Kontsevich, Lurie, Tao and Taylor, have each given $100,000 towards a $500,000 fund for helping graduate students from the developing world. I don’t know exactly what form the help will take, but the phrase “breakout graduate fellowships” was involved.

When I get time, I’ll try to write something about the Laudationes, but right now I need to sleep. I have to confess that during Jim Simons’s talk, my jet lag caught up with me in a major way and I simply couldn’t keep awake. So I don’t really have much to say about it, except that there was an amusing Q&A session where several people asked long rambling “questions” that left Jim Simons himself amusingly nonplussed. His repeated requests for short pithy questions were ignored.

Just before I finish, I’ve remembered an amusing thing that happened during the early part of the ceremony, when some traditional dancing was taking place (or at least I assume it was traditional). At one point some men in masks appeared, who looked like this.

Just while we’re at it, here are some more dancers.

Anyhow, when the men in masks came on stage, there were screams of terror from Mirzakhani’s daughter, who looked about two and a half, and delightful, and she (the daughter) took a long time to be calmed down. I think my six-year-old son might have felt the same way — he had to leave a pantomime version of Hansel and Gretel, to which he had been taken as a birthday treat when he was five, almost the instant it started, and still has those tendencies.

]]>

The flight over was not exactly fun — a night flight never is — but I watched two passable films, got a little bit of work done, missed out on the hot towels (which was good news because it meant I must have been properly asleep), and had possibly the best inflight meal of my life. The last was probably a well-known dish but it happened not to be known to me. I had a choice between beef, chicken, and bibimbap, with the first two being western and the third Korean. That was a no-brainer, but when I asked for the bibimbap I was given not just the bibimbap itself but a leaflet explaining how to assemble it. The steps were as follows.

1. Please put the steamed rice into the “Bibimbap” bowl.

2. Add gochujang (Korean hot pepper paste).

Spicy level 1. (Mild): 1/2 of tube.

Spicy level 2. (Hot): Full tube.

3. Add sesame oil.

4. Mix the “Bibimbap” together.

5. Enjoy the “Bibimbap” with side dish and soup.

I squeezed out almost all the tube of hot pepper sauce and the result was pleasantly hot without threatening to be painful. It was also delicious and substantial. The soup, which I think may have been seaweed soup, was also very good.

I now regret choosing omelette for breakfast when I could have had something called rice porridge, which also looked interesting. (The omelette wasn’t.)

The one other notable thing about the flight was that the plane was so vast that it took off before it felt as though it had picked up enough speed to do so. It also satisfied the “law of turbulence”: that no matter how big a plane is, it gets buffeted about just as much as any other plane. I wonder if there is some scaling law there: for instance, the faster you go, the more dramatic the changes in pressure and wind direction, or something like that.

Seoul was fairly similar to what I expected, though a bit more spread out perhaps. My impression of the place is gleaned from just one bus journey (over an hour) from airport to hotel. Maybe I’ll have more to say about it later.

When I arrived, I immediately went to register. That was quick and efficient, and I picked up my unusually tasteful conference bag, which resembles a large handbag. I had a choice between black and brown, and went daringly for the latter. It had the usual kinds of things in it, with one exception: no notepad. (For the younger generation out there, that means a number of sheets of paper conveniently joined together, rather than some kind of tablet computer.) That will make my note-taking work slightly harder, but I’ll think of something.

The first event of the ICM was an opening reception, which took place in a huge room in the conference centre. There was an extraordinary amount of food there, and also beer, which was very welcome. The food was good, and some of it interestingly Korean, but it didn’t quite reach the heights of the bibimbap (or should that be “Bibimbap”?).

Although I’m not strictly forced to leave the hotel, I’m not sure I’m ready to pay $40 for breakfast, so I’m going to nip out quickly and try to find some coffee and a bun or something like that. I noticed from the bus that there were lots of quite promising looking coffee places: it will certainly be a bonus if, as looks as though will be the case, Korea is a country where one can get a good cup of coffee. And then it’s off to the opening ceremony. More later.

Actually, more sooner, because I’ve just remembered that I was going to mention an amusing story that I was told at the reception yesterday. Apparently the Pope is visiting Korea, and asked for an audience with the president today. And the president told him that he would have to wait till tomorrow, because today she was otherwise occupied. It’s heartening to know that mathematics takes precedence over the Catholic church.

And slightly more again: I have a bit of battery left on my laptop, which I was allowed to bring into the opening ceremony. As was advised, I got here very much earlier than the start time, which makes an already long ceremony a significant chunk longer. We’ve been treated to Beatles songs arranged for some Korean instrument that I don’t know the name of — it looks a bit like a lyre but sits horizontally on the lap. Meanwhile, it seems that the names of the Fields Medallists have, disappointingly, been leaked. Despite that, I’ve managed to maintain my ignorance. (To be more accurate, I am now certain about three of the names but still don’t know who the fourth person is. We’ll see whether I can avoid learning that before it is announced.)

]]>

Just as the last ICM was the first (and still only) time I had been to India, this one will be my first visit to Korea. I’m looking forward to that aspect too, though my hotel is right next to where the congress is taking place and the programme looks pretty packed, so I’m not sure I’ll see much of the country. Talking of the packedness of the programme, I can already see that there are going to be some agonising decisions. For example, Tom Sanders is giving an invited lecture at the same time as Ryan Williams, two speakers I very much want to listen to. I suppose I’ll just have to read the proceedings article of the one I don’t go to. Equally unfortunate is that Ben Green’s plenary lecture is not until next week, when I’ll have gone. But I hope that I’ll still be able to get some kind of feel for where mathematics is now, what people outside my area consider important, and so on, and that I’ll be able to convey some of that in the next few posts.

I’d better stop this now, since I’ll soon be getting on to an Airbus 380 — a monstrously large double-decker plane. One of my children is something of a transport enthusiast and told me in advance that this would be the case (he had looked it up on the internet). I had hoped to end up on the top floor, but that turns out to be for business class only. The flight is about 11 hours: it leaves at 9pm French time and arrives at around 2:30pm Korean time. The challenge will be not to be utterly exhausted by the time of the opening ceremony on Wednesday morning. My memory of Hyderabad is that by the end of the four days I was so tired that I was almost getting anxious about my health. I plan to look after myself a bit better this time, but it may be difficult.

]]>