The structure of the story is wearily familiar after what happened with USS pensions. The authorities declare that there is a financial crisis, and that painful changes are necessary. They offer a consultation. In the consultation their arguments appear to be thoroughly refuted. The refutation is then ignored and the changes go ahead.

Here is a brief summary of the painful changes that are proposed for the Leicester mathematics department. The department has 21 permanent research-active staff. Six of those are to be made redundant. There are also two members of staff who concentrate on teaching. Their number will be increased to three. How will the six be chosen? Basically, almost everyone will be sacked and then invited to reapply for their jobs in a competitive process, and the plan is to get rid of “the lowest performers” at each level of seniority. Those lowest performers will be considered for “redeployment” — which means that the university will make efforts to find them a job of a broadly comparable nature, but doesn’t guarantee to succeed. It’s not clear to me what would count as broadly comparable to doing pure mathematical research.

How is performance defined? It’s based on things like research grants, research outputs, teaching feedback, good citizenship, and “the ongoing and potential for continued career development and trajectory”, whatever that means. In other words, on the typical flawed metrics so beloved of university administrators, together with some subjective opinions that will presumably have to come from the department itself — good luck with offering those without creating enemies for life.

Oh, and another detail is that they want to reduce the number of straight maths courses and promote actuarial science and service teaching in other departments.

There is a consultation period that started in late August and ends on the 30th of September. So the lucky members of the Leicester mathematics faculty have had a whole month to marshall their to-be-ignored arguments against the changes.

It’s important to note that mathematics is not the only department that is facing cuts. But it’s equally important to note that it *is* being singled out: the university is aiming for cuts of 4.5% on average, and mathematics is being asked to make a cut of more like 20%. One reason for this seems to be that the department didn’t score all that highly in the last REF. It’s a sorry state of affairs for a university that used to boast Sir Michael Atiyah as its chancellor.

I don’t know what can be done to stop this, but at the very least there is a petition you can sign. It would be good to see a lot of signatures, so that Leicester can see how damaging a move like this will be to its reputation.

]]>

I’ll consider three questions: why we need supranational organizations, to what extent we should care about sovereignty, and whether we should focus on the national interest.

In the abstract, the case for supranational organizations is almost too obvious to be worth making: just as it often benefits individual people to form groups and agree to restrict their behaviour in certain ways, so it can benefit nations to join groups and agree to restrict their behaviour in certain ways.

To see in more detail why this should be, I’ll look at some examples, starting with an example concerning individual people. It has sometimes been suggested that a simple way of dealing with the problem of drugs in sport would be to allow people to use whatever drugs they want. Even with the help of drugs, the Ben Johnsons of this world can’t set world records and win Olympic gold medals unless they are also amazing athletes, so if we allowed drugs, there would still be a great deal of room for human achievement.

There are many arguments against this proposal. A particularly powerful one is that allowing drugs has the effect of making them compulsory: they offer enough of a boost to performance that a drug-free athlete would almost certainly be unable to compete at the highest level if a large proportion of other athletes were taking drugs. Since taking drugs has serious adverse health effects — for instance, it has led to the deaths of several cyclists — it is better if competitors agree to forswear this method of gaining a competitive advantage. But just saying, “I won’t take drugs if you don’t” isn’t enough, since for any individual there will always be a huge temptation to break such an agreement. So one also needs organizations to which athletes belong, with precise rules and elaborate systems of testing.

This example has two features that are characteristic of many cooperative agreements.

- It is better for everybody if everybody cooperates than if everybody breaks the agreement.
- Whatever everybody else does, any individual will benefit from breaking the agreement (at least in the short term — of course, others may then follow suit).

These are the classic features of the Prisoner’s Dilemma, and whenever they occur, there is a case for an enforceable agreement. Such an agreement will leave everybody better off by forcing individuals not to act in their immediate self-interest.

The “individuals” in the Prisoner’s Dilemma need not be people: they can just as easily be countries. Here are a few examples.

Many people think that a country is better off if its workers are decently paid, do not work excessively long hours, and work in a safe environment. (If you are sufficiently right wing, then you may disagree, but that just means that you will need other examples to illustrate the abstract principle.) However, treating workers decently costs money, so if you are a company that is competing with companies from other countries, it is tempting to gain a competitive advantage by paying workers less, making them work longer hours, and cutting back on health and safety measures, which will enable you to reduce the price of your product. More generally, if you are a national government, it is tempting to gain a competitive advantage for your whole country by allowing companies to treat their workers less well. And it may be that that competitive advantage is of net benefit to your country: yes, some workers suffer, but the benefit to the economy in general reduces unemployment, helps your country to build more hospitals, and so on.

In such a situation, it may benefit an individual country to become “the sweatshop of Europe”. If that is the case, then in the absence of a supranational organization that forbids this, there is a pressure on all countries to do it, after which (i) there is no competitive advantage any more and (ii) workers are worse off. Thus, with a supranational organization, all countries are better off.

Another obvious example — so obvious that I won’t dwell on it — is the need to combat climate change. (Again, this will not appeal to a certain sort of right-winger who thinks that climate change is a big socialist conspiracy, but I doubt that many of those read this blog.) The world as a whole will be much better off if we all emit less carbon, but if you hold the behaviour of other countries constant, then whatever one country does to reduce carbon emissions makes less difference to its future interests than the cost of making the reductions. So again we need enforceable supranational agreements.

A third example is corporation tax. One way of attracting foreign investment is to have a low rate of corporation tax. So if countries are left completely free to set their tax rates, there may well be a race to the bottom, with the result that no country ends up benefiting very much from the tax revenue from foreign investors. (There will still be other benefits, such as the resulting employment.) But one can lift this “bottom” if a group of countries agrees to keep corporation taxes above a certain level. Unless that level is so high that it puts off foreign investors from investing anywhere in the group, then the countries in the group will now benefit from additional tax revenue.

Every time I hear a Leave campaigner complain about EU regulation, my first reaction is to wonder whether what they really want is to defect from an agreement that is there to deal with an instance of the Prisoner’s Dilemma. And sure enough, they often do. For example, a few days ago the farming minister George Eustice said that leaving the EU would free us from green directives. One of the directives he particularly wants to get rid of is the birds and habitat directive, which costs farmers money because it forces them to protect birds and wildlife habitats. He claims that Britain would introduce its own, better environmental legislation. But without the EU legislation, Britain would have a strong incentive to gain a competitive advantage by making its legislation less strict.

Similarly, a little while ago I heard a fisherman talking about how his livelihood suffered as a result of EU fishing quotas, and how he hoped that Britain would leave the EU and let him fish more. He didn’t put it quite that crudely, but that was basically what he was saying. And yet without quotas, the fishing stock would rapidly decline and that very same fisherman’s livelihood would vanish completely.

Do I trust our government not to succumb to these kinds of agreement-breaking temptations? Of course not. But more to the point, with a supranational body making appropriate legislation, I do not have to.

Sovereignty is often spoken of as though it is a good thing in itself. Why might that be? Well, if a country is free to do what it wants, then it is free to act in the best interests of its inhabitants, whereas if it is restricted by belonging to a supranational organization, then it loses some of that freedom, and therefore risks no longer being able to act in the best interests of its inhabitants.

However, as I have already explained, there are many situations where an agreement benefits all countries, but an individual country can gain, at least in the short term, by breaking it. In such situations, countries are better off without the freedom to act in the *immediate* best interests of their citizens, since those same citizens are better off if the agreements do not break down.

If sovereignty is what really matters, then why should it be *national* sovereignty that is important? Why should I want decisions to be taken at the level of the nation state and not at the level of, say, cities, or continents, or counties, or families? What I feel about it is something like this: I want to have as much influence as possible on the people who are making decisions that affect me, and I want those people to be well informed about my interests and to care about them. That suggests that decisions should be made at the lowest possible level. However, for the reasons rehearsed above, there are often advantages to be gained from taking decisions at a higher level, and those advantages often outweigh the resulting loss of influence I have. For example, I am happy to pay income tax, since there is no realistic more local way to finance much of the country’s infrastructure from which I greatly benefit. Unfortunately I don’t have much influence over the national government, so some of the income tax is spent in ways I disapprove of: for example, a few hundred pounds of what I contribute will probably go towards renewing Trident, which is — in my judgment anyway — a gigantic waste of money. But that loss of influence is part of the bargain: the advantages of paying income tax outweigh the disadvantages.

Thus, what really matters is *subsidiarity* rather than sovereignty. One used to hear the word “subsidiarity” constantly in the early 1990s, the last time the Conservative Party was ripping itself apart over Europe, but it has been strangely absent from the debate this time round (or if it hasn’t, then I’ve missed it). It is the principle that decisions should be taken at the lowest level that is appropriate. So, for example, measures to combat climate change should be taken at a supranational level, the decision to build a new motorway should be taken at a national level, and the decision to improve the lighting in a back street should be taken at a town-council level.

The principle of subsidiarity has been enshrined in European Union law since the Maastricht Treaty of 1992. Point 3 of Article 5 of the Lisbon Treaty of 2009 reads as follows.

Under the principle of subsidiarity, in areas which do not fall within its exclusive competence, the Union shall act only if and insofar as the objectives of the proposed action cannot be sufficiently achieved by the Member States, either at central level or at regional and local level, but can rather, by reason of the scale or effects of the proposed action, be better achieved at Union level.

The institutions of the Union shall apply the principle of subsidiarity as laid down in the Protocol on the application of the principles of subsidiarity and proportionality. National Parliaments ensure compliance with the principle of subsidiarity in accordance with the procedure set out in that Protocol.

When I hear politicians on the Leave side talk about sovereignty, I am again suspicious. What I hear is, “I want unfettered power.” But unfettered power for the Boris Johnsons of this world is not in my best interests or the best interests of the UK, which is why I shall vote for the fetters.

All other things being equal, of course the national interest matters, since what is better for my country is, well, better. But all things are not necessarily equal. I don’t for a moment believe that it would be in the UK’s best interests to leave the EU, but just suppose for a moment that it were. That still leaves us with the question of whether it would be in *Europe’s* best interests.

I am raising that question not in order to answer it (though I think the answer is pretty obvious), but to discuss whether it should be an important consideration. So let me suppose, hypothetically, that leaving the EU would be in the best interests of the UK but would be very much not in the best interests of the rest of Europe. Should I vote for the UK to leave?

If I were an extreme utilitarian, I would argue as follows: the total benefit of the UK leaving the EU is the total benefit to the UK minus the total cost to the rest of the EU; that is negative, so the UK should stay in the EU.

However, I am not an extreme utilitarian in that sense: if I were, I would sell my house and give all my money to charities that had been carefully selected (by an organization such as GiveWell) to do the maximum amount of good per pound. My family would suffer, but that suffering would be far outweighed by all the suffering I could relieve with that money. I have no plans to do that, but I am a utilitarian to this extent: such money as I *do* give to charity, I try to give to charities that are as efficient (in the amount-of-good-per-pound sense) as possible. If somebody asks me to give to a good cause, I am usually reluctant, because I feel it is my moral duty to give the money to an even better cause. (As an example, I once refused to take part in an ice bucket challenge but made a donation to one of GiveWell’s recommended charities instead.)

Thus, the principle I adopt is something like this. There are some people I care about more than others: my family, friends, and colleagues (in the broad sense of people round the world with similar interests) being the most obvious examples. Part of the reason for this is the very selfish one that my own interests are bound up with theirs: we belong to identifiable groups, and if those groups as a whole thrive, then that is very positive for me. So when I am making a decision, I will tend to give a significantly higher weight to people who are closer to me, in the sense of having interests that are aligned with mine.

But once that weighting is taken into account, I basically *am* a utilitarian. That is, if I’m faced with a choice, then I want to go for the option that maximizes total utility, except that the utility of people closer to me counts for more. Whether or not it *should* count for more is another question, but it does, and I think it does for most people. (I have oversimplified my position a bit here, but I don’t want to start writing a treatise in moral philosophy.)

So for me the question about national interest boils down to this: do I feel closer to people who are British than I do to people from other European countries?

I certainly feel closer to *some* British people, but that is not really because of their intrinsic Britishness: it’s just that I have lived in Britain almost all my life, so the people I have got close to I have mostly met here. What’s more there are plenty of non-British Europeans I feel closer to than I do to most British people: my wife and in-laws are a particularly strong example, but I also have far more in common with a random European academic, say, than I do with a random inhabitant of the UK.

So the mere fact that someone is British does not make me care about them more. To take an example, some regions of the UK are significantly less well off than others, and have been for a long time. I would very much like to see those regions regenerated. But I do not see why that should be more important to me than the regeneration of, say, Greece. Similarly, I am no more concerned by the fact that the UK is a net contributor to the EU than I am by the fact that I am a net contributor to the welfare state. (In fact, I’m a lot less concerned by it, since the net contribution is such a small proportion of our GDP that it is almost certainly made up for by the free trade benefits that result.)

I have given three main arguments: that we need supranational organizations to deal with prisoner’s-dilemma-type situations, that subsidiarity is what matters rather than sovereignty, and that one should not make a decision that is based solely on the national interest and that ignores the wider European interest.

One could in theory agree with everything I have written but argue that the EU is not the right way of dealing with problems that have to be dealt with at an international level. I myself certainly don’t think it’s perfect, but it is utterly unrealistic to imagine that if we leave then we will end up with an organization that does the job better.

]]>

But as I’ve got a history with this problem, including posting about it on this blog in the past, I feel I can’t just not react. So in this post and a subsequent one (or ones) I want to do three things. The first is just to try to describe my own personal reaction to these events. The second is more mathematically interesting. As regular readers of this blog will know, I have a strong interest in the question of where mathematical ideas come from, and a strong conviction that they *always* result from a fairly systematic process — and that the opposite impression, that some ideas are incredible bolts from the blue that require “genius” or “sudden inspiration” to find, is an illusion that results from the way mathematicians present their proofs after they have discovered them.

From time to time an argument comes along that appears to present a stiff challenge to my view. The solution to the cap-set problem is a very good example: it’s easy to understand the proof, but the argument has a magic quality that leaves one wondering how on earth anybody thought of it. I’m referring particularly to the Croot-Lev-Pach lemma here. I don’t pretend to have a complete account of how the idea might have been discovered (if any of Ernie, Seva or Peter, or indeed anybody else, want to comment about this here, that would be extremely welcome), but I have some remarks.

The third thing I’d like to do reflects another interest of mine, which is avoiding duplication of effort. I’ve spent a little time thinking about whether there is a cheap way of getting a Behrend-type bound for Roth’s theorem out of these ideas (and I’m not the only one). Although I wasn’t expecting the answer to be yes, I think there is some value in publicizing some of the dead ends I’ve come across. Maybe it will save others from exploring them, or maybe, just maybe, it will stimulate somebody to find a way past the barriers that seem to be there.

There’s not actually all that much to say here. I just wanted to comment on a phenomenon that’s part of mathematical life: the feeling of ambivalence one has when a favourite problem is solved by someone else. The existence of such a feeling is hardly a surprise, but slightly more interesting are the conditions that make it more or less painful. For me, an extreme example where it was not at all painful was Wiles’s solution of Fermat’s Last Theorem. I was in completely the wrong area of mathematics to have a hope of solving that problem, so although I had been fascinated by it since boyhood, I could nevertheless celebrate in an uncomplicated way the fact that it had been solved in my lifetime, something that I hadn’t expected.

Towards the other end of the spectrum for me personally was Tom Sanders’s quasipolynomial version of the Bogolyubov-Ruzsa lemma (which was closely related to his bound for Roth’s theorem). That was a problem I had worked on very hard, and some of the ideas I had had were, it turned out, somewhat in the right direction. But Tom got things to work, with the help of further ideas that I had definitely not had, and by the time he solved the problem I had gone for several years without seriously working on it. So on balance, my excitement at the solution was a lot greater than the disappointment that that particular dream had died.

The cap-set problem was another of my favourite problems, and one I intended to return to. But here I feel oddly un-disappointed. The main reason is that I know that if I had started work on it again, I would have continued to try to push the Fourier methods that have been so thoroughly displaced by the Lev-Croot-Pach lemma, and would probably have got nowhere. So the discovery of this proof has saved me from wasting a lot of time at some point in the future. It’s also an incredible bonus that the proof is so short and easy to understand. I could almost feel my brain expanding as I read Jordan Ellenberg’s preprint and realized that here was a major new technique to add to the toolbox. Of course, the polynomial method is not new, but somehow this application of it, at least for me, feels like one where I can make some headway with understanding why it works, rather than just gasping in admiration at each new application and wondering how on earth anyone thought of it.

That brings me neatly on to the next theme of this post. From now on I shall assume familiarity with the argument as presented by Jordan Ellenberg, but here is a very brief recap.

The key to it is the lemma of Croot, Lev and Pach (very slightly modified), which states that if and is a polynomial of degree in variables such that for every pair of distinct elements , then is non-zero for at most values of , where is the dimension of the space of polynomials in of degree at most .

Why does this help? Well, the monomials we consider are of the form where each . The expected degree of a random such monomial is , and for large the degree is strongly concentrated about its mean. In particular, if we choose , then the probability that a random monomial has degree greater than is exponentially small, and the probability that a random monomial has degree less than is also exponentially small.

Therefore, the dimension of the space of polynomials of degree at most (for this ) is at least , while the dimension of the space of polynomials of degree at most is at most . Here is some constant less than 1. It follows that if is a set of density greater than we can find a polynomial of degree that vanishes everywhere on and doesn’t vanish on . Furthermore, if has density a bit bigger than this — say , we can find a polynomial of degree that vanishes on and is non-zero at more than points of . Therefore, by the lemma, it cannot vanish on all with distinct elements of , which implies that there exist distinct such that for some .

Now let us think about the Croot-Lev-Pach lemma. It is proved by a linear algebra argument: we define a map , where is a certain vector space over of dimension , and we also define a bilinear form on , with the property that for every . Then the conditions on translate into the condition that for all distinct . But if is non-zero at more than points in , that gives us such that if and only if , which implies that are linearly independent, which they can’t be as they all live in the -dimensional space .

The crucial thing that makes this lemma useful is that we have a huge space of functions — of almost full dimension — each of which can be represented this way with a very small .

The question I want to think about is the following. Suppose somebody had realized that they could bound the size of an AP-free set by finding an almost full-dimensional space of functions, each of which had a representation of the form , where took values in a low-dimensional vector space . How might they have come to realize that polynomials could do the job? Answering this question doesn’t solve the mystery of how the proof was discovered, since the above realization seems hard to come by: until you’ve seen it, the idea that almost all functions could be represented very efficiently like that seems somewhat implausible. But at least it’s a start.

Let’s turn the question round. Suppose we know that has the property that for every , with taking values in a -dimensional space. That is telling us that if we think of as a matrix — that is, we write for — then that matrix has rank . So we can ask the following question: given a matrix that happens to be of the special form (where the indexing variables live in ), under what circumstances can it possibly have low rank? That is, what about makes have low rank?

We can get some purchase on this question by thinking about how operates as a linear map on functions defined on . Indeed, we have that if is a function defined on (I’m being a bit vague for the moment about where takes its values, though the eventual answer will be ), then we have the formula . Now has rank if and only if the functions of the form form a -dimensional subspace. Note that if is the function , we have that . Since every is a linear combination of delta functions, we are requiring that the translates of should span a subspace of dimension . Of course, we’d settle for a lower dimension, so it’s perhaps more natural to say at most . I won’t actually write that, but it should be understood that it’s what I basically mean.

What kinds of functions have the nice property that their translates span a low-dimensional subspace? And can we find a huge space of such functions?

The answer that occurs most naturally to me is that characters have this property: if is a character, then every translate of is a multiple of , since . So if is a linear combination of characters, then its translates span a -dimensional space. (So now, just to be explicit about it, my functions are taking values in .)

Moreover, the converse is true. What we are asking for is equivalent to asking for the convolutions of with other functions to live in a -dimensional subspace. If we take Fourier transforms, we now want the pointwise products of with other functions to live in a -dimensional subspace. Well, that’s exactly saying that takes non-zero values. Transforming back, that gives us that needs to be a linear combination of characters.

But that’s a bit of a disaster. If we want an -dimensional space of functions such that each one is a linear combination of at most characters, we cannot do better than to take . The proof is the same as one of the arguments in Ellenberg’s preprint: in an -dimensional space there must be at least active coordinates, and then a random element of the space is on average non-zero on at least of those.

So we have failed in our quest to make exponentially close to and exponentially close to zero.

But before we give up, shouldn’t we at least consider backtracking and trying again with a different field of scalars? The complex numbers didn’t work out for us, but there is one other choice that stands out as natural, namely .

So now we ask a question that’s exactly analogous to the question we asked earlier: what kinds of functions have the property that they and their translates generate a subspace of dimension ?

Let’s see whether the characters idea works here. Are there functions with the property that ? No there aren’t, or at least not any interesting ones, since that would give us that for every , which implies that is constant (and because , that constant has to be 0 or 1).

OK, let’s ask a slightly different question. Is there some fairly small space of functions from to that is closed under taking translates? That is, we would like that if belongs to the space, then for each the function also belongs to the space.

One obvious space of functions with this property is linear maps. There aren’t that many of these — just an -dimensional space of them (or -dimensional if we interpret “linear” in the polynomials sense rather than the vector-spaces sense) — sitting inside the -dimensional space of *all* functions from to .

It’s not much of a stretch to get from here to noticing that polynomials of degree at most form another such space. For example, we might think, “What’s the simplest function I can think of that isn’t linear?” and we might then go for something like . That and its translates generate the space of all quadratic polynomials that depend on only. Then we’d start to spot that there are several spaces of functions that are closed under translation. Given any monomial, it and its translates generate the space generated by all smaller monomials. So for example the monomial and its translates generate the space of polynomials of the form . So any down-set of monomials defines a subspace that is closed under translation.

I think, but have not carefully checked, that these are in fact the *only* subspaces that are closed under translation. Let me try to explain why. Given any function from to , it must be given by a polynomial made out of cube-free monomials. That’s simply because the dimension of the space of such polynomials is . And I think that if you take any polynomial, then the subspace that it and its translates generate is generated by all the monomials that are less than a monomial that occurs in with a non-zero coefficient.

Actually no, that’s false. If I take the polynomial , then every translate of it is of the form . So without thinking a bit more, I don’t have a characterization of the spaces of functions that are closed under translation. But we can at least say that polynomials give us a rich supply of them.

I’m starting this section a day after writing the sections above, and after a good night’s sleep I have clarified in my mind something I sort of knew already, as it’s essential to the whole argument, which is that the conjectures that briefly flitted across my mind two paragraphs ago and that turned out to be false *absolutely had to be false*. Their falsity is pretty much the whole point of what is going on. So let me come to that now.

Let me call a subspace *closed* if it is closed under translation. (Just to be completely explicit about this, by “translation” I am referring to operations of the form , which take a function to the function .) Note that the sum of two closed subspaces is closed. Therefore, if we want to find out what closed subspaces are like, we could do a lot worse than thinking about the closed subspaces generated by a single function, which it now seems good to think of as a polynomial.

Unfortunately, it’s not easy to illustrate what I’m about to say with a simple example, because simple examples tend to be too small for the phenomenon to manifest itself. So let us argue in full generality. Let be a polynomial of degree at most . We would like to understand the rank of the matrix , which is equal to the dimension of the closed subspace generated by , or equivalently the subspace generated by all functions of the form .

At first sight it looks as though this subspace could contain pretty well all linear combinations of monomials that are dominated by monomials that occur with non-zero coefficients in . For example, consider the 2-variable polynomial . In this case we are trying to work out the dimension of the space spanned by the polynomials

.

These live in the space spanned by six monomials, so we’d like to know whether the vectors of the form span the whole of or just some proper subspace. Setting we see that we can generate the standard basis vectors and . Setting it’s not hard to see that we can also get and . And setting we see that we can get the fourth and sixth coordinates to be any pair we like. So these do indeed span the full space. Thus, in this particular case one of my false conjectures from earlier happens to be true.

Let’s see why it is false in general. The argument is basically repeating the proof of the Croot-Lev-Pach lemma, but using that proof to prove an equivalent statement (a bound for the rank of the closed subspace generated by ) rather than the precise statement they proved. (I’m not claiming that this is a radically different way of looking at things, but I find it slightly friendlier.)

Let be a polynomial. One thing that’s pretty clear, and I think this is why I got slightly confused yesterday, is that for every monomial that’s dominated by a monomial that occurs non-trivially in we can find some linear combination of translates such that occurs with a non-zero coefficient. So if we want to prove that these translates generate a low-dimensional space, we need to show that there are some heavy-duty linear dependences amongst these coefficients. And there are! Here’s how the proof goes. Suppose that has degree at most . Then we won’t worry at all about the coefficients of the monomials of degree at most : sure, these generate a subspace of dimension (that’s the definition of , by the way), but unless is very close to , that’s going to be very small.

But what about the coefficients of the monomials of degree greater than ? This is where the linear dependences come in. Let be such a monomial. What can we say about its coefficient in the polynomial ? Well, if we expand out and write it as a linear combination of monomials, then the coefficient of will work out as a gigantic polynomial in . However, and this is the key point, this “gigantic” polynomial will have degree at most . That is, for each such monomial , we have a polynomial of degree at most such that gives the coefficient of in the polynomial . But these polynomials all live in the -dimensional space of polynomials of degree at most , so we can find a spanning subset of them of size at most . In other words, we can pick out at most of the polynomials , and all the rest are linear combinations of those ones. This is the huge linear dependence we wanted, and it shows that the projection of the closed subspace generated by to the monomials of degree at least is also at most .

So in total we get that and its translates span a space of dimension at most , which for suitable is much much smaller than . This is what I am referring to when I talk about a “rank miracle”.

Note that we could have phrased the entire discussion in terms of the rank of . That is, we could have started with the thought that if is a function defined on such that whenever are distinct elements of , and for at least points , then the matrix would have rank at least , which is the same as saying that and its translates span a space of dimension at least . So then we would be on the lookout for a high-dimensional space of functions such that for each function in the class, and its translates span a much lower-dimensional space. That is what the polynomials give us, and we don’t have to mention a funny non-linear function from to a vector space .

I still haven’t answered the question of whether the rank miracle is a miracle. I actually don’t have a very good answer to this. In the abstract, it is a big surprise that there is a space of functions of dimension that’s exponentially close to the maximal dimension such that for every single function in that space, the rank of the matrix is exponentially small. (Here “exponentially small/close” means as a fraction of .) And yet, once one has seen the proof, it begins to feel like a fairly familiar concentration of measure argument: it isn’t a surprise that the polynomials of degree at most form a space of almost full dimension and the polynomials of degree at most form a space of tiny dimension. And it’s not completely surprising (again with hindsight) that because in the expansion of you can’t use more than half the degree for both and , there might be some way of arguing that the translates of live in a subspace of dimension closer to .

This post has got rather long, so this seems like a good place to cut it off. To be continued.

]]>

It has long been a conviction of mine that the effort-reducing forces we have seen so far are just the beginning. One way in which the internet might be harnessed more fully is in the creation of amazing new databases, something I once asked a Mathoverflow question about. I recently had cause (while working on a research project with a student of mine, Jason Long) to use Sloane’s database in a serious way. That is, a sequence of numbers came out of some calculations we did, we found it in the OEIS, that gave us a formula, and we could prove that the formula was right. The great thing about the OEIS was that it solved an NP-ish problem for us: once the formula was given to us, it wasn’t that hard to prove that it was correct for our sequence, but finding it in the first place would have been extremely hard without the OEIS.

I’m saying all this just to explain why I rejoice that a major new database was launched today. It’s not in my area, so I won’t be using it, but I am nevertheless very excited that it exists. It is called the L-functions and modular forms database. The thinking behind the site is that lots of number theorists have privately done lots of difficult calculations concerning L-functions, modular forms, and related objects. Presumably up to now there has been a great deal of duplication, because by no means all these calculations make it into papers, and even if they do it may be hard to find the right paper. But now there is a big database of these objects, with a large amount of information about each one, as well as a great big graph of connections between them. I will be very curious to know whether it speeds up research in number theory: I hope it will become a completely standard tool in the area and inspire people in other areas to create databases of their own.

]]>

Ten pounds bet then would have net me 50000 pounds now, so a natural question arises: should I be kicking myself (the appropriate reaction given the sport) for not placing such a bet? In one sense the answer is obviously yes, as I’d have made a lot of money if I had. But I’m not in the habit of placing bets, and had no idea that these odds were being offered anyway, so I’m not too cut up about it.

Nevertheless, it’s still interesting to think about the question hypothetically: if I *had* been the betting type and had known about these odds, should I have gone for them? Or would regretting not doing so be as silly as regretting not choosing and betting on the particular set of numbers that just happened to win the national lottery last week?

Here’s a possible argument that the 5000-1 odds at the beginning of the season were about right, or at least not too low, and an attempted explanation of why hardly anybody bet on Leicester. If you’ve watched football for any length of time, you know that the league is dominated by the big clubs, with their vast resources to spend on top players and managers. Just occasionally a middle-ranking club has a surprisingly good season and ends up somewhere near the top. But a bottom-ranking club that hasn’t just been lavished with money doesn’t become a top club overnight, and since to win the league you have to do consistently well over an entire season, it just ain’t gonna happen that a club like Leicester will win.

And here are a few criticisms of the above argument.

1. The argument that we know how things work from following the game for years or even decades is convincing if all you want to prove is that it is very unlikely that a team like Leicester will win. But here we want to prove that the odds are not just low, but one-in-five-thousand low. What if the probability of it happening in any given season were 100 to 1? We haven’t had many more than 100 seasons ever, so we might well never have observed what we observed this season.

2. The argument that consistency is required over a whole season is a very strong one if the conclusion to be established is that a mediocre team will almost never win. Indeed, for a mediocre team to beat a very good team some significantly good luck is required. And the chances of that kind of luck happening enough times during a season for the team to win the league are given by the tail of a binomial distribution, so they are tiny.

However, in practice it is not at all true that results of different matches are independent. Once Leicester had won a few matches against far bigger and richer clubs, a simple Bayesian calculation would have shown that it was far more likely that Leicester had somehow become a much better team since last season than that it had won those matches by a series of independent flukes. I think the bookmakers probably made a big mistake by offering odds of 1000-1 three months into the season, at which point Leicester were top. Of course we all expected them to fall off, but were we 99.9% sure of that? Surely not. (I think if I’d known about those odds, I probably would have bet Â£20 or so. Oh well.)

3. Although it was very unlikely that Leicester would suddenly become far better, there were changes, such as a new manager and some unheralded new players who turned out to be incredibly good. How unlikely is it that a player who has caught someone’s eye will be much better than anybody expected? Pretty unlikely but not impossible, I’d have thought: it’s quite common for players to blossom when they move to a new club.

4. The fact that Leicester had a remarkable escape from relegation at the end of last season, winning seven of their last nine matches, was already fairly strong evidence that something had changed (see point 2 above). Had they accumulated their meagre points total in a more uniform manner, it would have reduced the odds of their winning this season.

The first criticism above is not itself beyond criticism, since we have more data to go on than just the English league. If nothing like the Leicester story had happened in any league anywhere in the world since the beginning of the game, then the evidence would be more convincing. But from what I’ve read in the papers the story isn’t *completely* unprecedented: that is, pretty big surprises do just occasionally happen. Though against that, the way that money has come into the game has made the big clubs more dominant in recent years, which would seem to reduce Leicester’s chances.

I’m not going to come to any firm conclusion here, but my instinct is that 5,000-1 was a very good bet to take at the beginning of the season, even without hindsight, and that 1000-1 three months later was an amazing chance. I’m ignoring here the well-known question of whether it is sensible to take unlikely bets just because your expected gain is positive. I’m just wondering whether the expected gain *was* positive. Your back-of-envelope calculations on the subject are welcome …

]]>

We have decided to splash out and use a publishing platform called Scholastica. Scholastica was founded in 2011 by some University of Chicago graduates who wanted to disrupt the current state of affairs in academic publishing by making it very easy to create electronic journals. I say “splash out” because they charge $10 per submission, whereas there are other ways of creating electronic journals that are free. But we have got a lot for that $10, as I shall explain later in this post, and the charge compares favourably, to put it mildly, with the article processing charges levied by more traditional publishers. (An example: if you have had an article accepted by the Elsevier journal Advances in Mathematics, the price you need to pay to make that article open access is $1500; the same amount of money would cover 100 submissions to Discrete Analysis. I didn’t say 150 because there are some small further costs we incur, such as a subscription to CrossRef, which enables us to issue DOIs to our articles.) Most importantly, we do not pass on even this $10 charge to authors, as we have a small fund that covers it.

Now that we have been handling submissions for almost six months, we have been forced to make decisions that leave us with a rather clearer idea about what the scope and standards of the journal are. As far as the scope is concerned, we want to be reasonably broad. For example, the analysis in the paper by Tuomas Hytönen, Sean Li and Assaf Naor is not really discrete in any useful sense, but we judged it to have a similar spirit to the kind of papers that fit the title of the journal more obviously by treating discrete structures using analytic tools. Our rough policy is that if a paper is good enough, then we will not be too worried about whether it has the right sort of subject matter, as long as it isn’t in an area that is completely foreign to the editorial board.

As for the quality, we have been surprised and gratified by the high standard of submissions we have received, which has allowed us to set a high bar, turn away some perfectly respectable papers, and establish Discrete Analysis as a distinctly good journal.

That is an important part of our mission, because we want to show that the cheapness of running the journal is completely compatible with high quality. And that does not just mean mathematical quality. One thing I hope you will notice is that the journal’s website is far better designed than almost any other website of a mathematics journal. This design was done by the Scholastica team for no charge (I think they see it as an investment, since they would like to attract more journals to their platform), and it satisfies various requirements I felt strongly about: for example, that it should be attractive to look at, that one should be able to explore the content of the journal without undue clicking and loading of new pages, and that it would be able to handle basic LaTeX. But it has other features that I did not think of, such as having an image associated with each article (which seems pointless until you actually look at the site and see how the image makes it easier to browse and more tempting to find out about the article) and making the site work well on your phone as well as your laptop. If you compare it with, say, the website of Forum of Mathematics, Sigma, it’s like comparing a Rolls Royce with a Trabant, except that someone has mischievously exchanged the price tags. (Let me add here that there are many good things about Forum of Mathematics. In particular, its editorial practices have been a strong influence on those of Discrete Analysis. And it is far from alone in having an unimaginative and inconvenient website.)

Since I am keen to promote the arXiv overlay model, I was also particularly concerned that Discrete Analysis should not be perceived as “just like a normal journal, but without X, Y and Z”. Rather, I wanted it to be *better* than a normal journal in important respects (and at least equal to a normal journal in all respects that anyone actually cares about). If you visit the website, you will notice that each article gives you an option to click on the words “Editorial introduction”. If you do so, then up comes a description of the article (not on a new webpage, I hasten to add), which sets it in some kind of context and helps you to judge whether you might want to go ahead and read it on the arXiv.

There are at least two reasons for doing this. One is that if the website were nothing but a list of links, then there would be a danger that it would seem a bit pointless: about the only reason to visit it would be to check that when an author claims to have been published by us, then that is actually true. But with article descriptions and a well-designed website, one can actually *browse* the journal. Browsing is something I used to enjoy doing back when print journals were all that there were, but it is quite a lot harder when everything is electronic. (Some websites try to interest you in related content, but it seems to be chosen by rather unsophisticated algorithms, and in any case is not what I am talking about — I mean the less focused kind of browsing where you stumble on an interesting paper that neither you nor an algorithm based on your browsing history would ever have thought of looking at.)

A second reason is that having these introductions goes a small way towards dealing with a serious objection to the current system of peer review, which is that a great deal of valuable information never gets made public. As an editor, I sometimes get to read very interesting information that puts a submitted article into a context that I didn’t know about. All the reader of the journal gets is one bit of information: that the article was accepted rather than rejected. (One could argue that it isn’t even one bit, since we do not learn which articles have been rejected.) Of course, under cover of privacy and anonymity, referees can also make remarks that one would not want to make public, but with article descriptions we don’t have to. We can simply write the descriptions using information from the article itself, prior knowledge, remarks made by the referees, remarks made by editors, relevant facts discovered from the internet, and so on. And how this information is selected and combined can vary from article to article, so the reader won’t know whether any particular piece of information was part of a referee’s report.

Thus, Discrete Analysis is offering services that other journals do not offer. Here’s another one. Suppose you submit an article to Discrete Analysis and we accept it. The next stage is for you to submit a revision to arXiv, taking account of the referee’s comments. Once that’s done, we make sure we have an editorial introduction and appropriate metadata in place, and publish it. But what if at some later date you suddenly realize that there is a shorter and more informative proof of Lemma 2.3? With the conventional publishing system, that’s basically just too bad: you’re stuck with the accepted version.

In a way that’s true for us too. The version that’s accepted becomes what people like to call the version of record, so that when people refer to your paper there won’t be any confusion about what exactly they are referring to. (This is important of course, though in my view the legacy publishers massively exaggerate its importance.) However, being an arXiv overlay journal allows us to reach a much more satisfactory compromise between having a fixed version of record and allowing updates. If you follow the link from the journal webpage to the article and the article has subsequently been updated, the arXiv page you link to will inform you that the version you are looking at is not the latest one. So without our having to do anything, since it happens automatically with the arXiv, readers get the best of both worlds. As an example, here is the arXiv page for a version of a preprint by Bourgain and Demeter (not submitted to Discrete Analysis). As you’ll see, the information that it is not the latest version is clearly highlighted in red.

Another feature of Discrete Analysis, but this one it shares with other purely electronic journals, is that we are not artificially constrained by the need to fill a certain number of pages per year. So you will not hear from us that we receive many more good articles than we can accept, or that your article, though excellent, is too long — we just have a standard we are aiming for and will accept all articles that we judge to reach it.

So if you have a good paper that could conceivably be within our scope, then why not submit it to us? Your paper will have some very good company (just look at the website if you don’t believe me). It will be properly promoted on a website that embraces what the internet has to offer rather than merely being a pale shadow of a paper journal. And you will be helping, in a small way, to bring about a change to the absurdly expensive and anachronistic system of academic publishing that we still have to put up with.

]]>

One question that has arisen is whether FUNC holds if the ground set is the cyclic group and is rotationally invariant. This was prompted by Alec Edgington’s example showing that we cannot always find and an injection from to that maps each set to a superset. Tom Eccles suggested a heuristic argument that if is generated by all intervals of length , then it should satisfy FUNC. I agree that this is almost certainly true, but I think nobody has yet given a rigorous proof. I don’t think it should be too hard.

One can ask similar questions about ground sets with other symmetry groups.

A nice question that I came across on Mathoverflow is whether the intersection version of FUNC is true if consists of all subgroups of a finite group . The answers to the question came very close to solving it, with suggestions about how to finish things off, but the fact that the question was non-trivial was quite a surprise to me.

In response to Alec’s counterexample, Gil Kalai asked a yet weaker question, which is whether one can find and an injection from to that increases the size of each set. It is easy to see that this is equivalent to asking that the number of sets of size at least in is always at least the number of sets of size at least in . One aspect of this question that may make it a good one is that it permits one to look at what happens for particular values of , such as (where is the size of the ground set), and also to attempt induction on . So far this conjecture still seems to be alive.

Another question that is I think alive still is a “ternary version” of FUNC. I put forward a conjecture that had a very simple counterexample, and my attempt to deal with that counterexample led to the following question. Let be a collection of functions from a finite set to and suppose that it is closed under the following slightly strange multivalued operation. Given two functions , let be the set where and . (Thus, there are potentially nine sets . We now take the set of all functions that are constant on each , and either lie between and or lie between and . For example, if and then we get , , , , , and and themselves.

This definition generalizes to alphabets of all sizes. In particular, when the alphabet has size 1, it reduces to the normal FUNC, since the only functions between and that are constant on the sets where and are constant are and themselves. The conjecture is then that there exists such that (where the functions take values in ) for at least of the functions in . If true, this conjecture is best possible, since we can take to consist of all functions from to .

The reason for the somewhat complicated closure operation is that, as Thomas pointed out, one has to rule out systems of functions such as all functions that either take all values in or are the constant function that takes the value 2 everywhere. This set is closed under taking pointwise maxima, but we cannot say anything interesting about how often functions take the largest value. The closure property above stops it being possible to “jump” from small functions to a large one. I don’t think anyone has thought much about this conjecture, so it may still have a simple counterexample.

Another conjecture I put forward also had to be significantly revised after a critical mauling, but this time not because it was false (that still seems to be an open question) but because it was equivalent to a simpler question that was less interesting than I had hoped my original question might be.

I began by noting that if we think of sets in has having weight 1 and sets not in as having weight 0, then the union-closed condition is that . We had already noted problems with adopting this as a more general conjecture, but when weights are 0 or 1, then . So I wondered whether the condition would be worth considering. The conjecture would then be that there exists such that , where we sum over all subsets of the ground set . I had hoped that this question might be amenable to a variational approach.

Alec Edgington delivered two blows to this proposal, which were basically two consequences of the same observation. The observation, which I had spotted without properly appreciating its significance, was that if have non-zero weight, then , and therefore . One consequence of this is that a weighting with is usually not close to a weighting with . Indeed, suppose we can find with and . Then the moment becomes non-zero, is forced to jump from to at least .

A second consequence is that talking about geometric means is a red herring, since that condition implies, and is implied by, the simpler condition that the family of sets with non-zero weight is union closed, and whenever with .

However, this still leaves us with a strengthening of FUNC. Moreover, it is a genuine strengthening, since there are union-closed families where it is not optimal to give all the sets the same weight.

Incidentally, as was pointed out in some of the comments, and also in this recent paper, it is natural to rephrase this kind of problem so that it looks more like a standard optimization problem. Here we would like to maximize subject to the constraints that whenever and for every in the ground set. If we can achieve a maximum greater than 2, then weighted FUNC is false. If we can achieve it with constant weights, then FUNC is false.

However, this is not a linear relaxation of FUNC, since for the weighted version we have to choose the union-closed family before thinking about the optimum weights. The best that might come out of this line of enquiry (as far as I can see) is a situation like the following.

- Weighted FUNC is true.
- We manage to understand very well how the optimum weights depend on the union-closed family .
- With the help of that understanding, we arrive at a statement that is easier to prove than FUNC.

That seems pretty optimistic, but it also seems sufficiently non-ridiculous to be worth investigating. And indeed, quite a bit of investigation has already taken place in the comments on the previous post. In particular, weighted FUNC has been tested on a number of families, and so far no counterexample has emerged.

A quick remark that may have been made already is that if is a group of permutations of the ground set that give rise to automorphisms of , then we can choose the optimal weights to be -invariant. Indeed, if is an optimal weight, then the weight satisfies the same constraints as and . However, the optimal weight is not in general unique, and sometimes there are non--invariant weights that are also optimal.

I wonder whether it is time to think a bit about strategy. It seems to me that the (very interesting) discussion so far has had a “preliminary” flavour: we have made a lot of suggestions, come up with several variants, some of which are false and some of which may be true, and generally improved our intuitions about the problem. Should we continue like that for a while, or are there promising proof strategies that we should consider pursuing in more detail? As ever, there is a balance to be struck: it is usually a good idea to avoid doing hard work until the chances of a payoff are sufficiently high, but sometimes avoiding hard work means that one misses discoveries that could be extremely helpful. So what I’m asking is whether there are any proposals that would involve hard work.

One that I have in the back of my mind is connected with things that Tom Eccles has said. It seems to me at least possible that FUNC could be proved by induction, if only one could come up with a suitably convoluted inductive hypothesis. But how does one do that? One method is a kind of iterative process: you try a hypothesis and discover that it is not strong enough to imply the next case, so you then search for a strengthening, which perhaps implies the original statement but is not implied by smaller versions of itself, so a yet further strengthening is called for, and so on. This process can be quite hard work, but I wonder whether if we all focused on it at once we could make it more painless. But this is just one suggestion, and there may well be better ones.

]]>

After the failure of the average-overlap-density conjecture, I came up with a more refined conjecture along similar lines that has one or two nice properties and has not yet been shown to be false.

The basic aim is the same: to take a union-closed family and use it to construct a probability measure on the ground set in such a way that the average abundance with respect to that measure is at least 1/2. With the failed conjecture the method was very basic: pick a random non-empty set and then a random element .

The trouble with picking random elements is that it gives rise to a distribution that does not behave well when you duplicate elements. (What you would want is that the probability is shared out amongst the duplicates, but in actual fact if you duplicate an element lots of times it gives an advantage to the set of duplicates that the original element did not have.) This is not just an aesthetic concern: it was at the heart of the downfall of the conjecture. What one really wants, and this is a point that Tobias Fritz has been emphasizing, is to avoid talking about the ground set altogether, something one can do by formulating the conjecture in terms of lattices, though I’m not sure what I’m about to describe does make sense for lattices.

Let be a union-closed set system with ground set . Define a *chain* to be a collection of subsets of with the following properties.

- The inclusions are strict.
- Each is an intersection of sets in .
- is non-empty, but for every , either or .

The idea is to choose a random chain and then a random element of . That last step is harmless because the elements of are indistinguishable from the point of view of (they are all contained in the same sets). So this construction behaves itself when you duplicate elements.

What exactly is a random chain? What I suggested before was to run an algorithm like this. You start with . Having got to , let consist of all sets such that is neither empty nor , pick a random set , and let . But that is not the only possibility. Another would be to define a chain to be *maximal* if for every there is no set such that lies strictly between and , and then to pick a maximal chain uniformly at random.

At the moment I think that the first idea is more natural and therefore more likely to work. (But “more likely” does not imply “likely”.) The fact that it seems hard to disprove is not a good reason for optimism, since the definition is sufficiently complicated that it is hard to analyse. Perhaps there is a simple example for which the conjecture fails by miles, but for which it is very hard to prove that it fails by miles (other than by checking it on a computer if the example is small enough).

Another possible idea is this. Start a random walk at . The walk takes place on the set of subsets of that are non-empty intersections of sets in . Call this set system . Then join to in if is a proper subset of and there is no that lies properly between and . To be clear, I’m defining an *un*directed graph here, so if is joined to , then is joined to .

Now we do a random walk on this graph by picking a random neighbour at each stage, and we take its stationary distribution. One could then condition this distribution on the set you are at being a minimal element of . This gives a distribution on the minimal elements, and then the claim would be that on average a minimal element is contained in at least half the sets in .

I’ll finish this section with the obvious question.

**Question.** *Does an averaging argument with a probability distribution like one of these have the slightest chance of working? If so, how would one go about proving it?*

Tobias Fritz has shared with us a very nice observation that gives another way of looking at union-closed families. It is sufficiently natural that I feel there is a good chance that it will be genuinely helpful, and not just a slightly different perspective on all the same statements.

Let be a finite set, let and let be a non-empty subset of . Write as shorthand for the condition

.

If , then we can write this as a *Horn clause*

.

If is a collection of conditions of this kind, then we can define a set system to consist of all sets that satisfy all of them. That is, for each , if , then .

It is very easy to check that any set system defined this way is union closed and contains the empty set. Conversely, given a union-closed family that includes the empty set, let be a subset of that does not belong to . If for every we can find such that , then we have a contradiction, since the union of these belongs to and is equal to . So there must be some such that for every , if , then . That is, there is a condition that is satisfied by every and is not satisfied by . Taking all such conditions, we have a collection of conditions that gives rise to precisely the set system .

As Thomas says, this is strongly reminiscent of describing a convex body not as a set of points but as an intersection of half spaces. Since that dual approach is often extremely useful, it seems very much worth bearing in mind when thinking about FUNC. At the very least, it gives us a concise way of describing some union-closed families that would be complicated to define in a more element-listing way: Tobias used it to describe one of Thomas Bloom’s examples quite concisely, for instance.

Suppose we have a Horn-clause description of a union-closed family . For each , it gives us a collection of conditions that must satisfy, each of the form . Putting all these together gives us a single condition in conjunctive normal form. This single condition is a monotone property of , and any monotone property can arise in this way. So if we want, we can forget about Horn clauses and instead think of an arbitrary union-closed family as being defined as follows. For each , there is some monotone property , and then consists of all sets such that for every , the property holds.

To illustrate this with an example (not one that has any chance of being a counterexample to FUNC — just an example of the kind of thing one can do), we could take (the integers mod a prime ) and take to be the property “contains a subset of the form “. Note that this is a very concise definition, but the resulting criterion for a set to belong to is not simple at all. (If you think it is, then can you exhibit for me a non-empty set of density less than 1/2 that satisfies the condition when , or prove that no such set exists? *Update: I’ve now realized that this question has a fairly easy answer — given in a comment below. But describing the sets that satisfy the condition would not be simple.*)

This way of looking at union-closed families also generates many special cases of FUNC that could be interesting to tackle. For example, we can take the ground set to be some structure (above, I took a cyclic group, but one could also take, for instance, the complete graph on a set of vertices) and restrict attention to properties that are natural within that structure (where “natural” could mean something like invariant under symmetries of the structure that fix ).

Another special case that is very natural to think about is where each property is a single disjunction — that is, the Horn-clause formulation in the special case where each is on the left of exactly one Horn clause. Is FUNC true in this case? Or might this case be a good place to search for a counterexample? At the time of writing, I have no intuition at all about this question, so even heuristic arguments would be interesting.

As discussed in the last post, we already know that an optimistic conjecture of Tobias Fritz, that there is always some and a union-preserving injection from to , is false. Gil Kalai proposed a conjecture in a similar spirit: that there is always an injection from to such that each set in is a subset of its image. So far, nobody (or at least nobody here) has disproved this. I tried to check whether the counterexamples to Tobias’s conjecture worked here too, and I’m fairly sure the complement-of-Steiner-system approach doesn’t work.

While the general belief seems to be (at least if we believe Jeff Kahn) that such strengthenings are false, it would be very good to confirm this. Of course it would be even better to prove the strengthening â€¦

*Update: Alec Edgington has now found a counterexample.*

In this comment Tom Eccles asked a question motivated by thinking about what an inductive proof of FUNC could possibly look like. The question ought to be simpler than FUNC, and asks the following. Does there exist a union-closed family and an element with the following three properties?

- has abundance less than 1/2.
- No element has abundance greater than or equal to 1/2 in both and .
- Both and contain at least one non-empty set.

It would be very nice to have such an example, because it would make an excellent test case for proposed inductive approaches.

There’s probably plenty more I could extract from the comment thread in the last post, but I think it’s time to post this, since the number of comments has exceeded 100.

While I’m saying that, let me add a general remark that if anyone thinks that a direction of discussion is being wrongly neglected, then please feel free to highlight it, even (or perhaps especially) if it is a direction that you yourself introduced. These posts are based on what happens to have caught my attention, but should not be interpreted as a careful judgment of what is interesting and what is not. I hope that everything I include is interesting, but the converse is completely false.

]]>

If is a union-closed family on a ground set , and , then we can take the family . The map is a homomorphism (in the sense that , so it makes sense to regard as a quotient of .

If instead we take an equivalence relation on , we can define a set-system to be the set of all unions of equivalence classes that belong to .

Thus, subsets of give quotient families and quotient sets of give subfamilies.

Possibly the most obvious product construction of two families and is to make their ground sets disjoint and then to take . (This is the special case with disjoint ground sets of the construction that Tom Eccles discussed earlier.)

Note that we could define this product slightly differently by saying that it consists of all pairs with the “union” operation . This gives an algebraic system called a join semilattice, and it is isomorphic in an obvious sense to with ordinary unions. Looked at this way, it is not so obvious how one should define abundances, because does not have a ground set. Of course, we can define them via the isomorphism to but it would be nice to do so more intrinsically.

Tobias Fritz, in this comment, defines a more general “fibre bundle” construction as follows. Let be a union-closed family of sets (the “base” of the system). For each let be a union-closed family (one of the “fibres”), and let the elements of consist of pairs with . We would like to define a join operation on by

for a suitable . For that we need a bit more structure, in the form of homomorphisms whenever . These should satisfy the obvious composition rule .

With that structure in place, we can take to be , and we have something like a union-closed system. To turn it into a union-closed system one needs to find a concrete realization of this “join semilattice” as a set system with the union operation. This can be done in certain cases (see the comment thread linked to above) and quite possibly in all cases.

First, here is a simple construction that shows that Conjecture 6 from the previous post is false. That conjecture states that if you choose a random non-empty and then a random , then the average abundance of is at least 1/2. It never seemed likely to be true, but it survived for a surprisingly long time, before the following example was discovered in a comment thread that starts here.

Let be a large integer and let be disjoint sets of size and . (Many details here are unimportant — for example, all that actually matters is that the sizes of the sets should increase fairly rapidly.) Now take the set system

.

To see that this is a counterexample, let us pick our random element of a random set, and then condition on the five possibilities for what that set is. I’ll do a couple of the calculations and then just state the rest. If , then its abundance is 2/3. If it is in , then its abundance is 1/2. If it is in , then the probability that it is in is , which is very small, so its abundance is very close to 1/2 (since with high probability the only three sets it belongs to are , and ). In this kind of way we get that for large enough we can make the average abundance as close as we like to

.

One thing I would like to do — or would like someone to do — is come up with a refinement of this conjecture that isn’t so obviously false. What this example demonstrates is that duplication shows that for the conjecture to have been true, the following apparently much stronger statement would have had to be true. For each non-empty , let be the minimum abundance of any element of . Then the average of over is at least 1/2.

How can we convert the average over into the minimum over ? The answer is simple: take the original set system and write the elements of the ground set in decreasing order of abundance. Now duplicate the first element (that is, the element with greatest abundance) once, the second element times, the third times, and so on. For very large , the effect of this is that if we choose a random element of (after the duplications have taken place) then it will have minimal abundance in .

So it seems that duplication of elements kills off this averaging argument too, but in a slightly subtler way. Could we somehow iterate this thought? For example, could we choose a random by first picking a random non-empty , then a random such that , and finally a random element ? And could we go further — e.g., picking a random chain of the form , etc., and stopping when we reach a set whose points cannot be separated further?

Tobias Fritz came up with a nice strengthening that again turned out (again as expected) to be false. The thought was that it might be nice to find a “bijective” proof of FUNC. Defining to be and to be , we would prove FUNC for if we could find an injection from to .

For such an argument to qualify as a proper bijective proof, it is not enough merely to establish the existence of an injection — that follows from FUNC on mere grounds of cardinality. Rather, one should define it in a nice way somehow. That makes it natural to think about what properties such an injection might have, and a particularly natural requirement that one might think about is that it should preserve unions.

It turns out that there are set systems for which there does not exist any with a union-preserving injection from to . After several failed attempts, I found the following example. Take a not too small pair of positive integers — it looks as though works. Then take a Steiner -system — that is, a collection of sets of size 5 such that each set of size 3 is contained in exactly one set from . (Work of Peter Keevash guarantees that such a set system exists, though this case was known before his amazing result.)

The counterexample is generated by all complements of sets in , though it is more convenient just to take and prove that there is no intersection-preserving injection from to . To establish this, one first proves that any such injection would have to take sets of size to sets of size , which is basically because you need room for all the subsets of size of a set to map to distinct subsets of the image of . Once that is established, it is fairly straightforward to show that there just isn’t room to do things. The argument can be found in the comment linked to above, and the thread below it.

Thomas Bloom came up with a simpler example, which is interesting for other reasons too. His example is generated by the sets , all -subsets of , and the 6 sets , , , , , . I asked him where this set system had come from, and the answer turned out to be very interesting. He had got it by staring at an example of Renaud and Sarvate of a union-closed set system with exactly one minimal-sized set, which has size 3, such that that minimal set contains no element of abundance at least 1/2. Thomas worked out how the Renaud-Servate example had been pieced together, and used similar ideas to produce his example. Tobias Fritz then went on to show that Thomas’s construction was a special case of his fibre-bundle construction.

This post is by no means a comprehensive account of all the potentially interesting ideas from the last post. For example, Gil Kalai has an interesting slant on the conjecture that I think should be pursued further, and there are a number of interesting questions that were asked in the previous comment thread that I have not repeated here, mainly because the post has taken a long time to write and I think it is time to post it.

]]>

Something I like to think about with Polymath projects is the following question: if we end up *not* solving the problem, then what can we hope to achieve? The Erdős discrepancy problem project is a good example here. An obvious answer is that we can hope that enough people have been stimulated in enough ways that the probability of somebody solving the problem in the not too distant future increases (for example because we have identified more clearly the gap in our understanding). But I was thinking of something a little more concrete than that: I would like at the very least for this project to leave behind it an online resource that will be essential reading for anybody who wants to attack the problem in future. The blog comments themselves may achieve this to some extent, but it is not practical to wade through hundreds of comments in search of ideas that may or may not be useful. With past projects, we have developed Wiki pages where we have tried to organize the ideas we have had into a more browsable form. One thing we didn’t do with EDP, which in retrospect I think we should have, is have an official “closing” of the project marked by the writing of a formal article that included what we judged to be the main ideas we had had, with complete proofs when we had them. An advantage of doing that is that if somebody later solves the problem, it is more convenient to be able to refer to an article (or preprint) than to a combination of blog comments and Wiki pages.

With an eye to this, I thought I would make FUNC1 a data-gathering exercise of the following slightly unusual kind. For somebody working on the problem in the future, it would be very useful, I would have thought, to have a list of natural strengthenings of the conjecture, together with a list of “troublesome” examples. One could then produce a table with strengthenings down the side and examples along the top, with a tick in the table entry if the example disproves the strengthening, a cross if it doesn’t, and a question mark if we don’t yet know whether it does.

A first step towards drawing up such a table is of course to come up with a good supply of strengthenings and examples, and that is what I want to do in this post. I am mainly selecting them from the comments on the previous post. I shall present the strengthenings as statements rather than questions, so they are not necessarily true.

Let be a function from the power set of a finite set to the non-negative reals. Suppose that the weights satisfy the condition for every and that at least one non-empty set has positive weight. Then there exists such that the sum of the weights of the sets containing is at least half the sum of all the weights.

~~Note that if all weights take values 0 or 1, then this becomes the original conjecture. It is possible that the above statement ~~*follows* from the original conjecture, but we do not know this (though it may be known).

This is not a good question after all, as the deleted statement above is false. When is 01-valued, the statement reduces to saying that for every up-set there is an element in at least half the sets, which is trivial: all the elements are in at least half the sets. Thanks to Tobias Fritz for pointing this out.

Let be a function from the power set of a finite set to the non-negative reals. Suppose that the weights satisfy the condition for every and that at least one non-empty set has positive weight. Then there exists such that the sum of the weights of the sets containing is at least half the sum of all the weights.

Again, if all weights take values 0 or 1, then the collection of sets of weight 1 is union closed and we obtain the original conjecture. It was suggested in this comment that one might perhaps be able to attack this strengthening using tropical geometry, since the operations it uses are addition and taking the minimum.

Tom Eccles suggests (in this comment) a generalization that concerns two set systems rather than one. Given set systems and , write for the union set . A family is union closed if and only if . What can we say if and are set systems with small? There are various conjectures one can make, of which one of the cleanest is the following: if and are of size and is of size at most , then there exists such that , where denotes the set of sets in that contain . This obviously implies FUNC.

Simple examples show that can be much smaller than either or — for instance, it can consist of just one set. But in those examples there always seems to be an element contained in many more sets. So it would be interesting to find a good conjecture by choosing an appropriate function to insert into the following statement: if , , and , then there exists such that .

Let be a union-closed family of subsets of a finite set . Then the average size of is at least .

This is false, as the example shows for any .

Let be a union-closed family of subsets of a finite set and suppose that *separates points*, meaning that if , then at least one set in contains exactly one of and . (Equivalently, the sets are all distinct.) Then the average size of is at least .

This again is false: see Example 2 below.

In this comment I had a rather amusing (and typically Polymathematical) experience of formulating a conjecture that I thought was obviously false in order to think about how it might be refined, and then discovering that I couldn’t disprove it (despite temporarily thinking I had a counterexample). So here it is.

As I have just noted (and also commented in the first post), very simple examples show that if we define the “abundance” of an element to be , then the average abundance does not have to be at least . However, that still leaves open the possibility that some kind of naturally defined *weighted* average might do the job. Since we want to define the weighting in terms of and to favour elements that are contained in lots of sets, a rather crude idea is to pick a random non-empty set and then a random element , and make that the probability distribution on that we use for calculating the average abundance.

A short calculation reveals that the average abundance with this probability distribution is equal to the *average overlap density*, which we define to be

where the averages are over . So one is led to the following conjecture, which implies FUNC: if is a union-closed family of sets, at least one of which is non-empty, then its average overlap density is at least 1/2.

A not wholly pleasant feature of this conjecture is that the average overlap density is very far from being isomorphism invariant. (That is, if you duplicate elements of , the average overlap density changes.) Initially, I thought this would make it easy to find counterexamples, but that seems not to be the case. It also means that one can give some thought to how to put a measure on that makes the average overlap density as small as possible. Perhaps if the conjecture is true, this “worst case” would be easier to analyse. (It’s not actually clear that there is a worst case — it may be that one wants to use a measure on that gives measure zero to some non-empty set , at which point the definition of average overlap density breaks down. So one might have to look at the “near worst” case.)

This conjecture comes from a comment by Igor Balla. Let be a union-closed family and let . Define a new family by replacing each by if and leaving it alone if . Repeat this process for every and the result is an *up-set* , that is, a set-system such that and implies that .

Note that each time we perform the “add if you can” operation, we are applying a bijection to the current set system, so we can compose all these bijections to obtain a bijection from to .

Suppose now that are distinct sets. It can be shown that there is no set such that and . In other words, is never a subset of .

Now the fact that is an up-set means that each element is in at least half the sets (since if then ). Moreover, it seems hard for too many sets in to be “far” from their images , since then there is a strong danger that we will be able to find a pair of sets and with .

This leads to the conjecture that Balla makes. He is not at all confident that it is true, but has checked that there are no small counterexamples.

**Conjecture.** Let be a set system such that there exist an up-set and a bijection with the following properties.

- For each , .
- For no distinct do we have .

Then there is an element that belongs to at least half the sets in .

The following comment by Gil Kalai is worth quoting: “Years ago I remember that Jeff Kahn said that he bet he will find a counterexample to every meaningful strengthening of Franklâ€™s conjecture. And indeed he shot down many of those and a few I proposed, including weighted versions. I have to look in my old emails to see if this one too.” So it seems that even to find a conjecture that genuinely strenghtens FUNC without being obviously false (at least to Jeff Kahn) would be some sort of achievement. (Apparently the final conjecture above passes the Jeff-Kahn test in the following weak sense: he believes it to be false but has not managed to find a counterexample.)

If is a finite set and is the power set of , then every element of has abundance 1/2. (Remark 1: I am using the word “abundance” for the *proportion* of sets in that contain the element in question. Remark 2: for what it’s worth, the above statement is meaningful and true even if is empty.)

Obviously this is not a counterexample to FUNC, but it was in fact a counterexample to an over-optimistic conjecture I very briefly made and then abandoned while writing it into a comment.

This example was mentioned by Alec Edgington. Let be a finite set and let be an element that does not belong to . Now let consist of together with all sets of the form such that .

If , then has abundance , while each has abundance . Therefore, only one point has abundance that is not less than 1/2.

A slightly different example, also used by Alec Edgington, is to take all subsets of together with the set . If , then the abundance of any element of is while the abundance of is . Therefore, the average abundance is

When is large, the amount by which exceeds 1/2 is exponentially small, from which it follows easily that this average is less than 1/2. In fact, it starts to be less than 1/2 when (which is the case Alec mentioned). This shows that Conjecture 5 above (that the average abundance must be at least 1/2 if the system separates points) is false.

Let be a positive integer and take the set system that consists of the sets and . This is a simple example (or rather class of examples) of a set system for which although there is certainly an element with abundance at least 1/2 (the element has abundance 2/3), the *average* abundance is close to 1/3. Very simple variants of this example can give average abundances that are arbitrarily small — just take a few small sets and one absolutely huge set.

I will not explain these in detail, but just point you to an interesting comment by Uwe Stroinski that suggests a number-theoretic way of constructing union-closed families.

I will continue with methods of building union-closed families out of other union-closed families.

I’ll define this process formally first. Let be a set of size and let be a collection of subsets of . Now let be a collection of disjoint non-empty sets and define to be the collection of all sets of the form for some . If is union closed, then so is .

One can think of as “duplicating” the element of times. A simple example of this process is to take the set system and let and . This gives the set system 3 above.

Let us say that if for some suitable set-valued function . And let us say that two set systems are *isomorphic* if they are in the same equivalence class of the symmetric-transitive closure of the relation . Equivalently, they are isomorphic if we can find and such that .

The effect of duplication is basically that we can convert the uniform measure on the ground set into any other probability measure (at least to an arbitrary approximation). What I mean by that is that the uniform measure on the ground set of , which is of course , gives you a probability of of landing in , so has the same effect as assigning that probability to and sticking with the set system . (So the precise statement is that we can get any probability measure where all the probabilities are rational.)

If one is looking for an averaging argument, then it would seem that a nice property that such an argument might have is (as I have already commented above) that the average should be with respect to a probability measure on that is constructed from in an isomorphism-invariant way.

It is common in the literature to outlaw duplication by insisting that separates points. However, it may be genuinely useful to consider different measures on the ground set.

Tom Eccles, in his off-diagonal conjecture, considered the set system, which he denoted by , that is defined to be . This might more properly be denoted , by analogy with the notation for sumsets, but obviously one can’t write it like that because that notation already stands for something else, so I’ll stick with Tom’s notation.

It’s trivial to see that if and are union closed, then so is . Moreover, sometimes it does quite natural things: for instance, if and are any two sets, then , where is the power-set operation.

Another remark is that if and are disjoint, and and , then the abundance of in is equal to the abundance of in .

I got this from a comment by Thomas Bloom. Let and be disjoint finite sets and let and be two union-closed families living inside and , respectively, and assume that and . We then build a new family as follows. Let be some function from to . Then take all sets of one of the following four forms:

- sets with ;
- sets with ;
- sets with ;
- sets with .

It can be checked quite easily (there are six cases to consider, all straightforward) that the resulting family is union closed.

Thomas Bloom remarks that if consists of all subsets of and consists of all subsets of , then (for suitable ) the result is a union-closed family that contains no set of size less than 3, and also contains a set of size 3 with no element of abundance greater than or equal to 1/2. This is interesting because a simple argument shows that if is a set with two elements in a union-closed family then at least one of its elements has abundance at least 1/2.

Thus, this construction method can be used to create interesting union-closed families out of boring ones.

Thomas discusses what happens to abundances when you do this construction, and the rough answer is that elements of become less abundant but elements of become quite a lot more abundant. So one can’t just perform this construction a few times and end up with a counterexample to FUNC. However, as Thomas also says, there is plenty of scope for modifying this basic idea, and maybe good things could flow from that.

I feel as though there is much more I could say, but this post has got quite long, and has taken me quite a long time to write, so I think it is better if I just post it. If there are things I wish I had mentioned, I’ll put them in comments and possibly repeat them in my next post.

I’ll close by remarking that I have created a wiki page. At the time of writing it has almost nothing on it but I hope that will change before too long.

]]>

A less serious problem is what acronym one would use for the project. For the density Hales-Jewett problem we went for DHJ, and for the Erdős discrepancy problem we used EDP. That general approach runs into difficulties with Frankl’s union-closed conjecture, so I suggest FUNC. This post, if the project were to go ahead, could be FUNC0; in general I like the idea that we would be engaged in a funky line of research.

The problem, for anyone who doesn’t know, is this. Suppose you have a family that consists of distinct subsets of a set . Suppose also that it is *union closed*, meaning that if , then as well. Must there be an element of that belongs to at least of the sets? This seems like the sort of question that ought to have an easy answer one way or the other, but it has turned out to be surprisingly difficult.

If you are potentially interested, then one good thing to do by way of preparation is look at this survey article by Henning Bruhn and Oliver Schaudt. It is very nicely written and seems to be a pretty comprehensive account of the current state of knowledge about the problem. It includes some quite interesting reformulations (interesting because you don’t just look at them and see that they are trivially equivalent to the original problem).

For the remainder of this post, I want to discuss a couple of failures. The first is a natural idea for generalizing the problem to make it easier that completely fails, at least initially, but can perhaps be rescued, and the second is a failed attempt to produce a counterexample. I’ll present these just in case one or other of them stimulates a useful idea in somebody else.

An immediate reaction of any probabilistic combinatorialist is likely to be to wonder whether in order to prove that there *exists* a point in at least half the sets it might be easier to show that in fact an *average* point belongs to half the sets.

Unfortunately, it is very easy to see that that is false: consider, for example, the three sets , , and . The average (over ) of the number of sets containing a random element is , but there are three sets.

However, this example doesn't feel like a genuine counterexample somehow, because the set system is just a dressed up version of : we replace the singleton by the set and that's it. So for this set system it seems more natural to consider a *weighted* average, or equivalently to take not the uniform distribution on , but some other distribution that reflects more naturally the properties of the set system at hand. For example, we could give a probability 1/2 to the element 1 and to each of the remaining 12 elements of the set. If we do that, then the average number of sets containing a random element will be the same as it is for the example with the uniform distribution (not that the uniform distribution is obviously the most natural distribution for that example).

This suggests a very slightly more sophisticated version of the averaging-argument idea: does there *exist* a probability distribution on the elements of the ground set such that the expected number of sets containing a random element (drawn according to that probability distribution) is at least half the number of sets?

With this question we have in a sense the opposite problem. Instead of the answer being a trivial no, it is a trivial yes — if, that is, the union-closed conjecture holds. That’s because if the conjecture holds, then some belongs to at least half the sets, so we can assign probability 1 to that and probability zero to all the other elements.

Of course, this still doesn’t feel like a complete demolition of the approach. It just means that for it not to be a trivial reformulation we will have to put *conditions* on the probability distribution. There are two ways I can imagine getting the approach to work. The first is to insist on some property that the distribution is required to have that means that its existence does *not* follow easily from the conjecture. That is, the idea would be to prove a stronger statement. It seems paradoxical, but as any experienced mathematician knows, it can sometimes be easier to prove a stronger statement, because there is less room for manoeuvre. In extreme cases, once a statement has been suitably strengthened, you have so little choice about what to do that the proof becomes almost trivial.

A second idea is that there might be a nice way of defining the probability distribution in terms of the set system. This would be a situation rather like the one I discussed in my previous post, on entropy and Sidorenko’s conjecture. There, the basic idea was to prove that a set had cardinality at least by proving that there is a probability distribution on with entropy at least . At first, this seems like an unhelpful idea, because if then the uniform distribution on will trivially do the job. But it turns out that there is a different distribution for which it is easier to *prove* that it does the job, even though it usually has lower entropy than the uniform distribution. Perhaps with the union-closed conjecture something like this works too: obviously the best distribution is supported on the set of elements that are contained in a maximal number of sets from the set system, but perhaps one can construct a different distribution out of the set system that gives a smaller average in general but about which it is easier to prove things.

I have no doubt that thoughts of the above kind have occurred to a high percentage of people who have thought about the union-closed conjecture, and can probably be found in the literature as well, but it would be odd not to mention them in this post.

To finish this section, here is a wild guess at a distribution that does the job. Like almost all wild guesses, its chances of being correct are very close to zero, but it gives the flavour of the kind of thing one might hope for.

Given a finite set and a collection of subsets of , we can pick a random set (uniformly from ) and look at the events for each . In general, these events are correlated.

Now let us define a matrix by . We could now try to find a probability distribution on that minimizes the sum . That is, in a certain sense we would be trying to make the events as uncorrelated as possible on average. (There may be much better ways of measuring this — I’m just writing down the first thing that comes into my head that I can’t immediately see is stupid.)

What does this give in the case of the three sets , and ? We have that if or or and . If and , then , since if , then is one of the two sets and , with equal probability.

So to minimize the sum we should choose so as to maximize the probability that and . If , then this probability is , which is maximized when , so in fact we get the distribution mentioned earlier. In particular, for this distribution the average number of sets containing a random point is , which is precisely half the total number of sets. (I find this slightly worrying, since for a successful proof of this kind I would expect equality to be achieved only in the case that you have disjoint sets and you take all their unions, including the empty set. But since this definition of a probability distribution isn’t supposed to be a serious candidate for a proof of the whole conjecture, I’m not too worried about being worried.)

Just to throw in another thought, perhaps some entropy-based distribution would be good. I wondered, for example, about defining a probability distribution as follows. Given any probability distribution, we obtain weights on the sets by taking to be the probability that a random element (chosen from the distribution) belongs to . We can then form a probability distribution on by taking the probabilities to be proportional to the weights. Finally, we can choose a distribution on the elements to maximize the entropy of the distribution on .

If we try that with the example above, and if is the probability assigned to the element 1, then the three weights are and , so the probabilities we will assign will be and . The entropy of this distribution will be maximized when the two non-zero probabilities are equal, which gives us , so in this case we will pick out the element 1. It isn’t completely obvious that that is a bad thing to do for this particular example — indeed, we will do it whenever there is an element that is contained in all the non-empty sets from . Again, there is virtually no chance that this rather artificial construction will work, but perhaps after a lot of thought and several modifications and refinements, something like it could be got to work.

I find the non-example I’m about to present interesting because I don’t have a good conceptual understanding of why it fails — it’s just that the numbers aren’t kind to me. But I think there *is* a proper understanding to be had. Can anyone give me a simple argument that no construction that is anything like what I tried can possibly work? (I haven’t even checked properly whether the known positive results about the problem ruled out my attempt before I even started.)

The idea was as follows. Let and be parameters to be chosen later, and let be a random set system obtained by choosing each subset of of size with probability , the choices being independent. We then take as our attempted counterexample the set of all unions of sets in .

Why might one entertain even for a second the thought that this could be a counterexample? Well, if we choose to be rather close to , but just slightly less, then a typical pair of sets of size have a union of size close to , and more generally a typical union of sets of size has size at least this. There are vastly fewer sets of size greater than than there are of size , so we could perhaps dare to hope that almost all the sets in the set system are the ones of size , so the average size is close to , which is less than . And since the sets are spread around, the elements are likely to be contained in roughly the same number of sets each, so this gives a counterexample.

Of course, the problem here is that although a typical union is large, there are many atypical unions, so we need to get rid of them somehow — or at least the vast majority of them. This is where choosing a random subset comes in. The hope is that if we choose a fairly sparse random subset, then all the unions will be large rather than merely almost all.

However, this introduces a new problem, which is that if we have passed to a *sparse* random subset, then it is no longer clear that the size of that subset is bigger than the number of possible unions. So it becomes a question of balance: can we choose small enough for the unions of those sets to be typical, but still large enough for the sets of size to dominate the set system? We’re also free to choose of course.

I usually find when I’m in a situation like this, where I’m hoping for a miracle, that a miracle doesn’t occur, and that indeed seems to be the case here. Let me explain my back-of-envelope calculation.

I’ll write for the set of unions of sets in . Let us now take and give an upper bound for the expected number of sets in of size . So fix a set of size and let us give a bound for the probability that . We know that must contain at least two sets in . But the number of pairs of sets of size contained in is at most and each such pair has a probability of being a pair of sets in , so the probability that is at most . Therefore, the expected number of sets in of size is at most .

As for the expected number of sets in , it is , so if we want the example to work, we would very much like it to be the case that when , we have the inequality

.

We can weaken this requirement by observing that the expected number of sets in of size is also trivially at most , so it is enough to go for

.

If the left-hand side is not just greater than the right-hand side, but greater by a factor of say for each , then we should be in good shape: the average size of a set in will be not much greater than and we’ll be done.

If is not much bigger than , then things look quite promising. In this case, will be comparable in size to , but will be quite small — it equals , and is small. A crude estimate says that we’ll be OK provided that is significantly smaller than . And that looks OK, since is a lot smaller than , so we aren’t being made to choose a ridiculously small value of .

If on the other hand is quite a lot larger than , then is much much smaller than , so we’re in great shape as long as we haven’t chosen so tiny that is also much much smaller than .

So what goes wrong? Well, the problem is that the first argument requires smaller and smaller values of as gets further and further away from , and the result seems to be that by the time the second regime takes over, has become too small for the trivial argument to work.

Let me try to be a bit more precise about this. The point at which becomes smaller than is of course the point at which . For that value of , we require , so we need . However, an easy calculation reveals that

,

(or observe that if you multiply both sides by , then both expressions are equal to the multinomial coefficient that counts the number of ways of writing an -element set as with and ). So unfortunately we find that however we choose the value of there is a value of such that the number of sets in of size is greater than . (I should remark that the estimate for the number of sets in of size can be improved to , but this does not make enough of a difference to rescue the argument.)

So unfortunately it turns out that the middle of the range is worse than the two ends, and indeed worse by enough to kill off the idea. However, it seemed to me to be good to make at least some attempt to find a counterexample in order to understand the problem better.

From here there are two obvious ways to go. One is to try to modify the above idea to give it a better chance of working. The other, which I have already mentioned, is to try to generalize the failure: that is, to explain why that example, and many others like it, had no hope of working. Alternatively, somebody could propose a completely different line of enquiry.

I’ll stop there. Experience with Polymath projects so far seems to suggest that, as with individual projects, it is hard to predict how long they will continue before there is a general feeling of being stuck. So I’m thinking of this as a slightly tentative suggestion, and if it provokes a sufficiently healthy conversation and interesting new (or at least new to me) ideas, then I’ll write another post and launch a project more formally. In particular, only at that point will I call it Polymath11 (or should that be Polymath12? — I don’t know whether the almost instantly successful polynomial-identities project got round to assigning itself a number). Also, for various reasons I don’t want to get properly going on a Polymath project for at least a week, though I realize I may not be in complete control of what happens in response to this post.

Just before I finish, let me remark that Polymath10, attempting to prove Erdős’s sunflower conjecture, is still continuing on Gil Kalai’s blog. What’s more, I think it is still at a stage where a newcomer could catch up with what is going on — it might take a couple of hours to find and digest a few of the more important comments. But Gil and I agree that there may well be room to have more than one Polymath project going at the same time, since a common pattern is for the group of participants to shrink down to a smallish number of “enthusiasts”, and there are enough mathematicians to form many such groups.

And a quick reminder, as maybe some people reading this will be new to the concept of Polymath projects. The aim is to try to make the problem-solving process easier in various ways. One is to have an open discussion, in the form of blog posts and comments, so that anybody can participate, and with luck a process of self-selection will take place that results in a team of enthusiastic people with a good mixture of skills and knowledge. Another is to encourage people to express ideas that may well be half-baked or even wrong, or even completely *obviously* wrong. (It’s surprising how often a completely obviously wrong idea can stimulate a different idea that turns out to be very useful. Naturally, expressing such an idea can be embarrassing, but it shouldn’t be, as it is an important part of what we do when we think about problems privately.) Another is to provide a mechanism where people can get very quick feedback on their ideas — this too can be extremely stimulating and speed up the process of thought considerably. If you like the problem but don’t feel like pursuing either of the approaches I’ve outlined above, that’s of course fine — your ideas are still welcome and may well be more fruitful than those ones, which are there just to get the discussion started.

]]>

The proof is very simple. For each , let be the characteristic function of the neighbourhood of . That is, if is an edge and otherwise. Then is the sum of the degrees of the , which is the number of edges of , which is . If we set , then this tells us that . By the Cauchy-Schwarz inequality, it follows that .

But by the Cauchy-Schwarz inequality again,

That last expression is times the number of quadruples such that all of and are edges, and our previous estimate shows that it is at least . Therefore, the probability that a random such quadruple consists entirely of edges is at least , as claimed (since there are possible quadruples to choose from).

Essentially the same proof applies to a weighted bipartite graph. That is, if you have some real weights for each of the edges, and if the average weight is , then

.

(One can also generalize this statement to complex weights if one puts in appropriate conjugates.) One way of thinking of this is that the sum on the left-hand side goes down if you apply an averaging projection to — that is, you replace all the values of by their average.

Thus, an appropriate weighted count of 4-cycles is minimized over all systems of weights with a given average when all the weights are equal.

Sidorenko’s conjecture is the statement that the same is true for any bipartite graph one might care to count. Or to be more precise, it is the statement that if you apply the averaging projection to a graph (rather than a weighted graph), then the count goes down: I haven’t checked whether the conjecture is still reasonable for weighted graphs, though standard arguments show that if it is true then it will still be true for weighted graphs if the weights are between 0 and 1, and hence (by rescaling) for all non-negative weights.

Here is a more formal statement of the conjecture.

**Conjecture** *Let be a bipartite graph with vertex sets and . Let the number of edges of be . Let be a bipartite graph of density with vertex sets and and let and be random functions from to and from to . Then the probability that is an edge of for every pair such that is an edge of is at least .*

This feels like the kind of statement that is either false with a simple counterexample or true with a simple proof. But this impression is incorrect.

My interest in the problem was aroused when I taught a graduate-level course in additive combinatorics and related ideas last year, and set a question that I wrongly thought I had solved, which is Sidorenko’s conjecture in the case of a path of length 3. That is, I asked for a proof or disproof of the statement that if has density , then the number of quadruples such that and are all edges is at least . I actually thought I had a counterexample, which I didn’t, as this case of Sidorenko’s conjecture is a theorem that I shall prove later in this post.

I also tried to prove the statement using the Cauchy-Schwarz inequality, but nothing I did seemed to work, and eventually I filed it away in my mind under the heading “unexpectedly hard problem”.

At some point around then I heard about a paper of Balázs Szegedy that used entropy to prove all (then) known cases of Sidorenko’s conjecture as well as a few more. I looked at the paper, but it was hard to extract from it a short easy proof for paths of length 3, because it is concerned with proving as general a result as possible, which necessitates setting up quite a lot of abstract language. If you’re like me, what you really need in order to understand a general result is (at least in complicated cases) not the actual proof of the result, but more like a proof of a special case that is general enough that once you’ve understood that you know that you could in principle think hard about what you used in order to extract from the argument as general a result as possible.

In order to get to that point, I set the paper as a mini-seminar topic for my research student Jason Long, and everything that follows is “joint work” in the sense that he gave a nice presentation of the paper, after which we had a discussion about what was actually going on in small cases, which led to a particularly simple proof for paths of length 3, which works just as well for all trees, as well as a somewhat more complicated proof for 4-cycles. (These proofs are entirely due to Szegedy, but we stripped them of all the abstract language, the result of which, to me at any rate, is to expose what is going on and what was presumably going on in Szegedy’s mind when he proved the more general result.) Actually, this doesn’t explain the whole paper, even in the sense just described, since we did not discuss a final section in which Szegedy deals with more complicated examples that don’t follow from his first approach, but it does at least show very clearly how entropy enters the picture.

I think it is possible to identify three ideas that drive the proof for paths of length 3. They are as follows.

- Recall that the
*entropy*of a probability distribution on a finite set is . By Jensen’s inequality, this is maximized for the uniform distribution, when it takes the value . It follows that one way of proving that is to identify a probability distribution on with entropy greater than . - That may look like a mad idea at first, since if anything is going to work, the uniform distribution will. However, it is not necessarily a mad idea, because there might in principle be a non-uniform distribution with entropy that was much easier to calculate than that of the uniform distribution.
- If is a graph, then there is a probability distribution, on the set of quadruples of vertices of such that and are all edges, that is not in general uniform and that has very nice properties.

The distribution mentioned in 3 is easy to describe. You first pick an edge of uniformly at random (from all the edges of ). Note that the probability that is the first vertex of this edge is proportional to the degree of , and the probability that is the end vertex is proportional to the degree of .

Having picked the edge you then pick uniformly at random from the neighbours of , and having picked you pick uniformly at random from the neighbours of . This guarantees that is a path of length 3 (possibly with repeated vertices, as we need for the statement of Sidorenko’s conjecture).

The beautiful property of this distribution is that the edges and are identically distributed. This follows from the fact that has the same distribution as and is a random neighbour of , which means that has the same distribution as and is a random neighbour of .

Now let us turn to point 2. The only conceivable advantage of using a non-uniform distribution and an entropy argument is that if we are lucky then the entropy will be easy to calculate and will give us a good enough bound to prove what we want. The reason this stands a chance of being true is that entropy has some very nice properties. Let me briefly review those.

It will be convenient to talk about entropy of random variables rather than probability distributions: if is a random variable on a finite set , then its entropy is , where I have written as shorthand for .

We also need the notion of *conditional* entropy . This is : that is, it is the expectation of the entropy of if you are told the value of .

If you think of the entropy of as the average amount of information needed to specify the value of , then is the average amount *more* information you need to specify if you have been told the value of . A couple of extreme cases are that if and are independent, then (since the distribution of is the same as the distribution of for each ), and if is a function of , then (since is concentrated at a point for each ). In general, we also have the formula

where is the entropy of the joint random variable . Intuitively this says that to know the value of you need to know the value of and then you need to know any extra information needed to specify . It can be verified easily from the formula for entropy.

Now let us calculate the entropy of the distribution we have defined on labelled paths of length 3. Let , , and be random variables that tell us where and are. We want to calculate . By a small generalization of the formula above — also intuitive and easy to check formally — we have

Now the nice properties of our distribution come into play. Note first that if you know that , then and are independent (they are both random neighbours of ). It follows that . Similarly, given the value of , is independent of the pair , so .

Now for each let be the degree of and let . Then , and , so

,

and because the edges all have the same distribution, we get the same answer for and .

As for the entropy , it is equal to

.

Putting all this together, we get that

.

By Jensen’s inequality, the second term is minimized if for every , and in that case we obtain

.

If the average degree is , then , which gives us

The moral here is that once one uses the nice distribution and calculates its entropy, the calculations follow very easily from the standard properties of conditional entropy, and they give exactly what we need — that the entropy must be at least , from which it follows that the number of labelled paths of length 3 must be at least .

Now I’ll prove the conjecture for 4-cycles. In a way this is a perverse thing to do, since as we have already seen, one can prove this case easily using Cauchy-Schwarz. However, the argument just given generalizes very straightforwardly to trees (and therefore forests), which makes the 4-cycle the simplest case that we have not yet covered, since it is the simplest bipartite graph that contains a cycle. Also, it is quite interesting that there should be a genuinely different proof for the 4-cycles case that is still natural.

What we would like for the proof to work is a distribution on the 4-cycles (as usual we do not insist that these vertices are distinct) such that the marginal distribution of each edge is uniform amongst all edges of . A natural guess about how to do this is to pick a random edge , pick a random neighbour of , and then pick a random from the intersection of the neighbourhoods of and . But that gives the wrong distribution on . Instead, when we pick we need to choose it according to the distribution we have on paths (that is, choose a random edge and then let be a random neighbour of ) and then condition on and . Note that for fixed and that is exactly the distribution we already have on : once that is pointed out, it becomes obvious that this is the only reasonable thing to do, and that it will work.

So now let us work out the entropy of this distribution. Again let be the (far from independent) random variables that tell us where the vertices go. Then we can write down a few equations using the usual rule about conditional entropy and exploiting independence where we have it.

From the second and third equations we get that

,

and substituting this into the first gives us

As before, we have that

and

.

Therefore,

.

As for the term , we now use the trivial upper bound . If the average degree is , then

.

The remaining part is minimized when every is equal to , in which case it gives us , so we end up with a lower bound for the entropy of , exactly as required.

This kind of argument deals with a fairly large class of graphs — to get the argument to work it is necessary for the graph to be built up in a certain way, but that covers many cases of the conjecture.

]]>

First, I’ll just briefly say that things are going well with the new journal Discrete Analysis, and I think we’re on course to launch, as planned, early next year with a few very good accepted papers — we certainly have a number of papers in the pipeline that look promising to me. Of course, we’d love to have more.

Secondly, a very interesting initiative has recently been started by Martin Eve, called the Open Library of Humanities. The rough idea is that they provide a platform for humanities journals that are free to read online and free for authors (or, as some people like to say, are Diamond OA journals). Perhaps the most interesting aspect of this initiative is that it is funded by a consortium of libraries. Librarians are the people who feel the pain of ridiculous subscription prices, so they have great goodwill towards people who are trying to build new and cheaper publication models. I think there is no reason that the sciences couldn’t do something similar — in fact, it should be even easier to find money.

The OLH is actively encouraging existing humanities journals to move to their platform, which brings me to the third event I wanted to mention: the resignation of the editorial board of the Elsevier journal Lingua, which is in linguistics. The story in brief is that the editors made demands of Elsevier that were both reasonable and unreasonable: reasonable in the sense that they would be fine if we had a sane publication system, but unreasonable in the sense that it was quite obvious that Elsevier wouldn’t agree to them. They wanted to become an open access journal with publication fees of $400, way below the usual rate for an Elsevier journal. Since Elsevier owns the title, Lingua has now become its Greek counterpart Glossa — or, if you look at it Elsevier’s way, an entirely new journal has been founded called Glossa with an editorial board that has an entirely coincidental resemblance to what was until very recently the editorial board of Lingua, and it just happens also that the future editorial board of Lingua will be disjoint from what was recently the editorial board of Lingua. A nice term has been coined for what Lingua (that is, the Elsevier version) is about to become: a zombie journal. Maybe it will go the way of another famous zombie journal, Topology, the soul of which entered a new body called the Journal of Topology, and which staggered on for a couple of years before being put out of its misery. Here is an article about the Lingua story, which includes some priceless quotes from the managing editor. And here is Elsevier’s response, which is as facepalmish as usual. For example, at one point they say the following, which needs no comment from me.

Lingua is a hybrid open access journal which means that every author who wants to publish open access (i.e., free-of-charge for the reader), can do so. However, we have observed little uptake of the open access option in Lingua or elsewhere in linguistics at price points that would be economically viable.

The Open Library of Humanities will be helping to support Glossa.

Lastly, there is a story brewing at the LMS, which made the decision to close one of its journals, the LMS Journal of Computation and Mathematics, which has been going since 1998. Somebody with a paper submitted to the journal told me that he received an email saying the following.

Dear [TITLE LAST-NAME],

I am writing with news that may have a bearing on your consideration of publishing your article in the LMS JCM, â€˜[TITLE OF THE PAPER]’, by [FIRST-NAME LAST-NAME]

As you may be aware, the LMS Journal of Computation and Mathematics has been running for some years as a â€˜freeâ€™ journal and the costs of publishing the journal have been borne by the London Mathematical Society. From the outset, it was intended that the journal should progress to at least break even and, for a few years, it ran as a subscription journal but did not manage to acquire sufficient support from libraries to cover the costs of subscription management. Over the last few years, we have been considering how to best get the journal to a satisfactory and successful state and, last Friday, the LMS Council (whose members are the Officers and Trustees of the London Mathematical Society) considered the LMS Publications Committee’s proposal for the JCM, which included moving the journal to a gold open access model.

However, the LMS Council did not accept the proposal, and decided instead that the journal should be closed, one reason being that it felt the move to a gold open access model would likely lead to a slow decline that could be more damaging to its reputation. Council felt that the general area of computation and mathematics was one that the Society should, in the long run, continue to be present in, but thought that there were probably better ways to use its resources in this direction. Of course the Society will continue to make the papers already published available in perpetuity.

While we are happy to continue the process of publication of your paper, we are giving all authors yet to be published the opportunity to withdraw their papers. We will continue to publish any papers still in the pipeline providing you are willing to continue.

If you wish to withdraw your paper, please let us know and we will do this on your behalf. If you do not wish to withdraw your paper, no further action is necessary on your part.

Not too surprisingly, this has annoyed a lot of people. The following letter, with many signatures, has been sent to the LMS Council to urge them to reverse the decision.

In accordance with Statute 19 of the LMS Charter and Statutes, we, members of the LMS, make a requisition to convene a Special General Meeting of the Society; the object of the meeting shall be the reversal of the LMS Council’s decision to close down The LMS Journal of Computation and Mathematics.

The Council’s decision to close the Journal seems to conflict with the public benefit statement of the Trustees’ Annual Report. Moreover, closing The LMS Journal of Computation and Mathematics may be at odds with the charitable aims of the LMS as spelled out in its Charter. Indeed, Article 3 of the Charter says:

“The objects for which the Society is incorporated shall be: […]

(vi) To *make grants of money* or donations in aid of mathematical investigations or *the publication of mathematical works* [our emphasis] or other matters or things for the purpose of promoting invention and research in mathematical science, or its applications, or in subjects connected therewith; […]”

We trust that our requisition will be treated in line with Statute 19 of the LMS Charter and Statutes:

“19. The Council shall within twenty-eight days of the receipt of a requisition in writing of not less than twenty Members of the Society stating the objects for which the meeting is desired convene a General Meeting of the Society. If upon a requisition the Council fails to convene a Special General Meeting within twenty-eight days of a receipt of the requisition then a Special General Meeting to be held within three months of the expiration of the said period of twenty-eight days may be convened by the President or the requisitionists.”

The LMS Journal of Computation and Mathematics is an electronic journal, so very cheap to run. Perhaps the LMS feels that to run a cheap journal at a small loss sets a dangerous precedent, given that it depends so heavily on the income it gets from its journals. But some sort of line has surely been crossed when a mathematical society closes down a journal that is successful mathematically on the grounds that it is insufficiently successful economically.

]]>

The problem, as with many problems in combinatorics, is easy to state, but fascinatingly hard to solve. It is a classic extremal problem, in that it asks how big some combinatorial object needs to be before it is guaranteed to contain a subobject of some particular kind. In this case, the object is a –*uniform hypergraph*, which just means a collection of sets of size . The subobject one would like to find is a *sunflower* of size , which means a collection of sets such that we can find disjoint sets with the disjoint and with for each . I have used the letters and to stand for “head” and “petal” — is the head of the sunflower and are the petals.

How many sets of size do you need to guarantee a sunflower of size ? A simple argument gives not too bad an upper bound of . We argue as follows. Let be a collection of sets, each of size . If we can find disjoint sets in then we are done (with an empty head and the petals being the sets themselves). Otherwise, let be the union of a maximal disjoint collection of sets in , and note that has cardinality at most , and that every set in has a non-empty intersection with .

By the pigeonhole principle, we can find and a subcollection of containing at least sets, such that every set contains .

Now remove from all the sets in to create a collection of sets of size . By induction, this collection contains a sunflower of size , and putting back the element then gives us a sunflower in . (The base case, when , states that sets are enough to guarantee a sunflower of size .)

How about a lower bound? An easy one is obtained as follows. Let be disjoint sets of size and let consist of all sets that intersect each in exactly one place. There are such sets, and there cannot be a sunflower, since there is not enough room in each to make it possible to have disjoint petals.

The main question of interest is the dependence on : for fixed , does the correct bound grow exponentially (like the lower bound just given), or more like , or like something in between? Even when , the first non-trivial case, the answer is not known (though there are bounds known that improve on the simple ones I’ve just given).

For more information, as well as a discussion about how a homological approach might be useful, see Gil’s post.

At the time of writing, Gil’s post has attracted 60 comments, but it is still at what one might call a warming-up stage, so if you are interested in the problem and understand what I have written above, it should still be easy to catch up with the discussion. I strongly recommend contributing — even small remarks can be very helpful for other people, sparking off ideas that they might not have had otherwise. And there’s nothing quite like thinking about a problem, writing regular bulletins of the little ideas you’ve had, and getting feedback on them from other Polymath participants. This problem has the same kind of notoriously hard feel about it that the Erdős discrepancy problem had — it would be wonderful if a Polymath collaboration could contribute to its finally getting solved.

If you have comments specific to what I’ve written above, such as to point out typos or inaccuracies, then by all means write them here, but if you have mathematical thoughts about the problem, please write them on Gil’s blog.

]]>

This post is therefore the final post of the polymath5 project. I refer you to Terry’s posts for the mathematics. I will just make a few comments about what all this says about polymath projects in general.

After the success of the first polymath project, which found a purely combinatorial proof of the density Hales-Jewett theorem, there was an appetite to try something similar. However, the subsequent experience made it look as though the first project had been rather lucky, and not necessarily a good indication of what the polymath approach will typically achieve. I started polymath2, about a Banach-space problem, which never really got off the ground. Gil Kalai started polymath3, on the polynomial Hirsch conjecture, but the problem was not solved. Terence Tao started polymath4, about finding a deterministic algorithm to output a prime between and , which did not find such an algorithm but did prove some partial results that were interesting enough to publish in an AMS journal called Mathematics of Computation. I started polymath5, with the aim of solving the Erdős discrepancy problem (after this problem was chosen by a vote from a shortlist that I drew up), and although we had some interesting ideas, we did not solve the problem. The most obviously successful polymath project was polymath8, which aimed to bring down the size of the gap in Zhang’s prime-gaps result, but it could be argued that success for that project was guaranteed in advance: it was obvious that the gap could be reduced, and the only question was how far.

Actually, that last argument is not very convincing, since a lot more came out of polymath8 than just a tightening up of the individual steps of Zhang’s argument. But I want to concentrate on polymath5. I have always felt that that project, despite not solving the problem, was a distinct success, because by the end of it I, and I was not alone, understood the problem far better and in a very different way. So when I discussed the polymath approach with people, I described its virtues as follows: a polymath discussion tends to go at lightning speed through all the preliminary stages of solving a difficult problem — trying out ideas, reformulating, asking interesting variants of the question, finding potentially useful reductions, and so on. With some problems, once you’ve done all that, the problem is softened up and you can go on and solve it. With others, the difficulties that remain are still substantial, but at least you understand far better what they are.

In the light of what has now happened, the second case seems like a very accurate description of the polymath5 project, since Terence Tao used ideas from that project in an essential way, but also recent breakthroughs in number theory by Kaisa Matomäki and Maksim Radziwiłł that led on to work by those authors and Terry himself that led on to the averaged form of the Elliott conjecture that Terry has just proved. Thus, if the proof of the Erdős discrepancy problem in some sense requires these ideas, then there was no way we could possibly have hoped to solve the problem back in 2010, when polymath5 was running, but what we did achieve was to create a sort of penumbra around the problem, which had the effect that when these remarkable results in number theory became available, the application to the Erdős discrepancy problem was significantly easier to spot, at least for Terence Tao â€¦

I’ll remark here that the approach to the problem that excited me most when we were thinking about it was a use of duality to reduce the problem to an existential statement: you “just” have to find a function with certain properties and you are done. Unfortunately, finding such a function proved to be extremely hard. Terry’s work proves abstractly that such a function exists, but doesn’t tell us how to construct it. So I’m left feeling that perhaps I was a bit too wedded to that duality approach, though I also think that it would still be very nice if someone managed to make it work.

There are a couple of other questions that are interesting to think about. The first is whether polymath5 really did play a significant role in the discovery of the solution. Terry refers to the work of polymath5, but one of the key polymath5 steps he uses was contributed by him, so perhaps he could have just done the whole thing on his own.

At the very least I would say that polymath5 got him interested in the problem, and took him quickly through the stage I talked about above of looking at it from many different angles. Also, the Fourier reduction argument that Terry found was a sort of response to observations and speculations that had taken place in the earlier discussion, so it seems likely that in some sense polymath5 played a role in provoking Terry to have the thoughts he did. My own experience of polymath projects is that they often provoke me to have thoughts I wouldn’t have had otherwise, even if the relationship between those thoughts and what other people have written is very hard to pin down — it can be a bit like those moments where someone says A, and then you think of B, which appears to have nothing to do with A, but then you manage to reconstruct your daydreamy thought processes to see that A made you think of C, which made you think of D, which made you think of B.

Another question is what should happen to polymath projects that don’t result in a solution of the problem that they are trying to solve, but do have useful ideas. Shouldn’t there come a time when the project “closes” and the participants (and othes) are free to think about the problem individually? I feel strongly that there should, since otherwise there is a danger that a polymath project could actually delay progress on a problem by discouraging research on it. With polymath5 I tried to signal such a “closure” by writing a survey article that was partly about the work of polymath5. And Terry has now written up his work as an individual author, but been careful to say which ingredients of his proof were part of the polymath5 discussion and which were new. That seems to me to be exactly how things should work, but perhaps the lesson for the future is that the closing of a polymath project should be done more explicitly — up to now several of them have just quietly died. I had at one time intended to do rather more than what I did in the survey article, and write up, on behalf of polymath5 and published under the polymath name, a proper paper that would contain the main ideas discovered by polymath5 with full proofs. That would have been a better way of closing the project and would have led to a cleaner situation — Terry could have referred to that paper just as anyone refers to a mathematical paper. But while I regret not getting round to that, I don’t regret it too much, because I also quite like the idea that polymath5’s ideas are freely available on the internet but not in the form of a traditional journal article. (I still think that on balance it would have been better to write up the ideas though.)

Another lesson for the future is that it would be great to have some more polymath projects. We now know that Polymath5 has accelerated the solution of a famous open problem. I think we should be encouraged by this and try to do the same for several other famous open problems, but this time with the idea that as soon as the discussion stalls, the project will be declared to be finished. Gil Kalai has said on his blog that he plans to start a new project: I hope it will happen soon. And at some point when I feel slightly less busy, I would like to start one too, on another notorious problem with an elementary statement. It would be interesting to see whether a large group of people thinking together could find anything new to say about, for example, Frankl’s union-closed conjecture, or the asymptotics of Ramsey numbers, or the cap-set problem, or …

]]>

Part of the motivation for starting the journal is, of course, to challenge existing models of academic publishing and to contribute in a small way to creating an alternative and much cheaper system. However, I hope that in due course people will get used to this publication model, at which point the fact that Discrete Analysis is an arXiv overlay journal will no longer seem interesting or novel, and the main interest in the journal will be the mathematics it contains.

The members of the editorial board so far — but we may well add further people in the near future — are Ernie Croot, me, Ben Green, Gil Kalai, Nets Katz, Bryna Kra, Izabella Laba, Tom Sanders, Jozsef Solymosi, Terence Tao, Julia Wolf, and Tamar Ziegler. For the time being, I will be the managing editor. I interpret this as meaning that I will have the ultimate responsibility for the smooth running of the journal, and will have to do a bit more work than the other editors, but that decisions about journal policy and about accepting or rejecting papers will be made democratically by the whole editorial board. (For example, we had quite a lot of discussion, including a vote, about the title, and the other editors have approved this blog post after suggesting a couple of minor changes.)

I will write the rest of this post as a series of questions and answers.

The members of the editorial board all have an interest in additive combinatorics, but they also have other interests that may be only loosely related to additive combinatorics. So the scope of the journal is best thought of as a cluster of related subjects that cannot easily be pinned down with a concise definition, but that can be fairly easily recognised. (Wittgenstein refers to this kind of situation as a family resemblance.) Some of the subjects we will welcome in the journal are harmonic analysis, ergodic theory, topological dynamics, growth in groups, analytic number theory, combinatorial number theory, extremal combinatorics, probabilistic combinatorics, combinatorial geometry, convexity, metric geometry, and the more mathematical side of theoretical computer science. The phrase “discrete analysis” was coined by Ben Green when he wanted a suitable name for a seminar in Cambridge: despite its oxymoronic feel, it is in fact a good description of many parts of mathematics where the structures being studied are discrete, but the tools are analytical in character. (A particularly good example is the use of discrete Fourier analysis to solve combinatorial problems in number theory.)

We do not want the journal to be a fully general mathematical journal, but we do want it to be broad. If you are in doubt about whether the subject matter of your paper is suitable, then feel free to consult an editor. We will try to err on the side of inclusiveness.

No. This journal is what some people call a *diamond* open access journal: there are no charges for readers (obviously, since the papers are on the arXiv), and no charges for authors.

The software for managing the refereeing process will be provided by Scholastica, an outfit that was set up a few years ago by some graduates from the University of Chicago with the aim of making it very easy to create electronic journals. However, the look and feel of Discrete Analysis will be independent: the people at Scholastica are extremely helpful, and one of the services they provide is a web page designed to the specifications you want, with a URL that does not contain the word “scholastica”. Scholastica does charge for this service — a whopping $10 per submission. (This should be compared with typical article processing charges of well over 100 times this from more conventional journals.) Cambridge University has kindly agreed to provide a small grant to the journal, which means that we will be able to cover the cost of the first 500 or so submissions. I am confident that by the time we have had that many submissions, we will be able to find additional funding. The absolute worst that could happen is that in a few years’ time, we will have to ask people to pay an amount roughly equal to the cost of a couple of beers to submit a paper, but it is unlikely that we will ever have to charge anything.

Whatever happens, this journal will demonstrate the following important principle: if you trust authors to do their own typesetting and copy-editing to a satisfactory standard, with the help of suggestions from referees, then the cost of running a mathematics journal can be at least two orders of magnitude lower than the cost incurred by traditional publishers. In theory, this offers a way out of the current stranglehold that the publishers have over us: if enough universities set up enough journals at these very modest costs, then we will have an alternative and much cheaper publication system up and running, and it will look more and more pointless to submit papers to the expensive journals, which will save the universities huge amounts of money. Just to drive the point home, the cost of submitting an article from the UK to the Journal of the London Mathematical Society is, if you want to use their open-access option, Â£2,310. If Discrete Analysis gets 50 submissions per year (which is more than I would expect to start with), then this single article processing charge would cover our costs for well over five years.

Furthermore, even these modest costs could have been lower. We happened to have funds that allowed us to use Scholastica’s facilities, and decided to do that, but another possibility would have been the Episciences platform, which has been specifically designed for the setting up of overlay journals, and which does not charge anything. It is still in its very early stages, but it already has two mathematics journals (which existed before and migrated to the Episciences platform), and it would be very good to see more. Another possibility that some people might find it worth considering is Open Journal Systems, though that requires a degree of technical skill that I for one do not possess, whereas setting up a journal with Scholastica has been extremely easy, and I think using the Episciences platform would be easy as well.

Could a malevolent person — let us call him or her the Evil Seer — bankrupt the journal by submitting 1000 computer-generated papers? Is it reasonable for us to be charged $10 for instantly rejecting a two-page proof of the Riemann hypothesis that uses nothing more than high-school algebra? I have taken this up with Scholastica, and they have told me that in such cases we just need to tell them and will not be charged.

Yes. As already mentioned, the articles will be peer-reviewed in the traditional way. There will also be a numbering system for the articles, so that when they are cited, they look like journal articles rather than “mere” arXiv preprints. They will be exclusive to Discrete Analysis. They will have DOIs, and the journal will have an ISSN. Whether the journal will at some point have an impact factor I do not know, but I hope that most people who consider submitting to it will in any case have a healthy contempt for impact factors. We will adhere to the “best practice” as set out in MathSciNet’s Policy on Indexing Electronic Journals, so our articles should be listed there and on Zentralblatt — we are in the process of checking whether this will definitely happen.

No. Another example is SIGMA (Symmetry, Integrability and Geometry: Methods and Applications), though as well as giving arXiv links it hosts its own copies of its articles. And another, which is a mathematically oriented computer science journal, is Logical Methods in Computer Science. I would guess that there are several others that I am unaware of. But one can at least say that Discrete Analysis is an early adopter of the arXiv overlay model.

The current plan is that people are free to submit articles immediately, via a temporary website that has been set up for the purpose. We hope that we will be able to process a few good papers quickly, which will allow us to have an official launch of the journal in early 2016 with some articles already published.

It is difficult to be precise about this, especially before we have received any submissions. However, broadly speaking, we would like to publish genuinely interesting papers in the areas described above. So if you have proved a result that you think is likely to interest the editors, then please consider Discrete Analysis for it. We would like the journal to be consistently interesting, but we do not want to set the standard so high that we do not publish anything.

It would be a pity to exclude the editors from the journal, given that their areas of research are by definition suitable for it. Our policy will be to allow editors to be authors, but to apply slightly more rigorous standards to submissions from editors. In practice, that will mean that in borderline cases a paper will be at a disadvantage if one of its authors is an editor. It goes without saying that editors will be completely excluded from the discussion of any paper that might lead to a conflict of interest. Scholastica’s software makes it very easy to do this.

We have not (yet) discussed the question of whether I as *managing* editor should be allowed to submit to the journal, but I shall probably follow the policy of many reputable journals and avoid doing so (albeit with some regret) and send any papers that would have been suitable to other journals with publication models that I want to support.

An obvious partial answer to this question is that the list of links on our journal website will be a list of certificates that certain arXiv preprints have been peer reviewed and judged to be of a suitable standard for Discrete Analysis. Thus, it will provide information that the arXiv alone does not provide.

However, we intend to do slightly more than this. For each paper, we will give not just a link, but also a short description. This will be based on the abstract and introduction, and on any further context that one of our editors or referees may be able to give us. The advantage of this is that it will be possible to browse the journal and get a good idea of what it contains, without having to keep clicking back and forth to arXiv preprints. In this way, we hope to make visiting the Discrete Analysis home page a worthwhile experience.

Another thing we will be able to do with these descriptions is post links to newer versions of the articles. If an author wishes to update an article after it has been published, we will provide two links: one to the “official” version (that is, not the first submitted version, but the “final” version that takes into account comments by the referee), and one to the new updated version, with a brief summary of what has changed.

The mathematical community is now sufficiently dependent on the arXiv that it is very unlikely that the arXiv will fold, and if it does then there will be greater problems than the fate of Discrete Analysis. However, in this hypothetical situation, we will download all the articles accepted by Discrete Analysis, as well as those still under review, and find another way of hosting them. Note that articles posted to the arXiv are automatically uploaded to HAL as well, so one possibility would be simply to change the arXiv links to HAL links. As for Scholastica, they perform regular backups of all their data, so even if their main site were to be wiped out, all the information concerning their journals would be recoverable. In short, barring a catastrophic failure of the entire internet, articles published in Discrete Analysis will be secure and permanent.

The editors have widely differing views about these sorts of ideas. For now, we are taking a cautious approach, trying to make the journal as conventional as possible so as to maximize its chances of becoming successful. If at some point in the future we decide to experiment with newer methods of peer review, we shall continue to be cautious, and will always give authors the chance to opt out of them.

First, post it on the arXiv, selecting one of the CC-BY options when it asks you which licence you want to use (this is important for ensuring that the journal complies with the open-access requirements of various funding bodies, but if you have already posted the article under a more restrictive licence, you can always use a CC-BY licence for the version that is revised in the light of comments from referees). Then go to the journal’s temporary website, click on the red “Submit Manuscript” button in the top right-hand corner, and follow the simple instructions.

Not everybody reads blogs, so one way that you can support the journal is to bring it to the attention of anybody you know who might conceivably have a suitable paper for it. The sooner we can build up an initial list of interesting papers, the sooner the journal can become established, and the sooner the cheap arXiv overlay model can start competing with the expensive traditional models of publication.

]]>

As we have always said, the party with the most votes and the most seats in this election has the first right to seek to form a Government. The British people would rightly question the legitimacy of a coalition that didnâ€™t allow the party with the largest number of seats and votes the opportunity to attempt to form a Government first.

Iâ€™m proud that the Liberal Democrats have proved we can form a strong and stable coalition government, able to bring prosperity to Britain.

Just like we would not put UKIP in charge of Europe, we are not going to put the SNP in charge of Britain â€“ a country they want to rip apart.

The current projections at Five Thirty-Eight put the Conservatives on 281 seats, Labour on 268, the Scottish Nationalists on 49 and the Liberal Democrats on 26. If these are correct, then Clegg is saying that he will try first to form a Government with the Conservatives. I claim that this is inconsistent with all four of the fundamental Liberal values I mentioned.

It is obviously inconsistent with wanting to promote a broadly centre-left political programme. From what Ed Miliband is saying, it seems unlikely that there will be a formal coalition between Labour and the SNP. However, there does seem to be room for a looser agreement, since the SNP would hate to be responsible for there being another Conservative government, as Nicola Sturgeon has made very clear. Furthermore, Sturgeon has also been clear that she will not press for another referendum on Scottish independence, so there is no reason to suppose that a loose alliance between Labour and the SNP would be a threat to the UK. Thus, Clegg’s choice is between supporting a right wing party or a centre-left alliance of two parties.

The right-wing party is flirting dangerously with leaving the European Union: David Cameron’s official position is that he wants to renegotiate the treaty and then campaign to stay inside a reformed Europe. He has not said what he will do if, as is almost inevitable, he fails to reform the EU, but it is hard to see how he could campaign to remain inside an EU that has humiliated him by refusing his demands for reform. Labour and the SNP, by contrast, are committed to staying in the EU (unless it changes radically) and will not hold a referendum. Yet Clegg says that he will attempt to form a government with the right-wing party, which, I might add, is also full of climate-change deniers, as right-wing parties tend to be.

What is Clegg’s rationale for this? He talks about democratic legitimacy, and here his views are utterly inconsistent with the basic principles that lie behind arguments for voting reform. One of the strongest arguments is that under the current system if you have two parties with broadly similar views, they can split the vote and be heavily penalized, giving power to a much less representative party. And yet that is exactly what Clegg, in effect, supports. The great irony of his position is that by saying that he will support the largest party, he is advocating a first-past-the-post system for forming a government. If the Conservatives get the most seats but are greatly outnumbered in the House of Commons by people of a broadly centre-left persuasion, Clegg claims that a centre-left alliance will nevertheless lack democratic legitimacy. Has he forgotten why he argued against the first-past-the-post system?

To get a full idea of how wrong his position is, let’s imagine a different scenario. Suppose that Labour and the Conservatives had a very similar number of seats and the Lib Dems held the balance of power. According to Clegg, the Lib Dems should form a coalition with whichever of Labour and the Conservatives have the most seats. But isn’t he forgetting something? What about the political preferences of Lib Dem voters? Do they count for nothing? The democratically legitimate option is to choose the major party that best represents the interests of Lib Dem voters, since then the largest number of voters get roughly what they voted for.

I have been sufficiently loyal to the party to forgive it for some pretty dreadful mistakes over the last few years, such as killing off any hope of voting reform in my lifetime, and breaking their promises about tuition fees — I put these down to naivety resulting from inexperience with coalitions. But there is no excuse this time, and Clegg’s basic principles about coalition-building are simply wrong. It may be that he will try but fail to form a government with the Conservatives and end up in just the kind of alliance I would like to see. But he will be wrong even to try.

It will be difficult not to vote for Julian Huppert, our MP for the last five years, who has been excellent and independent-minded (for instance, he voted against tuition fees). I do not want to punish him for the sins of Nick Clegg. But I care even more about the values that have led me to vote Liberal in the past, and it now seems to me that every seat that Labour can pick up from the Lib Dems increases the chances of those values being promoted in the House of Commons. If any card-carrying Lib Dems want to try to persuade me otherwise in the comments below, they will be most welcome to do so.

]]>

Of course, any change will have to be in the direction of making the deal less generous for those with pensions. Indeed, changes have already been made. Until a few years ago, the amount you got at the end was based on your final salary. More precisely, you got one 80th of your final salary per year after retirement for each year that you contributed to the scheme, up to a maximum of 40 years of contributions (and thus a maximum of half your final salary when you retire). But a few years ago they closed this final-salary scheme to new entrants, because (they said) it had become too expensive. This was partly because now a much larger proportion of academics end up as professors, so their final salaries are higher, and also of course because people live for longer.

They now propose to close the final-salary scheme even for existing participants. That of course raises the question of what happens to the contributions we have already made to the scheme. If the USS really can’t afford to keep going with the present arrangements, it is perhaps reasonable to say that we cannot continue to make contributions under those arrangements, but our past contributions were made under the very clear understanding that each year of contributions would add one 80th of our final salary to our eventual annual pension payments. Will that still be the case?

I received a letter from the USS yesterday that included the following reassuring paragraph.

As an active member of the Final Salary section of the scheme, you would be affected by the proposed changes. Under the proposals, the pension benefits provided to you in the future would be different to those that are currently provided through the scheme. It is important to note that the pension rights you have already earned are protected by law and in the scheme rules; the proposed changes will only affect the pension benefits that you will be able to build up in the future if the changes are implemented as proposed.

Does this mean, then, that the pension I have already built up is safe? No, it decidedly doesn’t. If you received a similar letter and were reassured by the above paragraph, then please unreassure yourself, since it is hiding the fact that you stand to lose a *lot* of money (the precise amount depending on your circumstances — I will discuss this later in the post).

The key to how this can be lies in a paragraph from a leaflet that I received with the letter. It says the following.

If you are a member of the current final salary section, the benefits you have built up — your accrued benefits — will be calculated using your pensionable salary and pensionable service immediately prior to the implementation date. Going forward, those accrued benefits will be revalued in line with increases in official pensions (currently the Consumer Prices Index — CPI) each April, up to the point of retirement or leaving the scheme.

In plain language, they are saying that for each year of contributions that you have made to the scheme, you will now earn one 80th of your salary *at the time that the changes to the scheme are implemented* and not at the time that you retire. So if, say, you are in mid career and your final salary ends up 25% higher than your current salary, then what you will get for your contributions so far will be reduced by 20%. (The difference between those two percentages is because if you increase a number by 25%, then to get back to the original number you have to decrease the new number by 20%.)

Let’s illustrate this USS-style with a few hypothetical examples. I will ignore inflation, but it is straightforward to adjust for it.

1. Alice is a historian. She was appointed 19 years ago, when she was in her late 20s. Since then, she has had two children, which caused a temporary drop in her academic productivity, but she has made up for it since, and her career is going well. She has just become a reader, and is told that she is very likely to become a professor in the next two or three years. Her current salary is Â£56,482 per year and will be Â£58,172 next year.

Looking into the future, she does indeed become a professor, in 2018, and starts two notches up from the bottom of the professorial salary scale, at Â£71,506. Looking further into the future, she ends up at the top of Band 1 of the professorial scale, with a salary of Â£85,354 (plus inflationary increases).

Unfortunately for her, the changes to the scheme are implemented before she is promoted, so the 20 years of contributions that she has by then amassed earn her 20/80, or a quarter, of her reader’s salary of Â£58,172, per year. That is, it earns her Â£14,543 per year. (This is not her total pension — just the part of her pension that results from the contributions she has made so far.) Had the scheme not been changed, those contributions would have instead earned her a quarter of her final salary of Â£85,354, which would work out as Â£21,438.50 per year. So she has lost nearly Â£7,000 per year from her pension as a result of the changes. She is destined to live for 25 years after she retires, so her loss works out as Â£175,000.

2. Bob is also a historian and a good friend of Alice. He was appointed at the same time, is the same age, and has had a very similar career, but he has progressed slightly earlier because he did not have a period of low academic productivity. He became a reader three years ago and will become a professor later this year, starting two notches above the bottom salary level, at Â£71,506. He too is destined to end his career at the top of professorial Band 1 with a salary of Â£85,354.

Under the new scheme, his pension contributions up to the time of the change will earn him a quarter of Â£71,506 per year, or Â£17,876.50. Under the current scheme, they would have earned him Â£21,438.50 per year, just as Alice’s would, since their final salaries are destined to be the same. So Bob too has lost out.

However, Bob was luckier than Alice because he was promoted just before the change to the system, as a result of which his salary at the time of the change will be substantially higher than that of Alice. Even though Alice will be promoted soon afterwards, she will end up much worse off than Bob, to the tune of Â£3,333.50 per year.

3. Carl is a mathematician. He proved some very good results in his early 30s and was promoted to professor at the age of 38. He too has put in 20 years of contributions by the time of the changes, by which time he is at the top of Band 1 with a salary of Â£85,354. Unfortunately, soon after he became a professor, he burnt out somewhat, never quite matching the achievements of his youth, so his salary is not going to increase any further. So for him the changes to the system make no difference: his current salary is is final salary. As with both Alice and Bob, under the current system his contributions would earn him Â£21,438.50. But for Carl they will earn him Â£21,438.50 under the new system as well.

There are two general points I want to make with these examples. The first is that the changes amount to the breaking of an agreement. We were not obliged to take out a pension with USS, but were told that it was crazy not to do so because the payout was based on our final salary. I started my pension late (out of sheer stupidity, but that’s another story) and decided that at considerable expense (because there was not an accompanying employers’ contribution) I would make additional voluntary contributions. When I was deciding to do this, it was explained to me that each year I bought would add one 80th of my final salary to my pension. I am on a salary scale and have not reached the top of it, so if the USS make the proposed changes then they will be reneging on that agreement.

Is this legal? Here again is what they said.

It is important to note that the pension rights you have already earned are protected by law and in the scheme rules; the proposed changes will only affect the pension benefits that you will be able to build up in the future if the changes are implemented as proposed.

A lot depends on what is meant by “the pension rights you have already earned”. I would understand that to mean my final salary multiplied by the number of years I have contributed to the scheme divided by 80, since that is what I was told I would be getting for the money I have paid in so far. However, I think it may be that in law what I have already earned is what I could take away if I left the scheme now, which would be based on my current salary, and that part of “building up in the future” is sticking around in Cambridge while my salary increases. If anybody knows the answer to this legal question, I would be very interested. I have tried to find out by looking at the Pension Schemes Act 1993, and in particular Chapter 4, but it is pretty impenetrable. (Lawyers often claim that this impenetrability is necessary in order to avoid ambiguity, but in this instance it seems to have the opposite effect.)

But even if it turns out that it is not illegal for USS to interpret “the pension rights you have already earned” in this way, it is quite clearly immoral: it is a straightforward breaking of the terms of the agreement I had with them when I decided to take out a USS pension and make additional voluntary contributions. And of course I am far from alone in this respect. I personally don’t expect my final salary to be all that much higher than my current salary, so I probably won’t lose too much, but people whose final salaries are likely to be a lot higher than their current salaries will lose hugely.

The second point is that the way the USS has decided to share out the pain hugely exacerbates unfairnesses that are already present in the system. It is not fair that scientists are typically promoted much earlier than those in the humanities. In many cases it is not fair when men are promoted earlier than women. But at least those who were promoted more slowly could console themselves with the thought that they would probably catch up eventually, and that their pensions would therefore be comparable. If the changes come into effect, then as the examples above illustrate, if two people are in mid career at the time of the changes and are destined to reach the same final salary, but one has been promoted more than the other at the time of the changes, then the first person will end up not just with all that extra salary as at present but also with a substantially higher pension.

There is a mathematical point to make here that applies to many different policies. It is very wrong if the effect of the policy does not depend roughly continuously on somebody’s circumstances. But if you belong to the final-salary section and are up for promotion soon, you had better hope that you get promoted just before the change rather than just after it, since the accumulated difference it will make to your pension will be very large, even though the difference to your career progression will be small.

If all this bothers you, please do two things. First, alert your colleagues to what is going on and to what is wrong with it. Secondly, consider signing a petition that has been set up to oppose the changes.

**Update.** There are two further points that have come to my attention that mean that the situation is worse than I described it. The first is that I forgot to mention the lump sum that one receives on retirement. This is worth three times one’s annual pension, so for each of Alice, Bob and Carl, what they stand to lose from the lump sum under the new system is three quarters of the difference between their current salary and their final salary. Thus, Alice loses around Â£21,000 from her lump sum, while Carl loses nothing from his.

However, it turns out that Carl is not quite as fortunate as I claimed above, owing to a further consideration that I did not know about, which is that academic salaries tend to rise faster than inflation. I don’t mean that the salary of any one individual rises faster as a result of salary increments. I mean that if you take the salary at a fixed place in the salary scale, then that tends to rise faster than inflation. So although Carl will remain on the same point at the top of Band 1 for the rest of his career, his salary is likely to be significantly higher in real terms when he retires than it is now. I am told that it is quite usual for salaries to go up by at least 1% more than inflation, so in 20 years’ time this could make a big difference. This second consideration makes the situation worse for Alice and Bob by the same amount that it does for Carl.

]]>

I’ll start with the case . I want to phrase a familiar argument in a way that will make it easy to generalize in the way I want to generalize it. Let be the rectangle consisting of all integer points such that and . We can partition into those points for which and those points for which . The number of points of the first kind is , since for each we get possibilities for . The number of points of the second kind is , since for each we get possibilities for . Therefore, and we get the familiar formula.

Now let’s move to sums of squares. This time I’ll let be the set of points with , and . We can partition into three sets, the first consisting of points for which is maximal, the second of points for which is maximal and is not maximal, and the third of points for which is strictly larger than both and . The numbers of points in these sets are easily seen to be

and

,

respectively. This gives us the formula

from which we get the familiar formula for the sum of the first squares. Writing for the sum of the first th powers, we also get the relationship

A striking fact about power sums is that . One way of explaining this can be found in a rather nice animation that I came across as a result of a Google Plus post of Richard Green. Another comes from continuing with the approach here.

This time I’ll let be the set of points such that and are between 1 and n and and are between 1 and . Again I’ll partition according to which of is the largest, taking the first one that’s largest when there is a tie. That gives me four sets. Here are their sizes.

first largest: .

first largest: .

first largest: .

first largest: .

These sizes can be written as , and . So we get , which gives us that . It also gives us a kind of explanation of that fact: for we decompose into two equal pieces of size , while for we decompose into four pieces that don’t quite all have size but the two errors cancel out.

To see that this is a partial but not total coincidence, I’m going to jump to now. I’ll let be the set of points such that are between 1 and and are between 1 and . This time the calculations are as follows.

first largest: .

first largest: .

first largest: .

first largest: .

first largest: .

first largest: .

Adding all these up we find that . From that we get that

In general, if we use this method to calculate when is odd, then (as with other methods) we obtain a relationship between and earlier . But what is nice about it is that there is a lot of cancellation: all the for even make a contribution of zero.

Indeed, if , then we have sets, and their sizes are , , , and so on down to

and then the same thing but with all minus signs in these linear combinations replaced by plus signs. Adding it all up we get a linear combination of , , … , equalling , where if is even and if is odd. When is small, we don’t have to take into account too many , so the formulae remain quite nice for a while before they become disgusting.

Note that it is quite easy to work out the coefficients of the various in the above linear combination: they are just sums of binomial coefficients. Several other methods require one to solve simultaneous equations, though they are usually in triangular form, so not too bad.

A small remark is that the basic idea of this argument is to discretize a much easier continuous argument that shows that . That argument is to take the -dimensional cube consisting of all points such that each belongs to the interval and partition it into pieces according to which coordinate is largest. (I call the argument geometrical because these pieces are pyramids with -cube bases.) In the continuous case, we don’t have to worry about what happens if there is more than one largest coordinate, since that set has measure zero. Each piece has measure , and there are pieces, while the cube has measure , so we are done.

A second remark is that the method I previously knew of for calculating sums of th powers is to exploit the fact that deserves to be called the natural discrete analogue of the above. Define to be . This we think of as the discrete analogue of . Then we do a discrete analogue of differentiating, which is to look at , which equals . This is the discrete analogue of the fact that the derivative of is . Next, we use the discrete analogue of the fundamental theorem of calculus, which is the statement that to deduce that . This gives us for each a polynomial of degree that we can sum easily, namely , and then to work out the sum of the first th powers, we write it as a linear combination of the , sum everything, and simplify the answer. That works fine, but the calculations are quite a bit more complicated than what I did above, and the proof is too algebraic to explain why the answers have the fairly nice forms they do. (For example, why is a factor of the sum of the first th powers whenever is odd and at least 3? From the argument I gave earlier in the post, this follows fairly easily by induction.)

]]>

As a result, the first talk I went to was Manjul Bhargava’s plenary lecture, which was another superb example of what a plenary lecture should be like. Like Jim Arthur, he began by telling us an absolutely central general problem in number theory, but interestingly it wasn’t the same problem — though it is related.

Bhargava’s central problem was this: given a function on the integers/rationals that takes integer/rational values, when does it take square values? In order to persuade us that this problem had been a central preoccupation of number theorists for a very long time, he took as his first example the function . Asking for this to take square values is asking for a Pythagorean triple, and people have been interested in those for thousands of years. To demonstrate this, he showed us a cuneiform tablet, which was probably the Babylonian tablet Plimpton 322, which contains a list of Pythagorean triples, some of which involve what in decimal notation are 5-digit numbers, and therefore not the kind of example one stumbles on without some kind of systematic procedure for generating it.

If one takes one’s function to be a cubic in one variable, then one obtains an elliptic curve, and rational points on elliptic curves are of course a huge topic in modern number theory, one to which Bhargava has made a major contribution. I won’t say much more about that, since I have already said a reasonable amount about it when discussing his laudatio. But there were a few extra details that are worth reporting.

He told us that Goldfeld and Katz and Sarnak had conjectured that 50% of elliptic curves have rank 0 and 50% have rank 1 (so the density of elliptic curves with higher rank is zero). He then told us about some work of Brumer and McGuinness in 1990 that seems to cast doubt on this (later) conjecture: they found that rank 2 curves occur quite often and their frequency increases as the coefficients get larger. More recent computational work has very strongly suggested that the conjecture is false: if you draw a graph of the average rank of elliptic curves as the size goes from to , it increases quickly from 0.7 before tailing off and appearing to tend to about 0.87. Apparently the reaction of Katz and Sarnak was a cheerful, “Well, it will go down eventually.”

Bhargava was pretty sceptical about this, but became properly interested in the problem when he learnt about work of Brumer, who showed assuming the generalized Riemann hypothesis and the Birch–Swinnerton-Dyer conjecture that the average rank was bounded above by 2.3. As Bhargava put it, this was a result that depends on two million dollars worth of conjectures. But that meant that if one could prove that the average rank of elliptic curves was greater than 2.3, then one would have shown that at least one of the generalized Riemann hypothesis and the Birch–Swinnerton-Dyer conjecture was false.

Still using the two million dollars worth of conjecture, Heath-Brown got the bound down to 2 in 2004, and Young got it to 1.79 in 2009. Bhargava and Shankar managed to improve that by 0.9 and two million dollars: that is, they obtained an unconditional bound of 0.89, amusingly close to the apparent asymptote of the graph that comes from the computations. As Bhargava pointed out, if one could extend those computations and find that the density eventually surpassed 0.89, this would, paradoxically, be very good news for the conjecture of Katz and Sarnak, because it would prove that the graph did eventually have to start coming down.

More recently, with Chris Skinner, Bhargava got an unconditional lower bound of 0.2.

One thing I understood a bit better by the end of Bhargava’s lecture was the result that the Birch–Swinnerton-Dyer conjecture holds for a positive proportion of elliptic curves. Although this is a remarkable result, there is a sense in which it is a slight cheat. What I mean by that is that Bhargava and his collaborators have a clever way of proving that a positive proportion of elliptic curves have rank 1. Then of those curves, they have a clever way of showing that for a positive proportion of those curves the order of the L-function at s=1 is also 1. What this argument doesn’t do, if my understanding is correct, is show something like this (except perhaps in some trivial sense):

- Every elliptic curve that satisfies a certain criterion also satisfies the Birch–Swinnerton-Dyer conjecture.
- A positive proportion of elliptic curves satisfy that criterion.

So in some sense, it doesn’t really get us any closer to establishing a connection between the rank of an elliptic curve and the order of the associated L-function at s=1. Perhaps in that respect it is a bit like the various results that say that a positive proportion of the zeros of the zeta function lie on the critical line, though I’m not sure whether that is a good analogy. Nevertheless, it is a remarkable result, in the sense that it proves something that looked out of reach.

Perhaps my favourite moment in Bhargava’s talk came when he gave us a hint about how he proved things. By this time he was talking about hyperelliptic curves (that is, curves where is a polynomial of degree at least 5), where his main result is that most of them don’t have any rational solutions. How does he show that? The following slide, which I photographed, gives us a huge clue.

He looked at polynomials of degree 6. If the hyperelliptic curve has a rational solution , then by applying the change of variable , we can assume without loss of generality that the rational solution occurs at , which tells us that for some rational . But then you get the remarkable identity shown in the slide: a pair of explicit matrices and such that det. Note that to get these matrices, it was necessary to split up as a product , so we really are using the fact that there is a rational point on the curve. And apparently one can show that for most polynomials of degree 6 such a pair of matrices does not exist, so most polynomials of degree 6 do not take square values.

Just as the Babylonians didn’t find huge Pythagorean triples without some method of producing them, so Bhargava and his collaborators clearly didn’t find those matrices and without some method of producing them. He didn’t tell us what that method was, but my impression was that it belonged to the same circle of ideas as his work on generalizing Gauss’s composition law.

The lecture was rapturously received, especially by non-mathematicians in the audience (that could be interpreted as a subtly negative remark, but it isn’t meant that way), who came away from it amazed to feel that they had understood quite a bit of it. Afterwards, he was mobbed in a way that film stars might be used to, but mathematicians rather less so. I photographed that too.

If you give the photo coordinates in , then Bhargava’s head is at around and he is wearing a dark red shirt.

At 2pm there was the Gauss Prize lecture. I thought about skipping it, but then thought that that would be hypocritical of me after my views about people who left the laudationes just before the one for the Nevanlinna Prize. I shouldn’t be prejudiced against applied mathematics, and in any case Stanley Osher’s work, or at least part of it, is about image processing, something that I find very interesting.

I went to the talk thinking it would be given by Osher himself, but in fact it was given by someone else about his work. The slides were fairly dense, and there was a surprising amount of emphasis on what people call metrics — numbers of papers, H-factors and so on. The fact that the speaker said, “I realize there is more to academic output than these metrics,” somehow didn’t help. I found myself gradually zoning out of this talk and as a result, despite my initial good intentions, do not have anything more to say about Osher’s work, clearly interesting though it is.

I then did skip the first of the afternoon’s parallel sessions. I wondered about going to hear Mohammed Abouzaid, because I have heard that he is a rising star (or rather, an already risen star who probably has even further to rise), but I found his abstract too intimidating.

So the first talk I actually did go to was in the second session, when I went to hear Craig Gentry, a theoretical computer scientist famous for something called homomorphic encryption, which I had heard about without quite understanding what it was. My target for the 45 minutes was to remedy this situation.

In the end two things happened, one good and one bad. The good one was that early on in the talk Gentry explained what homomorphic encryption was in a a way that was easy to understand. The bad one was that I was attacked by one of my periodic waves of tiredness, so after the early success I took in very little else — I was too absorbed in the struggle to keep my eyes open (or rather, to ensure that the brief moments when I shut them didn’t accidentally turn into stretches of several minutes).

The basic idea of homomorphic encryption is this. Suppose you have some function that encrypts data, and let’s suppose that the items one encrypts are integers. Now suppose that you are given the encryptions and of and and want to work out the encryption of . For an arbitrary encryption system there’s not much you can do other than decrypt and , add up the results, and then encrypt again. In other words, you can’t do it unless you know how to decrypt. But what if you want people to be able to do things to encrypted data (such as, say, carrying out transactions on someone’s bank account) without having access to the original data? You’d like some weird operation with the property that . I think now it is clear what the word “homomorphic” is doing here: we want to be a homomorphism from (integers, +) to (encrypted integers, ).

Having said that, I think Gentry told us (but can’t remember for sure) that just doing this for addition was already known, and his achievement has been to find a system that allows you to add and multiply. So I think his encryption may be a ring homomorphism. Something I haven’t stressed enough here is that it isn’t enough for the “funny” operations and to *exist*: you need to be able to compute them efficiently without being able to decrypt efficiently. The little I took in about how he actually did this made it sound as though it was very clever: it wasn’t just some little trick that makes things easy once you’ve observed it.

If you want to know more, the talk is here.

The last talk I went to, of the entire congress, was that of Tom Sanders, who was talking about the context surrounding his remarkable work on Roth’s theorem on arithmetic progressions. Sanders was the first to show that a subset of of density must contain an arithmetic progression of length 3. This is tantalizingly close to the density of the primes in that interval, and also tantalizingly close to the density needed to prove the first non-trivial case of Erdős’s famous conjecture that a subset of such that contains arithmetic progressions of all lengths.

Sanders discussed the general question of which configurations can be found in the primes, but also the question of *why* they can be found. For instance, quadruples such that can be found in the primes, but the proof has nothing to do with the primes other than their density: the number of pairs with prime and less than is about , and the number of possible sums is at most , so some sum can be achieved in several ways. By contrast, while there are many solutions of the equation in the primes (an example is ), one can easily find dense sets of integers with no solutions: for instance, the set of integers congruent to 1 mod 3 or the set of integers strictly between and .

Roth’s theorem concerns the equation , and while has been known for many decades that there are many solutions to this equation in the primes, there is no proof known that uses only the density of the primes, and also no counterexample known that shows that that density is insufficient.

I had a conversation with Sanders after the talk, in which I asked him what he thought the lowest possible density was that guaranteed a progression of length 3. The two natural candidates, given what we know so far, are somewhere around , and somewhere around . (The latter is the density of the densest known set with no progression of length 3.) Recent work of Schoen and Shkredov, building on Sanders’s ideas, has shown that the equation has non-trivial solutions in any set of density at least . I put it to him that the fact that Schoen and Shkredov needed the extra “smoothness” that comes from taking a fivefold sumset on the left-hand side rather than just a twofold one paradoxically casts doubt on the fact that this type of bound is correct for Roth’s theorem. Rather, it suggests that perhaps the smoothness is actually needed. Sanders replied that this was not necessarily the case: while a convolution of two characteristic functions of dense sets can have “gaps”, in the sense of points where the value is significantly less than expected, it is difficult for that value to go all the way down to zero.

That will be a bit too vague to be comprehensible if you are not an additive combinatorialist, so let me try to give a little bit more explanation. Let be a subset of (the integers mod ) of density . We say that is –*quasirandom* if the sizes of the intersections , which have mean , have standard deviation at most . Now one way for the standard deviation to be small is for most of the intersections to have roughly the same size, but for a few of them to be empty. That is the kind of situation that needs to happen if you want an unexpectedly dense set with no arithmetic progression of length 3. (This exact situation doesn’t have to happen, but I’m trying to convey the general feel of what does.) But in many situations, it seems to be hard to get these empty intersections, rather than merely intersections that are quite a bit smaller than average.

After Sanders’s talk (which is here), I went back to my room. By this time, the stomach bug that I mentioned a few posts ago had struck, which wasn’t very good timing given that the conference banquet was coming up. Before that, I went up to the top of the hotel, where there was a stunning view over much of Seoul, to have a drink with Günter Ziegler and one other person whose name I have forgotten (if you’re reading this, I enjoyed meeting you and apologize for this memory lapse). Günter too had a stomach bug, but like me he had had a similar one shortly before coming to Korea, so neither of us could be sure that Korean food had anything to do with it.

The banquet was notable for an extraordinary Kung Fu performance that was put on for our entertainment. It included things like perfomers forming a human pyramid that other performers would run up in order to do a backwards somersault, in the middle of which they would demolish a piece of wood with a sharp blow from the foot. It was quite repetitive, but the tricks were sufficiently amazing to bear quite a bit of repetition.

My last memory of ICM2014 was of meeting Artur Avila in the lobby of the hotel at about 5:25am. I was waiting for the bus that would take me to the airport. “Are you leaving too?” I naively asked him. No, he was just getting back from a night on the town.

]]>

As I’ve already mentioned, Day 3 started with Jim Arthur’s excellent lecture on the Langlands programme. (In a comment on that post, somebody questioned my use of “Jim” rather than “James”. I’m pretty sure that’s how he likes to be known, but I can’t find any evidence of that on the web.) The next talk was by Demetrios Christodoulou, famous for some extraordinarily difficult results he has proved in general relativity. I’m not going to say anything about the talk, other than that I didn’t follow much of it, because he had a series of dense slides that he read word for word. The slides may even have been a suitably chopped up version of his article for the ICM proceedings, but I have not been able to check that. Anyhow, after a gentle introduction of about three or four minutes, I switched off.

I switched on again for JÃ¡nos KollÃ¡r’s lecture, which was, like some of the others, what I feel a plenary lecture should be: a lecture that gives the non-expert a feel for what is important in the area being talked about. The first thing I wrote down was his brief description of the minimal model problem, one of the central questions in algebraic geometry. I think that by that time he had spent a while telling us what algebraic sets were, explaining why the picture you get if you just work over the reals is somewhat incomplete (for example, you may get a graph with two components, when if you work over the extended complex plane you have a torus), and so on.

The minimal model problem is this: given an algebraic variety , find a variety (the “minimal model” of ) such that the space of meromorphic functions on is isomorphic to the space of meromorphic functions on and the geometry of is as simple as possible. The condition that the function spaces are isomorphic seems (from a glance at Wikipedia) to be another way of saying that the two varieties are birationally equivalent, which is a fundamental notion of equivalence in algebraic geometry. So one is trying to find a good representative of each equivalence class.

The problem was solved for curves by Riemann in 1851, for surfaces by Enriques in 1914 and by Kodaira in 1966 (I don’t know exactly what that means, but I suppose Enriques made major inroads into the problem and Kodaira finished it off). And for higher dimensions there was the Mori program of 1981. As I understand it, Mori made huge progress towards understanding the three-dimensional case, and Christopher Hacon and James McKernan, much more recently, made huge progress in higher dimensions.

Another major focus of research is the *moduli problem*. This, KollÃ¡r told us, asks what are the simplest families of algebraic varieties, and how can we transform any family into a simplest one? I don’t know what this means, but I would guess that when he said “families of algebraic varieties” he was talking about some kind of moduli space (partly because that seems the most likely meaning, and partly because of the word “moduli” in the name of the problem). So perhaps the problem is sort of like a “family version” of the minimal model problem: you want to find a simplest moduli space that is in some sense similar to the one you started with.

Anyhow, whatever the problem is, it was done for curves by Deligne and Mumford in 1969, for surfaces by KollÃ¡r and Shepherd-Barron in 1988 and Alexeev in 1996 (again I don’t know who did what), and apparently in higher dimensions the KollÃ¡r-Shepherd-Barron-Alexeev method works, but there are technical details. (Does that mean that KollÃ¡r is confident that the method works but that a full proof has not yet been written out? He may well have told us, but my notes don’t tell me now.)

KollÃ¡r then explained to us a third problem. A general technique for studying a variety is to find a variety that is birationally equivalent to and study the question for instead. Under these circumstances, there will be lower dimensional subvarieties and such that . So one is left needing to answer a similar question for and , and since these are of lower dimension, one has the basis for an inductive proof. But for that to work, we want to be adapted to the problem, so the question, “When is a variety simple?” arises.

Apparently this was not even a precisely formulated question until work of Mori and Reid (1980-2) and KollÃ¡r, Miyaoka and Mori (1992). The precise formulation involves the first Chern class.

And that’s all I have, other than a general memory that this lecture continued the generally high standard of plenary lectures at the congress.

At 2pm, Avila gave his Fields medallist’s lecture. As with Hairer, I don’t feel I have much to say that I have not already said when describing the laudatio, so I’ll move on to 3pm, or rather 3:05 — by today the conference organizers had realized that it took a non-zero amount of time to get from one talk to another — when David Conlon was speaking.

David is a former student and collaborator of mine, and quite a bit of what he talked about concerned that collaboration. I’ll very briefly describe our main result.

There are many combinatorial theorems that can be regarded as questions about arbitrary subsets of nice structures such as the complete graph on vertices or the cyclic group of order . For example, Ramsey’s theorem says that if you 2-colour the edges of the complete graph on vertices, then (as long as is large enough) one of the colour classes will contain a complete graph on vertices. And SzemerÃ©di’s theorem is equivalent to the assertion that for every and every positive integer there exists such that for every subset of the integers mod of size at least there exist and such that all of belong to .

For many such questions, one can generalize them from the “nice” structures to arbitrary structures. For instance, one can ask of a given graph whether if you colour its edges with two colours then one of those colours must contain a complete subgraph with vertices. Obviously, the answer will be yes for some and no for others, but to make it an interesting question, one can ask what happens for a *random* . More precisely, how sparse can a random graph be and still have the Ramsey property?

This question was answered in full by RÃ¶dl and Rucinski, but our method gives a new proof of the upper bound (on how dense the random graph needs to be), and also gives a very general method that solves many problems of this type that were previously unsolved. For example, for SzemerÃ©di’s theorem it tells us the following. Define a subset of to be –*SzemerÃ©di* if every subset of size at least contains an arithmetic progression of length . Then if is large enough (depending on and only), then a random subset of where elements are chosen independently with probability is -SzemerÃ©di with high probability.

This bound is within a constant of best possible, since if the probability dips below , around half the elements of the random set will not even belong to an arithmetic progression of length , so those elements form a dense set that proves that is not -SzemerÃ©di.

The method David and I used was inspired by the “transference principle” that Green and Tao used to prove their famous result about arithmetic progressions in the primes, though it involved several additional ingredients. A completely different approach was discovered independently by Mathias Schacht. Like ours, his approach established a large number of previously open “sparse random versions” of well-known combinatorial theorems.

David always gives very nice talks, and this one was no exception.

After his talk, I went to hear Nets Katz — with some regret as it meant missing Maria Chudnovski, who followed on from David in the combinatorics section. I must try to watch the video of her talk some time, though I’m bad at finding time to watch videos on the internet if they last for more than about three minutes.

Nets talked about work related to his famous solution with Larry Guth of the ErdÅ‘s distance problem. That problem asks how many distinct distances there must be if you have points in the plane. If you put them evenly spaced along a line, you get distinct distances. You can do a bit better than that by putting them in a grid: because the density of numbers that can be expressed as a sum of two squares is roughly , one gets around distinct distances this way.

ErdÅ‘s asked whether this was anywhere near to being best possible. More precisely, he asked whether there was a lower bound of , and that is what Guth and Katz proved. This was a great result that answered a question that many people had worked on, but it is also notable because the proof was very interesting. One of the main tools they used was the *polynomial method*, which I will not attempt to describe here, but if you are curious, then Terence Tao has posted on it several times. Nets Katz’s talk is here.

Then it was back (quite some way) to the combinatorics room to hear Michael Krivelevich talking about games. (This link is quite hard to find because they’ve accidentally put his name as Michael Krivelerich.) By “games” I mean two-player positional games, which are defined as follows. You have a set (the board) and a collection of subsets of (the winning positions). There are then two kinds of games that are typically studied. In both kinds, a *move* consists in choosing a point of that has not yet been chosen. In the first kind of game, the players alternate choosing points and the winner is the first player who can make a set in out of his/her points. (If neither player can do this by the time the entire board is filled up, then the result is a draw.) Noughts and crosses (or tic-tac-toe) is an example of this: is a 3-by-3 grid and consists of all lines of three points in that grid.

A well-known argument that goes back (at least) to John Nash when he was thinking about the game of Hex proves that the second player cannot have a winning strategy for this game. The argument, referred to as *strategy stealing* is as follows. Suppose that the second player does have a winning strategy. Then the first player has a winning strategy as well, which works like this. First choose an arbitrary . Then ignore , pretend that your opponent is the first player and play the second player’s winning strategy. If you ever find that you have already played the point that the strategy dictates, then play an arbitrary unoccupied point instead.

This contradiction (a contradiction since it is not possible for both players to have winning strategies) proves that the first player can guarantee a draw, but it is a highly inexplicit argument, so it gives no clue about *how* the first player can do that. An interesting open problem that Krivelevich mentioned relates this to the Hales-Jewett theorem. A consequence of the Hales-Jewett theorem is that if you play noughts and crosses on an -dimensional board where each side has length , then provided is large enough in terms of , it is not possible for the outcome to be a draw — since there is no 2-colouring of the points of the grid that does not give rise to a monochromatic line. So we know that the first player has a winning strategy. However, no explicit strategy is known, even if is allowed to be ridiculously large. (I am talking here about general : for small such as 3, and perhaps even 4, a winning strategy is known for fairly small .)

I asked Krivelevich about this problem, and his opinion was that it was probably very hard. The difficulty is that the first player has to devote too much attention to stopping the second player from winning, so cannot concentrate on trying to build up a line.

Another open problem is to find an explicit strategy that proves the following statement: there exist positive integers and such that for every , if the game is played on the complete graph on vertices (that is, players are alternately choosing edges), then the first player can create the first clique of size 5 in at most moves.

A moment I enjoyed in the talk was when Krivelevich mentioned something called the *extra set paradox*, which is the statement that if you add to the set of winning positions, a game that was previously a win for the first player can become a draw.

At first that seemed to me obviously false. When that happens, it is always interesting to try to analyse one’s thoughts and formulate the incorrect proof that has sprung to mind. The argument I had was something like that adding an extra set only increased the options available to the first player, so could not make it harder to win. And that argument is complete garbage, because it increases the options for the second player too. So if, for example, the first player plays as though the extra winning positions didn’t exist, the second player could potentially win by reaching one of those positions. The extra effort required to stop this can potentially (and sometimes does) kill the first player’s winning strategy.

Games of the kind I’ve just been discussing seem to be very hard to analyse, so attention has turned to a different kind of game, called a *maker-breaker game*. Here, the first player’s objective is to occupy a winning position, and the second player’s objective is to stop that happening. Also, the number of moves allotted to the two players is often different: we may allow one player to take moves for each move that the other player takes.

A typical question looked at is to take a graph property such as “contains a Hamilton cycle” and to try to find the threshold at which breaker can win. That is, if breaker gets moves for each move of maker, how large does need to be in order for breaker to be able to stop maker from making a Hamilton cycle? The answer to this, discovered by Krivelevich in 2011, is that the threshold is at , in the sense that if then maker wins, while if then breaker wins.

What makes this result particularly interesting is that the threshold occurs when the number of edges that maker gets to put down is (approximately) equal to the number of edges a random graph needs to have in order to contain a Hamilton cycle. This is the so-called *random paradigm* that allows one to guess the answers to many of these questions. (It was Erdős who first conjectured that this paradigm should hold.) It seems to be saying that if both players play optimally, then the graph formed by maker will end up looking like a random graph. It is rather remarkable that this has in some sense actually been proved.

Next up, at 6pm (this was a very long day) was the Abel lecture. This is a tradition started in 2010, where one of the last four Abel Prize winners gives a lecture at the ICM. The chosen speaker this time was John Milnor, whose title was “Topology through four centuries.” I did not take notes during this lecture, so I have to rely on my memory. Here’s what I remember. First of all, he gave us a lot of very interesting history. A moment I enjoyed was when he discussed the proof of a certain result and said that he liked it because it was the first example he knew of of the use of Morse theory. A long time ago, when I had very recently got my PhD, I thought about a problem about convex bodies that caused me to look at Milnor’s famous book on Morse theory. I can’t now remember what the problem was, but I think I was trying to think hard about what happens if you take the surface of a symmetric convex body with a sphere inside, gradually shrink it until it is inside the sphere, and look at the intersection of the two surfaces. That gives you (generically) a codimension-1 subset of the sphere that appears, moves about, and eventually vanishes again. That’s exactly the kind of situation studied by Morse theory.

Much more recently, indeed, since the talk, I have had acute personal experience of Morse theory in the outdoor unheated swimming pool where I was staying in France. Because I am worried about setting my heart out of rhythm if I give it too much of a shock, I get into cold swimming pools very slowly, rather than jumping in and getting the discomfort over all at once. This results in what my father describes as a ring of pain: the one-dimensional part of the surface of your body that is not yet used to the water and not safely outside it. Of course, the word “ring” is an oversimplification. Ignoring certain details that are inappropriate for a family post such as this, what I actually experience is initially two rings that after a while fuse to become a figure of eight, which then instantly opens out into a single large ring, to be joined by two more small rings that fuse with the large ring to make a yet larger ring that then becomes a lot smaller before increasing in size for a while and finally shrinking down to a point.

It is clear that if you are given the cross-sections of a surface with all the planes in a certain direction that intersect it, then you can reconstruct the surface. As I understand it, the basic insight of Morse theory is that what really matters if you want to know about the topology of the surface is what happens at the various singular moments such as when there is a figure of eight, or when a ring first appears, etc. The bits in between where the rings are just moving about and minding their own business don’t really affect anything. How this insight plays out in detail I don’t know.

As one would expect from Milnor, the talk was a beautiful one. In traditional fashion, he talked about surfaces, then 3-manifolds, and finally 4-manifolds. I think he may even have started in one dimension with a discussion of the bridges-of-Königsberg problem, but my memory of that is hazy. Anyhow, an indication of just how beautiful the talk was is what happened at the end. He misjudged the time, leaving himself about two minutes to discuss 4-manifolds. So he asked the chairman what he should do about it, and the chairman (who was Helge Holden) told him to take as much time as he wanted. Normally that would be the cause for hate rays to emanate towards the chairman and the speaker from the brains of almost the entire audience. But with this talk, the idea of missing out on the 4-manifold equivalent of what we had just heard for 2-manifolds and 3-manifolds was unthinkable, and there was a spontaneous burst of applause for the decision. I’ve never seen anything like it.

The one other thing I remember was a piece of superhuman modesty. When Milnor discussed examples of extraordinary facts about differentiable structures on 4-manifolds, the one he mentioned was the fact that there are uncountably many distinct such structures on , which was discovered by Cliff Taubes. The way Milnor presented it, one could have been forgiven for thinking that the fact that there can be distinct differentiable structures on a differentiable manifold was easy, and the truly remarkable thing was getting uncountably many, whereas in fact one of Milnor’s most famous results was the first example of a manifold with more than one differentiable structure. (The result of Taubes is remarkable even given what went before it: the first exotic structures on were discovered by Freedman and Kirby.)

Just to finish off the description of the day, I’ll mention that in the evening I went to a reception hosted by the Norwegians (so attending the Abel lecture was basically compulsory, though I’d have done so anyway). Two things I remember about that are a dish that contained a high density of snails and the delightful sight of Maryam Mirzakhani’s daughter running about in a forest of adult legs. Then it was back to my hotel room to try to gather energy for one final day.

]]>

The next morning kicked off (after breakfast at the place on the corner opposite my hotel, which served decent espressos) with Jim Arthur, who gave a talk about the Langlands programme and his role in it. He told us at the beginning that he was under strict instructions to make his talk comprehensible — which is what you are supposed to do as a plenary lecturer, but this time it was taken more seriously, which resulted in a higher than average standard. Ingrid Daubechies deserves a lot of credit for that. He explained that in response to that instruction, he was going to spend about two thirds of his lecture giving a gentle introduction to the Langlands programme and about one third talking about his own work. In the event he messed up the timing and left only about five minutes for his contribution, but for everybody except him that was just fine: we all knew he was there because he had done wonderful work, and most of us stood to learn a lot more from hearing about the background than from hearing about the work itself.

I’ve made a few attempts to understand the Langlands programme — not by actually studying it, you understand, but by attending general-audience talks or reading general-audience articles. It’s a bit of a two-steps-forward (during the talk) and one-step-back (during the weeks and months after the talk) process, but this was a very good lecture and I really felt I learned things from it. Some of them I immediately forgot, but have in my notes, and perhaps I’ll fix them slightly better in my brain by writing about them here.

For example, if you had asked me what the central problem in algebraic number theory is, I would never have thought of saying this. Given a fixed polynomial and a prime , we can factorize into irreducibles over the field . It turns out to be inconvenient if any of these irreducible factors occurs with multiplicity greater than 1, so an initial assumption is that has distinct roots over (or at least I think that’s the assumption). [Insert: looking at my notes, I realize that a better thing to write is , the splitting field of , rather than , though I presume that that gives the same answer.] But even then, it may be that over some primes there are repeated irreducible factors. The word “ramified”, which I had always been slightly scared of, comes in here. I can’t remember what ramifies over what, or which way round is ramified and which unramified, so let me quickly look that up. Hmm, that was harder than I expected, because the proper definition is to do with rings, extension fields and the like. But it appears that “ramified” refers to the case where you have multiplicity greater than 1 somewhere. For the purposes of this post, let’s say that a prime is ramified (I’ll take the polynomial as given) if has an irreducible factor over with multiplicity greater than 1. The main point to remember is that the set of ramified primes is small. I think Arthur said that it was always finite.

So what is the fundamental problem of algebraic number theory? Well, when you decompose a polynomial into irreducible factors, those factors have degrees. If the degree of is , then the degrees of the irreducible factors form a partition of : that is, a collection of positive integers that add up to . The question is this: which (unramified) primes give rise to which partitions of ?

How on earth is *that* the fundamental problem of algebraic number theory? What’s interesting about it? Aren’t number theorists supposed (after a long and circuitous route) to be solving Diophantine equations and things like that?

Arthur gave us a pretty convincing partial answer to these questions by discussing the example . The splitting field is — that is, rational linear combinations of 1 and — and the only ramified prime is 2. (The reason 2 is ramified is that over we have .)

Since the degree of is 2, the two partitions of the degree are and . The first occurs if and only if cannot be factorized over , which is the same as saying that -1 is not a quadratic residue. So in this case, the question becomes, “For which odd primes is a quadratic residue?” to which the answer is, famously, all primes congruent to 1 mod 4. So Arthur’s big grown-up question is a generalization of a familiar classical result of number theory.

To answer the question for quadratic polynomials, Gauss’s law of quadratic reciprocity is a massive help. I think it is correct to say that the Langlands programme is all about trying to find vast generalizations of quadratic reciprocity that will address the far more general question about the degrees of irreducible factors of arbitrary polynomials. But perhaps it is more general still — at the time of writing I’m not quite sure.

Actually, I think I am sure. One thing Arthur described was Artin L-functions, which are a way of packaging up the data I’ve just described. Here is the definition he gave. You start with a representation of the Galois group of . For simplicity he assumed that the Galois group was actually (where is the degree of ). Then for each unramified prime the partition of you get can be thought of as the cycle type of a permutation and thus as a conjugacy class in . The image of this conjugacy class under is a conjugacy class in , which is denoted by . The Artin L-function is then defined to be

It is easy to see that the determinant is well-defined — it follows from the fact that conjugate linear maps have the same determinant.

If you expand out this product, you get a Dirichlet series, of which this is the Euler product. And Dirichlet series that have Euler products are basically L-functions. Just as the Riemann zeta function packages up lots of important information about the primes, so the Artin L-functions package up lots of important information about the fundamental problem of algebraic number theory discussed earlier.

One interesting thing that Arthur told us was that in order to do research in this area, you have to use results from many different areas. This makes it difficult to get started, so most young researchers start by scouring the textbooks for the key theorems and using them as black boxes, understanding them fully only much later.

For example, certain Riemannian manifolds are particularly important, because automorphic forms come from solutions to differential equations (based on the Laplacian) on those manifolds. Arthur didn’t tell us exactly what these “special Riemannian manifolds” were, but he did say that they corresponded to reductive algebraic groups. (An algebraic group is roughly speaking a group defined using polynomials. For example, is algebraic, because the condition of having determinant 1 is expressible as a polynomial in the entries of a matrix, and the group operation, matrix multiplication, is also a polynomial operation. What “reductive” means I don’t know.) He then said that many beginners memorize ten key theorems about reductive algebraic groups and don’t bother themselves with the proofs.

Where does Langlands come into all this? He defined some L-functions that have a formula very similar to the formula for Artin L-functions: in fact, all you have to do is replace the in that formula with a . So a lot depends on what is. Apparently it’s an automorphic representation. I’m not sure what those are.

A big conjecture is that every arithmetic L-function is an automorphic L-function. This would give us a non-Abelian class field theory. (Classical class field theory studies Abelian field extensions, and can tell you things like which numbers are cubic residues mod .)

This conjecture is a special case of Langlands’s famous principle of functoriality, which Artin described as *the* fundamental problem. (OK, I’ve already described something else as the fundamental problem, but this is somehow the *real* fundamental problem.) I can’t resist stating the problem, because it looks as though it ought to be easy. I can imagine getting hooked on it in a parallel life, because it screams out, “Think about me in the right way and I’ll drop out.” Of course, that’s a very superficial impression, and probably once one actually does think about it, one quickly loses any feeling that it should at some sufficiently deep level be easy.

The principle says this.

**Conjecture.** *Given two groups and , an automorphic representation of and an analytic homomorphism between their dual groups*

there is an automorphic representation of such that ; that is,

*as conjugacy classes in .*

To me it looks like the kind of trivial-but-not-trivially-trivial statement one proves in a basic algebra course, but obviously it is far more than that.

One quite nice thing that Arthur did was to draw an extended analogy with a situation that held in physics a century or so ago. It was observed that the absorption spectra of starlight had black lines where certain frequencies were absent, and these corresponded to the wavelengths emitted by familiar elements. This suggested that the chemistry of stars was similar to the chemistry on earth. Furthermore, because these absorption spectra were red-shifted to various extents, it also suggested that the stars were moving away from us, and ultimately suggested the Big Bang theory. However, exactly *why* these black lines appeared was a mystery, which was not solved until the formulation of quantum mechanics.

Something like this is how Arthur sees number theory today. Automorphic forms tell us about other number-theoretic worlds. Spectra come from differential equations that are quite similar to the Schrödinger equation — in particular, they are based on Laplacians — that come from the geometry of the special Riemannian manifolds I mentioned above. But exactly how the connection between the number theory and the spectral theory works is still a mystery.

To end on a rather different note, the one other thing I got out of this excellent talk was to see Gerhard “I was at ICM2014” Paseman, of Mathoverflow fame. Later I even got to meet him, and he gave me a Mathoverflow teeshirt. I became aware of him because there were some small technical problems during the talk, and GP offered advice from the audience.

]]>

That’s a fairly easy question, so let’s follow it up with another one: how surprised should we be about this? Is there unconscious bias towards mathematicians with this property? Of this year’s 21 plenary lecturers, the only one with the property was Mirzakhani, and out of the 20 plenary lecturers in 2010, the only one with the property was Avila. What is going on?

On to more serious matters. After Candès’s lecture I had a solitary lunch in the subterranean mall (Korean food of some description, but I’ve forgotten exactly what) and went to hear Martin Hairer deliver his Fields medal lecture, which I’m not going to report on because I don’t have much more to say about his work than I’ve already said.

By and large, the organization of the congress was notably good — for example, I almost never had to queue for anything, and never for any length of time — but there was a little lapse this afternoon, in that Hairer’s lecture was scheduled to finish at 3pm, exactly the time that the afternoon’s parallel sessions started. In some places that might have been OK, but not in the vast COEX Seoul conference centre. I had to get from the main hall to a room at the other end of the centre where theoretical computer science talks were taking place, which was probably about as far as walking from my house in Cambridge to the railway station. (OK, I live close to the station, but even so.)

Inevitably, therefore, I arrived late to Boaz Barak’s talk, but he welcomed me, and a few others in my position, with the reassuring words that everything he had said up to now was bullshit and we didn’t need to worry about it. (He was quoting a juggler he had seen in Washington Square.)

I always like it when little themes recur at ICMs in different contexts. I’ve already mentioned the theme of looking at big spaces of objects in order to understand typical objects. Another one I mentioned when describing Candès’s lecture: that one should not necessarily be afraid of NP-complete problems, a theme which was present in Barak’s talk as well. I’m particularly fond of it because I’ve spent a lot of time in the last few years thinking about the well-known NP-complete problem where the input is a mathematical statement and the task (in the decision version) is to say whether there is a proof of that statement of length at most — in some appropriate formal system. The fact that this problem is NP-complete does not deter mathematicians from spending their lives solving instances of it. What explains this apparent success? I dream that there might be a very nice answer to this question, rather than just a hand-wavy one that says that the instances studied by mathematicians are far from general.

Barak was talking about something a little different, however. He too has a dream, which is to obtain a very precise understanding of why certain problems are hard (in the complexity sense of not being soluble with efficient algorithms) and others easy. He is not satisfied with mere lists of easy and hard problems, with algorithms for the former and reductions to NP-complete or other “known hard” problems for the latter. He wants a theory that will say which problems are hard and which easy, or at least do that for large classes of problems. And the way he wants to do it is to find “meta-algorithms” — which roughly speaking means very general algorithmic approaches with the property that if they work then the problem is easy and if they fail then it’s hard.

Why is there the slightest reason to think that that can be done? Isn’t there a wide variety of algorithms, each of which requires a lot of ingenuity to find? If one approach fails, might there not be some clever alternative approach that nobody had thought of?

These are all perfectly reasonable objections, but the message, or at least *a* message, of Barak’s talk is that it is not completely outlandish to think that it really is the case that there is what one might call a “best possible” meta-algorithm, in the sense that if it fails, then nothing else can succeed. Again, I stress that this would be for large and interesting classes of algorithm problems (e.g. certain optimization problems) and not for every single describable Boolean function. One reason to hold out hope is that if you delve a little more deeply into the algorithms we know about, you find that actually many of them are based on just a few ideas, such as linear and semidefinite programming, solving simultaneous linear equations, and so on. Of course, that could just reflect our lack of imagination, but it could be an indication that something deeper is going on.

Another reason for optimism is that he has a candidate: the sum-of-squares algorithm. This is connected with Hilbert’s 17th problem, which asked whether every multivariate polynomial that takes only non-negative values can be written as a sum of squares of rational functions. (It turns out that they can’t necessarily be written as a sum of squares of polynomials: a counterexample is .) An interesting algorithmic problem is to write a polynomial as a sum of squares when it can be so written. One of the reasons this problem interests Barak is that many other problems can be reduced to it. Another, which I don’t properly understand but I think would understand if I watched his talk again (it is here, by the way), is that if the unique games conjecture is false, and recall that it too is sort of saying that a certain algorithm is best possible, then the sum-of-squares algorithm is waiting in the wings to take over as the new candidate that will do the job.

An unfortunate aspect of going to Barak’s talk was that I missed Harald Helfgott’s. However, the sacrifice was rewarded, and I can always watch Harald on Youtube.

After another longish walk, but with a non-zero amount of time for it, I arrived at my next talk of the afternoon, given by Bob Guralnick. This was another very nice talk, just what an ICM invited lecture should be like. (By that I mean that it should be aimed principally at non-experts, while at the same time conveying what has been going on recently in the field. In other words, it should be more of a colloquium style talk than a research seminar.)

Guralnick’s title was Applications of the Classification of Finite Simple Groups. One thing he did was talk about the theorem itself, how a proof was announced in 1983 but not actually completed for another twenty years, and how there are now — or will be soon — “second-generation” proofs that are shorter, though still long, and use new ideas. He also mentioned a few statements that can be proved with the classification theorem and are seemingly completely out of reach without it. Here are a few of them.

1. Every finite simple group is generated by two elements.

2. The probability that a random pair of elements generates a finite simple group tends to 1 as the size of the group tends to infinity.

3. For every non-identity element of a finite simple group, there exists an element such that and generate the group.

4. For every finite simple group there exist conjugacy classes and such that for every and every the elements and generate the group.

Why does the classification of finite simple groups help with these problems? Because it means that instead of having to give an abstract proof that somehow uses the condition of having no proper normal subgroups, you have the option of doing a proof that involves calculations in concrete groups. Because the list of (families of) groups you have to consider is finite, this is a feasible approach. Actually, it’s not just that there are only finitely many families, but also that the families themselves are very nice, especially the various families of Lie type. As far as I can tell from the relevant Wikipedia article, there isn’t a formal definition of “group of Lie type”, but basically it means a group that’s like a Lie group but defined over a finite field instead of over $\mathbb{R}$ or $\mathbb{C}$. So things like PSL$(2,q)$ are finite simple groups of Lie type.

Just as the geometrization theorem didn’t kill off research in 3-manifolds, the classification of finite simple groups didn’t kill off group theory, even though in the past many mathematicians have thought that it would. It’s easy to see how that perception might have arisen: the project of classifying finite simple groups became such a major focus for group theorists that once it was done, a huge chunk of what they were engaged in was no longer available.

So what’s left? One answer, one might imagine, is that not all groups are simple. That is not a completely satisfactory answer, because groups can be put together from simple groups in such a way that for many problems it is enough to solve them just for simple groups (just as in number theory one can often prove a result for primes and prove that the product of two numbers that satisfy the result also satisfies the result). But it is part of the answer. For example -groups (that is, groups of prime power order) are built out of copies of a cyclic group of prime order, but that doesn’t begin to answer all the questions people have about -groups.

Another answer, which is closer to the reason that 3-manifold theory survived Perelman, is that proving results even for specific families of groups is often far from easy. For example, have a go at proving that a random pair of (equivalence classes of) matrices generates PSL$(2,q)$ with high probability when is large: it’s a genuine theorem rather than simply a verification.

I want to mention a very nice result that I think is due to Guralnick and his co-authors, though he didn’t quite explicitly say so. Let be a polynomial of degree , with coprime to . Then for every , either is bijective on the field or the set of values it takes has size at most .

What’s so nice about that? Well, the result is interesting, but even more interesting (at least to me) is the fact that the proof involved the classification of finite simple groups, and Guralnick described it (or more accurately, a different result just before it but I think the same remark applies) as untouchable without CFSG, even though the statement is about polynomials rather than groups.

Here is the video of Guralnick’s lecture.

The third invited lecture I went to was given by Francis Brown. Although I was expecting to understand very little, I wanted to go to it out of curiosity, because I knew Francis Brown when he was an undergraduate at Trinity — I think I taught him once or twice. After leaving Trinity he went to France, where he had been ever since, until very recently taking up a (highly prestigious) professorial fellowship at All Soul’s in Oxford. It was natural for him to go to France, because his mother is French and he is bilingual — another aspect that interests me since two of my children are in the same position. I heard nothing of him for a long time, but then in the last few years he suddenly popped up again as the person who has proved some important results concerning motivic zeta functions.

The word “motivic” scares me, and I’m not going to try to say what it means, because I can’t. I first heard of motives about twenty years ago, when the message I got was that they were objects that people studied even though they didn’t know how to define them. That may be a caricature, but my best guess as to the correct story is that even though people don’t know the right definition, they do know what *properties* this definition should have. In other words, there is a highly desirable theory that would do lots of nice things, if only one could find the objects that got you started.

However, what Brown was doing appeared not to be based on layers of conjecture, so I suppose it must be that “motivic versions” of certain objects have been shown to exist.

This was a talk in which I did not take notes. To do a decent job describing it, I’d need to watch it again, but rather than do that, I’ll just describe the traces it left in my memory.

One was that he mentioned the famous problem of the irrationality of for odd , and more generally the problem of whether the vector space over the rationals generated by has dimension . (It has been shown by Ball and Rivoal to have dimension that tends to infinity with , which was a major result when it was proved.)

Another was that he defined multiple zeta values, which are zeta-like functions of more than one integer variable, which come up naturally when one takes two zeta values, multiplies them together, and expands out the result. They were defined by Euler.

He also talked about periods, a very interesting concept defined (I think for the first time) by Kontsevich and Zagier. I highly recommend looking at their paper, available here in preprint form. At least the beginning of it is accessible to non-experts, and contains a very interesting open problem. Roughly speaking, a period is anything you can define using reasonably nice integrals. For example, is a period because it is the area of the unit disc, which has a nice polynomial equation . The nice problem is to prove that an explicit number is not a period. There are only countably many periods, so such numbers exist in abundance. If you want a specific number to try, then you can begin with . Best of luck.

While discussing what motivic zeta values are, he said that there were two approaches one could use, one involving Betti numbers and the other involving de Rham cohomology. He preferred the de Rham approach. “Betti” and “de Rham” became a sort of chorus throughout the talk, and even now I have ringing in my head phrases like “-Betti or -de Rham”.

If I understood correctly, linear dependences between motivic zeta values (which are much fancier objects that still depend on tuples of integers) imply the corresponding dependences between standard zeta values. (I’m talking about both single and multiple zeta values here.) That’s not much help if you are trying to prove *independence* of standard zeta values, but it does do two things for you. One is that it provides a closely related context in which the world seems to be a tidier place. As I understand it, all the conjectures one wants to be true for standard zeta values are true for their motivic cousins. But it has also enabled Brown to discover unexpected dependences between standard zeta values: for instance every multiple zeta value is a linear combination of multiple zeta values where every argument is 2 or 3. (I suppose multiple must mean “genuinely multiple” here.) Actually, looking very briefly at the relevant part of the talk, which is round about the 27th minute, I see that this was proving something called the Hoffman conjecture, so perhaps it is wrong to call it unexpected. But it is still a very interesting result, given that the proof was highly non-trivial and went via motivic zeta values.

My remaining memory trace is that the things Brown was talking about were related to a lot of other important parts of mathematics, and even theoretical physics. I’d love to understand this kind of thing better.

So although a lot of this talk (which is here) went over my head, enough of it didn’t that my attention was engaged throughout. Given the type of material, that was far from obviously going to be the case, so this was another very good talk, to round off a pretty amazing day.

]]>

One person who doesn’t lose any sleep over doubts like this is Emmanuel Candès, who gave the second plenary lecture I went to. He began by talking a little about the motivation for the kinds of problems he was going to discuss, which one could summarize as follows: his research is worthwhile because *it helps save the lives of children*. More precisely, it used to be the case that if a child had an illness that was sufficiently serious to warrant an MRI scan, then doctors faced the following dilemma. In order for the image to be useful, the child would have to keep completely still for two minutes. The only way to achieve that was to stop the child’s breathing for those two minutes. But depriving a child’s brain (or indeed any brain, I’d imagine) of oxygen for two minutes is not without risk, to put it mildly.

Now, thanks to the famous work of Candès and others on compressed sensing, one can reconstruct the image using many fewer samples, which reduces the time the child must keep still to 15 seconds. Depriving the brain of oxygen for 15 seconds is not risky at all. Candès told us about a specific boy who had something seriously wrong with his liver (I’ve forgotten the details) who benefited from this. If you want a ready answer for when people ask you about the point of doing maths, and if you’re sick of the Hardy-said-number-theory-useless-ha-ha-but-what-about-public-key-cryptography-internet-security-blah-blah example, then I recommend watching at least some of Candès’s lecture, which is available here, and using that instead. Then you’ll really have seized the moral high ground.

Actually, I recommend watching it *anyway*, because it was a fascinating lecture from start to finish. In that case, you may like to regard this post as something like a film review with spoilers: if you mind spoilers, then you’d better stop reading here.

I have to admit that as I started this post, I realized that there was something fairly crucial that I didn’t understand, that meant I couldn’t give a satisfactory account of what I wanted to describe. I didn’t take many notes during the talk, because I just wanted to sit back and enjoy it, and it felt as though I would remember everything easily, but there was one important mathematical point that I missed. I’ll come back to it in a moment.

Anyhow, the basic mathematical problem that the MRI scan leads to is this. A full scan basically presents you with the Fourier transform of the image you want, so to reconstruct the image you simply invert the Fourier transform. But if you are sampling in only two percent of directions and you take an inverse Fourier transform (it’s easy to make sense of that, but I won’t bother here), then you get a distorted image with all sorts of strange lines all over it — Candès showed us a picture — and it is useless for diagnostic purposes.

So, in a moment that Candès described as one of the luckiest in his life, a radiologist approached him and asked if there was any way of getting the right image from the much smaller set of samples. On the face of it, the answer might seem to be no, since the dimension of the space of possible outputs has become much smaller, so there must be many distinctions between inputs that are not detectable any more. However, in practice the answer is yes, for reasons that I’ll discuss after I’ve mentioned Candès’s second example.

The second example was related to things like the Netflix challenge, which was to find a good way of predicting which films somebody would like, given the preferences of other people and at least some of the preferences of the person in question. If we make the reasonable hypothesis that people’s preferences depend by and large on a fairly small number of variables (describing properties of the people and properties of the films), then we might expect that a matrix where the th entry represents the strength of preference of person for film would have fairly small rank. Or more reasonably, one might expect it to be a small perturbation of a matrix with small rank.

And thus we arrive at the following problem: you are given a few scattered entries of a matrix, and you want to find a low-rank matrix that agrees pretty well with the entries you observe. Also, you want it the low-rank matrix to be unique (up to a small perturbation) since otherwise you can’t use it for prediction.

As Candès pointed out, simple examples show that the uniqueness condition cannot always be obtained. For example, suppose you have 99 people with very similar preferences and one person whose preferences are completely different. Then the underlying matrix that describes their preferences has rank 2 — basically, one row for describing the preferences of the 99 and one for describing the preference of the one eccentric outlier. If all you have is a few entries for the outlier’s preferences, then there is nothing you can do to guess anything else about those preferences.

However, there is a natural assumption you can make, which I’ve now forgotten, that rules out this kind of example, and if a matrix satisfies this assumption then it can be reconstructed exactly.

Writing this, I realize that Candès was actually discussing a slight idealization of the problem I’ve described, in that he didn’t have perturbations. In other words, the problem was to reconstruct a low-rank matrix exactly from a few entries. An obvious necessary condition is that the number of samples should exceed the number of degrees of freedom of the set of low-rank matrices. But there are other conditions such as the one I’ve mentioned, and also things like that every row and every column should have a few samples. But given those conditions (or perhaps the sampling is done at random — I can’t remember) it turns out to be possible to reconstruct the matrix exactly.

The MRI problem boils down to something like this. You have a set of linear equations to solve (because you want to invert a Fourier transform) but the number of unknowns is significantly larger than the number of equations (because you have a sparse set of samples of the Fourier transform you want to invert). This is an impossible problem unless you make some assumption about the solution, and the assumption Candès makes is that it should be a *sparse vector*, meaning that it has only a few non-zero entries. This reduces the number of degrees of freedom considerably, but the resulting problem is no longer pure linear algebra.

The point that I missed was what sparse vectors have to do with MRI scans, since the image you want to reconstruct doesn’t appear to be a sparse vector. But looking back at the video I see that Candès addressed this point as follows: although the *image* is not sparse, the *gradient* of the image is sparse. Roughly speaking, you get quite a lot of patches of fairly constant colour, and if you assume that that is the case, then the number of degrees of freedom in the solution goes right down and you have a chance of reconstructing the image.

Going back to the more general problem, there is another condition that is needed in order to make it soluble, which is that the matrix of equations should not have too many sparse rows, since typically a sparse row acting on a sparse vector will give you zero, which doesn’t help you to work out what the sparse vector was.

I don’t want to say too much more, but there was one point that particularly appealed to me. If you try to solve these problems in the obvious way, then you might try to find algorithms for solving the following problems.

1. Given a system of underdetermined linear equations, find the sparsest solution.

2. Given a set of entries of a matrix, find the lowest rank matrix consistent with those entries.

Unfortunately, no efficient algorithms are known for these problems, and I think in the second case it’s even NP complete. However, what Candès and his collaborators did was consider *convex relaxations* of these problems.

1. Given a system of underdetermined linear equations, find the solution with smallest norm.

2. Given a set of entries of a matrix, find the matrix with smallest nuclear norm consistent with those entries.

If you don’t know what the nuclear norm is, it’s simple to define. Whereas the rank of a matrix is the smallest number of rank-1 matrices such that is a linear combination of those matrices, the nuclear norm of is the minimum such that you can write with each a rank-1 matrix of norm 1. So it’s more like a quantitative notion of rank.

It’s a standard fact that convex relaxations of problems tend to be much easier than the problems themselves. But usually that comes at a significant cost: the solutions you get out are not solutions of the form you originally wanted, but more like convex combinations of such solutions. (For example, if you relax the graph-colouring problem, you can solve the relaxation but you get something called a fractional colouring of your graph, where the total amount of each colour at two adjacent vertices is at most 1, and that can’t easily be converted into a genuine colouring.)

However, in the cases that Candès was telling us about, it turns out that if you solve the convex relaxations, you get exactly correct solutions to the original problems. So you have the following very nice situation: a problem is NP-complete, but if you nevertheless go ahead and try to solve it using an algorithm that is doomed to fail in general, the algorithm still works in a wide range of interesting cases.

At first this seems miraculous, but Candès spent the rest of the talk explaining to us why it isn’t. It boiled down to a very geometrical picture: you have a convex body and a plane through one of its extreme points, and if the plane is tangent to the body then the algorithm will work. It is this geometrical condition that underlies the necessary conditions I mentioned earlier.

For me this lecture was one of the highlights of the ICM, and I met many other people who greatly enjoyed it too.

]]>

Eventually I just made it, by going back to a place that was semi-above ground (meaning that it was below ground but you entered it a sunken area that was not covered by a roof) that I had earlier rejected on the grounds that it didn’t have a satisfactory food option, and just had an espresso. Thus fortified, I made my way to the talk and arrived just in time, which didn’t stop me getting a seat near the front. That was to be the case at all talks — if I marched to the front, I could get a seat. I think part of the reason was that there were “Reserved” stickers on several seats, which had been there for the opening ceremony and not been removed. But maybe it was also because some people like to sit some way back so that they can zone out of the talk if they want to, maybe even getting out their laptops. (However, although wireless was in theory available throughout the conference centre, in practice it was very hard to connect.)

The first talk was by Ian Agol. I was told before the talk that I would be unlikely to understand it — the comment was about Agol rather than about me — and the result of this lowering of my expectations was that I enjoyed the talk. In fact, I might even have enjoyed it without the lowering of expectations. Having said that, I did hear one criticism afterwards that I will try to explain, since it provides a good introduction to the content of the lecture.

When I first heard of Thurston’s famous geometrization conjecture, I thought of it as the ultimate aim of the study of 3-manifolds: what more could you want than a complete classification? However, this view was not correct. Although a proof of the geometrization conjecture would be (and later was) a massive step forward, it wouldn’t by itself answer all the questions that people really wanted to answer about 3-manifolds. But some very important work by Agol and others since Perelman’s breakthrough has, in some sense that I don’t understand, finished off some big programme in the subject. The criticism I heard was that Agol didn’t really explain what this programme was. I hadn’t really noticed that as a problem during the talk — I just took it on trust that the work Agol was describing was considered very important by the experts (and I was well aware of Agol’s reputation) — but perhaps he could have done a little more scene setting.

What he actually did by way of introduction was to mention two questions from a famous 1982 paper of Thurston (Three-dimensional manifolds, Kleinian groups and hyperbolic geometry) in which he asked 24 questions. The ones Agol mentioned were questions 16-18. I’ve just had a look at the Thurston paper, and it’s well worth a browse, as it’s a relatively gentle survey written for the Bulletin of the AMS. It also has lots of nice pictures. I didn’t get a sense from my skim through it that questions 16-18 were significantly more important than the others (apart from the geometrization conjecture), but perhaps the story is that when the dust had settled after Perelman’s work, it was those questions that were still hard. Maybe someone who knows what they’re talking about can give a better explanation in a comment.

One definition I learned from the lecture is this: a 3-manifold is said to have a property P *virtually* if it has a finite-sheeted cover with property P. I presume that a finite-sheeted cover is another 3-manifold and a suitable surjection to the first one such that each point in the first has preimages for some finite (that doesn’t depend on the point).

Thurston’s question 16 asks whether every aspherical 3-manifold (I presume that just means that it isn’t a 3-sphere) is virtually Haken.

A little later in the talk, Agol told us what “Haken” meant, other than being the name of a very well-known mathematician. Here’s the definition he gave, which left me with very little intuitive understanding of the concept. A compact 3-manifold with hyperbolic interior is *Haken* if it contains an embedded -injective surface. An example, if my understanding of my rapidly scrawled notes is correct, is a knot complement, one of the standard ways of constructing interesting 3-manifolds. If you take the complement of a knot in you get a 3-manifold, and if you take a tubular neighbourhood of that knot, then its boundary will be your -injective surface. (I’m only pretending to know what -injective means here.)

Thurston, in the paper mentioned earlier, describes Haken manifolds in a different, and for me more helpful, way. Let me approach the concept in top-down fashion: that is, I’ll define it in terms of other mysterious concepts, then work backwards through Thurston’s paper until everything is defined (to my satisfaction at least).

Thurston writes, “A 3-manifold is called a Haken manifold if it is prime and it contains a 2-sided incompressible surface (whose boundary, if any, is on ) which is not a 2-sphere.”

Incidentally, one thing I picked up during Agol’s talk is that it seems to be conventional to refer to a 3-manifold as the first time you mention it and as thereafter.

Now we need to know what “prime” and “incompressible” mean. The following paragraph of Thurston defines “prime” very nicely.

The decomposition referred to really has two stages. The first stage is the prime decomposition, obtained by repeatedly cutting a 3-manifold along 2-spheres embedded in so that they separate the manifold into two parts neither of which is a 3-ball, and then gluing 3-balls to the resulting boundary components, thus obtaining closed 3-manifolds which are “simpler”. Kneser proved that this process terminates after a finite number of steps. The resulting pieces, called the prime summands of , are uniquely determined by up to homeomorphism.

Hmm, perhaps the rule is more general: you refer to it as to start with and after that it’s sort of up to you whether you want to call it or .

The equivalent process in two dimensions could be used to simplify a two-holed torus. You first identify a circle that cuts it into two pieces and doesn’t bound a disc: basically what you get if you chop the surface into two with one hole on each side. Then you have two surfaces with circles as boundaries. You fill in those circles with discs and then you have two tori. At this point you can’t chop the surface in two in a non-trivial way, so a torus is prime. Unless my intuition is all wrong, that’s more or less telling us that the prime decomposition of an arbitrary orientable surface (without boundary) is into tori, one for each hole, except that the sphere would be prime.

What about “incompressible”? Thurston offers us this.

A surface embedded in a 3-manifold is two-sided if cuts a regular neighborhood of into two pieces, i.e., the normal bundle to is oriented. Since we are assuming that is oriented, this is equivalent to the condition that is oriented. A two-sided surface is incompressible if every simple curve on which bounds a disk in with interior disjoint from also bounds a disk on .

I think we can forget the first part there: just assume that everything in sight is oriented. Let’s try to think what it would mean for an embedded surface not to be incompressible. Consider for example a copy of the torus embedded in the 3-sphere. Then a loop that goes round the torus bounds a disc in the 3-sphere with no problem, but it doesn’t bound a disc in the torus. So that torus fails to be incompressible. But suppose we embedded the torus into a 3-dimensional torus in a natural way, by taking the 3D torus to be the quotient of by and the 2D torus to be the set of all points with -coordinate an (equivalence class of an) integer. Then the loops that don’t bound discs in the 2-torus don’t bound discs in the 3-torus either, so that surface is — again if what seems likely to be true actually is true — incompressible. It seems that an incompressible surface sort of spans the 3-manifold in an essential way rather than sitting inside a boring part of the 3-manifold and pretending that it isn’t boring.

OK, that’s what Haken manifolds *are*, but for the non-expert that’s not enough. We want to know why we should care about them. Thurston gives us an answer to this too. Here is a very useful paragraph about them.

It is hard to say how general the class of Haken manifolds is. There are many closed manifolds which are Haken and many which are not. Haken manifolds can be analyzed by inductive processes, because as Haken proved, a Haken manifold can be cut successively along incompressible surfaces until one is left with a collection of 3-balls. The condition that a 3-manifold has an incompressible surface is useful in proving that it has a hyperbolic structure (when it does), but intuitively it really seems to have little to do with the question of existence of a hyperbolic structure.

To put it more vaguely, Haken manifolds are good because they can be chopped into pieces in a way that makes them easy to understand. So I’d guess that the importance of showing that every aspherical 3-manifold is virtually Haken is that finite-sheeted coverings are sufficiently nice that even knowing that a manifold is *virtually* Haken means that in some sense you understand it.

One very nice thing Agol did was give us some basic examples of 3-manifolds, by which I mean not things like the 3-sphere, but examples of the kind that one wouldn’t immediately think of and that improve one’s intuition about what a typical 3-manifold looks like.

The first one was a (solid) dodecahedron with opposite faces identified — with a twist. I meant the word “twist” literally, but I suppose you could say that the twist is that there is a twist, meaning that given two opposite faces, you don’t identify each vertex with the one opposite it, but rather you first rotate one of the faces through and *then* identify opposite vertices. (Obviously you’ll have to do that in a consistent way somehow.)

There are some questions here that I can’t answer in my head. For example, if you take a vertex of the dodecahedron, then it belongs to three faces. Each of these faces is identified in a twisty way with the opposite face, so if we want to understand what’s going on near the vertex, then we should glue three more dodecahedra to our original one at those faces, keeping track of the various identifications. Now do the identifications mean that those dodecahedra all join up nicely so that the point is at the intersection of four copies of the dodecahedron? Or do we have to do some *more* gluing before everything starts to join together? One thing we *don’t* have to worry about is that there isn’t room for all those dodecahedra, which in a certain sense would be the case if the solid angle at a vertex is greater than 1. (I’m defining, I hope standardly, the solid angle of a cone to be the size of the intersection of that cone with a unit sphere centred at the apex, or whatever one calls it. Since a unit sphere has surface area , the largest possible solid angle is .)

Anyhow, as I said, this doesn’t matter. Indeed, far from mattering, it is to be positively welcomed, since if the solid angles of the dodecahedra that meet at a point add up to more than , then it indicates that the geometry of the resulting manifold will be hyperbolic, which is exactly what we want. I presume that another way of defining the example is to start with a tiling of hyperbolic 3-space by regular dodecahedra and then identify neighbouring dodecahedra using little twists. I’m guessing here, but opposite faces of a dodecahedron are parallel, while not being translates of one another. So maybe as you come out of a face, you give it the smallest (anticlockwise, say) twist you can to make it a translate of the opposite face, which will be a rotation by an angle of , and then re-enter the opposite face by the corresponding translated point. But it’s not clear to me that that is a consistent definition. (I haven’t said which dodecahedral tiling I’m even taking. Perhaps the one where all the pentagons have right angles at their vertices.)

The other example was actually a pair of examples. One was a figure-of-eight-knot complement, and the other was the complement of the Whitehead link. Agol showed us drawings of the knot and link: I’ll leave you to Google for them if you are interested.

How does a knot complement give you a 3-manifold? I’m not entirely sure. One thing that’s clear is that it gives you a 3-manifold with boundary, since you can take a tubular neighbourhood of the knot/link and take the complement of that, which will be a 3D region whose boundary is homeomorphic to a torus but sits in in a knotted way. I also know (from Thurston, but I’ve seen it before) that you can produce lots of 3-manifolds by defining some non-trivial homeomorphism from a torus to itself, removing a tubular neighbourhood of a knot from and gluing it back in again, but only after applying the homeomorphism to the boundary. That is, given your solid knot and your solid-knot-shaped hole, you identify the boundary of the knot with the boundary of the hole, but not in the obvious way. This process is called Dehn surgery, and in fact can be used to create all 3-manifolds.

But I still find myself unable to explain how a knot complement is *itself* a 3-manifold, unless it is a 3-manifold with boundary, or one compactifies it somehow, or something. So I had the illusion of understanding during the talk but am found out now.

The twisted-dodecahedron example was discovered by Seifert and Weber, and is interesting because it is a non-Haken manifold (a discovery of Burton, Rubinstein and Tillmann) that is virtually Haken.

Going back to the question of why the geometrization conjecture didn’t just finish off the subject, my guess is that it is probably possible to construct lots of complicated 3-manifolds that obviously satisfy the geometrization conjecture because they are already hyperbolic, but that are not by virtue of that fact alone easy to understand. What Agol appeared to say is that the role of the geometrization conjecture is essentially to reduce the whole problem of understanding 3-manifolds to that of understanding hyperbolic 3-manifolds. He also said something that is more or less a compulsory remark in a general lecture on 3-manifolds, namely that although they are topological objects, they are studied by geometrical means. (The corresponding compulsory remark for 4-manifolds is that 4D is the odd dimension out, where lots of weird things happen.)

As I’ve said, Agol discussed two other problems. I think the virtual Haken conjecture was the big one (after all, that was the title of his lecture), but the other two were, as he put it, stronger statements that were easier to think about. Question 17 asks whether every aspherical 3-manifold virtually has positive first Betti number, and question 18 asks whether it virtually fibres over the circle. I’ll pass straight to the second of these questions.

A 3-manifold *fibres over the circle* if there is a (suitably nice) map such that the preimage of every point in is a surface (the fibre at that point).

Let me state Agol’s main results without saying what they mean. In 2008 he proved that if is virtually special cubulated, then it is virtually fibred. In 2012 he proved that cubulations with hyperbolic fundamental group are virtually special, answering a 2011 conjecture of Wise. A corollary is that every closed hyperbolic 3-manifold virtually fibres over the circle, which answers questions 16-18.

There appears to be a missing step there, namely to show that every closed hyperbolic 3-manifold has a cubulation with hyperbolic fundamental group. That I think must have been the main message of what he said in a fairly long discussion about cubulations that preceded the statements of these big results, and about which I did not take detailed notes.

What I remember about the discussion was a number of pictures of cube complexes made up of cubes of different dimensions. An important aspect of these complexes was a kind of avoidance of positive curvature, which worked something like this. (I’ll discuss a low-dimensional situation, but it generalizes.) Suppose you have three squares that meet at a vertex just as they do if they are faces of a cube. Then at that vertex you’ve got some positive curvature, which is what you want to avoid. So to avoid it, you’re obliged to fill in the entire cube, and now the positive curvature is rendered harmless because it’s just the surface of some bit of 3D stuff. (This feels a bit like the way we don’t pay attention to embedded surfaces unless they are incompressible.)

I haven’t given the definition because I don’t remember it. The term CAT(0) came up a lot. At the time I felt I was following what was going on reasonably well, helped by the fact that I had seen an excellent talk by my former colleague Vlad Markovic on similar topics. (Markovic was mentioned in Agol’s talk, and himself was an invited speaker at the ICM.) The main message I remember now is that there is some kind of dictionary between cube complexes and 3-manifolds, so you try to find “cubulations” with particular properties that will enable you to prove that your 3-manifolds have corresponding properties. Note that although the manifolds are three-dimensional, the cubes in the corresponding cube complexes are not limited to three dimensions.

That’s about all I can remember, even with the help of notes. In case I have given the wrong impression, let me make clear that I very much enjoyed this lecture and thought it got the “working” part of the congress off to a great start. And it’s clear that the results of Agol and others are a big achievement. If you want to watch the lecture for yourself, it can be found here.

**Update.** I have found a series of three nice-looking blog posts by Danny Calegari about the virtual Haken conjecture and Agol’s proof. Here are the links: part 1, part 2 and part 3.

]]>