- Is it true that if two random elements and of are chosen, then beats with very high probability if it has a sum that is significantly larger? (Here “significantly larger” should mean larger by for some function — note that the standard deviation of the sum has order , so the idea is that this condition should be satisfied one way or the other with probability ).
- Is it true that the stronger conjecture, which is equivalent (given what we now know) to the statement that for almost all pairs of random dice, the event that beats a random die has almost no correlation with the event that beats , is false?
- Can the proof of the result obtained so far be modified to show a similar result for the multisets model?

The status of these three questions, as I see it, is that the first is basically solved — I shall try to justify this claim later in the post, for the second there is a promising approach that will I think lead to a solution — again I shall try to back up this assertion, and while the third feels as though it shouldn’t be impossibly difficult, we have so far made very little progress on it, apart from experimental evidence that suggests that all the results should be similar to those for the balanced sequences model. [Added after finishing the post: I may possibly have made significant progress on the third question as a result of writing this post, but I haven’t checked carefully.]

Let and be elements of chosen uniformly and independently at random. I shall now show that the average of

is zero, and that the probability that this quantity differs from its average by substantially more than is very small. Since typically the modulus of has order , it follows that whether or not beats is almost always determined by which has the bigger sum.

As in the proof of the main theorem, it is convenient to define the functions

and

.

Then

,

from which it follows that beats if and only if . Note also that

.

If we choose purely at random from , then the expectation of is , and Chernoff’s bounds imply that the probability that there exists with is, for suitable at most . Let us now fix some for which there is no such , but keep as a purely random element of .

Then is a sum of independent random variables, each with maximum at most . The expectation of this sum is .

But

,

so the expectation of is .

By standard probabilistic estimates for sums of independent random variables, with probability at least the difference between and its expectation is at most . Writing this out, we have

,

which works out as

.

Therefore, if , it follows that with high probability , which implies that beats , and if , then with high probability beats . But one or other of these two cases almost always happens, since the standard deviations of and are of order . So almost always the die that wins is the one with the bigger sum, as claimed. And since “has a bigger sum than” is a transitive relation, we get transitivity almost all the time.

As I mentioned, the experimental evidence seems to suggest that the strong conjecture is false. But there is also the outline of an argument that points in the same direction. I’m going to be very sketchy about it, and I don’t expect all the details to be straightforward. (In particular, it looks to me as though the argument will be harder than the argument in the previous section.)

The basic idea comes from a comment of Thomas Budzinski. It is to base a proof on the following structure.

- With probability bounded away from zero, two random dice and are “close”.
- If and are two fixed dice that are close to each other and is random, then the events “ beats ” and “ beats ” are positively correlated.

Here is how I would imagine going about defining “close”. First of all, note that the function is somewhat like a random walk that is constrained to start and end at zero. There are results that show that random walks have a positive probability of never deviating very far from the origin — at most half a standard deviation, say — so something like the following idea for proving the first step (remaining agnostic for the time being about the precise definition of “close”). We choose some fixed positive integer and let be integers evenly spread through the interval . Then we argue — and this should be very straightforward — that with probability bounded away from zero, the values of and are close to each other, where here I mean that the difference is at most some small (but fixed) fraction of a standard deviation.

If that holds, it should also be the case, since the intervals between and are short, that and are uniformly close with positive probability.

I’m not quite sure whether proving the second part would require the local central limit theorem in the paper or whether it would be an easier argument that could just use the fact that since and are close, the sums and are almost certainly close too. Thomas Budzinski sketches an argument of the first kind, and my guess is that that is indeed needed. But either way, I think it ought to be possible to prove something like this.

We haven’t thought about this too hard, but there is a very general approach that looks to me promising. However, it depends on something happening that should be either quite easy to establish or not true, and at the moment I haven’t worked out which, and as far as I know neither has anyone else.

The difficulty is that while we still know in the multisets model that beats if and only if (since this depends just on the dice and not on the model that is used to generate them randomly), it is less easy to get traction on the sum because it isn’t obvious how to express it as a sum of independent random variables.

Of course, we had that difficulty with the balanced-sequences model too, but there we got round the problem by considering purely random sequences and conditioning on their sum, having established that certain events held with sufficiently high probability for the conditioning not to stop them holding with high probability.

But with the multisets model, there isn’t an obvious way to obtain the distribution over random dice by choosing independently (according to some distribution) and conditioning on some suitable event. (A quick thought here is that it would be enough if we could *approximate* the distribution of in such a way, provided the approximation was good enough. The obvious distribution to take on each is the marginal distribution of that in the multisets model, and the obvious conditioning would then be on the sum, but it is far from clear to me whether that works.)

A somewhat different approach that I have not got far with myself is to use the standard one-to-one correspondence between increasing sequences of length taken from and subsets of of size . (Given such a sequence one takes the subset , and given a subset , where the are written in increasing order, one takes the multiset of all values , with multiplicity.) Somehow a subset of of size feels closer to a bunch of independent random variables. For example, we could model it by choosing each element with probability and conditioning on the number of elements being exactly , which will happen with non-tiny probability.

Actually, now that I’m writing this, I’m coming to think that I may have accidentally got closer to a solution. The reason is that earlier I was using a holes-and-pegs approach to defining the bijection between multisets and subsets, whereas with this approach, which I had wrongly assumed was essentially the same, there is a nice correspondence between the elements of the multiset and the elements of the set. So I suddenly feel more optimistic that the approach for balanced sequences can be adapted to the multisets model.

I’ll end this post on that optimistic note: no doubt it won’t be long before I run up against some harsh reality.

]]>

What can be done about this? There are many actions, none of which are likely to be sufficient to bring about major change on their own, but which in combination will help to get us to a tipping point. In no particular order, here are some of them.

- Create new journals that operate much more cheaply and wait for them to become established.
- Persuade libraries not to agree to Big Deals with the big publishers.
- Refuse to publish with, write for, or edit for, the big publishers.
- Make sure all your work is freely available online.
- Encourage journals that are supporting the big publishers to leave those publishers and set up in a cheaper and fairer way.

Not all of these are easy things to do, but I’m delighted to report that a small group I belong to, set up by Mark Wilson, has, after approaching a large number of maths journals, found one that was ready to “flip”: the Journal of Algebraic Combinatorics has just announced that it will be leaving Springer. Or if you want to be more pedantic about it, a new journal will be starting, called Algebraic Combinatorics and published by The Mersenne Centre for Open Scientific Publishing, and almost all the editors of the Journal of Algebraic Combinatorics will resign from that journal and become editors of the new one, which will adhere to Fair Open Access Principles.

If you want to see change, then you should from now on regard Algebraic Combinatorics as the true continuation of the Journal of Algebraic Combinatorics, and the Journal of Algebraic Combinatorics as a zombie journal that happens to have a name that coincides with a former real journal. And of course, that means that if you are an algebraic combinatorialist with a paper that would have been suitable for the Journal of Algebraic Combinatorics, you should understand that *the reputation of the Journal of Algebraic Combinatorics is being transferred, along with the editorial board, to Algebraic Combinatorics, and you should therefore submit it to Algebraic Combinatorics*. This has worked with previous flips: the zombie journal rarely thrives afterwards and in some notable cases has ceased to publish after a couple of years or so.

The words of one of the editors of the Journal of Algebraic Combinatorics, Hugh Thomas, are particularly telling, especially the first sentence: “There wasnâ€™t a particular crisis. It has been becoming more and more clear that commercial journal publishers are charging high subscription fees and high Article Processing Charges (APCs), profiting from the volunteer labour of the academic community, and adding little value. It is getting easier and easier to automate the things that they once took care of. The actual printing and distribution of paper copies is also much less important than it has been in the past; this is something which we have decided we can do without.”

I mentioned earlier that we approached many journals. Although it is very exciting that one journal is flipping, I must also admit to disappointment at how low our strike rate has been so far. However, the words “so far” are important: many members of editorial boards were very sympathetic with our aims, and some journals were adopting a wait-and-see attitude, so if the flip of JACo is successful, we hope that it will encourage other journals. I should say that we weren’t just saying, “Why don’t you flip?” but we were also offering support, including financial support. The current situation is that we can almost certainly finance journals that are ready to flip to an “ultra-cheap” model (using a platform that charges either nothing or a very small fee per submission) and help with administrative support, and are working on financial support for more expensive models, but still far cheaper than the commercial publishers, where more elaborate services are offered.

Understandably, the main editors tended to be a lot more cautious on average than the bulk of the editorial boards. I think many of them were worried that they might accidentally destroy their journals if they flipped them, and in the case of journals with long traditions, this is not something one would want to be remembered for. So again, the more we can support Algebraic Combinatorics, the more likely it is that this caution will be reduced and other journals will consider following. (If you are an editor of a journal we have not approached, please do get in touch to discuss what the possibilities are — we have put a lot of thought into it.)

Another argument put forward by some editors is that to flip a journal risks damaging the reputation of the old version of the journal, and therefore, indirectly, the reputation of the papers published in it, some of which are by early-career researchers. So they did not want to flip in order to avoid damaging the careers of young mathematicians. If you are a young mathematician and would like to comment on whether you would be bothered by a journal flipping after you had published in it, we would be very interested to hear what you have to say.

Against that background I’d like to congratulate the editors of the Journal of Algebraic Combinatorics for their courage and for the work they have put into this. (But that word “work” should not put off other editors: one of the aims of our small group was to provide support and expertise, including from Johann Rooryck, the editor of the Elsevier journal Lingua, which flipped to become Glossa, in order to make the transition as easy as possible.) I’d also like to make clear, to avoid any misunderstanding that might arise, that although I’ve been involved in a lot of discussion with Mark Wilson’s group and wrote to many editors of other journals, my role in this particular flip has been a minor one.

And finally, let me repeat the main message of this post: please support the newly flipped journal, since the more successful it is, the greater the chance that other journals will follow, and the greater the chance that we will be able to move to a more sensible academic publishing system.

]]>

**Theorem.** *Let and be random -sided dice. Then the probability that beats given that beats and beats is .*

In this post I want to give a fairly detailed sketch of the proof, which will I hope make it clearer what is going on in the write-up.

The first step is to show that the theorem is equivalent to the following statement.

**Theorem.** *Let be a random -sided die. Then with probability , the proportion of -sided dice that beats is .*

We had two proofs of this statement in earlier posts and comments on this blog. In the write-up I have used a very nice short proof supplied by Luke Pebody. There is no need to repeat it here, since there isn’t much to say that will make it any easier to understand than it already is. I will, however, mention once again an example that illustrates quite well what this statement does and doesn’t say. The example is of a tournament (that is, complete graph where every edge is given a direction) where every vertex beats half the other vertices (meaning that half the edges at the vertex go in and half go out) but the tournament does not look at all random. One just takes an odd integer and puts arrows out from to mod for every , and arrows into for every . It is not hard to check that the probability that there is an arrow from to given that there are arrows from to and to is approximately 1/2, and this turns out to be a general phenomenon.

So how do we prove that almost all -sided dice beat approximately half the other -sided dice?

The first step is to recast the problem as one about sums of independent random variables. Let stand for as usual. Given a sequence we define a function by setting to be the number of such that plus half the number of such that . We also define to be . It is not hard to verify that beats if , ties with if , and loses to if .

So our question now becomes the following. Suppose we choose a random sequence with the property that . What is the probability that ? (Of course, the answer depends on , and most of the work of the proof comes in showing that a “typical” has properties that ensure that the probability is about 1/2.)

It is convenient to rephrase the problem slightly, replacing by . We can then ask it as follows. Suppose we choose a sequence of elements of the set , where the terms of the sequence are independent and uniformly distributed. For each let . What is the probability that given that ?

This is a question about the distribution of , where the are i.i.d. random variables taking values in (at least if is odd — a small modification is needed if is even). Everything we know about probability would lead us to expect that this distribution is approximately Gaussian, and since it has mean , it ought to be the case that if we sum up the probabilities that over positive , we should get roughly the same as if we sum them up over negative . Also, it is highly plausible that the probability of getting will be a lot smaller than either of these two sums.

So there we have a heuristic argument for why the second theorem, and hence the first, ought to be true.

There are several theorems in the literature that initially seemed as though they should be helpful. And indeed they *were* helpful, but we were unable to apply them directly, and had instead to develop our own modifications of their proofs.

The obvious theorem to mention is the central limit theorem. But this is not strong enough for two reasons. The first is that it tells you about the probability that a sum of random variables will lie in some rectangular region of of size comparable to the standard deviation. It will not tell you the probability of belonging to some subset of the y-axis (even for discrete random variables). Another problem is that the central limit on its own does not give information about the rate of convergence to a Gaussian, whereas here we require one.

The second problem is dealt with for many applications by the Berry-Esseen theorem, but not the first.

The first problem is dealt with for many applications by *local* central limit theorems, about which Terence Tao has blogged in the past. These tell you not just about the probability of landing in a region, but about the probability of actually equalling some given value, with estimates that are precise enough to give, in many situations, the kind of information that we seek here.

What we did not find, however, was precisely the theorem we were looking for: a statement that would be local and 2-dimensional and would give information about the rate of convergence that was sufficiently strong that we would be able to obtain good enough convergence after only steps. (I use the word “step” here because we can think of a sum of independent copies of a 2D random variable as an -step random walk.) It was not even clear in advance what such a theorem should say, since we did not know what properties we would be able to prove about the random variables when was “typical”. That is, we knew that not every worked, so the structure of the proof (probably) had to be as follows.

1. Prove that has certain properties with probability .

2. Using these properties, deduce that the sum converges very well after steps to a Gaussian.

3. Conclude that the heuristic argument is indeed correct.

The key properties that needed to have were the following two. First, there needed to be a bound on the higher moments of . This we achieved in a slightly wasteful way — but the cost was a log factor that we could afford — by arguing that with high probability no value of has magnitude greater than . To prove this the steps were as follows.

- Let be a random element of . Then the probability that there exists with is at most (for some such as 10).
- The probability that is at least for some absolute constant .
- It follows that if is a random -sided die, then with probability we have for every .

The proofs of the first two statements are standard probabilistic estimates about sums of independent random variables.

The second property that needed to have is more difficult to obtain. There is a standard Fourier-analytic approach to proving central limit theorems, and in order to get good convergence it turns out that what one wants is for a certain Fourier transform to be sufficiently well bounded away from 1. More precisely, we define the *characteristic function* of the random variable to be

where is shorthand for , , and and range over .

I’ll come later to why it is good for not to be too close to 1. But for now I want to concentrate on how one proves a statement like this, since that is perhaps the least standard part of the argument.

To get an idea, let us first think what it would take for to be very close to 1. This condition basically tells us that is highly concentrated mod 1: indeed, if is highly concentrated, then takes approximately the same value almost all the time, so the average is roughly equal to that value, which has modulus 1; conversely, if is not highly concentrated mod 1, then there is plenty of cancellation between the different values of and the result is that the average has modulus appreciably smaller than 1.

So the task is to prove that the values of are reasonably well spread about mod 1. Note that this is saying that the values of are reasonably spread about.

The way we prove this is roughly as follows. Let , let be of order of magnitude , and consider the values of at the four points and . Then a typical order of magnitude of is around , and one can prove without too much trouble (here the Berry-Esseen theorem was helpful to keep the proof short) that the probability that

is at least , for some positive absolute constant . It follows by Markov’s inequality that with positive probability one has the above inequality for many values of .

That’s not quite good enough, since we want a probability that’s very close to 1. This we obtain by chopping up into intervals of length and applying the above argument in each interval. (While writing this I’m coming to think that I could just as easily have gone for progressions of length 3, not that it matters much.) Then in each interval there is a reasonable probability of getting the above inequality to hold many times, from which one can prove that with very high probability it holds many times.

But since is of order , is of order 1, which gives that the values are far from constant whenever the above inequality holds. So by averaging we end up with a good upper bound for .

The alert reader will have noticed that if , then the above argument doesn’t work, because we can’t choose to be bigger than . In that case, however, we just do the best we can: we choose to be of order , the logarithmic factor being there because we need to operate in many different intervals in order to get the probability to be high. We will get many quadruples where

and this translates into a lower bound for of order , basically because has order for small . This is a good bound for us as long as we can use it to prove that is bounded above by a large negative power of . For that we need to be at least (since is about ), so we are in good shape provided that .

The alert reader will also have noticed that the probabilities for different intervals are not independent: for example, if some is equal to , then beyond that depends linearly on . However, except when is very large, this is extremely unlikely, and it is basically the only thing that can go wrong. To make this rigorous we formulated a concentration inequality that states, roughly speaking, that if you have a bunch of events, and almost always (that is, always, unless some very unlikely event occurs) the probability that the th event holds given that all the previous events hold is at least , then the probability that fewer than of the events hold is exponentially small in . The proof of the concentration inequality is a standard exponential-moment argument, with a small extra step to show that the low-probability events don’t mess things up too much.

Incidentally, the idea of splitting up the interval in this way came from an answer by Serguei Popov to a Mathoverflow question I asked, when I got slightly stuck trying to prove a lower bound for the second moment of . I eventually didn’t use that bound, but the interval-splitting idea helped for the bound for the Fourier coefficient as well.

So in this way we prove that is very small if . A simpler argument of a similar flavour shows that is also very small if is smaller than this and .

Now let us return to the question of why we might like to be small. It follows from the inversion and convolution formulae in Fourier analysis. The convolution formula tells us that the characteristic function of the sum of the (which are independent and each have characteristic function ) is . And then the inversion formula tells us that

What we have proved can be used to show that the contribution to the integral on the right-hand side from those pairs that lie outside a small rectangle (of width in the direction and in the direction, up to log factors) is negligible.

All the above is true provided the random -sided die satisfies two properties (the bound on and the bound on ), which it does with probability .

We now take a die with these properties and turn our attention to what happens inside this box. First, it is a standard fact about characteristic functions that their derivatives tell us about moments. Indeed,

,

and when this is . It therefore follows from the two-dimensional version of Taylor’s theorem that

plus a remainder term that can be bounded above by a constant times .

Writing for we have that is a positive semidefinite quadratic form in and . (In fact, it turns out to be positive definite.) Provided is small enough, replacing it by zero does not have much effect on , and provided is small enough, is well approximated by .

It turns out, crucially, that the approximations just described are valid in a box that is much bigger than the box inside which has a chance of not being small. That implies that the Gaussian decays quickly (and is why we know that is positive definite).

There is a bit of back-of-envelope calculation needed to check this, but the upshot is that the probability that is very well approximated, at least when and aren’t too big, by a formula of the form

.

But this is the formula for the Fourier transform of a Gaussian (at least if we let and range over , which makes very little difference to the integral because the Gaussian decays so quickly), so it is the restriction to of a Gaussian, just as we wanted.

When we sum over infinitely many values of and , uniform estimates are not good enough, but we can deal with that very directly by using simple measure concentration estimates to prove that the probability that is very small outside a not too large box.

That completes the sketch of the main ideas that go into showing that the heuristic argument is indeed correct.

Any comments about the current draft would be very welcome, and if anyone feels like working on it directly rather than through me, that is certainly a possibility — just let me know. I will try to post soon on the following questions, since it would be very nice to be able to add answers to them.

1. Is the more general quasirandomness conjecture false, as the experimental evidence suggests? (It is equivalent to the statement that if and are two random -sided dice, then with probability , the four possibilities for whether another die beats and whether it beats each have probability .)

2. What happens in the multiset model? Can the above method of proof be adapted to this case?

3. The experimental evidence suggests that transitivity almost always occurs if we pick purely random sequences from . Can we prove this rigorously? (I think I basically have a proof of this, by showing that whether or not beats almost always depends on whether has a bigger sum than . I’ll try to find time reasonably soon to add this to the draft.)

Of course, other suggestions for follow-up questions will be very welcome, as will ideas about the first two questions above.

]]>

There is a recent paper that does this in the one-dimensional case, though it used an elementary argument, whereas I would prefer to use Fourier analysis. Here I’d like to begin the process of proving a two-dimensional result that is designed with our particular application in mind. If we are successful in doing that, then it would be natural to try to extract from the proof a more general statement, but that is not a priority just yet.

As people often do, I’ll begin with a heuristic argument, and then I’ll discuss how we might try to sharpen it up to the point where it gives us good bounds for the probabilities of individual points of . Much of this post is cut and pasted from comments on the previous post, since it should be more convenient to have it in one place.

The rough idea of the characteristic-functions approach, which I’ll specialize to the 2-dimensional case, is as follows. (Apologies to anyone who knows about this properly for anything idiotic I might accidentally write.) Let be a random variable on and write for . If we take independent copies of and add them together, then the probability of being at is

where that denotes the -fold convolution.

Now let’s define the Fourier transform of , which probabilists call the characteristic function, in the usual way by

.

Here and belong to , but I’ll sometimes think of them as belonging to too.

We have the convolution law that and the inversion formula

Putting these together, we find that if random variables are independent copies of , then the probability that their sum is is

.

The very rough reason that we should now expect a Gaussian formula is that we consider a Taylor expansion of . We can assume for our application that and have mean zero. From that one can argue that the coefficients of the linear terms in the Taylor expansion are zero. (I’ll give more details in a subsequent comment.) The constant term is 1, and the quadratic terms give us the covariance matrix of and . If we assume that we can approximate by an expression of the form for some suitable quadratic form in and , then the th power should be close to , and then, since Fourier transforms (and inverse Fourier transforms) take Gaussians to Gaussians, when we invert this one, we should get a Gaussian-type formula for . So far I’m glossing over the point that Gaussians are defined on , whereas and live in and and live in , but if most of is supported in a small region around 0, then this turns out not to be too much of a problem.

If we take the formula

and partially differentiate times with respect to and times with respect to we obtain the expression

.

Setting turns this into . Also, for every and the absolute value of the partial derivative is at most . This allows us to get a very good handle on the Taylor expansion of when and are close to the origin.

Recall that the two-dimensional Taylor expansion of about is given by the formula

where is the partial derivative operator with respect to the first coordinate, the mixed partial derivative, and so on.

In our case, , , and .

As in the one-dimensional case, the error term has an integral representation, namely

,

which has absolute value at most , which in turn is at most

.

When is the random variable (where is a fixed die and is chosen randomly from ), we have that .

With very slightly more effort we can get bounds for the moments of as well. For any particular and a purely random sequence , the probability that is bounded above by for an absolute constant . (Something like 1/8 will do.) So the probability that there exists such a conditional on (which happens with probability about ) is at most , and in particular is small when . I think that with a bit more effort we could probably prove that is at most , which would allow us to improve the bound for the error term, but I think we can afford the logarithmic factor here, so I won’t worry about this. So we get an error of .

For this error to count as small, we want it to be small compared with the second moments. For the time being I’m just going to assume that the rough size of the second-moment contribution is around . So for our error to be small, we want to be and to be .

That is giving us a rough idea of the domain in which we can say confidently that the terms up to the quadratic ones give a good approximation to , and hence that is well approximated by a Gaussian.

Outside the domain, we have to do something different, and that something is fairly simple: we shall show that is very small. This is equivalent to showing that is bounded away from 1 by significantly more than . This we do by looking more directly at the formula for the Fourier transform:

.

We would like this to have absolute value bounded away from 1 by significantly more than except when is quite a bit smaller than and is quite a bit smaller than .

Now in our case is uniformly distributed on the points . So we can write as

.

Here’s a possible way that we might try to bound that sum. Let and let us split up the sum into pairs of terms with and , for . So each pair of terms will take the form

The ratio of these two terms is

.

And if the ratio is , then the modulus of the sum of the two terms is at most .

Now let us suppose that as varies, the differences are mostly reasonably well distributed in an interval between and , as seems very likely to be the case. Then the ratios above vary in a range from about to . But that should imply that the entire sum, when divided by , has modulus at most . (This analysis obviously isn’t correct when is bigger than , since the modulus can’t be negative, but once we’re in that regime, then it really is easy to establish the bounds we want.)

If is, say , then this gives us , and raising that to the power gives us , which is tiny.

As a quick sanity check, note that for not to be tiny we need to be not much more than . This reflects the fact that a random walk of steps of typical size about will tend to be at a distance comparable to from the origin, and when you take the Fourier transform, you take the reciprocals of the distance scales.

If is quite a bit smaller than and is not too much smaller than , then the numbers are all small but the numbers vary quite a bit, so a similar argument can be used to show that in this case too the Fourier transform is not close enough to 1 for its th power to be large. I won’t give details here.

If the calculations above are not too wide of the mark, then the main thing that needs to be done is to show that for a typical die the numbers are reasonably uniform in a range of width around , and more importantly that the numbers are not too constant: basically I’d like them to be pretty uniform too.

It’s possible that we might want to try a slightly different approach, which is to take the uniform distribution on the set of points , convolve it once with itself, and argue that the resulting probability distribution is reasonably uniform in a rectangle of width around and height around . By that I mean that a significant proportion of the points are hit around times each (because there are sums and they lie in a rectangle of area ). But one way or another, I feel pretty confident that we will be able to bound this Fourier transform and get the local central limit theorem we need.

]]>

An –*sided die* in the sequence model is a sequence of elements of such that , or equivalently such that the average value of is , which is of course the average value of a random element of . A *random* -sided die in this model is simply an -sided die chosen uniformly at random from the set of all such dice.

Given -sided dice and , we say that *beats* if

If the two sets above have equal size, then we say that *ties with* .

When looking at this problem, it is natural to think about the following directed graph: the vertex set is the set of all -sided dice and we put an arrow from to if beats .

We believe (and even believe we can prove) that ties are rare. Assuming that to be the case, then the conjecture above is equivalent to the statement that if are three vertices chosen independently at random in this graph, then the probability that is a directed cycle is what you expect for a random tournament, namely 1/8.

One can also make a more general conjecture, namely that the entire (almost) tournament is quasirandom in a sense defined by Chung and Graham, which turns out to be equivalent to the statement that for almost all pairs of dice, the four possible pairs of truth values for the pair of statements

beats beats

each occur with probability approximately 1/4. If this is true, then given random dice , all the possibilities for which beat which have probability approximately . This would imply, for example, that if are independent random -sided dice, then the probability that beats given that beats for all other pairs with is still .

Several of us have done computer experiments to test these conjectures, and it looks as though the first one is true and the second one false. A further reason to be suspicious of the stronger conjecture is that a natural approach to prove it appears to be morally equivalent to a relationship between the correlations of certain random variables that doesn’t seem to have any heuristic justification or to fit with experimental evidence. So although we don’t have a disproof of the stronger conjecture (I think it would be very interesting to find one), it doesn’t seem like a good idea to spend a lot of effort trying to prove it, unless we can somehow explain away the evidence that appears to be stacking up against it.

The first conjecture turns out to be equivalent to a statement that doesn’t mention transitivity. The very quick proof I’ll give here was supplied by Luke Pebody. Suppose we have a tournament (that is, a complete graph with each edge directed in one of the two possible directions) and write for the out-degree of a vertex (that is, the number of such that there is an arrow from to ) and for the in-degree. Then let us count the number of ordered triples such that . Any directed triangle in the tournament will give rise to three such triples, namely and . And any other triangle will give rise to just one: for example, if and we get just the triple . So the number of ordered triples such that and is plus twice the number of directed triangles. Note that is approximately .

But the number of these ordered triples is also . If almost all in-degrees and almost all out-degrees are roughly , then this is approximately , which means that the number of directed triangles is approximately . That is, in this case, the probability that three dice form an intransitive triple is approximately 1/4, as we are hoping from the conjecture. If on the other hand several in-degrees fail to be roughly , then is substantially lower than and we get a noticeably smaller proportion of intransitive triples.

Thus, the weaker conjecture is equivalent to the statement that almost every die beats approximately half the other dice.

The answer to this is fairly simple, heuristically at least. Let be an arbitrary die. For define to be the number of with and define to be . Then

,

from which it follows that .

We also have that if is another die, then

If we make the simplifying assumption that sufficiently infrequently to make no real difference to what is going on (which is not problematic, as a slightly more complicated but still fairly simple function can be used instead of to avoid this problem), then we find that to a reasonable approximation beats if and only if is positive.

So what we would like to prove is that if are chosen independently at random from , then

.

We are therefore led to consider the random variable

where now is chosen uniformly at random from without any condition on the sum. To write this in a more transparent way, let be the random variable , where is chosen uniformly at random from . Then is a sum of independent copies of . What we are interested in is the distribution we obtain when we condition the random variable on .

This should mean that we are in an excellent position, since under appropriate conditions, a lot is known about sums of independent random variables, and it looks very much as though those conditions are satisfied by , at least when is “typical”. Indeed, what we would expect, by the central limit theorem, is that will approximate a bivariate normal distribution with mean 0 (since both and have mean zero). But a bivariate normal distribution is centrally symmetric, so we expect the distribution of to be approximately centrally symmetric, which would imply what we wanted above, since that is equivalent to the statement that .

How can we make the above argument rigorous? The central limit theorem on its own is not enough, for two reasons. The first is that it does not give us information about the speed of convergence to a normal distribution, whereas we need a sum of copies of to be close to normal. The second is that the notion of “close to normal” is not precise enough for our purposes: it will allow us to approximate the probability of an event such as but not of a “probability zero” event such as .

The first of these difficulties is not too worrying, since plenty of work has been done on the speed of convergence in the central limit theorem. In particular, there is a famous theorem of Berry and Esseen that is often used when this kind of information is needed.

However, the Berry-Esseen theorem still suffers from the second drawback. To get round that one needs to turn to more precise results still, known as *local* central limit theorems, often abbreviated to LCLTs. With a local central limit theorem, one can even talk about the probability that takes a specific value after a specific number of steps. Roughly speaking, it says (in its 2-dimensional version) that if is a random variable of mean zero taking values in and if satisfies suitable moment conditions and is not supported in a proper sublattice of , then writing for a sum of copies of , we have that the probability that takes a particular value differs from the “expected” probability (given by a suitable Gaussian formula) by . (I’m not 100% sure I’ve got that right: the theorem in question is Theorem 2.1.1 from this book.)

That looks very close to what we want, but it still falls short. The problem is that the implied constant depends on the random variable . A simple proof of this is that if is not supported in a sublattice but very nearly is — for example, if the probability that it takes a value outside the sublattice is — then one will have to add together an extremely large number of copies of before the sum ceases to be concentrated in the sublattice.

So the situation we appear to be in is the following. We have more precise information about the random variable than is assumed in the LCLT in the reference above, and we want to use that to obtain an explicit constant in the theorem.

It could be that out there in the literature is exactly the result we need, which would be nice, but it also seems possible that we will have to prove an appropriate version of the LCLT for ourselves. I’d prefer the first, but the second wouldn’t be too disappointing, as the problem is quite appealing and even has something of an additive-combinatorial flavour (since it is about describing an iterated convolution of a subset of under appropriate assumptions).

I said above, with no justification, that we have more precise information about the random variable . Let me now try to give the justification.

First of all, we know everything we could possibly want to know about : it is the uniform distribution on . (In particular, if is odd, then it is the uniform distribution on the set of integers in .)

How about the distribution of ? That question is equivalent to asking about the values taken by , and their multiplicities. There is quite a lot one can say about those. For example, I claim that with high probability (if is a random -sided die) is never bigger than . That is because if we choose a fully random sequence , then the expected number of such that is , and the probability that this number differs from by more than is , by standard probabilistic estimates, so if we set , then this is at most , which we can make a lot smaller than by choosing to be, say, . (I think can be taken to be 1/8 if you want me to be more explicit.) Since the probability that is proportional to , it follows that this conclusion continues to hold even after we condition on that event.

Another simple observation is that the values taken by are not contained in a sublattice (assuming, that is, that is ever non-zero). That is simply because and averages zero.

A third simple observation is that with probability 1-o(1) will take a value of at least at least somewhere. I’ll sketch a proof of this. Let be around and let be evenly spaced in , staying away from the end points 1 and . Let be a purely random sequence in . Then the standard deviation of is around , so the probability that it is less than is around . The same is true of the conditional probability that is less than conditioned on the value of (the worst case being when this value is 0). So the probability that this happens for every is at most . This is much smaller than , so the conclusion remains valid when we condition on the sum of the being . So the claim follows. Note that because of the previous simple observation, it follows that must be at least in magnitude at least times, so up to log factors we get that is at least . With a bit more effort, it should be possible to push this up to something more like , since one would expect that would have rough order of magnitude for a positive fraction of the . Maybe this would be a good subproblem to think about, and ideally not too difficult.

How about the joint distribution ? It seems highly likely that for typical this will not be concentrated in a lattice, and that elementary arguments such as the above can be used to prove this. But let me indicate the kind of situation that we would have to prove is not typical. Suppose that and . Then as runs from 1 to 15 the values taken by are and the values taken by are . For this example, all the points live in the lattice of points such that is a multiple of 5.

This wouldn’t necessarily be a disaster for us actually, since the LCLT can be restricted to a sublattice and if after conditioning on we happen to have that is always a multiple of 5, that isn’t a problem if we still have the central symmetry. But it would probably be nicer to prove that it is an atypical occurrence, so that we don’t have to worry about living inside a sublattice (or even being concentrated in one).

My guess is that if we were to pursue these kinds of thoughts, we would end up being able to prove a statement that would say something like that takes a pretty representative sample of values with being between and and being in a range of width around . I would expect, for example, that if we add three or four independent copies of , then we will have a distribution that is similar in character to the uniform distribution on a rectangle of width of order of magnitude and height of order of magnitude . And if that’s true, then adding of them should give us something very close to normal (in an appropriate discrete sense of the word “normal”).

There are two obvious tasks here. One is to try to prove as much as we can about the random variable . The other is to try to prove a suitable LCLT that is strong enough to give us that the probability that given that is approximately 1/2, under suitable assumptions about . And then we have to hope that what we achieve for the first is sufficient for the second.

It’s possible that the second task can be achieved by simply going through one of the existing proofs of the LCLT and being more careful about the details. But if that’s the case, then we should spend some time trying to find out whether anyone has done it already, since there wouldn’t be much point in duplicating that work. I hope I’ve set out what we want clearly enough for any probabilist who might stumble upon this blog post to be able to point us in the right direction if indeed the result we want is out there somewhere.

]]>

In this post I want to expand on part of the previous one, to try to understand better what would need to be true for the quasirandomness assertion to be true. I’ll repeat a few simple definitions and simple facts needed to make the post more self-contained.

By an –*sided* die I mean a sequence in (where is shorthand for ) that adds up to . Given an -sided die and , I define to be the number of such that and to be .

We can write as . Therefore, if is another die, or even just an arbitrary sequence in , we have that

.

If and no is equal to any , then the sign of this sum therefore tells us whether beats . For most , we don’t expect many ties, so the sign of the sum is a reasonable, but not perfect, proxy for which of the two dice wins. (With a slightly more complicated function we can avoid the problem of ties: I shall stick with the simpler one for ease of exposition, but would expect that if proofs could be got to work, then we would switch to the more complicated functions.)

This motivates the following question. Let and be two random dice. Is it the case that with high probability the remaining dice are split into four sets of roughly equal size according to the signs of and ? I expect the answer to this question to be the same as the answer to the original transitivity question, but I haven’t checked as carefully as I should that my cavalier approach to ties isn’t problematic.

I propose the following way of tackling this question. We fix and and then choose a purely random sequence (that is, with no constraint on the sum) and look at the 3D random variable

.

Each coordinate separately is a sum of independent random variables with mean zero, so provided not too many of the or are zero, which for random and is a reasonable assumption, we should get something that approximates a trivariate normal distribution.

Therefore, we should expect that when we condition on being zero, we will get something that approximates a bivariate normal distribution. Although that may not be completely straightforward to prove rigorously, tools such as the Berry-Esseen theorem ought to be helpful, and I’d be surprised if this was impossibly hard. But for now I’m aiming at a heuristic argument, so I want simply to assume it.

What we want is for the signs of the first two coordinates to be approximately independent, which I think is equivalent to saying (assuming normality) that the first two coordinates themselves are approximately independent.

However, what makes the question interesting is that the first two coordinates are definitely *not* independent without the conditioning: the random variables and are typically quite strongly correlated. (There are good reasons to expect this to be the case, and I’ve tested it computationally too.) Also, we expect correlations between these variables and . So what we are asking for is that all these correlations should disappear when we condition appropriately. More geometrically, there is a certain ellipsoid, and we want its intersection with a certain plane to be a circle.

The main aim of this post is to make the last paragraph more precise. That is, I want to take three standard normal random variables and that are not independent, and understand precisely the circumstances that guarantee that and become independent when we condition on .

The joint distribution of is determined by the matrix of correlations. Let this matrix be split up as , where is the covariance matrix of , is a matrix, is a matrix and is the matrix . A general result about conditioning joint normal distributions on a subset of the variables tells us, if I understand the result correctly, that the covariance matrix of when we condition on the value of is . (I got this from Wikipedia. It seems to be quite tricky to prove, so I hope it really can be used as a black box.) So in our case if we have a covariance matrix then the covariance matrix of conditioned on should be .

That looks dimensionally odd because I normalized the random variables to have variance 1. If instead I had started with the more general covariance matrix I would have ended up with .

So after the conditioning, if we want and to become independent, we appear to want to equal . That is, we want

where I am using angle brackets for covariances.

If we divide each variable by its standard deviation, that gives us that the correlation between and should be the product of the correlation between and and the correlation between and .

I wrote some code to test this, and it seemed not to be the case, anything like, but I am not confident that I didn’t make careless mistakes in the code. (However, my correlations were reasonable numbers in the range , so any mistakes there might have been didn’t jump out at me. I might just rewrite the code from scratch without looking at the old version.)

One final remark I’d like to make is that if you feel there is something familiar about the expression , then you are not completely wrong. The formula for the vector triple product is

.

Therefore, the expression can be condensed to . Now this is the scalar triple product of the three vectors , , and . For this to be zero, we need to lie in the plane generated by and . Note that is orthogonal to both and . So if is the orthogonal projection to the subspace generated by , we want to be orthogonal to . Actually, that can be read out of the original formula too, since it is . A nicer way of thinking of it (because more symmetrical) is that we want the orthogonal projections of and to the subspace orthogonal to to be orthogonal. To check that, assuming (WLOG) that ,

.

So what I’d like to see done (but I’m certainly not saying it’s the only thing worth doing) is the following.

1. Test experimentally whether for a random pair of -sided dice we find that the correlations of the random variables , and really do appear to satisfy the relationship

corr.corr corr.

Here the are chosen randomly *without* any conditioning on their sum. My experiment seemed to indicate not, but I’m hoping I made a mistake.

2. If they do satisfy that relationship, then we can start to think about why.

3. If they do not satisfy it, then we can start to think about why not. In particular, which of the heuristic assumptions used to suggest that they *should* satisfy that relationship is wrong — or is it my understanding of multivariate normals that is faulty?

If we manage to prove that they typically do satisfy that relationship, at least approximately, then we can think about whether various distributions become sufficiently normal sufficiently quickly for that to imply that intransitivity occurs with probability 1/4.

]]>

But I haven’t got to that point yet: let me see whether a second public post generates any more reaction.

I’ll start by collecting a few thoughts that have already been made in comments. And I’ll start that with some definitions. First of all, I’m going to change the definition of a die. This is because it probably makes sense to try to prove rigorous results for the simplest model for which they are true, and random multisets are a little bit frightening. But I am told that experiments suggest that the conjectured phenomenon occurs for the following model as well. We define an *-sided die* to be a sequence of integers between 1 and such that . A random -sided die is just one of those chosen uniformly from the set of all of them. We say that *beats* if

That is, beats if the probability, when you roll the two dice, that shows a higher number than is greater than the probability that shows a higher number than . If the two probabilities are equal then we say that *ties with* .

The main two conjectures are that the probability that two dice tie with each other tends to zero as tends to infinity and that the “beats” relation is pretty well random. This has a precise meaning, but one manifestation of this randomness is that if you choose three dice and uniformly at random and are given that beats and beats , then the probability that beats is, for large , approximately . In other words, transitivity doesn’t happen any more often than it does for a random tournament. (Recall that a *tournament* is a complete graph in which every edge is directed.)

Now let me define a function that helps one think about dice. Given a die , define a function on the set by

Then it follows immediately from the definitions that beats if , which is equivalent to the statement that .

If the “beats” tournament is quasirandom, then we would expect that for almost every pair of dice the remaining dice are split into four parts of roughly equal sizes, according to whether they beat and whether they beat . So for a typical pair of dice we would like to show that for roughly half of all dice , and for roughly half of all dice , and that these two events have almost no correlation.

It is critical here that the sums should be fixed. Otherwise, if we are told that beats , the most likely explanation is that the sum of is a bit bigger than the sum of , and then is significantly more likely to beat than is.

Note that for every die we have

That is, every die ties with the die .

Now let me modify the functions to make them a bit easier to think about, though not quite as directly related to the “beats” relation (though everything can be suitably translated). Define to be and to be . Note that which would normally be approximately equal to .

We are therefore interested in sums such as . I would therefore like to get a picture of what a typical sequence looks like. I’m pretty sure that has mean . I also think it is distributed approximately normally around . But I would also like to know about how and correlate, since this will help us get some idea of the variance of , which, if everything in sight is roughly normal, will pin down the distribution. I’d also like to know about the covariance of and , or similar quantities anyway, but I don’t want to walk before I can fly.

Anyhow, I had the good fortune to see Persi Diaconis a couple of days ago, and he assured me that the kind of thing I wanted to understand had been studied thoroughly by probabilists and comes under the name “constrained limit theorems”. I’ve subsequently Googled that phrase and found some fairly old papers written in the typical uncompromising style and level of generality of their day, which leaves me thinking that it may be simpler to work a few things out from scratch. The main purpose of this post is to set out some exercises that have that as their goal.

Suppose, then, that we have a random -sided die . Let’s begin by asking for a proper proof that the mean of is . It clearly is if we choose a purely random -tuple of elements of , but what happens if we constrain the average to be ?

I don’t see an easy proof. In fact, I’m not sure it’s true, and here’s why. The average will always be if and only if the probability that is always equal to , and that is true if and only if is uniformly distributed. (The distributions of the are of course identical, but — equally of course — not independent.) But do we expect to be uniformly distributed? No we don’t: if that will surely make it easier for the global average to be than if .

However, I would be surprised if it were not at least approximately true. Here is how I would suggest proving it. (I stress that I am *not* claiming that this is an unknown result, or something that would detain a professional probabilist for more than two minutes — that is why I used the word “exercise” above. But I hope these questions will be useful exercises.)

The basic problem we want to solve is this: if are chosen independently and uniformly from , then what is the conditional probability that given that the average of the is exactly ?

It’s not the aim of this post to give solutions, but I will at least say why I think that the problems aren’t too hard. In this case, we can use Bayes’s theorem. Using well-known estimates for sums of independent random variables, we can give good approximations to the probability that the sum is and of the probability of that given that (which is just the probability that the sum of the remaining s is ). We also know that the probability that is . So we have all the information we need. I haven’t done the calculation, but my guess is that the tendency for to be closer to the middle than to the extremes is not very pronounced.

In fact, here’s a rough argument for that. If we choose uniformly from , then the variance is about . So the variance of the sum of the (in the fully independent case) is about , so the standard deviation is proportional to . But if that’s the case, then the probability that the sum equals is roughly constant for .

I think it should be possible to use similar reasoning to prove that if , then are approximately independent. (Of course, this would apply to any of the , if correct.)

What is the probability that of the are at most ? Again, it seems to me that Bayes’s theorem and facts about sums of independent random variables are enough for this. We want the probability of the above event given that . By Bayes’s theorem, we can work this out if we know the probability that given that , together with the probability that and the probability that , in both cases when is chosen fully independently. The last two calculations are simple. The first one isn’t 100% simple, but it doesn’t look too bad. We have a sum of random variables that are uniform on and that are uniform on and we want to know how likely it is that they add up to . We could do this by conditioning on the possible values of the two sums, which then leaves us with sums of independent variables, and adding up all the results. It looks to me as though that calculation shouldn’t be too unpleasant. What I would recommend is to do the calculation on the assumption that the distributions are normal (in a suitable discrete sense) with whatever mean and variance they have to have, since that will yield an answer that is almost certainly correct. A rigorous proof can come later, and shouldn’t be too much harder.

The answer I expect and hope for is that is approximately normally distributed with mean and a variance that would come out of the calculations.

This can in principle be done by exactly the same technique, except that now things get one step nastier because we have to condition on the sum of the that are at most , the sum of the that are between and , and the sum of the rest. So we end up with a double sum of products of three probabilities at the end instead of a single sum of products of two probabilities. The reason I haven’t done this is that I am quite busy with other things and the calculation will need a strong stomach. I’d be very happy if someone else did it. But if not, I will attempt it at some point over the next … well, I don’t want to commit myself too strongly, but *perhaps* the next week or two. At this stage I’m just interested in the heuristic approach — assume that probabilities one knows are roughly normal are in fact given by an exact formula of the form .

For some experimental evidence about this, see a comment by Ian on the previous post, which links to some nice visualizations. Ian, if you’re reading this, it would take you about another minute, I’d have thought, to choose a few random dice and plot the graphs . It would be interesting to see such plots to get an idea of what a typical one looks like: roughly how often does it change sign, for example?

I have much less to say here — in particular, I don’t have a satisfactory answer. But I haven’t spent serious time on it, and I think it should be possible to get one.

One slight simplification is that we don’t have to think too hard about whether beats when we are thinking about the three dice and . As I commented above, the tournament will be quasirandom (I think I’m right in saying) if for *almost every* and the events “ beats ” and “ beats ” have probability roughly 1/2 each and are hardly correlated.

A good starting point would be the first part. Is it true that almost every die beats approximately half the other dice? This question was also recommended by Bogdan Grechuk in a comment on the previous post. He suggested, as a preliminary question, the question of finding a good sufficient condition on a die for this to be the case.

That I think is approachable too. Let’s fix some function without worrying too much about whether it comes from a die (but I have no objection to assuming that it is non-decreasing and that , should that be helpful). Under what conditions can we be confident that the sum is greater than with probability roughly 1/2, where *is* a random die?

Assuming it’s correct that each is roughly uniform, is going to average , which if is a die will be close to . But we need to know rather more than that in order to obtain the probability in question.

But I think the Bayes approach may still work. We’d like to nail down the distribution of given that . So we can look at , where now the are chosen uniformly and independently. Calling that , we find that it’s going to be fairly easy to estimate the probabilities of and . However, it doesn’t seem to be notably easier to calculate than it is to calculate . But we have made at least one huge gain, which is that now the are independent, so I’d be very surprised if people don’t know how to estimate this probability. Indeed, the probability we really want to know is . From that all else should follow. And I *think* that what we’d like is a nice condition on that would tell us that the two events are approximately independent.

I’d better stop here, but I hope I will have persuaded at least some people that there’s some reasonably low-hanging fruit around, at least for the time being.

]]>

Suppose you have a pair of dice with different numbers painted on their sides. Let us say that *beats* if, thinking of them as random variables, the probability that is greater than the probability that . (Here, the rolls are of course independent, and each face on each die comes up with equal probability.) It is a famous fact in elementary probability that this relation is not transitive. That is, you can have three dice such that beats , beats , and beats .

Brian Conrey, James Gabbard, Katie Grant, Andrew Liu and Kent E. Morrison became curious about this phenomenon and asked the kind of question that comes naturally to an experienced mathematician: to what extent is intransitivity “abnormal”? The way they made the question precise is also one that comes naturally to an experienced mathematician: they looked at -sided dice for large and asked about limiting probabilities. (To give another example where one might do something like this, suppose one asked “How hard is Sudoku?” Well, any Sudoku puzzle can be solved in constant time by brute force, but if one generalizes the question to arbitrarily large Sudoku boards, then one can prove that the puzzle is NP-hard to solve, which gives a genuine insight into the usual situation with a board.)

Let us see how they formulate the question. The “usual” -sided die can be thought of as a random variable that takes values in the set , each with equal probability. A general -sided die is one where different probability distributions on are allowed. There is some choice about which ones to go for, but Conrey et al go for the following natural conditions.

- For each integer , the probability that it occurs is a multiple of .
- If , then .
- The expectation is the same as it is for the usual die — namely .

Equivalently, an -sided die is a multiset of size with elements in and sum . For example, (2,2,3,3,5,6) and (1,2,3,3,6,6) are six-sided dice.

If we have two -sided dice and represented in this way as and , then beats if the number of pairs with exceeds the number of pairs with .

The question can be formulated a little over-precisely as follows.

**Question.** *Let , and be three -sided dice chosen uniformly at random. What is the probability that beats if you are given that beats and beats ?*

I say “over-precisely” because there isn’t a serious hope of finding an exact formula for this conditional probability. However, it is certainly reasonable to ask about the limiting behaviour as tends to infinity.

It’s important to be clear what “uniformly at random” means in the question above. The authors consider two -sided dice to be the same if the probability distributions are the same, so in the sequence representation a random die is a random non-decreasing sequence of integers from that add up to — the important word there being “non-decreasing”. Another way of saying this is that, as indicated above, the distribution is uniform over multisets (with the usual notion of equality) rather than sequences.

What makes the question particularly nice is that there is strong evidence for what the answer ought to be, and the apparent answer is, at least initially, quite surprising. The authors make the following conjecture.

**Conjecture.** *Let , and be three -sided dice chosen uniformly at random. Then the probability that beats if you are given that beats and beats tends to 1/2 as tends to infinity.*

This is saying that if you know that beats and that beats , you basically have no information about whether beats .

They back up this conjecture with some experimental evidence. When , there turn out to be 4417 triples of dice such that beats and beats . For 930 of these triples, and were tied, for 1756, beat , and for , beat .

It seems obvious that as tends to infinity, the probability that two random -sided dice are tied tends to zero. Somewhat surprisingly, that is not known, and is also conjectured in the paper. It might make a good first target.

The reason these problems are hard is at least in part that the uniform distribution over non-decreasing sequences of length with entries in that add up to is hard to understand. In the light of that, it is tempting to formulate the original question — just how abnormal is transitivity? — using a different, more tractable distribution. However, experimental evidence presented by the authors in their paper indicates that the problem is quite sensitive to the distribution one chooses, so it is not completely obvious that a good reformulation of this kind exists. But it might still be worth thinking about.

Assuming that the conjecture is true, I would imagine that the heuristic reason for its being true is that for large , two random dice will typically be “close” in the sense that although one beats the other, it does not do so by very much, and therefore we do not get significant information about what it looks like just from knowing that it beats the other one.

That sounds a bit vague, so let me give an analogy. Suppose we choose random unit vectors in and are given the additional information that and . What is the probability that ? This is a simple exercise, and, unless I’ve messed up, the answer is 3/4. That is, knowing that in some sense is close to and is close to makes it more likely that is close to .

But now let’s choose our random vectors from . The picture changes significantly. For fixed , the concentration of measure phenomenon tells us that for almost all the inner product is close to zero, so we can think of as the North Pole and the unit sphere as being almost all contained in a thin strip around the equator. And if happens to be just in the northern hemisphere — well, it could just as easily have landed in the southern hemisphere. After a change of basis, we can assume that and is very close to . So when we choose a third vector , we are asking whether the sign of its second coordinate is correlated with the sign of its first. And the answer is no — or rather, yes but only very weakly.

One can pursue that thought and show that the graph where one joins to if is, for large , quasirandom, which means, roughly speaking, that it has several equivalent properties that are shared by almost all random graphs. (For a more detailed description, Googling “quasirandom graphs” produces lots of hits.)

For the problem of Conrey et al, the combinatorial object being examined is not a graph but a *tournament*: that is, a complete graph with orientations on each of its edges. (The vertices are dice, and we draw an arrow from to if beats . Strictly speaking this is not a tournament, because of ties, but I am assuming that ties are rare enough for this to make no significant difference to the discussion that follows.) It is natural to speculate that the main conjecture is a consequence of a much more general statement, namely that this tournament is quasirandom in some suitable sense. In their paper, the authors do indeed make this speculation (it appears there as Conjecture 4).

It turns out that there is a theory of quasirandom tournaments, due to Fan Chung and Ron Graham. Chung and Graham showed that a number of properties that a tournament can have are asymptotically equivalent. It is possible that one of the properties they identified could be of use in proving the conjecture described in the previous paragraph, which, in the light of the Chung-Graham paper, is exactly saying that the tournament is quasirandom. I had hoped that there might be an analogue for tournaments of the spectral characterization of quasirandom graphs (which says that a graph is quasirandom if its second largest eigenvalue is small), since that could give a significantly new angle on the problem, but there is no such characterization in Chung and Graham’s list of properties. Perhaps it is worth looking for something of this kind.

Here, once again, is a link to the paper where the conjectures about dice are set out, and more detail is given. If there is enough appetite for a Polymath project on this problem, I am happy to host it on this blog. All I mean by this is that I am happy for the posts and comments to appear here — at this stage I am not sure what level of involvement I would expect to have with the project itself, but I shall certainly follow the discussion to start with and I hope I’ll be able to make useful contributions.

]]>

The problem it will tackle is Rota’s basis conjecture, which is the following statement.

**Conjecture.** *For each let be a basis of an -dimensional vector space . Then there are disjoint bases of , each containing one element from each .*

Equivalently, if you have an matrix where each row is a basis, then you can permute the entries of the rows so that each column is also a basis.

This is one of those annoying problems that comes into the how-can-that-not-be-known category. Timothy Chow has a lot of interesting thoughts to get the project going, as well as explanations of why he thinks the time might be ripe for a solution.

]]>

The ScienceDirect agreement provides access to around 1,850 full text scientific, technical and medical (STM) journals â€“ managed by renowned editors, written by respected authors and read by researchers from around the globe â€“ all available in one place: ScienceDirect. Elsevierâ€™s full text collection covers titles from the core scientific literature including high impact factor titles such as The Lancet, Cell and Tetrahedron.

Unless things have changed, this too is highly misleading, since up to now most Cell Press titles have *not* been part of the Big Deal but instead are part of a separate package. This point is worth stressing, since failure to appreciate it may cause some people to overestimate how much they rely on the Big Deal — in Cambridge at least, the Cell Press journals account for a significant percentage of our total downloads. (To be more precise, the top ten Elsevier journals accessed by Cambridge are, in order, Cell, Neuron, Current Biology, Molecular Cell, The Lancet, Developmental Cell, NeuroImage, Cell Stem Cell, Journal of Molecular Biology, and Earth and Planetary Science Letters. Of those, Cell, Neuron, Current Biology, Molecular Cell, Developmental Cell and Cell Stem Cell are Cell Press journals, and they account for over 10% of all our access to Elsevier journals.)

Jisc has also put up a Q&A, which can be found here.

Just to remind you, here is what a number of universities were paying annually for their Elsevier subscriptions during the current deal. To be precise, these are the figures for 2014, obtained using FOI requests: they are likely to be a little higher for 2016.

University |
Cost |
Enrolment |
Academic Staff |

Birmingham | ÂŁ764,553 | 31,070 | 2355 + 440 |

Bristol | ÂŁ808,840 | 19,220 | 2090 + 525 |

Cambridge | ÂŁ1,161,571 | 19,945 | 4205 + 710 |

Cardiff | ÂŁ720,533 | 30,000 | 2130 + 825 |

*Durham | ÂŁ461,020 | 16,570 | 1250 + 305 |

**Edinburgh | ÂŁ845,000 | 31,323 | 2945 + 540 |

*Exeter | ÂŁ234,126 | 18,720 | 1270 + 290 |

Glasgow | ÂŁ686,104 | 26,395 | 2000 + 650 |

Imperial College London | ÂŁ1,340,213 | 16,000 | 3295 + 535 |

King’s College London | ÂŁ655,054 | 26,460 | 2920 + 1190 |

Leeds | ÂŁ847,429 | 32,510 | 2470 + 655 |

Liverpool | ÂŁ659,796 | 21,875 | 1835 + 530 |

Â§London School of Economics | ÂŁ146,117 | 9,805 | 755 + 825 |

Manchester | ÂŁ1,257,407 | 40,860 | 3810 + 745 |

Newcastle | ÂŁ974,930 | 21,055 | 2010 + 495 |

Nottingham | ÂŁ903,076 | 35,630 | 2805 + 585 |

Oxford | ÂŁ990,775 | 25,595 | 5190 + 775 |

* ***Queen Mary U of London | ÂŁ454,422 | 14,860 | 1495 + 565 |

Queen’s U Belfast | ÂŁ584,020 | 22,990 | 1375 + 170 |

Sheffield | ÂŁ562,277 | 25,965 | 2300 + 460 |

Southampton | ÂŁ766,616 | 24,135 | 2065 + 655 |

University College London | ÂŁ1,381,380 | 25,525 | 4315 + 1185 |

Warwick | ÂŁ631,851 | 27,440 | 1535 + 305 |

*York | ÂŁ400,445 | 17,405 | 1205 + 285 |

*Joined the Russell Group two years ago.

**Information obtained by Sean Williams.

***Information obtained by Edward Hughes.

Â§LSE subscribes to a package of subject collections rather than to the full Freedom Collection.

These are figures for Russell Group universities: the total amount spent annually by all UK universities for access to ScienceDirect is around ÂŁ40 million.

An important additional factor is that since the last deal was struck with Elsevier, we have had the Finch Report, which has led to a policy of requiring publications in the UK to be open access. The big publishers (who lobbied hard when the report was being written) have responded by turning many of their journals into “hybrid” journals, that is, subscription journals where for an additional fee, usually in the region of ÂŁ2000, you can pay to make your article freely readable to everybody. This has added significantly to the total bill. Cambridge, for example, has paid over ÂŁ750,000 this year in article processing charges, from a grant provided for the purpose.

Jisc started preparing for these negotiations at least two years ago, for example going on fact-finding missions round the world to see what had happened in other countries. The negotiations began in earnest in 2016, and Jisc started out with some core aims, some of which they described as red lines and some as important aims. (I know this from a briefing meeting I attended in Cambridge — I think that similar meetings took place at other universities.) Some of these were as follows.

- No real-terms price increases.
- An offsetting agreement for article processing charges.
- No confidentiality clauses.
- A move away from basing price on “historic spend”.
- A three-year deal rather than a five-year deal.

Let me say a little about each of these.

This seemed extraordinarily unambitious as a starting point for negotiations. The whole point of universities asking an organization like Jisc to negotiate on our behalf was supposed to be that they would be able to negotiate hard and that the threat of not coming to an agreement would be one that Elsevier would have to be genuinely worried about. Journal prices have gone up far more than inflation for decades, while the costs of dissemination have (or at the very least should have) gone down substantially. In addition, there are a number of subjects, mathematics and high-energy physics being two notable examples, where it is now common practice to claim priority for a result by posting a preprint, and in those subjects it is less and less common for people to look at the journal versions of articles because repositories such as arXiv are much more convenient, and the value that the publishers claim they add to articles is small to nonexistent. So Jisc should have been pressing for a substantial cut in prices: maintenance of the status quo is not appropriate when technology and reading habits are changing so rapidly.

An offsetting agreement means a deal where if somebody pays an article processing charge in order to make an article open access in an Elsevier journal, then that charge is subtracted from the Big Deal payment. There are arguments for and against this idea. The main argument for it is that it is a way of avoiding double dipping: the phenomenon where Elsevier effectively gets paid twice for the same article, since it rakes in the article processing charges but does not reduce the subscription cost of the Big Deal.

In its defence, Elsevier makes the following two points. First, it has an explicit policy against double dipping. In answer to the obvious accusation that they are receiving a lot of APCs and we are seeing no corresponding drop in Big Deal prices, they point out that the total volume of articles they publish is going up. This highlights a huge problem with Big Deals: if universities could say that they did not want the extra content then it might be OK, but as it is, all Elsevier has to do to adhere to its policy is found enough worthless journals that nobody reads to equal the volume of articles for which APCs are paid.

But there is a second argument that carries more weight. It is that if one country has an offsetting agreement, then all other countries benefit (at least in theory) from lower subscription prices, so in total Elsevier has lost out. Or to put it another way, with an offsetting agreement, it basically becomes free for people in that country to publish an open access article with Elsevier, so they are effectively giving away that content.

Against this are two arguments: that if somebody has to lose out, why should it not be Elsevier, and that in any case it would be entirely consistent with a no-double-dipping policy for Elsevier not to reduce its Big Deal subscriptions for the other countries. In the longer term, if lots of countries had offsetting agreements, this might cease to be sustainable, since nobody would need subscriptions any more, but since most countries are not following the UK’s lead in pursuing open access with article processing charges, this is unlikely to happen any time soon.

Personally, I am not in favour of an offsetting agreement if it works on a per-article basis, since that may lead to pressure from universities for their academics to publish with Elsevier rather than with publishers that do not have offsetting agreements: that is, it gives an artificial advantage to Elsevier journals. What I would like to see is a big drop in the subscription price to allow for the fact that we are now paying a lot of APC money to Elsevier. That way, if other journals are better, they will get used, and there will be some semblance of a market.

It goes without saying that confidentiality clauses are one of the most obnoxious features of Elsevier contracts. And now that FOI requests have been successful in obtaining information about what universities pay for their subscriptions, they also seem rather pointless. In any case, Jisc was strongly against them, as they certainly should have been.

Another remark is that if contracts are kept confidential, there is no way of assessing whether Elsevier is double dipping.

When we moved from looking at print copies of journals to looking at articles online, it suddenly ceased to be obvious on what basis we should be charged. Elsevier came up with the idea of not changing anything, so even if in practice with a big deal we get access to all the journals, nominally a university subscribes to a “Core Collection”, which is based on what it used to have print subscriptions to (they are allowed to change what is in the Core Collection, but they cannot reduce its size), and then the rest goes under the Orwellian name of the Freedom Collection.

This system is manifestly unfair: for example, Cambridge, with its numerous college libraries, used to subscribe to several copies of certain journals and is now penalized for this. It also means that if a university starts to need journals less, there is no way for this to be reflected in the price it pays.

Jisc recognised the problem, and came up with a rather mealy-mouthed formula about “moving away from historic spend”. Not abolishing the system and replacing it by a fairer one (which is hard to do as there will be losers as well as winners), but “moving away” from it in ways that they did not specify when we asked about it at the briefing meeting.

At some point I was told (indirectly by Cambridge’s then head librarian) that the idea was to go for a three-year deal, so that we would not be locked in for too long. This I was very pleased to hear, as a lot can change in three years.

For reasons I’ve given in the previous section, even if Jisc had succeeded in its aims, I would have been disappointed by the deal. But as it was, something very strange happened. We had been told of considerable ill feeling, including cancelled meetings because the deals that Elsevier was offering were so insultingly bad, and then suddenly in late September we learned that a deal had been reached. And then when the deal was announced it was all smiles and talk of “landmark deals” and “value for money”.

So how did Jisc do, by their own criteria? Well, it is conceivable that they will end up achieving their first aim of not having any real-terms price increases: this will depend on whether Brexit causes enough inflation to cancel out such money-terms price increases as there may or may not be — I leave it to you to guess which. (In the interests of balance, I should also point out that the substantial drop in the pound means that what Elsevier receives has, in their terms, gone down. That said, currency fluctuations are a fact of life and over the last few years they have benefited a lot from a weak euro.)

Jisc said that an offsetting agreement was not just an aspiration but a red line — a requirement of any deal they would be prepared to strike. However, there is no offsetting agreement.

Jisc also said that they would insist on transparency, but when Elsevier insisted on confidentiality clauses, they meekly accepted this. (Their reasoning: Elsevier was not prepared to reach a deal without these clauses. But why didn’t an argument of exactly the same type apply to Jisc in the other direction?) It is for that reason that I have been a bit vague about prices above.

As far as historic spend is concerned, I see on the Jisc statement the following words: “The agreement includes the ability for the consortium to migrate from historical print spend and reallocate costs should we so wish.” I have no information about whether any “migration” has started, but my guess would be that it hasn’t, since if there were to be moves in that direction, then there would surely need to be difficult negotiations between the universities about how to divide up the total bill, and there has been no sign of any such negotiations taking place.

Finally, the deal is for five years and not for three years.

So Jisc has not won any clear victories and has had several clear defeats. Now if you were in that position more than three months before the end of the existing deal, what would you do? Perhaps you would follow the course suggested by a Jisc representative at one of the briefing meetings, who said the following.

We know from analysis of the experiences of other consortia that Elsevier really do want to reach an agreement this year. They really hate to go over into the next year …

A number of colleagues from other consortia have said they wished they had held on longer …

If we can hold firm even briefly into 2017 that should have quite a profound impact on what we can achieve in these negotiations.

Of course, all that is just common sense. But this sensible negotiating strategy was mysteriously abandoned, on the grounds that it had become clear that the deal on offer was the best that Jisc was going to get. Again there is a curious lack of symmetry here: why didn’t Jisc make it clear that a better deal (for Jisc) was the best that Elsevier was going to get? At the very least, why didn’t Jisc at least try to extract further concessions from Elsevier by letting the negotiations continue until much closer to the expiry of the current deal?

Jisc defended itself by saying that their job was simply to obtain the best deal they could to put before the universities, but no university was obliged to sign up to the deal. This is not a wholly satisfactory response, since (i) the whole point of using Jisc rather than negotiating individually was to exploit the extra bargaining power that should come from acting in concert and (ii) Elsevier have made it clear that they will not offer a better deal to any institution that opts out of the Jisc-negotiated one. (This is one of many parallels with Brexit — in this case with the fact that the EU cannot be seen to be giving the UK a better deal than it had in the EU.)

A particularly irritating aspect of the situation was that I and some others had organized for an open letter to be sent to Jisc from many academics, urging them to bargain hard. We asked Jisc whether this would be helpful and they requested that we should delay sending it until after a particular meeting with Elsevier had taken place. And then the premature deal took us by surprise and the letter never got sent.

Several universities have already accepted the deal, and the mood amongst heads of department in Cambridge appears to be that although it is not a good deal we do not have a realistic alternative to accepting it. This may be correct, but we appear to be rushing into a decision (in Cambridge it is due to be taken in a few days’ time). We are talking about a lot of money: would it not be sensible to delay signing a contract until there has been a proper assessment of the consequences of rejecting a deal?

For Cambridge, I personally would be in favour of cancelling the Big Deal and subscribing individually to a selection of the most important journals, even if this ended up costing more than what we pay at the moment. The reason is that we would have taken back control (those parallels again). At the moment the market is completely dysfunctional, since the price we pay bears virtually no relationship to demand. But if departments were given budgets and told they could choose whether to spend them on journal subscriptions or to use the money for other purposes, then they would be able to do a proper cost-benefit analysis and act on it. Then as more and more papers became freely available online, costs would start to go down. And if other universities did the same (as some notable universities such as Harvard already have), then Elsevier might start having to lower the list prices of their journals.

If the deal is accepted, it should not be the end of the story. A large part of the reason that Elsevier and the other large publishers walk all over Jisc in these negotiations is that we lack a credible Plan B. (For mathematics there is one — just cancel the deal and read papers on the arXiv, as we do already — but many other subjects have not reached this stage.) We need to think about this, so that in future negotiations any threat to cancel the deal is itself credible. We also need to think about whether Jisc is the right body to be negotiating on our behalf, given what has happened this time. What I am hearing from many people, even those who think we should accept the deal, is full agreement that it is a bad one. Even if we accept it, the very least we can do is make clear that we are not happy with what we are accepting. It may not be very polite to those at Jisc who worked hard on our behalf, but we have paid a heavy price for politeness.

If Elsevier will not give us a proper market, we can at least create mini-markets ourselves within universities: why not charge more from faculties that rely on ScienceDirect more heavily? Such is the culture of secrecy that I am not even allowed to tell you how the cost is shared out in Cambridge, but it does not appear to be based on need.

I am often asked why I focus on Elsevier, but the truth is that I no longer do: Springer, Wiley, and Taylor and Francis are in many ways just as bad, and in some respects are even worse. (For example, while Elsevier now makes mathematics papers over four years old freely available, Springer has consistently refused to make any such move.) I am very reluctant to submit papers to any of these publishers — for example, now that the London Mathematical Society has switched from OUP to Wiley I will not be sending papers to their journals. It will be depressing if we have to wait another five years to improve the situation with Elsevier, but in the meantime there are smaller, but still pretty big, Big Deals coming up with the other members of the big four. Because they are smaller, perhaps we are less reliant on their journals, and perhaps that would allow us to drive harder bargains.

In any case, if you are unhappy with the way things are, please make your feelings known. Part of the problem is that the people who negotiate on our behalf are, quite reasonably, afraid of the reaction they would get if we lost access to important journals. It’s just a pity that they are not also afraid of the reaction if the deal they strike is significantly more expensive than it need have been. (We are in a classic game-theoretic situation where there is a wide range of prices at which it is worth it for Elsevier to provide the deal and not worth it for a university to cancel it, and Elsevier is very good at pushing the price to the top of this range.) Pressure should also be put on librarians to get organized with a proper Plan B so that we can survive for a reasonable length of time without Big Deal subscriptions. Just as with nuclear weapons, it is not necessary for such a Plan B ever to be put to use, but it needs to exist and be credible so that any threat to walk away from negotiations will be taken seriously.

]]>

The Chern Medal is a relatively new prize, awarded once every four years jointly by the IMU and the Chern Medal Foundation (CMF) to an individual whose accomplishments warrant the highest level of recognition for outstanding achievements in the field of mathematics. Funded by the CMF, the Medalist receives a cash prize of US$ 250,000. In addition, each Medalist may nominate one or more organizations to receive funding totalling US$ 250,000, for the support of research, education, or other outreach programs in the field of mathematics.

Professor Chern devoted his life to mathematics, both in active research and education, and in nurturing the field whenever the opportunity arose. He obtained fundamental results in all the major aspects of modern geometry and founded the area of global differential geometry. Chern exhibited keen aesthetic tastes in his selection of problems, and the breadth of his work deepened the connections of geometry with different areas of mathematics. He was also generous during his lifetime in his personal support of the field.

Nominations should be sent to the Prize Committee Chair: Caroline Series, email: chair(at)chern18.mathunion.org by 31st December 2016. Further details and nomination guidelines for this and the other IMU prizes can be found here.

]]>

]]>

Approximately a year on from the announcement of Discrete Analysis, it seems a good moment to take stock and give a quick progress report, so here it is.

At the time of writing (5th October 2016) we have 17 articles published and are on target to reach 20 by the end of the year. (Another is accepted and waiting for the authors to produce a final version.) We are very happy with the standard of the articles. The journal has an ISSN, each article has a DOI, and articles are listed on MathSciNet. We are not yet listed on Web of Science, so we do not have an impact factor, but we will soon start the process of applying for one.

We are informed by Scholastica that between June 6th and September 27th 2016 the journal had 18,980 pageviews. (In the not too distant future we will have the analytics available to us whenever we want to look at them.) The number of views of the page for a typical article is in the low hundreds, but that probably underestimates the number of times people read the editorial introduction for a given article, since that can be done from the main journal pages. So getting published in Discrete Analysis appears to be a good way to attract attention to your article — we hope more than if you post it on the arXiv and wait for it to appear a long time later in a journal of a more conventional type.

We have had 74 submissions so far, of which 14 are still in process. Our acceptance rate is 37%, but some submissions are not serious mathematics, and if these are discounted then the rate is probably somewhere around 50%. I think the 74 includes revised versions of previously submitted articles, so the true figure is a little lower. Our average time to reject a non-serious submission is 7 days, our average to reject a more serious submission is 47 days, and our average time to accept is 121 days. There is considerable variance in these figures, so they should be interpreted cautiously.

There has been one change of policy since the launch of the journal. LĂˇszlĂł Babai, founder of the online journal Theory of Computing, which, like Discrete Analysis, is free to read and has no publication charges, very generously offered to provide for us a suitable adaptation of their style file. As a result, our articles will from now on have a uniform appearance and, more importantly, will appear with their metadata: after a while it seemed a little strange that the official version of one of our articles would not say anywhere that it was published by Discrete Analysis, but now it tells you that, and the number of the article, the date of publication, the DOI, and so on. So far, our two most recent articles have been formatted — you can see them here and here — and in due course we will reformat all the earlier ones.

If you have an article that you think might suit the journal (and now that we have several articles on our website it should be easier to judge this), we would be very pleased to receive it: 20 articles in our first year is a good start, but we hope that in due course the journal will be perceived as established and the submission rate of good articles will increase. (For comparison, Combinatorica published 31 articles in 2015, and Combinatorics, Probability and Computing publishes around 55 articles a year, to judge from a small sample of issues.)

]]>

The structure of the story is wearily familiar after what happened with USS pensions. The authorities declare that there is a financial crisis, and that painful changes are necessary. They offer a consultation. In the consultation their arguments appear to be thoroughly refuted. The refutation is then ignored and the changes go ahead.

Here is a brief summary of the painful changes that are proposed for the Leicester mathematics department. The department has 21 permanent research-active staff. Six of those are to be made redundant. There are also two members of staff who concentrate on teaching. Their number will be increased to three. How will the six be chosen? Basically, almost everyone will be sacked and then invited to reapply for their jobs in a competitive process, and the plan is to get rid of “the lowest performers” at each level of seniority. Those lowest performers will be considered for “redeployment” — which means that the university will make efforts to find them a job of a broadly comparable nature, but doesn’t guarantee to succeed. It’s not clear to me what would count as broadly comparable to doing pure mathematical research.

How is performance defined? It’s based on things like research grants, research outputs, teaching feedback, good citizenship, and “the ongoing and potential for continued career development and trajectory”, whatever that means. In other words, on the typical flawed metrics so beloved of university administrators, together with some subjective opinions that will presumably have to come from the department itself — good luck with offering those without creating enemies for life.

Oh, and another detail is that they want to reduce the number of straight maths courses and promote actuarial science and service teaching in other departments.

There is a consultation period that started in late August and ends on the 30th of September. So the lucky members of the Leicester mathematics faculty have had a whole month to marshall their to-be-ignored arguments against the changes.

It’s important to note that mathematics is not the only department that is facing cuts. But it’s equally important to note that it *is* being singled out: the university is aiming for cuts of 4.5% on average, and mathematics is being asked to make a cut of more like 20%. One reason for this seems to be that the department didn’t score all that highly in the last REF. It’s a sorry state of affairs for a university that used to boast Sir Michael Atiyah as its chancellor.

I don’t know what can be done to stop this, but at the very least there is a petition you can sign. It would be good to see a lot of signatures, so that Leicester can see how damaging a move like this will be to its reputation.

]]>

I’ll consider three questions: why we need supranational organizations, to what extent we should care about sovereignty, and whether we should focus on the national interest.

In the abstract, the case for supranational organizations is almost too obvious to be worth making: just as it often benefits individual people to form groups and agree to restrict their behaviour in certain ways, so it can benefit nations to join groups and agree to restrict their behaviour in certain ways.

To see in more detail why this should be, I’ll look at some examples, starting with an example concerning individual people. It has sometimes been suggested that a simple way of dealing with the problem of drugs in sport would be to allow people to use whatever drugs they want. Even with the help of drugs, the Ben Johnsons of this world can’t set world records and win Olympic gold medals unless they are also amazing athletes, so if we allowed drugs, there would still be a great deal of room for human achievement.

There are many arguments against this proposal. A particularly powerful one is that allowing drugs has the effect of making them compulsory: they offer enough of a boost to performance that a drug-free athlete would almost certainly be unable to compete at the highest level if a large proportion of other athletes were taking drugs. Since taking drugs has serious adverse health effects — for instance, it has led to the deaths of several cyclists — it is better if competitors agree to forswear this method of gaining a competitive advantage. But just saying, “I won’t take drugs if you don’t” isn’t enough, since for any individual there will always be a huge temptation to break such an agreement. So one also needs organizations to which athletes belong, with precise rules and elaborate systems of testing.

This example has two features that are characteristic of many cooperative agreements.

- It is better for everybody if everybody cooperates than if everybody breaks the agreement.
- Whatever everybody else does, any individual will benefit from breaking the agreement (at least in the short term — of course, others may then follow suit).

These are the classic features of the Prisoner’s Dilemma, and whenever they occur, there is a case for an enforceable agreement. Such an agreement will leave everybody better off by forcing individuals not to act in their immediate self-interest.

The “individuals” in the Prisoner’s Dilemma need not be people: they can just as easily be countries. Here are a few examples.

Many people think that a country is better off if its workers are decently paid, do not work excessively long hours, and work in a safe environment. (If you are sufficiently right wing, then you may disagree, but that just means that you will need other examples to illustrate the abstract principle.) However, treating workers decently costs money, so if you are a company that is competing with companies from other countries, it is tempting to gain a competitive advantage by paying workers less, making them work longer hours, and cutting back on health and safety measures, which will enable you to reduce the price of your product. More generally, if you are a national government, it is tempting to gain a competitive advantage for your whole country by allowing companies to treat their workers less well. And it may be that that competitive advantage is of net benefit to your country: yes, some workers suffer, but the benefit to the economy in general reduces unemployment, helps your country to build more hospitals, and so on.

In such a situation, it may benefit an individual country to become “the sweatshop of Europe”. If that is the case, then in the absence of a supranational organization that forbids this, there is a pressure on all countries to do it, after which (i) there is no competitive advantage any more and (ii) workers are worse off. Thus, with a supranational organization, all countries are better off.

Another obvious example — so obvious that I won’t dwell on it — is the need to combat climate change. (Again, this will not appeal to a certain sort of right-winger who thinks that climate change is a big socialist conspiracy, but I doubt that many of those read this blog.) The world as a whole will be much better off if we all emit less carbon, but if you hold the behaviour of other countries constant, then whatever one country does to reduce carbon emissions makes less difference to its future interests than the cost of making the reductions. So again we need enforceable supranational agreements.

A third example is corporation tax. One way of attracting foreign investment is to have a low rate of corporation tax. So if countries are left completely free to set their tax rates, there may well be a race to the bottom, with the result that no country ends up benefiting very much from the tax revenue from foreign investors. (There will still be other benefits, such as the resulting employment.) But one can lift this “bottom” if a group of countries agrees to keep corporation taxes above a certain level. Unless that level is so high that it puts off foreign investors from investing anywhere in the group, then the countries in the group will now benefit from additional tax revenue.

Every time I hear a Leave campaigner complain about EU regulation, my first reaction is to wonder whether what they really want is to defect from an agreement that is there to deal with an instance of the Prisoner’s Dilemma. And sure enough, they often do. For example, a few days ago the farming minister George Eustice said that leaving the EU would free us from green directives. One of the directives he particularly wants to get rid of is the birds and habitat directive, which costs farmers money because it forces them to protect birds and wildlife habitats. He claims that Britain would introduce its own, better environmental legislation. But without the EU legislation, Britain would have a strong incentive to gain a competitive advantage by making its legislation less strict.

Similarly, a little while ago I heard a fisherman talking about how his livelihood suffered as a result of EU fishing quotas, and how he hoped that Britain would leave the EU and let him fish more. He didn’t put it quite that crudely, but that was basically what he was saying. And yet without quotas, the fishing stock would rapidly decline and that very same fisherman’s livelihood would vanish completely.

Do I trust our government not to succumb to these kinds of agreement-breaking temptations? Of course not. But more to the point, with a supranational body making appropriate legislation, I do not have to.

Sovereignty is often spoken of as though it is a good thing in itself. Why might that be? Well, if a country is free to do what it wants, then it is free to act in the best interests of its inhabitants, whereas if it is restricted by belonging to a supranational organization, then it loses some of that freedom, and therefore risks no longer being able to act in the best interests of its inhabitants.

However, as I have already explained, there are many situations where an agreement benefits all countries, but an individual country can gain, at least in the short term, by breaking it. In such situations, countries are better off without the freedom to act in the *immediate* best interests of their citizens, since those same citizens are better off if the agreements do not break down.

If sovereignty is what really matters, then why should it be *national* sovereignty that is important? Why should I want decisions to be taken at the level of the nation state and not at the level of, say, cities, or continents, or counties, or families? What I feel about it is something like this: I want to have as much influence as possible on the people who are making decisions that affect me, and I want those people to be well informed about my interests and to care about them. That suggests that decisions should be made at the lowest possible level. However, for the reasons rehearsed above, there are often advantages to be gained from taking decisions at a higher level, and those advantages often outweigh the resulting loss of influence I have. For example, I am happy to pay income tax, since there is no realistic more local way to finance much of the country’s infrastructure from which I greatly benefit. Unfortunately I don’t have much influence over the national government, so some of the income tax is spent in ways I disapprove of: for example, a few hundred pounds of what I contribute will probably go towards renewing Trident, which is — in my judgment anyway — a gigantic waste of money. But that loss of influence is part of the bargain: the advantages of paying income tax outweigh the disadvantages.

Thus, what really matters is *subsidiarity* rather than sovereignty. One used to hear the word “subsidiarity” constantly in the early 1990s, the last time the Conservative Party was ripping itself apart over Europe, but it has been strangely absent from the debate this time round (or if it hasn’t, then I’ve missed it). It is the principle that decisions should be taken at the lowest level that is appropriate. So, for example, measures to combat climate change should be taken at a supranational level, the decision to build a new motorway should be taken at a national level, and the decision to improve the lighting in a back street should be taken at a town-council level.

The principle of subsidiarity has been enshrined in European Union law since the Maastricht Treaty of 1992. Point 3 of Article 5 of the Lisbon Treaty of 2009 reads as follows.

Under the principle of subsidiarity, in areas which do not fall within its exclusive competence, the Union shall act only if and insofar as the objectives of the proposed action cannot be sufficiently achieved by the Member States, either at central level or at regional and local level, but can rather, by reason of the scale or effects of the proposed action, be better achieved at Union level.

The institutions of the Union shall apply the principle of subsidiarity as laid down in the Protocol on the application of the principles of subsidiarity and proportionality. National Parliaments ensure compliance with the principle of subsidiarity in accordance with the procedure set out in that Protocol.

When I hear politicians on the Leave side talk about sovereignty, I am again suspicious. What I hear is, “I want unfettered power.” But unfettered power for the Boris Johnsons of this world is not in my best interests or the best interests of the UK, which is why I shall vote for the fetters.

All other things being equal, of course the national interest matters, since what is better for my country is, well, better. But all things are not necessarily equal. I don’t for a moment believe that it would be in the UK’s best interests to leave the EU, but just suppose for a moment that it were. That still leaves us with the question of whether it would be in *Europe’s* best interests.

I am raising that question not in order to answer it (though I think the answer is pretty obvious), but to discuss whether it should be an important consideration. So let me suppose, hypothetically, that leaving the EU would be in the best interests of the UK but would be very much not in the best interests of the rest of Europe. Should I vote for the UK to leave?

If I were an extreme utilitarian, I would argue as follows: the total benefit of the UK leaving the EU is the total benefit to the UK minus the total cost to the rest of the EU; that is negative, so the UK should stay in the EU.

However, I am not an extreme utilitarian in that sense: if I were, I would sell my house and give all my money to charities that had been carefully selected (by an organization such as GiveWell) to do the maximum amount of good per pound. My family would suffer, but that suffering would be far outweighed by all the suffering I could relieve with that money. I have no plans to do that, but I am a utilitarian to this extent: such money as I *do* give to charity, I try to give to charities that are as efficient (in the amount-of-good-per-pound sense) as possible. If somebody asks me to give to a good cause, I am usually reluctant, because I feel it is my moral duty to give the money to an even better cause. (As an example, I once refused to take part in an ice bucket challenge but made a donation to one of GiveWell’s recommended charities instead.)

Thus, the principle I adopt is something like this. There are some people I care about more than others: my family, friends, and colleagues (in the broad sense of people round the world with similar interests) being the most obvious examples. Part of the reason for this is the very selfish one that my own interests are bound up with theirs: we belong to identifiable groups, and if those groups as a whole thrive, then that is very positive for me. So when I am making a decision, I will tend to give a significantly higher weight to people who are closer to me, in the sense of having interests that are aligned with mine.

But once that weighting is taken into account, I basically *am* a utilitarian. That is, if I’m faced with a choice, then I want to go for the option that maximizes total utility, except that the utility of people closer to me counts for more. Whether or not it *should* count for more is another question, but it does, and I think it does for most people. (I have oversimplified my position a bit here, but I don’t want to start writing a treatise in moral philosophy.)

So for me the question about national interest boils down to this: do I feel closer to people who are British than I do to people from other European countries?

I certainly feel closer to *some* British people, but that is not really because of their intrinsic Britishness: it’s just that I have lived in Britain almost all my life, so the people I have got close to I have mostly met here. What’s more there are plenty of non-British Europeans I feel closer to than I do to most British people: my wife and in-laws are a particularly strong example, but I also have far more in common with a random European academic, say, than I do with a random inhabitant of the UK.

So the mere fact that someone is British does not make me care about them more. To take an example, some regions of the UK are significantly less well off than others, and have been for a long time. I would very much like to see those regions regenerated. But I do not see why that should be more important to me than the regeneration of, say, Greece. Similarly, I am no more concerned by the fact that the UK is a net contributor to the EU than I am by the fact that I am a net contributor to the welfare state. (In fact, I’m a lot less concerned by it, since the net contribution is such a small proportion of our GDP that it is almost certainly made up for by the free trade benefits that result.)

I have given three main arguments: that we need supranational organizations to deal with prisoner’s-dilemma-type situations, that subsidiarity is what matters rather than sovereignty, and that one should not make a decision that is based solely on the national interest and that ignores the wider European interest.

One could in theory agree with everything I have written but argue that the EU is not the right way of dealing with problems that have to be dealt with at an international level. I myself certainly don’t think it’s perfect, but it is utterly unrealistic to imagine that if we leave then we will end up with an organization that does the job better.

]]>

But as I’ve got a history with this problem, including posting about it on this blog in the past, I feel I can’t just not react. So in this post and a subsequent one (or ones) I want to do three things. The first is just to try to describe my own personal reaction to these events. The second is more mathematically interesting. As regular readers of this blog will know, I have a strong interest in the question of where mathematical ideas come from, and a strong conviction that they *always* result from a fairly systematic process — and that the opposite impression, that some ideas are incredible bolts from the blue that require “genius” or “sudden inspiration” to find, is an illusion that results from the way mathematicians present their proofs after they have discovered them.

From time to time an argument comes along that appears to present a stiff challenge to my view. The solution to the cap-set problem is a very good example: it’s easy to understand the proof, but the argument has a magic quality that leaves one wondering how on earth anybody thought of it. I’m referring particularly to the Croot-Lev-Pach lemma here. I don’t pretend to have a complete account of how the idea might have been discovered (if any of Ernie, Seva or Peter, or indeed anybody else, want to comment about this here, that would be extremely welcome), but I have some remarks.

The third thing I’d like to do reflects another interest of mine, which is avoiding duplication of effort. I’ve spent a little time thinking about whether there is a cheap way of getting a Behrend-type bound for Roth’s theorem out of these ideas (and I’m not the only one). Although I wasn’t expecting the answer to be yes, I think there is some value in publicizing some of the dead ends I’ve come across. Maybe it will save others from exploring them, or maybe, just maybe, it will stimulate somebody to find a way past the barriers that seem to be there.

There’s not actually all that much to say here. I just wanted to comment on a phenomenon that’s part of mathematical life: the feeling of ambivalence one has when a favourite problem is solved by someone else. The existence of such a feeling is hardly a surprise, but slightly more interesting are the conditions that make it more or less painful. For me, an extreme example where it was not at all painful was Wiles’s solution of Fermat’s Last Theorem. I was in completely the wrong area of mathematics to have a hope of solving that problem, so although I had been fascinated by it since boyhood, I could nevertheless celebrate in an uncomplicated way the fact that it had been solved in my lifetime, something that I hadn’t expected.

Towards the other end of the spectrum for me personally was Tom Sanders’s quasipolynomial version of the Bogolyubov-Ruzsa lemma (which was closely related to his bound for Roth’s theorem). That was a problem I had worked on very hard, and some of the ideas I had had were, it turned out, somewhat in the right direction. But Tom got things to work, with the help of further ideas that I had definitely not had, and by the time he solved the problem I had gone for several years without seriously working on it. So on balance, my excitement at the solution was a lot greater than the disappointment that that particular dream had died.

The cap-set problem was another of my favourite problems, and one I intended to return to. But here I feel oddly un-disappointed. The main reason is that I know that if I had started work on it again, I would have continued to try to push the Fourier methods that have been so thoroughly displaced by the Lev-Croot-Pach lemma, and would probably have got nowhere. So the discovery of this proof has saved me from wasting a lot of time at some point in the future. It’s also an incredible bonus that the proof is so short and easy to understand. I could almost feel my brain expanding as I read Jordan Ellenberg’s preprint and realized that here was a major new technique to add to the toolbox. Of course, the polynomial method is not new, but somehow this application of it, at least for me, feels like one where I can make some headway with understanding why it works, rather than just gasping in admiration at each new application and wondering how on earth anyone thought of it.

That brings me neatly on to the next theme of this post. From now on I shall assume familiarity with the argument as presented by Jordan Ellenberg, but here is a very brief recap.

The key to it is the lemma of Croot, Lev and Pach (very slightly modified), which states that if and is a polynomial of degree in variables such that for every pair of distinct elements , then is non-zero for at most values of , where is the dimension of the space of polynomials in of degree at most .

Why does this help? Well, the monomials we consider are of the form where each . The expected degree of a random such monomial is , and for large the degree is strongly concentrated about its mean. In particular, if we choose , then the probability that a random monomial has degree greater than is exponentially small, and the probability that a random monomial has degree less than is also exponentially small.

Therefore, the dimension of the space of polynomials of degree at most (for this ) is at least , while the dimension of the space of polynomials of degree at most is at most . Here is some constant less than 1. It follows that if is a set of density greater than we can find a polynomial of degree that vanishes everywhere on and doesn’t vanish on . Furthermore, if has density a bit bigger than this — say , we can find a polynomial of degree that vanishes on and is non-zero at more than points of . Therefore, by the lemma, it cannot vanish on all with distinct elements of , which implies that there exist distinct such that for some .

Now let us think about the Croot-Lev-Pach lemma. It is proved by a linear algebra argument: we define a map , where is a certain vector space over of dimension , and we also define a bilinear form on , with the property that for every . Then the conditions on translate into the condition that for all distinct . But if is non-zero at more than points in , that gives us such that if and only if , which implies that are linearly independent, which they can’t be as they all live in the -dimensional space .

The crucial thing that makes this lemma useful is that we have a huge space of functions — of almost full dimension — each of which can be represented this way with a very small .

The question I want to think about is the following. Suppose somebody had realized that they could bound the size of an AP-free set by finding an almost full-dimensional space of functions, each of which had a representation of the form , where took values in a low-dimensional vector space . How might they have come to realize that polynomials could do the job? Answering this question doesn’t solve the mystery of how the proof was discovered, since the above realization seems hard to come by: until you’ve seen it, the idea that almost all functions could be represented very efficiently like that seems somewhat implausible. But at least it’s a start.

Let’s turn the question round. Suppose we know that has the property that for every , with taking values in a -dimensional space. That is telling us that if we think of as a matrix — that is, we write for — then that matrix has rank . So we can ask the following question: given a matrix that happens to be of the special form (where the indexing variables live in ), under what circumstances can it possibly have low rank? That is, what about makes have low rank?

We can get some purchase on this question by thinking about how operates as a linear map on functions defined on . Indeed, we have that if is a function defined on (I’m being a bit vague for the moment about where takes its values, though the eventual answer will be ), then we have the formula . Now has rank if and only if the functions of the form form a -dimensional subspace. Note that if is the function , we have that . Since every is a linear combination of delta functions, we are requiring that the translates of should span a subspace of dimension . Of course, we’d settle for a lower dimension, so it’s perhaps more natural to say at most . I won’t actually write that, but it should be understood that it’s what I basically mean.

What kinds of functions have the nice property that their translates span a low-dimensional subspace? And can we find a huge space of such functions?

The answer that occurs most naturally to me is that characters have this property: if is a character, then every translate of is a multiple of , since . So if is a linear combination of characters, then its translates span a -dimensional space. (So now, just to be explicit about it, my functions are taking values in .)

Moreover, the converse is true. What we are asking for is equivalent to asking for the convolutions of with other functions to live in a -dimensional subspace. If we take Fourier transforms, we now want the pointwise products of with other functions to live in a -dimensional subspace. Well, that’s exactly saying that takes non-zero values. Transforming back, that gives us that needs to be a linear combination of characters.

But that’s a bit of a disaster. If we want an -dimensional space of functions such that each one is a linear combination of at most characters, we cannot do better than to take . The proof is the same as one of the arguments in Ellenberg’s preprint: in an -dimensional space there must be at least active coordinates, and then a random element of the space is on average non-zero on at least of those.

So we have failed in our quest to make exponentially close to and exponentially close to zero.

But before we give up, shouldn’t we at least consider backtracking and trying again with a different field of scalars? The complex numbers didn’t work out for us, but there is one other choice that stands out as natural, namely .

So now we ask a question that’s exactly analogous to the question we asked earlier: what kinds of functions have the property that they and their translates generate a subspace of dimension ?

Let’s see whether the characters idea works here. Are there functions with the property that ? No there aren’t, or at least not any interesting ones, since that would give us that for every , which implies that is constant (and because , that constant has to be 0 or 1).

OK, let’s ask a slightly different question. Is there some fairly small space of functions from to that is closed under taking translates? That is, we would like that if belongs to the space, then for each the function also belongs to the space.

One obvious space of functions with this property is linear maps. There aren’t that many of these — just an -dimensional space of them (or -dimensional if we interpret “linear” in the polynomials sense rather than the vector-spaces sense) — sitting inside the -dimensional space of *all* functions from to .

It’s not much of a stretch to get from here to noticing that polynomials of degree at most form another such space. For example, we might think, “What’s the simplest function I can think of that isn’t linear?” and we might then go for something like . That and its translates generate the space of all quadratic polynomials that depend on only. Then we’d start to spot that there are several spaces of functions that are closed under translation. Given any monomial, it and its translates generate the space generated by all smaller monomials. So for example the monomial and its translates generate the space of polynomials of the form . So any down-set of monomials defines a subspace that is closed under translation.

I think, but have not carefully checked, that these are in fact the *only* subspaces that are closed under translation. Let me try to explain why. Given any function from to , it must be given by a polynomial made out of cube-free monomials. That’s simply because the dimension of the space of such polynomials is . And I think that if you take any polynomial, then the subspace that it and its translates generate is generated by all the monomials that are less than a monomial that occurs in with a non-zero coefficient.

Actually no, that’s false. If I take the polynomial , then every translate of it is of the form . So without thinking a bit more, I don’t have a characterization of the spaces of functions that are closed under translation. But we can at least say that polynomials give us a rich supply of them.

I’m starting this section a day after writing the sections above, and after a good night’s sleep I have clarified in my mind something I sort of knew already, as it’s essential to the whole argument, which is that the conjectures that briefly flitted across my mind two paragraphs ago and that turned out to be false *absolutely had to be false*. Their falsity is pretty much the whole point of what is going on. So let me come to that now.

Let me call a subspace *closed* if it is closed under translation. (Just to be completely explicit about this, by “translation” I am referring to operations of the form , which take a function to the function .) Note that the sum of two closed subspaces is closed. Therefore, if we want to find out what closed subspaces are like, we could do a lot worse than thinking about the closed subspaces generated by a single function, which it now seems good to think of as a polynomial.

Unfortunately, it’s not easy to illustrate what I’m about to say with a simple example, because simple examples tend to be too small for the phenomenon to manifest itself. So let us argue in full generality. Let be a polynomial of degree at most . We would like to understand the rank of the matrix , which is equal to the dimension of the closed subspace generated by , or equivalently the subspace generated by all functions of the form .

At first sight it looks as though this subspace could contain pretty well all linear combinations of monomials that are dominated by monomials that occur with non-zero coefficients in . For example, consider the 2-variable polynomial . In this case we are trying to work out the dimension of the space spanned by the polynomials

.

These live in the space spanned by six monomials, so we’d like to know whether the vectors of the form span the whole of or just some proper subspace. Setting we see that we can generate the standard basis vectors and . Setting it’s not hard to see that we can also get and . And setting we see that we can get the fourth and sixth coordinates to be any pair we like. So these do indeed span the full space. Thus, in this particular case one of my false conjectures from earlier happens to be true.

Let’s see why it is false in general. The argument is basically repeating the proof of the Croot-Lev-Pach lemma, but using that proof to prove an equivalent statement (a bound for the rank of the closed subspace generated by ) rather than the precise statement they proved. (I’m not claiming that this is a radically different way of looking at things, but I find it slightly friendlier.)

Let be a polynomial. One thing that’s pretty clear, and I think this is why I got slightly confused yesterday, is that for every monomial that’s dominated by a monomial that occurs non-trivially in we can find some linear combination of translates such that occurs with a non-zero coefficient. So if we want to prove that these translates generate a low-dimensional space, we need to show that there are some heavy-duty linear dependences amongst these coefficients. And there are! Here’s how the proof goes. Suppose that has degree at most . Then we won’t worry at all about the coefficients of the monomials of degree at most : sure, these generate a subspace of dimension (that’s the definition of , by the way), but unless is very close to , that’s going to be very small.

But what about the coefficients of the monomials of degree greater than ? This is where the linear dependences come in. Let be such a monomial. What can we say about its coefficient in the polynomial ? Well, if we expand out and write it as a linear combination of monomials, then the coefficient of will work out as a gigantic polynomial in . However, and this is the key point, this “gigantic” polynomial will have degree at most . That is, for each such monomial , we have a polynomial of degree at most such that gives the coefficient of in the polynomial . But these polynomials all live in the -dimensional space of polynomials of degree at most , so we can find a spanning subset of them of size at most . In other words, we can pick out at most of the polynomials , and all the rest are linear combinations of those ones. This is the huge linear dependence we wanted, and it shows that the projection of the closed subspace generated by to the monomials of degree at least is also at most .

So in total we get that and its translates span a space of dimension at most , which for suitable is much much smaller than . This is what I am referring to when I talk about a “rank miracle”.

Note that we could have phrased the entire discussion in terms of the rank of . That is, we could have started with the thought that if is a function defined on such that whenever are distinct elements of , and for at least points , then the matrix would have rank at least , which is the same as saying that and its translates span a space of dimension at least . So then we would be on the lookout for a high-dimensional space of functions such that for each function in the class, and its translates span a much lower-dimensional space. That is what the polynomials give us, and we don’t have to mention a funny non-linear function from to a vector space .

I still haven’t answered the question of whether the rank miracle is a miracle. I actually don’t have a very good answer to this. In the abstract, it is a big surprise that there is a space of functions of dimension that’s exponentially close to the maximal dimension such that for every single function in that space, the rank of the matrix is exponentially small. (Here “exponentially small/close” means as a fraction of .) And yet, once one has seen the proof, it begins to feel like a fairly familiar concentration of measure argument: it isn’t a surprise that the polynomials of degree at most form a space of almost full dimension and the polynomials of degree at most form a space of tiny dimension. And it’s not completely surprising (again with hindsight) that because in the expansion of you can’t use more than half the degree for both and , there might be some way of arguing that the translates of live in a subspace of dimension closer to .

This post has got rather long, so this seems like a good place to cut it off. To be continued.

]]>

It has long been a conviction of mine that the effort-reducing forces we have seen so far are just the beginning. One way in which the internet might be harnessed more fully is in the creation of amazing new databases, something I once asked a Mathoverflow question about. I recently had cause (while working on a research project with a student of mine, Jason Long) to use Sloane’s database in a serious way. That is, a sequence of numbers came out of some calculations we did, we found it in the OEIS, that gave us a formula, and we could prove that the formula was right. The great thing about the OEIS was that it solved an NP-ish problem for us: once the formula was given to us, it wasn’t that hard to prove that it was correct for our sequence, but finding it in the first place would have been extremely hard without the OEIS.

I’m saying all this just to explain why I rejoice that a major new database was launched today. It’s not in my area, so I won’t be using it, but I am nevertheless very excited that it exists. It is called the L-functions and modular forms database. The thinking behind the site is that lots of number theorists have privately done lots of difficult calculations concerning L-functions, modular forms, and related objects. Presumably up to now there has been a great deal of duplication, because by no means all these calculations make it into papers, and even if they do it may be hard to find the right paper. But now there is a big database of these objects, with a large amount of information about each one, as well as a great big graph of connections between them. I will be very curious to know whether it speeds up research in number theory: I hope it will become a completely standard tool in the area and inspire people in other areas to create databases of their own.

]]>

Ten pounds bet then would have net me 50000 pounds now, so a natural question arises: should I be kicking myself (the appropriate reaction given the sport) for not placing such a bet? In one sense the answer is obviously yes, as I’d have made a lot of money if I had. But I’m not in the habit of placing bets, and had no idea that these odds were being offered anyway, so I’m not too cut up about it.

Nevertheless, it’s still interesting to think about the question hypothetically: if I *had* been the betting type and had known about these odds, should I have gone for them? Or would regretting not doing so be as silly as regretting not choosing and betting on the particular set of numbers that just happened to win the national lottery last week?

Here’s a possible argument that the 5000-1 odds at the beginning of the season were about right, or at least not too low, and an attempted explanation of why hardly anybody bet on Leicester. If you’ve watched football for any length of time, you know that the league is dominated by the big clubs, with their vast resources to spend on top players and managers. Just occasionally a middle-ranking club has a surprisingly good season and ends up somewhere near the top. But a bottom-ranking club that hasn’t just been lavished with money doesn’t become a top club overnight, and since to win the league you have to do consistently well over an entire season, it just ain’t gonna happen that a club like Leicester will win.

And here are a few criticisms of the above argument.

1. The argument that we know how things work from following the game for years or even decades is convincing if all you want to prove is that it is very unlikely that a team like Leicester will win. But here we want to prove that the odds are not just low, but one-in-five-thousand low. What if the probability of it happening in any given season were 100 to 1? We haven’t had many more than 100 seasons ever, so we might well never have observed what we observed this season.

2. The argument that consistency is required over a whole season is a very strong one if the conclusion to be established is that a mediocre team will almost never win. Indeed, for a mediocre team to beat a very good team some significantly good luck is required. And the chances of that kind of luck happening enough times during a season for the team to win the league are given by the tail of a binomial distribution, so they are tiny.

However, in practice it is not at all true that results of different matches are independent. Once Leicester had won a few matches against far bigger and richer clubs, a simple Bayesian calculation would have shown that it was far more likely that Leicester had somehow become a much better team since last season than that it had won those matches by a series of independent flukes. I think the bookmakers probably made a big mistake by offering odds of 1000-1 three months into the season, at which point Leicester were top. Of course we all expected them to fall off, but were we 99.9% sure of that? Surely not. (I think if I’d known about those odds, I probably would have bet ÂŁ20 or so. Oh well.)

3. Although it was very unlikely that Leicester would suddenly become far better, there were changes, such as a new manager and some unheralded new players who turned out to be incredibly good. How unlikely is it that a player who has caught someone’s eye will be much better than anybody expected? Pretty unlikely but not impossible, I’d have thought: it’s quite common for players to blossom when they move to a new club.

4. The fact that Leicester had a remarkable escape from relegation at the end of last season, winning seven of their last nine matches, was already fairly strong evidence that something had changed (see point 2 above). Had they accumulated their meagre points total in a more uniform manner, it would have reduced the odds of their winning this season.

The first criticism above is not itself beyond criticism, since we have more data to go on than just the English league. If nothing like the Leicester story had happened in any league anywhere in the world since the beginning of the game, then the evidence would be more convincing. But from what I’ve read in the papers the story isn’t *completely* unprecedented: that is, pretty big surprises do just occasionally happen. Though against that, the way that money has come into the game has made the big clubs more dominant in recent years, which would seem to reduce Leicester’s chances.

I’m not going to come to any firm conclusion here, but my instinct is that 5,000-1 was a very good bet to take at the beginning of the season, even without hindsight, and that 1000-1 three months later was an amazing chance. I’m ignoring here the well-known question of whether it is sensible to take unlikely bets just because your expected gain is positive. I’m just wondering whether the expected gain *was* positive. Your back-of-envelope calculations on the subject are welcome …

]]>

We have decided to splash out and use a publishing platform called Scholastica. Scholastica was founded in 2011 by some University of Chicago graduates who wanted to disrupt the current state of affairs in academic publishing by making it very easy to create electronic journals. I say “splash out” because they charge $10 per submission, whereas there are other ways of creating electronic journals that are free. But we have got a lot for that $10, as I shall explain later in this post, and the charge compares favourably, to put it mildly, with the article processing charges levied by more traditional publishers. (An example: if you have had an article accepted by the Elsevier journal Advances in Mathematics, the price you need to pay to make that article open access is $1500; the same amount of money would cover 100 submissions to Discrete Analysis. I didn’t say 150 because there are some small further costs we incur, such as a subscription to CrossRef, which enables us to issue DOIs to our articles.) Most importantly, we do not pass on even this $10 charge to authors, as we have a small fund that covers it.

Now that we have been handling submissions for almost six months, we have been forced to make decisions that leave us with a rather clearer idea about what the scope and standards of the journal are. As far as the scope is concerned, we want to be reasonably broad. For example, the analysis in the paper by Tuomas Hytönen, Sean Li and Assaf Naor is not really discrete in any useful sense, but we judged it to have a similar spirit to the kind of papers that fit the title of the journal more obviously by treating discrete structures using analytic tools. Our rough policy is that if a paper is good enough, then we will not be too worried about whether it has the right sort of subject matter, as long as it isn’t in an area that is completely foreign to the editorial board.

As for the quality, we have been surprised and gratified by the high standard of submissions we have received, which has allowed us to set a high bar, turn away some perfectly respectable papers, and establish Discrete Analysis as a distinctly good journal.

That is an important part of our mission, because we want to show that the cheapness of running the journal is completely compatible with high quality. And that does not just mean mathematical quality. One thing I hope you will notice is that the journal’s website is far better designed than almost any other website of a mathematics journal. This design was done by the Scholastica team for no charge (I think they see it as an investment, since they would like to attract more journals to their platform), and it satisfies various requirements I felt strongly about: for example, that it should be attractive to look at, that one should be able to explore the content of the journal without undue clicking and loading of new pages, and that it would be able to handle basic LaTeX. But it has other features that I did not think of, such as having an image associated with each article (which seems pointless until you actually look at the site and see how the image makes it easier to browse and more tempting to find out about the article) and making the site work well on your phone as well as your laptop. If you compare it with, say, the website of Forum of Mathematics, Sigma, it’s like comparing a Rolls Royce with a Trabant, except that someone has mischievously exchanged the price tags. (Let me add here that there are many good things about Forum of Mathematics. In particular, its editorial practices have been a strong influence on those of Discrete Analysis. And it is far from alone in having an unimaginative and inconvenient website.)

Since I am keen to promote the arXiv overlay model, I was also particularly concerned that Discrete Analysis should not be perceived as “just like a normal journal, but without X, Y and Z”. Rather, I wanted it to be *better* than a normal journal in important respects (and at least equal to a normal journal in all respects that anyone actually cares about). If you visit the website, you will notice that each article gives you an option to click on the words “Editorial introduction”. If you do so, then up comes a description of the article (not on a new webpage, I hasten to add), which sets it in some kind of context and helps you to judge whether you might want to go ahead and read it on the arXiv.

There are at least two reasons for doing this. One is that if the website were nothing but a list of links, then there would be a danger that it would seem a bit pointless: about the only reason to visit it would be to check that when an author claims to have been published by us, then that is actually true. But with article descriptions and a well-designed website, one can actually *browse* the journal. Browsing is something I used to enjoy doing back when print journals were all that there were, but it is quite a lot harder when everything is electronic. (Some websites try to interest you in related content, but it seems to be chosen by rather unsophisticated algorithms, and in any case is not what I am talking about — I mean the less focused kind of browsing where you stumble on an interesting paper that neither you nor an algorithm based on your browsing history would ever have thought of looking at.)

A second reason is that having these introductions goes a small way towards dealing with a serious objection to the current system of peer review, which is that a great deal of valuable information never gets made public. As an editor, I sometimes get to read very interesting information that puts a submitted article into a context that I didn’t know about. All the reader of the journal gets is one bit of information: that the article was accepted rather than rejected. (One could argue that it isn’t even one bit, since we do not learn which articles have been rejected.) Of course, under cover of privacy and anonymity, referees can also make remarks that one would not want to make public, but with article descriptions we don’t have to. We can simply write the descriptions using information from the article itself, prior knowledge, remarks made by the referees, remarks made by editors, relevant facts discovered from the internet, and so on. And how this information is selected and combined can vary from article to article, so the reader won’t know whether any particular piece of information was part of a referee’s report.

Thus, Discrete Analysis is offering services that other journals do not offer. Here’s another one. Suppose you submit an article to Discrete Analysis and we accept it. The next stage is for you to submit a revision to arXiv, taking account of the referee’s comments. Once that’s done, we make sure we have an editorial introduction and appropriate metadata in place, and publish it. But what if at some later date you suddenly realize that there is a shorter and more informative proof of Lemma 2.3? With the conventional publishing system, that’s basically just too bad: you’re stuck with the accepted version.

In a way that’s true for us too. The version that’s accepted becomes what people like to call the version of record, so that when people refer to your paper there won’t be any confusion about what exactly they are referring to. (This is important of course, though in my view the legacy publishers massively exaggerate its importance.) However, being an arXiv overlay journal allows us to reach a much more satisfactory compromise between having a fixed version of record and allowing updates. If you follow the link from the journal webpage to the article and the article has subsequently been updated, the arXiv page you link to will inform you that the version you are looking at is not the latest one. So without our having to do anything, since it happens automatically with the arXiv, readers get the best of both worlds. As an example, here is the arXiv page for a version of a preprint by Bourgain and Demeter (not submitted to Discrete Analysis). As you’ll see, the information that it is not the latest version is clearly highlighted in red.

Another feature of Discrete Analysis, but this one it shares with other purely electronic journals, is that we are not artificially constrained by the need to fill a certain number of pages per year. So you will not hear from us that we receive many more good articles than we can accept, or that your article, though excellent, is too long — we just have a standard we are aiming for and will accept all articles that we judge to reach it.

So if you have a good paper that could conceivably be within our scope, then why not submit it to us? Your paper will have some very good company (just look at the website if you don’t believe me). It will be properly promoted on a website that embraces what the internet has to offer rather than merely being a pale shadow of a paper journal. And you will be helping, in a small way, to bring about a change to the absurdly expensive and anachronistic system of academic publishing that we still have to put up with.

]]>

One question that has arisen is whether FUNC holds if the ground set is the cyclic group and is rotationally invariant. This was prompted by Alec Edgington’s example showing that we cannot always find and an injection from to that maps each set to a superset. Tom Eccles suggested a heuristic argument that if is generated by all intervals of length , then it should satisfy FUNC. I agree that this is almost certainly true, but I think nobody has yet given a rigorous proof. I don’t think it should be too hard.

One can ask similar questions about ground sets with other symmetry groups.

A nice question that I came across on Mathoverflow is whether the intersection version of FUNC is true if consists of all subgroups of a finite group . The answers to the question came very close to solving it, with suggestions about how to finish things off, but the fact that the question was non-trivial was quite a surprise to me.

In response to Alec’s counterexample, Gil Kalai asked a yet weaker question, which is whether one can find and an injection from to that increases the size of each set. It is easy to see that this is equivalent to asking that the number of sets of size at least in is always at least the number of sets of size at least in . One aspect of this question that may make it a good one is that it permits one to look at what happens for particular values of , such as (where is the size of the ground set), and also to attempt induction on . So far this conjecture still seems to be alive.

Another question that is I think alive still is a “ternary version” of FUNC. I put forward a conjecture that had a very simple counterexample, and my attempt to deal with that counterexample led to the following question. Let be a collection of functions from a finite set to and suppose that it is closed under the following slightly strange multivalued operation. Given two functions , let be the set where and . (Thus, there are potentially nine sets . We now take the set of all functions that are constant on each , and either lie between and or lie between and . For example, if and then we get , , , , , and and themselves.

This definition generalizes to alphabets of all sizes. In particular, when the alphabet has size 1, it reduces to the normal FUNC, since the only functions between and that are constant on the sets where and are constant are and themselves. The conjecture is then that there exists such that (where the functions take values in ) for at least of the functions in . If true, this conjecture is best possible, since we can take to consist of all functions from to .

The reason for the somewhat complicated closure operation is that, as Thomas pointed out, one has to rule out systems of functions such as all functions that either take all values in or are the constant function that takes the value 2 everywhere. This set is closed under taking pointwise maxima, but we cannot say anything interesting about how often functions take the largest value. The closure property above stops it being possible to “jump” from small functions to a large one. I don’t think anyone has thought much about this conjecture, so it may still have a simple counterexample.

Another conjecture I put forward also had to be significantly revised after a critical mauling, but this time not because it was false (that still seems to be an open question) but because it was equivalent to a simpler question that was less interesting than I had hoped my original question might be.

I began by noting that if we think of sets in has having weight 1 and sets not in as having weight 0, then the union-closed condition is that . We had already noted problems with adopting this as a more general conjecture, but when weights are 0 or 1, then . So I wondered whether the condition would be worth considering. The conjecture would then be that there exists such that , where we sum over all subsets of the ground set . I had hoped that this question might be amenable to a variational approach.

Alec Edgington delivered two blows to this proposal, which were basically two consequences of the same observation. The observation, which I had spotted without properly appreciating its significance, was that if have non-zero weight, then , and therefore . One consequence of this is that a weighting with is usually not close to a weighting with . Indeed, suppose we can find with and . Then the moment becomes non-zero, is forced to jump from to at least .

A second consequence is that talking about geometric means is a red herring, since that condition implies, and is implied by, the simpler condition that the family of sets with non-zero weight is union closed, and whenever with .

However, this still leaves us with a strengthening of FUNC. Moreover, it is a genuine strengthening, since there are union-closed families where it is not optimal to give all the sets the same weight.

Incidentally, as was pointed out in some of the comments, and also in this recent paper, it is natural to rephrase this kind of problem so that it looks more like a standard optimization problem. Here we would like to maximize subject to the constraints that whenever and for every in the ground set. If we can achieve a maximum greater than 2, then weighted FUNC is false. If we can achieve it with constant weights, then FUNC is false.

However, this is not a linear relaxation of FUNC, since for the weighted version we have to choose the union-closed family before thinking about the optimum weights. The best that might come out of this line of enquiry (as far as I can see) is a situation like the following.

- Weighted FUNC is true.
- We manage to understand very well how the optimum weights depend on the union-closed family .
- With the help of that understanding, we arrive at a statement that is easier to prove than FUNC.

That seems pretty optimistic, but it also seems sufficiently non-ridiculous to be worth investigating. And indeed, quite a bit of investigation has already taken place in the comments on the previous post. In particular, weighted FUNC has been tested on a number of families, and so far no counterexample has emerged.

A quick remark that may have been made already is that if is a group of permutations of the ground set that give rise to automorphisms of , then we can choose the optimal weights to be -invariant. Indeed, if is an optimal weight, then the weight satisfies the same constraints as and . However, the optimal weight is not in general unique, and sometimes there are non--invariant weights that are also optimal.

I wonder whether it is time to think a bit about strategy. It seems to me that the (very interesting) discussion so far has had a “preliminary” flavour: we have made a lot of suggestions, come up with several variants, some of which are false and some of which may be true, and generally improved our intuitions about the problem. Should we continue like that for a while, or are there promising proof strategies that we should consider pursuing in more detail? As ever, there is a balance to be struck: it is usually a good idea to avoid doing hard work until the chances of a payoff are sufficiently high, but sometimes avoiding hard work means that one misses discoveries that could be extremely helpful. So what I’m asking is whether there are any proposals that would involve hard work.

One that I have in the back of my mind is connected with things that Tom Eccles has said. It seems to me at least possible that FUNC could be proved by induction, if only one could come up with a suitably convoluted inductive hypothesis. But how does one do that? One method is a kind of iterative process: you try a hypothesis and discover that it is not strong enough to imply the next case, so you then search for a strengthening, which perhaps implies the original statement but is not implied by smaller versions of itself, so a yet further strengthening is called for, and so on. This process can be quite hard work, but I wonder whether if we all focused on it at once we could make it more painless. But this is just one suggestion, and there may well be better ones.

]]>

After the failure of the average-overlap-density conjecture, I came up with a more refined conjecture along similar lines that has one or two nice properties and has not yet been shown to be false.

The basic aim is the same: to take a union-closed family and use it to construct a probability measure on the ground set in such a way that the average abundance with respect to that measure is at least 1/2. With the failed conjecture the method was very basic: pick a random non-empty set and then a random element .

The trouble with picking random elements is that it gives rise to a distribution that does not behave well when you duplicate elements. (What you would want is that the probability is shared out amongst the duplicates, but in actual fact if you duplicate an element lots of times it gives an advantage to the set of duplicates that the original element did not have.) This is not just an aesthetic concern: it was at the heart of the downfall of the conjecture. What one really wants, and this is a point that Tobias Fritz has been emphasizing, is to avoid talking about the ground set altogether, something one can do by formulating the conjecture in terms of lattices, though I’m not sure what I’m about to describe does make sense for lattices.

Let be a union-closed set system with ground set . Define a *chain* to be a collection of subsets of with the following properties.

- The inclusions are strict.
- Each is an intersection of sets in .
- is non-empty, but for every , either or .

The idea is to choose a random chain and then a random element of . That last step is harmless because the elements of are indistinguishable from the point of view of (they are all contained in the same sets). So this construction behaves itself when you duplicate elements.

What exactly is a random chain? What I suggested before was to run an algorithm like this. You start with . Having got to , let consist of all sets such that is neither empty nor , pick a random set , and let . But that is not the only possibility. Another would be to define a chain to be *maximal* if for every there is no set such that lies strictly between and , and then to pick a maximal chain uniformly at random.

At the moment I think that the first idea is more natural and therefore more likely to work. (But “more likely” does not imply “likely”.) The fact that it seems hard to disprove is not a good reason for optimism, since the definition is sufficiently complicated that it is hard to analyse. Perhaps there is a simple example for which the conjecture fails by miles, but for which it is very hard to prove that it fails by miles (other than by checking it on a computer if the example is small enough).

Another possible idea is this. Start a random walk at . The walk takes place on the set of subsets of that are non-empty intersections of sets in . Call this set system . Then join to in if is a proper subset of and there is no that lies properly between and . To be clear, I’m defining an *un*directed graph here, so if is joined to , then is joined to .

Now we do a random walk on this graph by picking a random neighbour at each stage, and we take its stationary distribution. One could then condition this distribution on the set you are at being a minimal element of . This gives a distribution on the minimal elements, and then the claim would be that on average a minimal element is contained in at least half the sets in .

I’ll finish this section with the obvious question.

**Question.** *Does an averaging argument with a probability distribution like one of these have the slightest chance of working? If so, how would one go about proving it?*

Tobias Fritz has shared with us a very nice observation that gives another way of looking at union-closed families. It is sufficiently natural that I feel there is a good chance that it will be genuinely helpful, and not just a slightly different perspective on all the same statements.

Let be a finite set, let and let be a non-empty subset of . Write as shorthand for the condition

.

If , then we can write this as a *Horn clause*

.

If is a collection of conditions of this kind, then we can define a set system to consist of all sets that satisfy all of them. That is, for each , if , then .

It is very easy to check that any set system defined this way is union closed and contains the empty set. Conversely, given a union-closed family that includes the empty set, let be a subset of that does not belong to . If for every we can find such that , then we have a contradiction, since the union of these belongs to and is equal to . So there must be some such that for every , if , then . That is, there is a condition that is satisfied by every and is not satisfied by . Taking all such conditions, we have a collection of conditions that gives rise to precisely the set system .

As Thomas says, this is strongly reminiscent of describing a convex body not as a set of points but as an intersection of half spaces. Since that dual approach is often extremely useful, it seems very much worth bearing in mind when thinking about FUNC. At the very least, it gives us a concise way of describing some union-closed families that would be complicated to define in a more element-listing way: Tobias used it to describe one of Thomas Bloom’s examples quite concisely, for instance.

Suppose we have a Horn-clause description of a union-closed family . For each , it gives us a collection of conditions that must satisfy, each of the form . Putting all these together gives us a single condition in conjunctive normal form. This single condition is a monotone property of , and any monotone property can arise in this way. So if we want, we can forget about Horn clauses and instead think of an arbitrary union-closed family as being defined as follows. For each , there is some monotone property , and then consists of all sets such that for every , the property holds.

To illustrate this with an example (not one that has any chance of being a counterexample to FUNC — just an example of the kind of thing one can do), we could take (the integers mod a prime ) and take to be the property “contains a subset of the form “. Note that this is a very concise definition, but the resulting criterion for a set to belong to is not simple at all. (If you think it is, then can you exhibit for me a non-empty set of density less than 1/2 that satisfies the condition when , or prove that no such set exists? *Update: I’ve now realized that this question has a fairly easy answer — given in a comment below. But describing the sets that satisfy the condition would not be simple.*)

This way of looking at union-closed families also generates many special cases of FUNC that could be interesting to tackle. For example, we can take the ground set to be some structure (above, I took a cyclic group, but one could also take, for instance, the complete graph on a set of vertices) and restrict attention to properties that are natural within that structure (where “natural” could mean something like invariant under symmetries of the structure that fix ).

Another special case that is very natural to think about is where each property is a single disjunction — that is, the Horn-clause formulation in the special case where each is on the left of exactly one Horn clause. Is FUNC true in this case? Or might this case be a good place to search for a counterexample? At the time of writing, I have no intuition at all about this question, so even heuristic arguments would be interesting.

As discussed in the last post, we already know that an optimistic conjecture of Tobias Fritz, that there is always some and a union-preserving injection from to , is false. Gil Kalai proposed a conjecture in a similar spirit: that there is always an injection from to such that each set in is a subset of its image. So far, nobody (or at least nobody here) has disproved this. I tried to check whether the counterexamples to Tobias’s conjecture worked here too, and I’m fairly sure the complement-of-Steiner-system approach doesn’t work.

While the general belief seems to be (at least if we believe Jeff Kahn) that such strengthenings are false, it would be very good to confirm this. Of course it would be even better to prove the strengthening â€¦

*Update: Alec Edgington has now found a counterexample.*

In this comment Tom Eccles asked a question motivated by thinking about what an inductive proof of FUNC could possibly look like. The question ought to be simpler than FUNC, and asks the following. Does there exist a union-closed family and an element with the following three properties?

- has abundance less than 1/2.
- No element has abundance greater than or equal to 1/2 in both and .
- Both and contain at least one non-empty set.

It would be very nice to have such an example, because it would make an excellent test case for proposed inductive approaches.

There’s probably plenty more I could extract from the comment thread in the last post, but I think it’s time to post this, since the number of comments has exceeded 100.

While I’m saying that, let me add a general remark that if anyone thinks that a direction of discussion is being wrongly neglected, then please feel free to highlight it, even (or perhaps especially) if it is a direction that you yourself introduced. These posts are based on what happens to have caught my attention, but should not be interpreted as a careful judgment of what is interesting and what is not. I hope that everything I include is interesting, but the converse is completely false.

]]>

If is a union-closed family on a ground set , and , then we can take the family . The map is a homomorphism (in the sense that , so it makes sense to regard as a quotient of .

If instead we take an equivalence relation on , we can define a set-system to be the set of all unions of equivalence classes that belong to .

Thus, subsets of give quotient families and quotient sets of give subfamilies.

Possibly the most obvious product construction of two families and is to make their ground sets disjoint and then to take . (This is the special case with disjoint ground sets of the construction that Tom Eccles discussed earlier.)

Note that we could define this product slightly differently by saying that it consists of all pairs with the “union” operation . This gives an algebraic system called a join semilattice, and it is isomorphic in an obvious sense to with ordinary unions. Looked at this way, it is not so obvious how one should define abundances, because does not have a ground set. Of course, we can define them via the isomorphism to but it would be nice to do so more intrinsically.

Tobias Fritz, in this comment, defines a more general “fibre bundle” construction as follows. Let be a union-closed family of sets (the “base” of the system). For each let be a union-closed family (one of the “fibres”), and let the elements of consist of pairs with . We would like to define a join operation on by

for a suitable . For that we need a bit more structure, in the form of homomorphisms whenever . These should satisfy the obvious composition rule .

With that structure in place, we can take to be , and we have something like a union-closed system. To turn it into a union-closed system one needs to find a concrete realization of this “join semilattice” as a set system with the union operation. This can be done in certain cases (see the comment thread linked to above) and quite possibly in all cases.

First, here is a simple construction that shows that Conjecture 6 from the previous post is false. That conjecture states that if you choose a random non-empty and then a random , then the average abundance of is at least 1/2. It never seemed likely to be true, but it survived for a surprisingly long time, before the following example was discovered in a comment thread that starts here.

Let be a large integer and let be disjoint sets of size and . (Many details here are unimportant — for example, all that actually matters is that the sizes of the sets should increase fairly rapidly.) Now take the set system

.

To see that this is a counterexample, let us pick our random element of a random set, and then condition on the five possibilities for what that set is. I’ll do a couple of the calculations and then just state the rest. If , then its abundance is 2/3. If it is in , then its abundance is 1/2. If it is in , then the probability that it is in is , which is very small, so its abundance is very close to 1/2 (since with high probability the only three sets it belongs to are , and ). In this kind of way we get that for large enough we can make the average abundance as close as we like to

.

One thing I would like to do — or would like someone to do — is come up with a refinement of this conjecture that isn’t so obviously false. What this example demonstrates is that duplication shows that for the conjecture to have been true, the following apparently much stronger statement would have had to be true. For each non-empty , let be the minimum abundance of any element of . Then the average of over is at least 1/2.

How can we convert the average over into the minimum over ? The answer is simple: take the original set system and write the elements of the ground set in decreasing order of abundance. Now duplicate the first element (that is, the element with greatest abundance) once, the second element times, the third times, and so on. For very large , the effect of this is that if we choose a random element of (after the duplications have taken place) then it will have minimal abundance in .

So it seems that duplication of elements kills off this averaging argument too, but in a slightly subtler way. Could we somehow iterate this thought? For example, could we choose a random by first picking a random non-empty , then a random such that , and finally a random element ? And could we go further — e.g., picking a random chain of the form , etc., and stopping when we reach a set whose points cannot be separated further?

Tobias Fritz came up with a nice strengthening that again turned out (again as expected) to be false. The thought was that it might be nice to find a “bijective” proof of FUNC. Defining to be and to be , we would prove FUNC for if we could find an injection from to .

For such an argument to qualify as a proper bijective proof, it is not enough merely to establish the existence of an injection — that follows from FUNC on mere grounds of cardinality. Rather, one should define it in a nice way somehow. That makes it natural to think about what properties such an injection might have, and a particularly natural requirement that one might think about is that it should preserve unions.

It turns out that there are set systems for which there does not exist any with a union-preserving injection from to . After several failed attempts, I found the following example. Take a not too small pair of positive integers — it looks as though works. Then take a Steiner -system — that is, a collection of sets of size 5 such that each set of size 3 is contained in exactly one set from . (Work of Peter Keevash guarantees that such a set system exists, though this case was known before his amazing result.)

The counterexample is generated by all complements of sets in , though it is more convenient just to take and prove that there is no intersection-preserving injection from to . To establish this, one first proves that any such injection would have to take sets of size to sets of size , which is basically because you need room for all the subsets of size of a set to map to distinct subsets of the image of . Once that is established, it is fairly straightforward to show that there just isn’t room to do things. The argument can be found in the comment linked to above, and the thread below it.

Thomas Bloom came up with a simpler example, which is interesting for other reasons too. His example is generated by the sets , all -subsets of , and the 6 sets , , , , , . I asked him where this set system had come from, and the answer turned out to be very interesting. He had got it by staring at an example of Renaud and Sarvate of a union-closed set system with exactly one minimal-sized set, which has size 3, such that that minimal set contains no element of abundance at least 1/2. Thomas worked out how the Renaud-Servate example had been pieced together, and used similar ideas to produce his example. Tobias Fritz then went on to show that Thomas’s construction was a special case of his fibre-bundle construction.

This post is by no means a comprehensive account of all the potentially interesting ideas from the last post. For example, Gil Kalai has an interesting slant on the conjecture that I think should be pursued further, and there are a number of interesting questions that were asked in the previous comment thread that I have not repeated here, mainly because the post has taken a long time to write and I think it is time to post it.

]]>

Something I like to think about with Polymath projects is the following question: if we end up *not* solving the problem, then what can we hope to achieve? The Erdős discrepancy problem project is a good example here. An obvious answer is that we can hope that enough people have been stimulated in enough ways that the probability of somebody solving the problem in the not too distant future increases (for example because we have identified more clearly the gap in our understanding). But I was thinking of something a little more concrete than that: I would like at the very least for this project to leave behind it an online resource that will be essential reading for anybody who wants to attack the problem in future. The blog comments themselves may achieve this to some extent, but it is not practical to wade through hundreds of comments in search of ideas that may or may not be useful. With past projects, we have developed Wiki pages where we have tried to organize the ideas we have had into a more browsable form. One thing we didn’t do with EDP, which in retrospect I think we should have, is have an official “closing” of the project marked by the writing of a formal article that included what we judged to be the main ideas we had had, with complete proofs when we had them. An advantage of doing that is that if somebody later solves the problem, it is more convenient to be able to refer to an article (or preprint) than to a combination of blog comments and Wiki pages.

With an eye to this, I thought I would make FUNC1 a data-gathering exercise of the following slightly unusual kind. For somebody working on the problem in the future, it would be very useful, I would have thought, to have a list of natural strengthenings of the conjecture, together with a list of “troublesome” examples. One could then produce a table with strengthenings down the side and examples along the top, with a tick in the table entry if the example disproves the strengthening, a cross if it doesn’t, and a question mark if we don’t yet know whether it does.

A first step towards drawing up such a table is of course to come up with a good supply of strengthenings and examples, and that is what I want to do in this post. I am mainly selecting them from the comments on the previous post. I shall present the strengthenings as statements rather than questions, so they are not necessarily true.

Let be a function from the power set of a finite set to the non-negative reals. Suppose that the weights satisfy the condition for every and that at least one non-empty set has positive weight. Then there exists such that the sum of the weights of the sets containing is at least half the sum of all the weights.

~~Note that if all weights take values 0 or 1, then this becomes the original conjecture. It is possible that the above statement ~~*follows* from the original conjecture, but we do not know this (though it may be known).

This is not a good question after all, as the deleted statement above is false. When is 01-valued, the statement reduces to saying that for every up-set there is an element in at least half the sets, which is trivial: all the elements are in at least half the sets. Thanks to Tobias Fritz for pointing this out.

Let be a function from the power set of a finite set to the non-negative reals. Suppose that the weights satisfy the condition for every and that at least one non-empty set has positive weight. Then there exists such that the sum of the weights of the sets containing is at least half the sum of all the weights.

Again, if all weights take values 0 or 1, then the collection of sets of weight 1 is union closed and we obtain the original conjecture. It was suggested in this comment that one might perhaps be able to attack this strengthening using tropical geometry, since the operations it uses are addition and taking the minimum.

Tom Eccles suggests (in this comment) a generalization that concerns two set systems rather than one. Given set systems and , write for the union set . A family is union closed if and only if . What can we say if and are set systems with small? There are various conjectures one can make, of which one of the cleanest is the following: if and are of size and is of size at most , then there exists such that , where denotes the set of sets in that contain . This obviously implies FUNC.

Simple examples show that can be much smaller than either or — for instance, it can consist of just one set. But in those examples there always seems to be an element contained in many more sets. So it would be interesting to find a good conjecture by choosing an appropriate function to insert into the following statement: if , , and , then there exists such that .

Let be a union-closed family of subsets of a finite set . Then the average size of is at least .

This is false, as the example shows for any .

Let be a union-closed family of subsets of a finite set and suppose that *separates points*, meaning that if , then at least one set in contains exactly one of and . (Equivalently, the sets are all distinct.) Then the average size of is at least .

This again is false: see Example 2 below.

In this comment I had a rather amusing (and typically Polymathematical) experience of formulating a conjecture that I thought was obviously false in order to think about how it might be refined, and then discovering that I couldn’t disprove it (despite temporarily thinking I had a counterexample). So here it is.

As I have just noted (and also commented in the first post), very simple examples show that if we define the “abundance” of an element to be , then the average abundance does not have to be at least . However, that still leaves open the possibility that some kind of naturally defined *weighted* average might do the job. Since we want to define the weighting in terms of and to favour elements that are contained in lots of sets, a rather crude idea is to pick a random non-empty set and then a random element , and make that the probability distribution on that we use for calculating the average abundance.

A short calculation reveals that the average abundance with this probability distribution is equal to the *average overlap density*, which we define to be

where the averages are over . So one is led to the following conjecture, which implies FUNC: if is a union-closed family of sets, at least one of which is non-empty, then its average overlap density is at least 1/2.

A not wholly pleasant feature of this conjecture is that the average overlap density is very far from being isomorphism invariant. (That is, if you duplicate elements of , the average overlap density changes.) Initially, I thought this would make it easy to find counterexamples, but that seems not to be the case. It also means that one can give some thought to how to put a measure on that makes the average overlap density as small as possible. Perhaps if the conjecture is true, this “worst case” would be easier to analyse. (It’s not actually clear that there is a worst case — it may be that one wants to use a measure on that gives measure zero to some non-empty set , at which point the definition of average overlap density breaks down. So one might have to look at the “near worst” case.)

This conjecture comes from a comment by Igor Balla. Let be a union-closed family and let . Define a new family by replacing each by if and leaving it alone if . Repeat this process for every and the result is an *up-set* , that is, a set-system such that and implies that .

Note that each time we perform the “add if you can” operation, we are applying a bijection to the current set system, so we can compose all these bijections to obtain a bijection from to .

Suppose now that are distinct sets. It can be shown that there is no set such that and . In other words, is never a subset of .

Now the fact that is an up-set means that each element is in at least half the sets (since if then ). Moreover, it seems hard for too many sets in to be “far” from their images , since then there is a strong danger that we will be able to find a pair of sets and with .

This leads to the conjecture that Balla makes. He is not at all confident that it is true, but has checked that there are no small counterexamples.

**Conjecture.** Let be a set system such that there exist an up-set and a bijection with the following properties.

- For each , .
- For no distinct do we have .

Then there is an element that belongs to at least half the sets in .

The following comment by Gil Kalai is worth quoting: “Years ago I remember that Jeff Kahn said that he bet he will find a counterexample to every meaningful strengthening of Franklâ€™s conjecture. And indeed he shot down many of those and a few I proposed, including weighted versions. I have to look in my old emails to see if this one too.” So it seems that even to find a conjecture that genuinely strenghtens FUNC without being obviously false (at least to Jeff Kahn) would be some sort of achievement. (Apparently the final conjecture above passes the Jeff-Kahn test in the following weak sense: he believes it to be false but has not managed to find a counterexample.)

If is a finite set and is the power set of , then every element of has abundance 1/2. (Remark 1: I am using the word “abundance” for the *proportion* of sets in that contain the element in question. Remark 2: for what it’s worth, the above statement is meaningful and true even if is empty.)

Obviously this is not a counterexample to FUNC, but it was in fact a counterexample to an over-optimistic conjecture I very briefly made and then abandoned while writing it into a comment.

This example was mentioned by Alec Edgington. Let be a finite set and let be an element that does not belong to . Now let consist of together with all sets of the form such that .

If , then has abundance , while each has abundance . Therefore, only one point has abundance that is not less than 1/2.

A slightly different example, also used by Alec Edgington, is to take all subsets of together with the set . If , then the abundance of any element of is while the abundance of is . Therefore, the average abundance is

When is large, the amount by which exceeds 1/2 is exponentially small, from which it follows easily that this average is less than 1/2. In fact, it starts to be less than 1/2 when (which is the case Alec mentioned). This shows that Conjecture 5 above (that the average abundance must be at least 1/2 if the system separates points) is false.

Let be a positive integer and take the set system that consists of the sets and . This is a simple example (or rather class of examples) of a set system for which although there is certainly an element with abundance at least 1/2 (the element has abundance 2/3), the *average* abundance is close to 1/3. Very simple variants of this example can give average abundances that are arbitrarily small — just take a few small sets and one absolutely huge set.

I will not explain these in detail, but just point you to an interesting comment by Uwe Stroinski that suggests a number-theoretic way of constructing union-closed families.

I will continue with methods of building union-closed families out of other union-closed families.

I’ll define this process formally first. Let be a set of size and let be a collection of subsets of . Now let be a collection of disjoint non-empty sets and define to be the collection of all sets of the form for some . If is union closed, then so is .

One can think of as “duplicating” the element of times. A simple example of this process is to take the set system and let and . This gives the set system 3 above.

Let us say that if for some suitable set-valued function . And let us say that two set systems are *isomorphic* if they are in the same equivalence class of the symmetric-transitive closure of the relation . Equivalently, they are isomorphic if we can find and such that .

The effect of duplication is basically that we can convert the uniform measure on the ground set into any other probability measure (at least to an arbitrary approximation). What I mean by that is that the uniform measure on the ground set of , which is of course , gives you a probability of of landing in , so has the same effect as assigning that probability to and sticking with the set system . (So the precise statement is that we can get any probability measure where all the probabilities are rational.)

If one is looking for an averaging argument, then it would seem that a nice property that such an argument might have is (as I have already commented above) that the average should be with respect to a probability measure on that is constructed from in an isomorphism-invariant way.

It is common in the literature to outlaw duplication by insisting that separates points. However, it may be genuinely useful to consider different measures on the ground set.

Tom Eccles, in his off-diagonal conjecture, considered the set system, which he denoted by , that is defined to be . This might more properly be denoted , by analogy with the notation for sumsets, but obviously one can’t write it like that because that notation already stands for something else, so I’ll stick with Tom’s notation.

It’s trivial to see that if and are union closed, then so is . Moreover, sometimes it does quite natural things: for instance, if and are any two sets, then , where is the power-set operation.

Another remark is that if and are disjoint, and and , then the abundance of in is equal to the abundance of in .

I got this from a comment by Thomas Bloom. Let and be disjoint finite sets and let and be two union-closed families living inside and , respectively, and assume that and . We then build a new family as follows. Let be some function from to . Then take all sets of one of the following four forms:

- sets with ;
- sets with ;
- sets with ;
- sets with .

It can be checked quite easily (there are six cases to consider, all straightforward) that the resulting family is union closed.

Thomas Bloom remarks that if consists of all subsets of and consists of all subsets of , then (for suitable ) the result is a union-closed family that contains no set of size less than 3, and also contains a set of size 3 with no element of abundance greater than or equal to 1/2. This is interesting because a simple argument shows that if is a set with two elements in a union-closed family then at least one of its elements has abundance at least 1/2.

Thus, this construction method can be used to create interesting union-closed families out of boring ones.

Thomas discusses what happens to abundances when you do this construction, and the rough answer is that elements of become less abundant but elements of become quite a lot more abundant. So one can’t just perform this construction a few times and end up with a counterexample to FUNC. However, as Thomas also says, there is plenty of scope for modifying this basic idea, and maybe good things could flow from that.

I feel as though there is much more I could say, but this post has got quite long, and has taken me quite a long time to write, so I think it is better if I just post it. If there are things I wish I had mentioned, I’ll put them in comments and possibly repeat them in my next post.

I’ll close by remarking that I have created a wiki page. At the time of writing it has almost nothing on it but I hope that will change before too long.

]]>

A less serious problem is what acronym one would use for the project. For the density Hales-Jewett problem we went for DHJ, and for the Erdős discrepancy problem we used EDP. That general approach runs into difficulties with Frankl’s union-closed conjecture, so I suggest FUNC. This post, if the project were to go ahead, could be FUNC0; in general I like the idea that we would be engaged in a funky line of research.

The problem, for anyone who doesn’t know, is this. Suppose you have a family that consists of distinct subsets of a set . Suppose also that it is *union closed*, meaning that if , then as well. Must there be an element of that belongs to at least of the sets? This seems like the sort of question that ought to have an easy answer one way or the other, but it has turned out to be surprisingly difficult.

If you are potentially interested, then one good thing to do by way of preparation is look at this survey article by Henning Bruhn and Oliver Schaudt. It is very nicely written and seems to be a pretty comprehensive account of the current state of knowledge about the problem. It includes some quite interesting reformulations (interesting because you don’t just look at them and see that they are trivially equivalent to the original problem).

For the remainder of this post, I want to discuss a couple of failures. The first is a natural idea for generalizing the problem to make it easier that completely fails, at least initially, but can perhaps be rescued, and the second is a failed attempt to produce a counterexample. I’ll present these just in case one or other of them stimulates a useful idea in somebody else.

An immediate reaction of any probabilistic combinatorialist is likely to be to wonder whether in order to prove that there *exists* a point in at least half the sets it might be easier to show that in fact an *average* point belongs to half the sets.

Unfortunately, it is very easy to see that that is false: consider, for example, the three sets , , and . The average (over ) of the number of sets containing a random element is , but there are three sets.

However, this example doesn't feel like a genuine counterexample somehow, because the set system is just a dressed up version of : we replace the singleton by the set and that's it. So for this set system it seems more natural to consider a *weighted* average, or equivalently to take not the uniform distribution on , but some other distribution that reflects more naturally the properties of the set system at hand. For example, we could give a probability 1/2 to the element 1 and to each of the remaining 12 elements of the set. If we do that, then the average number of sets containing a random element will be the same as it is for the example with the uniform distribution (not that the uniform distribution is obviously the most natural distribution for that example).

This suggests a very slightly more sophisticated version of the averaging-argument idea: does there *exist* a probability distribution on the elements of the ground set such that the expected number of sets containing a random element (drawn according to that probability distribution) is at least half the number of sets?

With this question we have in a sense the opposite problem. Instead of the answer being a trivial no, it is a trivial yes — if, that is, the union-closed conjecture holds. That’s because if the conjecture holds, then some belongs to at least half the sets, so we can assign probability 1 to that and probability zero to all the other elements.

Of course, this still doesn’t feel like a complete demolition of the approach. It just means that for it not to be a trivial reformulation we will have to put *conditions* on the probability distribution. There are two ways I can imagine getting the approach to work. The first is to insist on some property that the distribution is required to have that means that its existence does *not* follow easily from the conjecture. That is, the idea would be to prove a stronger statement. It seems paradoxical, but as any experienced mathematician knows, it can sometimes be easier to prove a stronger statement, because there is less room for manoeuvre. In extreme cases, once a statement has been suitably strengthened, you have so little choice about what to do that the proof becomes almost trivial.

A second idea is that there might be a nice way of defining the probability distribution in terms of the set system. This would be a situation rather like the one I discussed in my previous post, on entropy and Sidorenko’s conjecture. There, the basic idea was to prove that a set had cardinality at least by proving that there is a probability distribution on with entropy at least . At first, this seems like an unhelpful idea, because if then the uniform distribution on will trivially do the job. But it turns out that there is a different distribution for which it is easier to *prove* that it does the job, even though it usually has lower entropy than the uniform distribution. Perhaps with the union-closed conjecture something like this works too: obviously the best distribution is supported on the set of elements that are contained in a maximal number of sets from the set system, but perhaps one can construct a different distribution out of the set system that gives a smaller average in general but about which it is easier to prove things.

I have no doubt that thoughts of the above kind have occurred to a high percentage of people who have thought about the union-closed conjecture, and can probably be found in the literature as well, but it would be odd not to mention them in this post.

To finish this section, here is a wild guess at a distribution that does the job. Like almost all wild guesses, its chances of being correct are very close to zero, but it gives the flavour of the kind of thing one might hope for.

Given a finite set and a collection of subsets of , we can pick a random set (uniformly from ) and look at the events for each . In general, these events are correlated.

Now let us define a matrix by . We could now try to find a probability distribution on that minimizes the sum . That is, in a certain sense we would be trying to make the events as uncorrelated as possible on average. (There may be much better ways of measuring this — I’m just writing down the first thing that comes into my head that I can’t immediately see is stupid.)

What does this give in the case of the three sets , and ? We have that if or or and . If and , then , since if , then is one of the two sets and , with equal probability.

So to minimize the sum we should choose so as to maximize the probability that and . If , then this probability is , which is maximized when , so in fact we get the distribution mentioned earlier. In particular, for this distribution the average number of sets containing a random point is , which is precisely half the total number of sets. (I find this slightly worrying, since for a successful proof of this kind I would expect equality to be achieved only in the case that you have disjoint sets and you take all their unions, including the empty set. But since this definition of a probability distribution isn’t supposed to be a serious candidate for a proof of the whole conjecture, I’m not too worried about being worried.)

Just to throw in another thought, perhaps some entropy-based distribution would be good. I wondered, for example, about defining a probability distribution as follows. Given any probability distribution, we obtain weights on the sets by taking to be the probability that a random element (chosen from the distribution) belongs to . We can then form a probability distribution on by taking the probabilities to be proportional to the weights. Finally, we can choose a distribution on the elements to maximize the entropy of the distribution on .

If we try that with the example above, and if is the probability assigned to the element 1, then the three weights are and , so the probabilities we will assign will be and . The entropy of this distribution will be maximized when the two non-zero probabilities are equal, which gives us , so in this case we will pick out the element 1. It isn’t completely obvious that that is a bad thing to do for this particular example — indeed, we will do it whenever there is an element that is contained in all the non-empty sets from . Again, there is virtually no chance that this rather artificial construction will work, but perhaps after a lot of thought and several modifications and refinements, something like it could be got to work.

I find the non-example I’m about to present interesting because I don’t have a good conceptual understanding of why it fails — it’s just that the numbers aren’t kind to me. But I think there *is* a proper understanding to be had. Can anyone give me a simple argument that no construction that is anything like what I tried can possibly work? (I haven’t even checked properly whether the known positive results about the problem ruled out my attempt before I even started.)

The idea was as follows. Let and be parameters to be chosen later, and let be a random set system obtained by choosing each subset of of size with probability , the choices being independent. We then take as our attempted counterexample the set of all unions of sets in .

Why might one entertain even for a second the thought that this could be a counterexample? Well, if we choose to be rather close to , but just slightly less, then a typical pair of sets of size have a union of size close to , and more generally a typical union of sets of size has size at least this. There are vastly fewer sets of size greater than than there are of size , so we could perhaps dare to hope that almost all the sets in the set system are the ones of size , so the average size is close to , which is less than . And since the sets are spread around, the elements are likely to be contained in roughly the same number of sets each, so this gives a counterexample.

Of course, the problem here is that although a typical union is large, there are many atypical unions, so we need to get rid of them somehow — or at least the vast majority of them. This is where choosing a random subset comes in. The hope is that if we choose a fairly sparse random subset, then all the unions will be large rather than merely almost all.

However, this introduces a new problem, which is that if we have passed to a *sparse* random subset, then it is no longer clear that the size of that subset is bigger than the number of possible unions. So it becomes a question of balance: can we choose small enough for the unions of those sets to be typical, but still large enough for the sets of size to dominate the set system? We’re also free to choose of course.

I usually find when I’m in a situation like this, where I’m hoping for a miracle, that a miracle doesn’t occur, and that indeed seems to be the case here. Let me explain my back-of-envelope calculation.

I’ll write for the set of unions of sets in . Let us now take and give an upper bound for the expected number of sets in of size . So fix a set of size and let us give a bound for the probability that . We know that must contain at least two sets in . But the number of pairs of sets of size contained in is at most and each such pair has a probability of being a pair of sets in , so the probability that is at most . Therefore, the expected number of sets in of size is at most .

As for the expected number of sets in , it is , so if we want the example to work, we would very much like it to be the case that when , we have the inequality

.

We can weaken this requirement by observing that the expected number of sets in of size is also trivially at most , so it is enough to go for

.

If the left-hand side is not just greater than the right-hand side, but greater by a factor of say for each , then we should be in good shape: the average size of a set in will be not much greater than and we’ll be done.

If is not much bigger than , then things look quite promising. In this case, will be comparable in size to , but will be quite small — it equals , and is small. A crude estimate says that we’ll be OK provided that is significantly smaller than . And that looks OK, since is a lot smaller than , so we aren’t being made to choose a ridiculously small value of .

If on the other hand is quite a lot larger than , then is much much smaller than , so we’re in great shape as long as we haven’t chosen so tiny that is also much much smaller than .

So what goes wrong? Well, the problem is that the first argument requires smaller and smaller values of as gets further and further away from , and the result seems to be that by the time the second regime takes over, has become too small for the trivial argument to work.

Let me try to be a bit more precise about this. The point at which becomes smaller than is of course the point at which . For that value of , we require , so we need . However, an easy calculation reveals that

,

(or observe that if you multiply both sides by , then both expressions are equal to the multinomial coefficient that counts the number of ways of writing an -element set as with and ). So unfortunately we find that however we choose the value of there is a value of such that the number of sets in of size is greater than . (I should remark that the estimate for the number of sets in of size can be improved to , but this does not make enough of a difference to rescue the argument.)

So unfortunately it turns out that the middle of the range is worse than the two ends, and indeed worse by enough to kill off the idea. However, it seemed to me to be good to make at least some attempt to find a counterexample in order to understand the problem better.

From here there are two obvious ways to go. One is to try to modify the above idea to give it a better chance of working. The other, which I have already mentioned, is to try to generalize the failure: that is, to explain why that example, and many others like it, had no hope of working. Alternatively, somebody could propose a completely different line of enquiry.

I’ll stop there. Experience with Polymath projects so far seems to suggest that, as with individual projects, it is hard to predict how long they will continue before there is a general feeling of being stuck. So I’m thinking of this as a slightly tentative suggestion, and if it provokes a sufficiently healthy conversation and interesting new (or at least new to me) ideas, then I’ll write another post and launch a project more formally. In particular, only at that point will I call it Polymath11 (or should that be Polymath12? — I don’t know whether the almost instantly successful polynomial-identities project got round to assigning itself a number). Also, for various reasons I don’t want to get properly going on a Polymath project for at least a week, though I realize I may not be in complete control of what happens in response to this post.

Just before I finish, let me remark that Polymath10, attempting to prove Erdős’s sunflower conjecture, is still continuing on Gil Kalai’s blog. What’s more, I think it is still at a stage where a newcomer could catch up with what is going on — it might take a couple of hours to find and digest a few of the more important comments. But Gil and I agree that there may well be room to have more than one Polymath project going at the same time, since a common pattern is for the group of participants to shrink down to a smallish number of “enthusiasts”, and there are enough mathematicians to form many such groups.

And a quick reminder, as maybe some people reading this will be new to the concept of Polymath projects. The aim is to try to make the problem-solving process easier in various ways. One is to have an open discussion, in the form of blog posts and comments, so that anybody can participate, and with luck a process of self-selection will take place that results in a team of enthusiastic people with a good mixture of skills and knowledge. Another is to encourage people to express ideas that may well be half-baked or even wrong, or even completely *obviously* wrong. (It’s surprising how often a completely obviously wrong idea can stimulate a different idea that turns out to be very useful. Naturally, expressing such an idea can be embarrassing, but it shouldn’t be, as it is an important part of what we do when we think about problems privately.) Another is to provide a mechanism where people can get very quick feedback on their ideas — this too can be extremely stimulating and speed up the process of thought considerably. If you like the problem but don’t feel like pursuing either of the approaches I’ve outlined above, that’s of course fine — your ideas are still welcome and may well be more fruitful than those ones, which are there just to get the discussion started.

]]>

The proof is very simple. For each , let be the characteristic function of the neighbourhood of . That is, if is an edge and otherwise. Then is the sum of the degrees of the , which is the number of edges of , which is . If we set , then this tells us that . By the Cauchy-Schwarz inequality, it follows that .

But by the Cauchy-Schwarz inequality again,

That last expression is times the number of quadruples such that all of and are edges, and our previous estimate shows that it is at least . Therefore, the probability that a random such quadruple consists entirely of edges is at least , as claimed (since there are possible quadruples to choose from).

Essentially the same proof applies to a weighted bipartite graph. That is, if you have some real weights for each of the edges, and if the average weight is , then

.

(One can also generalize this statement to complex weights if one puts in appropriate conjugates.) One way of thinking of this is that the sum on the left-hand side goes down if you apply an averaging projection to — that is, you replace all the values of by their average.

Thus, an appropriate weighted count of 4-cycles is minimized over all systems of weights with a given average when all the weights are equal.

Sidorenko’s conjecture is the statement that the same is true for any bipartite graph one might care to count. Or to be more precise, it is the statement that if you apply the averaging projection to a graph (rather than a weighted graph), then the count goes down: I haven’t checked whether the conjecture is still reasonable for weighted graphs, though standard arguments show that if it is true then it will still be true for weighted graphs if the weights are between 0 and 1, and hence (by rescaling) for all non-negative weights.

Here is a more formal statement of the conjecture.

**Conjecture** *Let be a bipartite graph with vertex sets and . Let the number of edges of be . Let be a bipartite graph of density with vertex sets and and let and be random functions from to and from to . Then the probability that is an edge of for every pair such that is an edge of is at least .*

This feels like the kind of statement that is either false with a simple counterexample or true with a simple proof. But this impression is incorrect.

My interest in the problem was aroused when I taught a graduate-level course in additive combinatorics and related ideas last year, and set a question that I wrongly thought I had solved, which is Sidorenko’s conjecture in the case of a path of length 3. That is, I asked for a proof or disproof of the statement that if has density , then the number of quadruples such that and are all edges is at least . I actually thought I had a counterexample, which I didn’t, as this case of Sidorenko’s conjecture is a theorem that I shall prove later in this post.

I also tried to prove the statement using the Cauchy-Schwarz inequality, but nothing I did seemed to work, and eventually I filed it away in my mind under the heading “unexpectedly hard problem”.

At some point around then I heard about a paper of Balázs Szegedy that used entropy to prove all (then) known cases of Sidorenko’s conjecture as well as a few more. I looked at the paper, but it was hard to extract from it a short easy proof for paths of length 3, because it is concerned with proving as general a result as possible, which necessitates setting up quite a lot of abstract language. If you’re like me, what you really need in order to understand a general result is (at least in complicated cases) not the actual proof of the result, but more like a proof of a special case that is general enough that once you’ve understood that you know that you could in principle think hard about what you used in order to extract from the argument as general a result as possible.

In order to get to that point, I set the paper as a mini-seminar topic for my research student Jason Long, and everything that follows is “joint work” in the sense that he gave a nice presentation of the paper, after which we had a discussion about what was actually going on in small cases, which led to a particularly simple proof for paths of length 3, which works just as well for all trees, as well as a somewhat more complicated proof for 4-cycles. (These proofs are entirely due to Szegedy, but we stripped them of all the abstract language, the result of which, to me at any rate, is to expose what is going on and what was presumably going on in Szegedy’s mind when he proved the more general result.) Actually, this doesn’t explain the whole paper, even in the sense just described, since we did not discuss a final section in which Szegedy deals with more complicated examples that don’t follow from his first approach, but it does at least show very clearly how entropy enters the picture.

I think it is possible to identify three ideas that drive the proof for paths of length 3. They are as follows.

- Recall that the
*entropy*of a probability distribution on a finite set is . By Jensen’s inequality, this is maximized for the uniform distribution, when it takes the value . It follows that one way of proving that is to identify a probability distribution on with entropy greater than . - That may look like a mad idea at first, since if anything is going to work, the uniform distribution will. However, it is not necessarily a mad idea, because there might in principle be a non-uniform distribution with entropy that was much easier to calculate than that of the uniform distribution.
- If is a graph, then there is a probability distribution, on the set of quadruples of vertices of such that and are all edges, that is not in general uniform and that has very nice properties.

The distribution mentioned in 3 is easy to describe. You first pick an edge of uniformly at random (from all the edges of ). Note that the probability that is the first vertex of this edge is proportional to the degree of , and the probability that is the end vertex is proportional to the degree of .

Having picked the edge you then pick uniformly at random from the neighbours of , and having picked you pick uniformly at random from the neighbours of . This guarantees that is a path of length 3 (possibly with repeated vertices, as we need for the statement of Sidorenko’s conjecture).

The beautiful property of this distribution is that the edges and are identically distributed. This follows from the fact that has the same distribution as and is a random neighbour of , which means that has the same distribution as and is a random neighbour of .

Now let us turn to point 2. The only conceivable advantage of using a non-uniform distribution and an entropy argument is that if we are lucky then the entropy will be easy to calculate and will give us a good enough bound to prove what we want. The reason this stands a chance of being true is that entropy has some very nice properties. Let me briefly review those.

It will be convenient to talk about entropy of random variables rather than probability distributions: if is a random variable on a finite set , then its entropy is , where I have written as shorthand for .

We also need the notion of *conditional* entropy . This is : that is, it is the expectation of the entropy of if you are told the value of .

If you think of the entropy of as the average amount of information needed to specify the value of , then is the average amount *more* information you need to specify if you have been told the value of . A couple of extreme cases are that if and are independent, then (since the distribution of is the same as the distribution of for each ), and if is a function of , then (since is concentrated at a point for each ). In general, we also have the formula

where is the entropy of the joint random variable . Intuitively this says that to know the value of you need to know the value of and then you need to know any extra information needed to specify . It can be verified easily from the formula for entropy.

Now let us calculate the entropy of the distribution we have defined on labelled paths of length 3. Let , , and be random variables that tell us where and are. We want to calculate . By a small generalization of the formula above — also intuitive and easy to check formally — we have

Now the nice properties of our distribution come into play. Note first that if you know that , then and are independent (they are both random neighbours of ). It follows that . Similarly, given the value of , is independent of the pair , so .

Now for each let be the degree of and let . Then , and , so

,

and because the edges all have the same distribution, we get the same answer for and .

As for the entropy , it is equal to

.

Putting all this together, we get that

.

By Jensen’s inequality, the second term is minimized if for every , and in that case we obtain

.

If the average degree is , then , which gives us

The moral here is that once one uses the nice distribution and calculates its entropy, the calculations follow very easily from the standard properties of conditional entropy, and they give exactly what we need — that the entropy must be at least , from which it follows that the number of labelled paths of length 3 must be at least .

Now I’ll prove the conjecture for 4-cycles. In a way this is a perverse thing to do, since as we have already seen, one can prove this case easily using Cauchy-Schwarz. However, the argument just given generalizes very straightforwardly to trees (and therefore forests), which makes the 4-cycle the simplest case that we have not yet covered, since it is the simplest bipartite graph that contains a cycle. Also, it is quite interesting that there should be a genuinely different proof for the 4-cycles case that is still natural.

What we would like for the proof to work is a distribution on the 4-cycles (as usual we do not insist that these vertices are distinct) such that the marginal distribution of each edge is uniform amongst all edges of . A natural guess about how to do this is to pick a random edge , pick a random neighbour of , and then pick a random from the intersection of the neighbourhoods of and . But that gives the wrong distribution on . Instead, when we pick we need to choose it according to the distribution we have on paths (that is, choose a random edge and then let be a random neighbour of ) and then condition on and . Note that for fixed and that is exactly the distribution we already have on : once that is pointed out, it becomes obvious that this is the only reasonable thing to do, and that it will work.

So now let us work out the entropy of this distribution. Again let be the (far from independent) random variables that tell us where the vertices go. Then we can write down a few equations using the usual rule about conditional entropy and exploiting independence where we have it.

From the second and third equations we get that

,

and substituting this into the first gives us

As before, we have that

and

.

Therefore,

.

As for the term , we now use the trivial upper bound . If the average degree is , then

.

The remaining part is minimized when every is equal to , in which case it gives us , so we end up with a lower bound for the entropy of , exactly as required.

This kind of argument deals with a fairly large class of graphs — to get the argument to work it is necessary for the graph to be built up in a certain way, but that covers many cases of the conjecture.

]]>