I’ll try to say more about this later, but I had a small think about steps 3 and 4, and I now think that the 2D McDonald condition I was talking about isn’t quite strong enough for our purposes here, but that it should still be possible to prove some condition of a similar kind that *would* be strong enough. But to be sure about this, I’m going to need to do some proper calculations.

*Thanks for pointing those out — fixed now.*

1. Get bounds for all the moments , where is the random variable . Here is a fixed die with typical properties and is chosen uniformly at random from .

2. Use these to obtain a Taylor expansion for the characteristic function with a very explicit bound for the error term.

3. Prove that satisfies a McDonald-type condition (see above comments for what this means).

4. Deduce that outside a small ball about the origin (which will be stretched quite a lot more in the direction than the direction, but this is not important) the characteristic function is bounded away from 1.

5. From 2 and 4 deduce that the th power of the characteristic function can be very well approximated by a Gaussian both near and far away from it.

6. Deduce the result (since the Fourier transform of a Gaussian is a Gaussian).

]]>But then we need to eliminate the case when a significant fraction of points lies in some n-4-dimensional subspace, shared by a significant fraction of hyperplanes… Yes, this is also not easy.

]]>I should add that I think proving that kind of condition is straightforward, by essentially the same technique that I’ve been using for other statements of this kind: we prove that it holds with very high probability for a fully independent sequence, and then conditioning on the sum of the sequence doesn’t mess things up.

]]>Suppose we know that with positive probability if we choose a random according to the distribution , we have that and are comparable. I claim that this implies that is not close to 1 except when is close to 0. The proof is easy. The only way that can be close to 1 is if -almost every is close to 1, which is equivalent to saying that -almost every is close to an integer. But if is not close to zero (mod 1) and is close to an integer, then is close to , which is not close to zero (all of this being mod 1. And since with non-negligible -probability is such that has comparable probability to , this implies that an -random has a non-negligible probability of being close to , and in particular not being close to 1. Therefore, is not approximately equal to 1.

So it looks to me as though if we can prove a McDonald-type condition for the random variable we are interested in (when is typical), then we’ll have a good chance of making the characteristic-functions approach sufficiently quantitative for the purposes of the transitivity question.

]]>About ties conjecture. As above, each dice V can be represented as a point in n-2-dimentional space. Let be the number of such dices of size n.

Each dice V is in tie with some dice A if this point belongs to hyperplane with equation . So, we have points and hyperplanes, and would like to bound the total number of incidences between them.

There is a multidimentional Szemerédi–Trotter theorem https://en.wikipedia.org/wiki/Szemer%C3%A9di%E2%80%93Trotter_theorem , which states that the number of incidences between k points and m hyperplanes in dimension d is bounded by . But this bound is useless for , which is our case. With $m=k$, even if each hyperplane contains each point, the number of incidences is , much less than …

But there may be other versions of the multidimentional Szemerédi–Trotter theorem, more suitable to our parameter range…

]]>In very broad terms, the idea is that we know by the CLT what the approximate probability is that the sum of copies of lies in a given region of the plane. (Of course, we would need a suitably explicit form of the Berry-Esseen theorem to get this.) So to prove that the probability at an individual point at is what it should be, it is sufficient to prove that within a small region the probabilities are roughly constant.

How do we do that? Well in the 1D case it appears to go something like this. Let be a probability distribution on . One defines a parameter . If that parameter is not too small, it implies that there is a kind of smearing effect that causes the sum of many copies to be locally constant.

What I’m about to say next isn’t from the paper but I think it’s probably roughly equivalent to it, or else wrong. Anyhow, we have the option, if the parameter is large, of decomposing the random walk as follows. First, let’s focus just on two neighbouring points and with probabilities and . We could achieve the same probabilities by the following process. We define a new probability distribution that’s the same as except that and . Then if we happen to pick , we decide to add 1 to it with probability . Note that if the ratio of and isn’t huge or tiny, then is bounded away from 1. Also, the contribution to from pairs where one of and is at most times the other is at most , so if the parameter is not too small then we get plenty of pairs where and are of reasonably comparable size.

So now let us define a new random variable with probabilities given by a function , where for many disjoint pairs we set and , and for the other values we set .

Now we can think of each step of the random walk as being “decomposed” as follows. We begin by choosing a random point using the probability distribution . If we happen to pick from one of the pairs then we take a step to the right with probability .

After we have done this times, we have taken a sum of independent copies of , and have added to that a sum of a certain number of independent Bernoulli random variables. Now when I say “independent” I mean that they are independent of each other, but the number of variables we take, and what they are, depend on which s we happened to land on when we were choosing -random values. But with high probability we will choose a representative sample of these Bernoulli random variables and will add together independent copies of them.

So the outcome should be that we get a broad Gaussian behaviour out of the sum of the variables, which is then sort of “convolved” (it isn’t exactly a convolution, but might resemble one enough for what I’m writing not to be total nonsense) with a flat enough Gaussian obtained from adding together independent Bernoullis for the broad behaviour to tell us what the individual probabilities are.

Assuming that something like that is correct, it should help us think about what an appropriate 2D version might be like. Here’s one possibility. If , then , so . This we would expect to happen a positive proportion of the time, and each time it does, it tells us that there is a pair of points such that both of them occur with probability .

If exactly once, then , which gives us a pair of points that both occur with probability . Again, I would expect this to happen a positive proportion of the time.

So it looks as though we should be able to decompose our random walk into a part that gives us a global Gaussian, and two little Bernoulli parts in independent directions that give us the flatness.

So I no longer feel so sure that characteristic functions is the way to go.

]]>It’s possibly useful to think about the highly dissociated case in Fourier terms. For simplicity I’ll discuss the one-dimensional case. Suppose we have a random variable that is uniformly distributed in a highly dissociated set . Then let’s consider the Fourier transform

.

Because the are highly dissociated, if we choose randomly from , the numbers behave like independent random variables. (This is an intuition that can be made rigorous using standard techniques.) Therefore, every so often they will all be close to 1 and our wish that that should happen only near the origin will be thwarted.

I’m now wondering whether it might be possible to prove directly that does not get close to 1 away from the origin (when is the probability distribution corresponding to the random variable .)

]]>We can avoid equality constraints by expressing and from them: , hence . Similarly, , hence .

So, now our dices are just integer points in n-2-dimensional polytope defined by inequalities , , and .

There is a lot of literature about counting integer points in a polytope, but, intuitively, the number of such point should usually be approximately equal to the volume of the polytope.

Now, fix dice A with the corresponding representation . Whether it bits depends on the sign of the . Substituting and from the expressions above we get inequality of the form .

So, in fact a given n-2-dimentional polytope intersects with a hyperplane. For ties conjecture, we need to prove that the fraction of integer points on the hyperplane is small. For the “small conjecture”, we need to show that hyperplanes arising from almost all dices A divides integer points in the polytope by about half. As a first step, we would need to prove that the volume of the polytope is divided into two almost equal parts.

]]>1. A first situation is when the distribution is concentrated in a proper sublattice. I’ve already mentioned this several times.

2. A generalization of the above is when we have a sublattice and a centrally symmetric set such that the -fold sumset does not contain any points of other than the origin. If we now take a probability distribution that is concentrated in , then its -fold convolution will be concentrated in : in other words, it will live inside an array of “blobs” rather than coalescing into a single “blob”.

3. That can be generalized further to an arbitrary multidimensional arithmetic progression that is sufficiently “proper” that its -fold sumset is still a progression of the same dimension.

4. This isn’t a genuinely distinct example from 3, but it is an extreme case of it: one can take a probability distribution that is concentrated in a highly dissociated set: so much so that no two distinct sums of elements of the set are equal.

Now some of the facts we know about the random variable appear to rule out these kinds of examples very quickly. For instance, the fact that is bounded by not much more than and is uniformly distributed in an interval of width or so means that there just isn’t enough space for highly dissociated sets, or sets whose -fold sumsets don’t swallow up everything around them. Indeed, I can’t immediately see a counterexample to the following statement (though there might easily be one, in which case the statement would need to be adjusted).

**Very tentative conjecture.** *Let be a probability distribution on with mean and suppose that is supported in . Suppose also that the sum of over any proper sublattice of is at most . Then the -fold convolution of is approximately Gaussian.*

By “approximately Gaussian” I mean approximated sufficiently well by a formula of the form that we can deduce that, writing for the -fold convolution,

.

I’ve just realized that the formulation above rules out a “bad” example that I failed to mention above. I haven’t been thinking about the fact that one kind of “atypical” die is one where for almost all . If that were the case, then we would have a distribution that is highly concentrated at the origin (which is why the formulation above rules it out), which would effectively mean that we were taking a random walk with far fewer steps, which in turn would mean that it would have far less time to mix, and the bad examples listed above would be much easier to find.

Ah, but that observation leads me straight away to another bad example that gives a counterexample to the conjecture above. Let be the set . Suppose now that we have a probability distribution that is highly concentrated in , but very occasionally takes the value or . If is significantly bigger than , then after steps we’ll get blobs around multiples of rather than a single blob.

This comment is getting a bit bloated, so I’ll stop it now and try to think about what a revised conjecture should look like.

]]>If lives in a lattice, then this is false. Suppose, for example, that only ever takes even values. Then

.

So we will want a condition that says that isn’t supported, even approximately, in a lattice.

]]>.

Setting turns this into . That justifies the statement that if then there is no linear term, and that the quadratic term is proportional to the covariance matrix.

]]>where that denotes the -fold convolution.

Now let’s define the Fourier transform of in the usual way by

.

Here and belong to , but I’ll sometimes think of them as belonging to too.

We have the convolution law that and the inversion formula

Putting these together, we find that if random variables are independent copies of , then the probability that their sum is is

.

The very rough reason that we should now expect a Gaussian formula is that we consider a Taylor expansion of . We can assume for our application that and have mean zero. From that one can argue that the coefficients of the linear terms in the Taylor expansion are zero. (I’ll give more details in a subsequent comment.) The constant term is 1, and the quadratic terms give us the covariance matrix of and . If we assume that we can approximate by an expression of the form for some suitable quadratic form in and , then the th power should be close to , and then, since Fourier transforms (and inverse Fourier transforms) take Gaussians to Gaussians, when we invert this one, we should get a Gaussian-type formula for . (I’m glossing over the point that Gaussians are defined on , whereas and live in and and live in . If most of is supported in a small region around 0, then one can hope that this isn’t too much of a problem.)

]]>Just for the record, comments with two links go to the moderation queue, but comments with just one link are fine. In any case, here is a link to the paper you mention.

It’s difficult to know what to do at this point. I imagine that if one fully understood that paper, it would be a relatively routine matter to generalize its main result to two dimensions, but that looks like quite a bit investment of time.

But perhaps we should try to achieve an understanding of local limit theorems in a polymathematical way. I can’t find it now, but I have some memory of Terence Tao organizing something like this once. That is, instead of several people going away and trying to understand the literature on this topic, perhaps we could simply think of the remaining task — to prove an appropriate LCLT — as a Polymath project with a slightly unusual character, in the sense that all the techniques needed are probably out there in the literature, so an obvious way to proceed is to try to understand those techniques.

It’s not clear to me that we necessarily want to follow the approach of the paper above, but perhaps we do. My instinct would be to use a characteristic-functions approach, but perhaps there are difficulties in making that sufficiently precise. So I think what I’ll try to do next is start writing out an argument and see where I get stuck (which should be pretty soon, as I’m not up on this kind of thing).

But I’m not trying to argue that we *should* use characteristic functions. If someone else can see how to do it in a different way, that’s just fine by me.

When I read the abstract that it proves “local limit theorem for sums of independent lattice valued random variables, with effective error term, that is with explicit parameters and universal constants”, I even hoped it would solve our problem immediately. Unfortunately, only one-dimensional case is covered there.

]]>To be more explicit, and help organize my thoughts, here is why the above model has those nice properties.

Instead of looking at a die in its sequence representation (a_1, a_2, … a_n), consider it in the multiset representation (m_1, m_2, .. m_n) where m_i is the number of dice face with value i.

Then the score function is just an antisymmetric bilinear function, and we can use linear algebra.

With the matrix:

S_{ij} = sign(j-i)

and denoting the column vector v(A) as the vector with components equal to the multiset representation of the die A,

the score function is then:

score(A,B) = v(A)^T . S . v(B)

In this representation, the standard die is a vector with all entries 1.

v(std) = [1,1,1,…,1]

Since the multiset representation already restricts the values of die face to the set of integers 1 to n, the only remaining constraint is the sum. This sum constraint is equivalent to saying:

score(std,A) = v(std)^T . S . v(A) = 0

As we are only considering multisets which meets that constraint, there is a more convenient representation, where we only consider the difference to the std die. Denoting this representation as w(A),

w(A) = v(A) – v(std)

the score function retains the same form:

score(A,B) = w(A)^T . S . w(A)

and it is obvious that the standard die ties with everything because it is just the zero vector in this representation.

If we constraint to the dice with multiplicity of values <= 2, now it becomes obvious what the "negative" relation is and why this restriction allows it.

w(-A) = – w(A)

And since the score function is bilinear, we immediately get:

score(-A,B) = – score(A,B)

The reason this doesn't work more generally, is that it doesn't make sense to have a negative number of faces with some number j.

The "one step" dice considered in the arxiv paper Gowers linked when starting this discussion, are a "basis" for the space of all dice in that these steps can be added as vectors in the w(A) representation to get to all other dice. Exploring what the score function on the space of all dice pairs looks like, using this basis as a guide, may be fruitful because the scores has already been discussed for this basis in the arxiv paper.

The "max2 multiset dice" is the maximal expansion about the origin (the standard die) with the "one step" basis, while still allowing all dice to have a natural "negative" die. I did some computational tests and this is enough to make the fraction of ties go to zero as n increases, and yet the larger "random tournament" conjecture still appears to fail. Strangely, it even appears to be failing with the same ratios for k=4 (I did not heavily test this, and at this point still likely to be a coincidence).

So this appears to be a nice small model that has all the same features as the full larger model.

]]>Consider sorted sequence proper dice, where no number is repeated more than twice. This model of dice have a nice concept of a “negative” die, an involution such that the score(A,B), which is the number of roll pairs where A beats B – pairs where B beats A, has the nice property:

score(A,B) = – score(A,-B)

So you automatically get that for all die A, it beats exactly as many dice and it loses to. The distribution is perfectly symmetric.

Now, this says nothing about the number of ties, but with your approach would this be enough to conclude the following little conjecture?

For dice (in this restricted model) A,B,C such that A > B > C, (where > is the “beats” relation), is it just as probable that A > C as C > A.

I think that that will come out in the wash. My hope is that with a strong enough LCLT, we’ll be able to say that the probability that is small enough that even when we condition on it remains small.

]]>To what extent does this approach rely on that?

]]>