I hope, but am not yet sure, that this post is a counterexample to Betteridge’s law of headlines. To back up that hope, let me sketch an argument that has arisen from the discussion so far, which appears to get us close to showing that if and are three -sided dice chosen independently at random from the sequences model (I will recap some of these definitions in a moment), then the probability that beats given that beats and beats is . Loosely speaking, the information that beats and beats tells you nothing about how likely is to beat . Having given the argument, I will try to isolate a statement that looks plausible, not impossible to prove, and sufficient to finish off the argument.

### Basic definitions

An –*sided die* in the sequence model is a sequence of elements of such that , or equivalently such that the average value of is , which is of course the average value of a random element of . A *random* -sided die in this model is simply an -sided die chosen uniformly at random from the set of all such dice.

Given -sided dice and , we say that *beats* if

If the two sets above have equal size, then we say that *ties with* .

### A first reduction

When looking at this problem, it is natural to think about the following directed graph: the vertex set is the set of all -sided dice and we put an arrow from to if beats .

We believe (and even believe we can prove) that ties are rare. Assuming that to be the case, then the conjecture above is equivalent to the statement that if are three vertices chosen independently at random in this graph, then the probability that is a directed cycle is what you expect for a random tournament, namely 1/8.

One can also make a more general conjecture, namely that the entire (almost) tournament is quasirandom in a sense defined by Chung and Graham, which turns out to be equivalent to the statement that for almost all pairs of dice, the four possible pairs of truth values for the pair of statements

beats beats

each occur with probability approximately 1/4. If this is true, then given random dice , all the possibilities for which beat which have probability approximately . This would imply, for example, that if are independent random -sided dice, then the probability that beats given that beats for all other pairs with is still .

Several of us have done computer experiments to test these conjectures, and it looks as though the first one is true and the second one false. A further reason to be suspicious of the stronger conjecture is that a natural approach to prove it appears to be morally equivalent to a relationship between the correlations of certain random variables that doesn’t seem to have any heuristic justification or to fit with experimental evidence. So although we don’t have a disproof of the stronger conjecture (I think it would be very interesting to find one), it doesn’t seem like a good idea to spend a lot of effort trying to prove it, unless we can somehow explain away the evidence that appears to be stacking up against it.

The first conjecture turns out to be equivalent to a statement that doesn’t mention transitivity. The very quick proof I’ll give here was supplied by Luke Pebody. Suppose we have a tournament (that is, a complete graph with each edge directed in one of the two possible directions) and write for the out-degree of a vertex (that is, the number of such that there is an arrow from to ) and for the in-degree. Then let us count the number of ordered triples such that . Any directed triangle in the tournament will give rise to three such triples, namely and . And any other triangle will give rise to just one: for example, if and we get just the triple . So the number of ordered triples such that and is plus twice the number of directed triangles. Note that is approximately .

But the number of these ordered triples is also . If almost all in-degrees and almost all out-degrees are roughly , then this is approximately , which means that the number of directed triangles is approximately . That is, in this case, the probability that three dice form an intransitive triple is approximately 1/4, as we are hoping from the conjecture. If on the other hand several in-degrees fail to be roughly , then is substantially lower than and we get a noticeably smaller proportion of intransitive triples.

Thus, the weaker conjecture is equivalent to the statement that almost every die beats approximately half the other dice.

### Why should a “typical” die beat approximately half the other dice?

The answer to this is fairly simple, heuristically at least. Let be an arbitrary die. For define to be the number of with and define to be . Then

,

from which it follows that .

We also have that if is another die, then

If we make the simplifying assumption that sufficiently infrequently to make no real difference to what is going on (which is not problematic, as a slightly more complicated but still fairly simple function can be used instead of to avoid this problem), then we find that to a reasonable approximation beats if and only if is positive.

So what we would like to prove is that if are chosen independently at random from , then

.

We are therefore led to consider the random variable

where now is chosen uniformly at random from without any condition on the sum. To write this in a more transparent way, let be the random variable , where is chosen uniformly at random from . Then is a sum of independent copies of . What we are interested in is the distribution we obtain when we condition the random variable on .

This should mean that we are in an excellent position, since under appropriate conditions, a lot is known about sums of independent random variables, and it looks very much as though those conditions are satisfied by , at least when is “typical”. Indeed, what we would expect, by the central limit theorem, is that will approximate a bivariate normal distribution with mean 0 (since both and have mean zero). But a bivariate normal distribution is centrally symmetric, so we expect the distribution of to be approximately centrally symmetric, which would imply what we wanted above, since that is equivalent to the statement that .

### Why are we not immediately done?

How can we make the above argument rigorous? The central limit theorem on its own is not enough, for two reasons. The first is that it does not give us information about the speed of convergence to a normal distribution, whereas we need a sum of copies of to be close to normal. The second is that the notion of “close to normal” is not precise enough for our purposes: it will allow us to approximate the probability of an event such as but not of a “probability zero” event such as .

The first of these difficulties is not too worrying, since plenty of work has been done on the speed of convergence in the central limit theorem. In particular, there is a famous theorem of Berry and Esseen that is often used when this kind of information is needed.

However, the Berry-Esseen theorem still suffers from the second drawback. To get round that one needs to turn to more precise results still, known as *local* central limit theorems, often abbreviated to LCLTs. With a local central limit theorem, one can even talk about the probability that takes a specific value after a specific number of steps. Roughly speaking, it says (in its 2-dimensional version) that if is a random variable of mean zero taking values in and if satisfies suitable moment conditions and is not supported in a proper sublattice of , then writing for a sum of copies of , we have that the probability that takes a particular value differs from the “expected” probability (given by a suitable Gaussian formula) by . (I’m not 100% sure I’ve got that right: the theorem in question is Theorem 2.1.1 from this book.)

That looks very close to what we want, but it still falls short. The problem is that the implied constant depends on the random variable . A simple proof of this is that if is not supported in a sublattice but very nearly is — for example, if the probability that it takes a value outside the sublattice is — then one will have to add together an extremely large number of copies of before the sum ceases to be concentrated in the sublattice.

So the situation we appear to be in is the following. We have more precise information about the random variable than is assumed in the LCLT in the reference above, and we want to use that to obtain an explicit constant in the theorem.

It could be that out there in the literature is exactly the result we need, which would be nice, but it also seems possible that we will have to prove an appropriate version of the LCLT for ourselves. I’d prefer the first, but the second wouldn’t be too disappointing, as the problem is quite appealing and even has something of an additive-combinatorial flavour (since it is about describing an iterated convolution of a subset of under appropriate assumptions).

### What properties can we establish about the random variable ?

I said above, with no justification, that we have more precise information about the random variable . Let me now try to give the justification.

First of all, we know everything we could possibly want to know about : it is the uniform distribution on . (In particular, if is odd, then it is the uniform distribution on the set of integers in .)

How about the distribution of ? That question is equivalent to asking about the values taken by , and their multiplicities. There is quite a lot one can say about those. For example, I claim that with high probability (if is a random -sided die) is never bigger than . That is because if we choose a fully random sequence , then the expected number of such that is , and the probability that this number differs from by more than is , by standard probabilistic estimates, so if we set , then this is at most , which we can make a lot smaller than by choosing to be, say, . (I think can be taken to be 1/8 if you want me to be more explicit.) Since the probability that is proportional to , it follows that this conclusion continues to hold even after we condition on that event.

Another simple observation is that the values taken by are not contained in a sublattice (assuming, that is, that is ever non-zero). That is simply because and averages zero.

A third simple observation is that with probability 1-o(1) will take a value of at least at least somewhere. I’ll sketch a proof of this. Let be around and let be evenly spaced in , staying away from the end points 1 and . Let be a purely random sequence in . Then the standard deviation of is around , so the probability that it is less than is around . The same is true of the conditional probability that is less than conditioned on the value of (the worst case being when this value is 0). So the probability that this happens for every is at most . This is much smaller than , so the conclusion remains valid when we condition on the sum of the being . So the claim follows. Note that because of the previous simple observation, it follows that must be at least in magnitude at least times, so up to log factors we get that is at least . With a bit more effort, it should be possible to push this up to something more like , since one would expect that would have rough order of magnitude for a positive fraction of the . Maybe this would be a good subproblem to think about, and ideally not too difficult.

How about the joint distribution ? It seems highly likely that for typical this will not be concentrated in a lattice, and that elementary arguments such as the above can be used to prove this. But let me indicate the kind of situation that we would have to prove is not typical. Suppose that and . Then as runs from 1 to 15 the values taken by are and the values taken by are . For this example, all the points live in the lattice of points such that is a multiple of 5.

This wouldn’t necessarily be a disaster for us actually, since the LCLT can be restricted to a sublattice and if after conditioning on we happen to have that is always a multiple of 5, that isn’t a problem if we still have the central symmetry. But it would probably be nicer to prove that it is an atypical occurrence, so that we don’t have to worry about living inside a sublattice (or even being concentrated in one).

My guess is that if we were to pursue these kinds of thoughts, we would end up being able to prove a statement that would say something like that takes a pretty representative sample of values with being between and and being in a range of width around . I would expect, for example, that if we add three or four independent copies of , then we will have a distribution that is similar in character to the uniform distribution on a rectangle of width of order of magnitude and height of order of magnitude . And if that’s true, then adding of them should give us something very close to normal (in an appropriate discrete sense of the word “normal”).

### What next?

There are two obvious tasks here. One is to try to prove as much as we can about the random variable . The other is to try to prove a suitable LCLT that is strong enough to give us that the probability that given that is approximately 1/2, under suitable assumptions about . And then we have to hope that what we achieve for the first is sufficient for the second.

It’s possible that the second task can be achieved by simply going through one of the existing proofs of the LCLT and being more careful about the details. But if that’s the case, then we should spend some time trying to find out whether anyone has done it already, since there wouldn’t be much point in duplicating that work. I hope I’ve set out what we want clearly enough for any probabilist who might stumble upon this blog post to be able to point us in the right direction if indeed the result we want is out there somewhere.

May 27, 2017 at 10:20 pm |

What about the other “first” conjecture, that the fraction of ties goes to zero?

To what extent does this approach rely on that?

May 27, 2017 at 11:33 pm

I think that that will come out in the wash. My hope is that with a strong enough LCLT, we’ll be able to say that the probability that is small enough that even when we condition on it remains small.

May 28, 2017 at 12:11 am |

There is a different model of dice, which may be worth playing with, and which some of these desired attributes are easier to see.

Consider sorted sequence proper dice, where no number is repeated more than twice. This model of dice have a nice concept of a “negative” die, an involution such that the score(A,B), which is the number of roll pairs where A beats B – pairs where B beats A, has the nice property:

score(A,B) = – score(A,-B)

So you automatically get that for all die A, it beats exactly as many dice and it loses to. The distribution is perfectly symmetric.

Now, this says nothing about the number of ties, but with your approach would this be enough to conclude the following little conjecture?

For dice (in this restricted model) A,B,C such that A > B > C, (where > is the “beats” relation), is it just as probable that A > C as C > A.

May 28, 2017 at 12:40 am

To be more explicit, and help organize my thoughts, here is why the above model has those nice properties.

Instead of looking at a die in its sequence representation (a_1, a_2, … a_n), consider it in the multiset representation (m_1, m_2, .. m_n) where m_i is the number of dice face with value i.

Then the score function is just an antisymmetric bilinear function, and we can use linear algebra.

With the matrix:

S_{ij} = sign(j-i)

and denoting the column vector v(A) as the vector with components equal to the multiset representation of the die A,

the score function is then:

score(A,B) = v(A)^T . S . v(B)

In this representation, the standard die is a vector with all entries 1.

v(std) = [1,1,1,…,1]

Since the multiset representation already restricts the values of die face to the set of integers 1 to n, the only remaining constraint is the sum. This sum constraint is equivalent to saying:

score(std,A) = v(std)^T . S . v(A) = 0

As we are only considering multisets which meets that constraint, there is a more convenient representation, where we only consider the difference to the std die. Denoting this representation as w(A),

w(A) = v(A) – v(std)

the score function retains the same form:

score(A,B) = w(A)^T . S . w(A)

and it is obvious that the standard die ties with everything because it is just the zero vector in this representation.

If we constraint to the dice with multiplicity of values <= 2, now it becomes obvious what the "negative" relation is and why this restriction allows it.

w(-A) = – w(A)

And since the score function is bilinear, we immediately get:

score(-A,B) = – score(A,B)

The reason this doesn't work more generally, is that it doesn't make sense to have a negative number of faces with some number j.

The "one step" dice considered in the arxiv paper Gowers linked when starting this discussion, are a "basis" for the space of all dice in that these steps can be added as vectors in the w(A) representation to get to all other dice. Exploring what the score function on the space of all dice pairs looks like, using this basis as a guide, may be fruitful because the scores has already been discussed for this basis in the arxiv paper.

The "max2 multiset dice" is the maximal expansion about the origin (the standard die) with the "one step" basis, while still allowing all dice to have a natural "negative" die. I did some computational tests and this is enough to make the fraction of ties go to zero as n increases, and yet the larger "random tournament" conjecture still appears to fail. Strangely, it even appears to be failing with the same ratios for k=4 (I did not heavily test this, and at this point still likely to be a coincidence).

So this appears to be a nice small model that has all the same features as the full larger model.

May 29, 2017 at 11:29 am |

One more reference. Recent paper “Approximate Local Limit Theorems with Effective Rate and Application to Random Walks in Random Scenery” by Rita Giuliano and Michel Weber, published in a journal 6 days ago(!), but available in arxiv since 2014 (please google it, I am afraid that if I provide a link, this message will go to the moderation queue).

When I read the abstract that it proves “local limit theorem for sums of independent lattice valued random variables, with effective error term, that is with explicit parameters and universal constants”, I even hoped it would solve our problem immediately. Unfortunately, only one-dimensional case is covered there.

May 29, 2017 at 2:08 pm

Just for the record, comments with two links go to the moderation queue, but comments with just one link are fine. In any case, here is a link to the paper you mention.

It’s difficult to know what to do at this point. I imagine that if one fully understood that paper, it would be a relatively routine matter to generalize its main result to two dimensions, but that looks like quite a bit investment of time.

But perhaps we should try to achieve an understanding of local limit theorems in a polymathematical way. I can’t find it now, but I have some memory of Terence Tao organizing something like this once. That is, instead of several people going away and trying to understand the literature on this topic, perhaps we could simply think of the remaining task — to prove an appropriate LCLT — as a Polymath project with a slightly unusual character, in the sense that all the techniques needed are probably out there in the literature, so an obvious way to proceed is to try to understand those techniques.

It’s not clear to me that we necessarily want to follow the approach of the paper above, but perhaps we do. My instinct would be to use a characteristic-functions approach, but perhaps there are difficulties in making that sufficiently precise. So I think what I’ll try to do next is start writing out an argument and see where I get stuck (which should be pretty soon, as I’m not up on this kind of thing).

But I’m not trying to argue that we

shoulduse characteristic functions. If someone else can see how to do it in a different way, that’s just fine by me.May 29, 2017 at 3:17 pm |

The rough idea of the characteristic-functions approach, which I’ll specialize to the 2-dimensional case, is as follows. (Apologies to anyone who knows about this properly for anything idiotic I might accidentally write.) Let be a random variable on and write for . If we take independent copies of and add them together, then the probability of being at is

where that denotes the -fold convolution.

Now let’s define the Fourier transform of in the usual way by

.

Here and belong to , but I’ll sometimes think of them as belonging to too.

We have the convolution law that and the inversion formula

Putting these together, we find that if random variables are independent copies of , then the probability that their sum is is

.

The very rough reason that we should now expect a Gaussian formula is that we consider a Taylor expansion of . We can assume for our application that and have mean zero. From that one can argue that the coefficients of the linear terms in the Taylor expansion are zero. (I’ll give more details in a subsequent comment.) The constant term is 1, and the quadratic terms give us the covariance matrix of and . If we assume that we can approximate by an expression of the form for some suitable quadratic form in and , then the th power should be close to , and then, since Fourier transforms (and inverse Fourier transforms) take Gaussians to Gaussians, when we invert this one, we should get a Gaussian-type formula for . (I’m glossing over the point that Gaussians are defined on , whereas and live in and and live in . If most of is supported in a small region around 0, then one can hope that this isn’t too much of a problem.)

May 29, 2017 at 3:26 pm |

Let as above. Then if we partially differentiate times with respect to and times with respect to we obtain the expression

.

Setting turns this into . That justifies the statement that if then there is no linear term, and that the quadratic term is proportional to the covariance matrix.

May 29, 2017 at 3:52 pm |

As I mentioned in parentheses at the end of the previous comment but one, we want to be supported in a small set about . Since is quite large, this is saying that we don’t want to be close to 1 except if is close to .

If lives in a lattice, then this is false. Suppose, for example, that only ever takes even values. Then

.

So we will want a condition that says that isn’t supported, even approximately, in a lattice.

May 29, 2017 at 4:31 pm |

A useful thing to do might be to come up with a conjecture for when the -fold convolution of a probability distribution on should be Gaussian. In order to make a start on that, I’ll list some situations where this does

nothappen.1. A first situation is when the distribution is concentrated in a proper sublattice. I’ve already mentioned this several times.

2. A generalization of the above is when we have a sublattice and a centrally symmetric set such that the -fold sumset does not contain any points of other than the origin. If we now take a probability distribution that is concentrated in , then its -fold convolution will be concentrated in : in other words, it will live inside an array of “blobs” rather than coalescing into a single “blob”.

3. That can be generalized further to an arbitrary multidimensional arithmetic progression that is sufficiently “proper” that its -fold sumset is still a progression of the same dimension.

4. This isn’t a genuinely distinct example from 3, but it is an extreme case of it: one can take a probability distribution that is concentrated in a highly dissociated set: so much so that no two distinct sums of elements of the set are equal.

Now some of the facts we know about the random variable appear to rule out these kinds of examples very quickly. For instance, the fact that is bounded by not much more than and is uniformly distributed in an interval of width or so means that there just isn’t enough space for highly dissociated sets, or sets whose -fold sumsets don’t swallow up everything around them. Indeed, I can’t immediately see a counterexample to the following statement (though there might easily be one, in which case the statement would need to be adjusted).

Very tentative conjecture.Let be a probability distribution on with mean and suppose that is supported in . Suppose also that the sum of over any proper sublattice of is at most . Then the -fold convolution of is approximately Gaussian.By “approximately Gaussian” I mean approximated sufficiently well by a formula of the form that we can deduce that, writing for the -fold convolution,

.

I’ve just realized that the formulation above rules out a “bad” example that I failed to mention above. I haven’t been thinking about the fact that one kind of “atypical” die is one where for almost all . If that were the case, then we would have a distribution that is highly concentrated at the origin (which is why the formulation above rules it out), which would effectively mean that we were taking a random walk with far fewer steps, which in turn would mean that it would have far less time to mix, and the bad examples listed above would be much easier to find.

Ah, but that observation leads me straight away to another bad example that gives a counterexample to the conjecture above. Let be the set . Suppose now that we have a probability distribution that is highly concentrated in , but very occasionally takes the value or . If is significantly bigger than , then after steps we’ll get blobs around multiples of rather than a single blob.

This comment is getting a bit bloated, so I’ll stop it now and try to think about what a revised conjecture should look like.

May 29, 2017 at 5:44 pm

It’s possibly useful to think about the highly dissociated case in Fourier terms. For simplicity I’ll discuss the one-dimensional case. Suppose we have a random variable that is uniformly distributed in a highly dissociated set . Then let’s consider the Fourier transform

.

Because the are highly dissociated, if we choose randomly from , the numbers behave like independent random variables. (This is an intuition that can be made rigorous using standard techniques.) Therefore, every so often they will all be close to 1 and our wish that that should happen only near the origin will be thwarted.

I’m now wondering whether it might be possible to prove directly that does not get close to 1 away from the origin (when is the probability distribution corresponding to the random variable .)

May 29, 2017 at 5:02 pm |

Post of P. Peng reminded me that the original conjectures were formulated in a different model, in which dice is a non-decreasing sequence of its values, or, equivalently, is given by a vector , where is the number of faces with value i. We have conditions for all i, and and .

We can avoid equality constraints by expressing and from them: , hence . Similarly, , hence .

So, now our dices are just integer points in n-2-dimensional polytope defined by inequalities , , and .

There is a lot of literature about counting integer points in a polytope, but, intuitively, the number of such point should usually be approximately equal to the volume of the polytope.

Now, fix dice A with the corresponding representation . Whether it bits depends on the sign of the . Substituting and from the expressions above we get inequality of the form .

So, in fact a given n-2-dimentional polytope intersects with a hyperplane. For ties conjecture, we need to prove that the fraction of integer points on the hyperplane is small. For the “small conjecture”, we need to show that hyperplanes arising from almost all dices A divides integer points in the polytope by about half. As a first step, we would need to prove that the volume of the polytope is divided into two almost equal parts.

May 29, 2017 at 11:02 pm

About ties conjecture. As above, each dice V can be represented as a point in n-2-dimentional space. Let be the number of such dices of size n.

Each dice V is in tie with some dice A if this point belongs to hyperplane with equation . So, we have points and hyperplanes, and would like to bound the total number of incidences between them.

There is a multidimentional Szemerédi–Trotter theorem https://en.wikipedia.org/wiki/Szemer%C3%A9di%E2%80%93Trotter_theorem , which states that the number of incidences between k points and m hyperplanes in dimension d is bounded by . But this bound is useless for , which is our case. With $m=k$, even if each hyperplane contains each point, the number of incidences is , much less than …

But there may be other versions of the multidimentional Szemerédi–Trotter theorem, more suitable to our parameter range…

May 29, 2017 at 11:22 pm

But then we need to eliminate the case when a significant fraction of points lies in some n-4-dimensional subspace, shared by a significant fraction of hyperplanes… Yes, this is also not easy.

May 29, 2017 at 7:02 pm |

By chasing references from the paper Bogdan found, I came across this paper by D. R. McDonald, which has a sufficiently friendly account of how the “Bernoulli decomposition” approach to proving LCLTs works that I think I get the basic idea, which is a rather nice and natural one.

In very broad terms, the idea is that we know by the CLT what the approximate probability is that the sum of copies of lies in a given region of the plane. (Of course, we would need a suitably explicit form of the Berry-Esseen theorem to get this.) So to prove that the probability at an individual point at is what it should be, it is sufficient to prove that within a small region the probabilities are roughly constant.

How do we do that? Well in the 1D case it appears to go something like this. Let be a probability distribution on . One defines a parameter . If that parameter is not too small, it implies that there is a kind of smearing effect that causes the sum of many copies to be locally constant.

What I’m about to say next isn’t from the paper but I think it’s probably roughly equivalent to it, or else wrong. Anyhow, we have the option, if the parameter is large, of decomposing the random walk as follows. First, let’s focus just on two neighbouring points and with probabilities and . We could achieve the same probabilities by the following process. We define a new probability distribution that’s the same as except that and . Then if we happen to pick , we decide to add 1 to it with probability . Note that if the ratio of and isn’t huge or tiny, then is bounded away from 1. Also, the contribution to from pairs where one of and is at most times the other is at most , so if the parameter is not too small then we get plenty of pairs where and are of reasonably comparable size.

So now let us define a new random variable with probabilities given by a function , where for many disjoint pairs we set and , and for the other values we set .

Now we can think of each step of the random walk as being “decomposed” as follows. We begin by choosing a random point using the probability distribution . If we happen to pick from one of the pairs then we take a step to the right with probability .

After we have done this times, we have taken a sum of independent copies of , and have added to that a sum of a certain number of independent Bernoulli random variables. Now when I say “independent” I mean that they are independent of each other, but the number of variables we take, and what they are, depend on which s we happened to land on when we were choosing -random values. But with high probability we will choose a representative sample of these Bernoulli random variables and will add together independent copies of them.

So the outcome should be that we get a broad Gaussian behaviour out of the sum of the variables, which is then sort of “convolved” (it isn’t exactly a convolution, but might resemble one enough for what I’m writing not to be total nonsense) with a flat enough Gaussian obtained from adding together independent Bernoullis for the broad behaviour to tell us what the individual probabilities are.

Assuming that something like that is correct, it should help us think about what an appropriate 2D version might be like. Here’s one possibility. If , then , so . This we would expect to happen a positive proportion of the time, and each time it does, it tells us that there is a pair of points such that both of them occur with probability .

If exactly once, then , which gives us a pair of points that both occur with probability . Again, I would expect this to happen a positive proportion of the time.

So it looks as though we should be able to decompose our random walk into a part that gives us a global Gaussian, and two little Bernoulli parts in independent directions that give us the flatness.

So I no longer feel so sure that characteristic functions is the way to go.

May 29, 2017 at 11:11 pm |

Ah, but here is a thought that makes me swing back in favour of characteristic functions (though not necessarily permanently). It’s that the type of condition introduced by McDonald is also very useful for that approach too. I’ll discuss the 1D case but it generalizes straightforwardly to the 2D case too.

Suppose we know that with positive probability if we choose a random according to the distribution , we have that and are comparable. I claim that this implies that is not close to 1 except when is close to 0. The proof is easy. The only way that can be close to 1 is if -almost every is close to 1, which is equivalent to saying that -almost every is close to an integer. But if is not close to zero (mod 1) and is close to an integer, then is close to , which is not close to zero (all of this being mod 1. And since with non-negligible -probability is such that has comparable probability to , this implies that an -random has a non-negligible probability of being close to , and in particular not being close to 1. Therefore, is not approximately equal to 1.

So it looks to me as though if we can prove a McDonald-type condition for the random variable we are interested in (when is typical), then we’ll have a good chance of making the characteristic-functions approach sufficiently quantitative for the purposes of the transitivity question.

May 29, 2017 at 11:13 pm

I should add that I think proving that kind of condition is straightforward, by essentially the same technique that I’ve been using for other statements of this kind: we prove that it holds with very high probability for a fully independent sequence, and then conditioning on the sum of the sequence doesn’t mess things up.

May 30, 2017 at 8:35 am |

Here then is a plan for a possible proof.

1. Get bounds for all the moments , where is the random variable . Here is a fixed die with typical properties and is chosen uniformly at random from .

2. Use these to obtain a Taylor expansion for the characteristic function with a very explicit bound for the error term.

3. Prove that satisfies a McDonald-type condition (see above comments for what this means).

4. Deduce that outside a small ball about the origin (which will be stretched quite a lot more in the direction than the direction, but this is not important) the characteristic function is bounded away from 1.

5. From 2 and 4 deduce that the th power of the characteristic function can be very well approximated by a Gaussian both near and far away from it.

6. Deduce the result (since the Fourier transform of a Gaussian is a Gaussian).

May 30, 2017 at 1:33 pm

I’ll try to say more about this later, but I had a small think about steps 3 and 4, and I now think that the 2D McDonald condition I was talking about isn’t quite strong enough for our purposes here, but that it should still be possible to prove some condition of a similar kind that

wouldbe strong enough. But to be sure about this, I’m going to need to do some proper calculations.May 30, 2017 at 12:22 pm |

Very interesting stuff! Two small typos: in the calculation of you type $\sum_i(n – a_j + 1)$. Shouldn’t be there? And 10 lines below you claim that B beats A precisely when is positive, while I believe that the variable should be there. Good luck trying to finish the proof, while I keep struggling to understand anything you wrote ;)!

Thanks for pointing those out — fixed now.