I have now completed a draft of a write-up of a proof of the following statement. Recall that a random -sided die (in the balanced-sequences model) is a sequence of length
of integers between 1 and
that add up to
, chosen uniformly from all such sequences. A die
beats a die
if the number of pairs
such that
exceeds the number of pairs
such that
. If the two numbers are the same, we say that
ties with
.
Theorem. Let and
be random
-sided dice. Then the probability that
beats
given that
beats
and
beats
is
.
In this post I want to give a fairly detailed sketch of the proof, which will I hope make it clearer what is going on in the write-up.
The first step is to show that the theorem is equivalent to the following statement.
Theorem. Let be a random
-sided die. Then with probability
, the proportion of
-sided dice that
beats is
.
We had two proofs of this statement in earlier posts and comments on this blog. In the write-up I have used a very nice short proof supplied by Luke Pebody. There is no need to repeat it here, since there isn’t much to say that will make it any easier to understand than it already is. I will, however, mention once again an example that illustrates quite well what this statement does and doesn’t say. The example is of a tournament (that is, complete graph where every edge is given a direction) where every vertex beats half the other vertices (meaning that half the edges at the vertex go in and half go out) but the tournament does not look at all random. One just takes an odd integer and puts arrows out from
to
mod
for every
, and arrows into
for every
. It is not hard to check that the probability that there is an arrow from
to
given that there are arrows from
to
and
to
is approximately 1/2, and this turns out to be a general phenomenon.
So how do we prove that almost all -sided dice beat approximately half the other
-sided dice?
The first step is to recast the problem as one about sums of independent random variables. Let stand for
as usual. Given a sequence
we define a function
by setting
to be the number of
such that
plus half the number of
such that
. We also define
to be
. It is not hard to verify that
beats
if
, ties with
if
, and loses to
if
.
So our question now becomes the following. Suppose we choose a random sequence with the property that
. What is the probability that
? (Of course, the answer depends on
, and most of the work of the proof comes in showing that a “typical”
has properties that ensure that the probability is about 1/2.)
It is convenient to rephrase the problem slightly, replacing by
. We can then ask it as follows. Suppose we choose a sequence
of
elements of the set
, where the terms of the sequence are independent and uniformly distributed. For each
let
. What is the probability that
given that
?
This is a question about the distribution of , where the
are i.i.d. random variables taking values in
(at least if
is odd — a small modification is needed if
is even). Everything we know about probability would lead us to expect that this distribution is approximately Gaussian, and since it has mean
, it ought to be the case that if we sum up the probabilities that
over positive
, we should get roughly the same as if we sum them up over negative
. Also, it is highly plausible that the probability of getting
will be a lot smaller than either of these two sums.
So there we have a heuristic argument for why the second theorem, and hence the first, ought to be true.
There are several theorems in the literature that initially seemed as though they should be helpful. And indeed they were helpful, but we were unable to apply them directly, and had instead to develop our own modifications of their proofs.
The obvious theorem to mention is the central limit theorem. But this is not strong enough for two reasons. The first is that it tells you about the probability that a sum of random variables will lie in some rectangular region of of size comparable to the standard deviation. It will not tell you the probability of belonging to some subset of the y-axis (even for discrete random variables). Another problem is that the central limit on its own does not give information about the rate of convergence to a Gaussian, whereas here we require one.
The second problem is dealt with for many applications by the Berry-Esseen theorem, but not the first.
The first problem is dealt with for many applications by local central limit theorems, about which Terence Tao has blogged in the past. These tell you not just about the probability of landing in a region, but about the probability of actually equalling some given value, with estimates that are precise enough to give, in many situations, the kind of information that we seek here.
What we did not find, however, was precisely the theorem we were looking for: a statement that would be local and 2-dimensional and would give information about the rate of convergence that was sufficiently strong that we would be able to obtain good enough convergence after only steps. (I use the word “step” here because we can think of a sum of
independent copies of a 2D random variable as an
-step random walk.) It was not even clear in advance what such a theorem should say, since we did not know what properties we would be able to prove about the random variables
when
was “typical”. That is, we knew that not every
worked, so the structure of the proof (probably) had to be as follows.
1. Prove that has certain properties with probability
.
2. Using these properties, deduce that the sum converges very well after
steps to a Gaussian.
3. Conclude that the heuristic argument is indeed correct.
The key properties that needed to have were the following two. First, there needed to be a bound on the higher moments of
. This we achieved in a slightly wasteful way — but the cost was a log factor that we could afford — by arguing that with high probability no value of
has magnitude greater than
. To prove this the steps were as follows.
- Let
be a random element of
. Then the probability that there exists
with
is at most
(for some
such as 10).
- The probability that
is at least
for some absolute constant
.
- It follows that if
is a random
-sided die, then with probability
we have
for every
.
The proofs of the first two statements are standard probabilistic estimates about sums of independent random variables.
The second property that needed to have is more difficult to obtain. There is a standard Fourier-analytic approach to proving central limit theorems, and in order to get good convergence it turns out that what one wants is for a certain Fourier transform to be sufficiently well bounded away from 1. More precisely, we define the characteristic function of the random variable
to be
where is shorthand for
,
, and
and
range over
.
I’ll come later to why it is good for not to be too close to 1. But for now I want to concentrate on how one proves a statement like this, since that is perhaps the least standard part of the argument.
To get an idea, let us first think what it would take for to be very close to 1. This condition basically tells us that
is highly concentrated mod 1: indeed, if
is highly concentrated, then
takes approximately the same value almost all the time, so the average is roughly equal to that value, which has modulus 1; conversely, if
is not highly concentrated mod 1, then there is plenty of cancellation between the different values of
and the result is that the average has modulus appreciably smaller than 1.
So the task is to prove that the values of are reasonably well spread about mod 1. Note that this is saying that the values of
are reasonably spread about.
The way we prove this is roughly as follows. Let , let
be of order of magnitude
, and consider the values of
at the four points
and
. Then a typical order of magnitude of
is around
, and one can prove without too much trouble (here the Berry-Esseen theorem was helpful to keep the proof short) that the probability that
is at least , for some positive absolute constant
. It follows by Markov’s inequality that with positive probability one has the above inequality for many values of
.
That’s not quite good enough, since we want a probability that’s very close to 1. This we obtain by chopping up into intervals of length
and applying the above argument in each interval. (While writing this I’m coming to think that I could just as easily have gone for progressions of length 3, not that it matters much.) Then in each interval there is a reasonable probability of getting the above inequality to hold many times, from which one can prove that with very high probability it holds many times.
But since is of order
,
is of order 1, which gives that the values
are far from constant whenever the above inequality holds. So by averaging we end up with a good upper bound for
.
The alert reader will have noticed that if , then the above argument doesn’t work, because we can’t choose
to be bigger than
. In that case, however, we just do the best we can: we choose
to be of order
, the logarithmic factor being there because we need to operate in many different intervals in order to get the probability to be high. We will get many quadruples where
and this translates into a lower bound for of order
, basically because
has order
for small
. This is a good bound for us as long as we can use it to prove that
is bounded above by a large negative power of
. For that we need
to be at least
(since
is about
), so we are in good shape provided that
.
The alert reader will also have noticed that the probabilities for different intervals are not independent: for example, if some is equal to
, then beyond that
depends linearly on
. However, except when
is very large, this is extremely unlikely, and it is basically the only thing that can go wrong. To make this rigorous we formulated a concentration inequality that states, roughly speaking, that if you have a bunch of
events, and almost always (that is, always, unless some very unlikely event occurs) the probability that the
th event holds given that all the previous events hold is at least
, then the probability that fewer than
of the events hold is exponentially small in
. The proof of the concentration inequality is a standard exponential-moment argument, with a small extra step to show that the low-probability events don’t mess things up too much.
Incidentally, the idea of splitting up the interval in this way came from an answer by Serguei Popov to a Mathoverflow question I asked, when I got slightly stuck trying to prove a lower bound for the second moment of . I eventually didn’t use that bound, but the interval-splitting idea helped for the bound for the Fourier coefficient as well.
So in this way we prove that is very small if
. A simpler argument of a similar flavour shows that
is also very small if
is smaller than this and
.
Now let us return to the question of why we might like to be small. It follows from the inversion and convolution formulae in Fourier analysis. The convolution formula tells us that the characteristic function of the sum of the
(which are independent and each have characteristic function
) is
. And then the inversion formula tells us that
What we have proved can be used to show that the contribution to the integral on the right-hand side from those pairs that lie outside a small rectangle (of width
in the
direction and
in the
direction, up to log factors) is negligible.
All the above is true provided the random -sided die
satisfies two properties (the bound on
and the bound on
), which it does with probability
.
We now take a die with these properties and turn our attention to what happens inside this box. First, it is a standard fact about characteristic functions that their derivatives tell us about moments. Indeed,
,
and when this is
. It therefore follows from the two-dimensional version of Taylor’s theorem that
plus a remainder term that can be bounded above by a constant times
.
Writing for
we have that
is a positive semidefinite quadratic form in
and
. (In fact, it turns out to be positive definite.) Provided
is small enough, replacing it by zero does not have much effect on
, and provided
is small enough,
is well approximated by
.
It turns out, crucially, that the approximations just described are valid in a box that is much bigger than the box inside which has a chance of not being small. That implies that the Gaussian decays quickly (and is why we know that
is positive definite).
There is a bit of back-of-envelope calculation needed to check this, but the upshot is that the probability that is very well approximated, at least when
and
aren’t too big, by a formula of the form
.
But this is the formula for the Fourier transform of a Gaussian (at least if we let and
range over
, which makes very little difference to the integral because the Gaussian decays so quickly), so it is the restriction to
of a Gaussian, just as we wanted.
When we sum over infinitely many values of and
, uniform estimates are not good enough, but we can deal with that very directly by using simple measure concentration estimates to prove that the probability that
is very small outside a not too large box.
That completes the sketch of the main ideas that go into showing that the heuristic argument is indeed correct.
Any comments about the current draft would be very welcome, and if anyone feels like working on it directly rather than through me, that is certainly a possibility — just let me know. I will try to post soon on the following questions, since it would be very nice to be able to add answers to them.
1. Is the more general quasirandomness conjecture false, as the experimental evidence suggests? (It is equivalent to the statement that if and
are two random
-sided dice, then with probability
, the four possibilities for whether another die beats
and whether it beats
each have probability
.)
2. What happens in the multiset model? Can the above method of proof be adapted to this case?
3. The experimental evidence suggests that transitivity almost always occurs if we pick purely random sequences from . Can we prove this rigorously? (I think I basically have a proof of this, by showing that whether or not
beats
almost always depends on whether
has a bigger sum than
. I’ll try to find time reasonably soon to add this to the draft.)
Of course, other suggestions for follow-up questions will be very welcome, as will ideas about the first two questions above.
July 26, 2017 at 8:48 pm |
It occurs to me that we could follow up (in addition to #3) by looking at a different small change that breaks the intransitivity. In particular, the “improper dice” model in the paper that allowed values > n, but still constrains the average value. They suggested that this improper dice model leads to a more transitive “beats” relation.
To better fit the improper dice model with this proof technique, it can be viewed as looking at [n]^n sequences, and conditioning on the sum being m(m+1)/2 for some m<n. Since this is just conditioning on a different V value, maybe this is just a small step from the current results.
Having one model where average value constraint isn't sufficient, along with one showing removing the average value constraint breaks things (if the proof for #3 works out), it feels like the constraint is necessary but must be "balanced". Another nugget pointing this way is that the multiset model also appears to work, and the constraints appear "balanced" here as well.
Notice in the unconstrained [n]^n sequence, the "complementary die" involution often has a different average value. And notice for the "improper dice" model this involution does not exist (the complementary die may not even be in the set). I wonder if these two conditions together are necessary.
Are there any models that both have a constrained average, and the involution exists, yet the dice are transitive? I think someone mentioned there is a "continuous" model of the dice which is transitive, which would be an example. Has this been shown? And are there any finite models of dice that also displays this?
July 27, 2017 at 12:28 am |
Here is an idea to show the tournament is not quasirandom. For two dice
and
, let
. If
and
are independent typical dices, this has order of magnitude
. Assume that
and
are very close to each other but not too close from the “trivial” dice
in the sense that
but
with
small compared to
. If
is uniform in
, then
is approximately Gaussian with
, so we should have
. Hence, if
is conditioned to be balanced,
with high probability, while
, which is much larger. Therefore, with high probability,
and
have the same sign, so the probability that
beats both
and
is not
.
The harder part would be to show that if
and
are uniform balanced dice, they are close from each other (and not too close from
) with macroscopic probability. It seems more or less equivalent to proving that the sequence of processes
is tight and does not converge to
(it should actually converge to some Brownian motion-related process, the most natural candidate is a Brownian bridge conditioned to have integral
on
).
July 30, 2017 at 7:57 pm
Sorry to be so slow to respond, but this seems like a very promising approach. Also, I feel reasonably optimistic that what you describe as the harder part may in fact be not too hard. Suppose, for instance, that we split
into
equal-sized intervals
for some suitably chosen absolute constant
. Then setting
I think it is the case that, with macroscopic probability, in each of the first
intervals there are approximately
points from
and in each of the last
intervals there are approximately
points. (Here “approximately” means something like “to within
.)
July 30, 2017 at 11:26 pm
Maybe it would work out simpler to follow the approach in this post. For this comment, I’ll assume enough familiarity with that post to know what I mean by the random variables
and
.
The condition for the strong conjecture set out in that post is that with probability
we should have
But it seems to me that that could easily be false: with macroscopic probability we should have
small, and therefore with macroscopic probability both
and
are small. But with macroscopic probability that should not imply that
is small.
Actually, it looks as though proving these assertions will be of roughly equivalent difficulty to proving the ones you wanted: perhaps we just want a general theorem that says that certain broad shapes occur with macroscopic probability.
August 6, 2017 at 11:40 am |
This is very nice and I really like the sequence model. Now that we know that balancedness implies (asymptotic) random outcomes for three dice but not for 4 dice we can wonder what additional condition may imply randomness for 4 dice (which if I understood correctly implies pseudorandomness for any number of dice). A natural guess would be that the dice are 1) balanced 2) have the same variance. In other words if the sum of entries of the dice is precisely the expectation (
) and the sum of squares of entries is precisely the expectation (
).
We can also compare (balanced and non balanced) dice based on other Boolean functions.
August 7, 2017 at 6:24 pm
Assuming by “balancedness” you mean constraining the expectation value of a dice roll to be the same for all dice, then balancedness does not imply random outcomes for three dice. For example if we use the sequence model, but constrain on an average other than n(n+1)/2, transitiveness can come back. Or at least that is what is suggested by the “improper” dice in the original paper.
So to get random outcomes for three dice, maybe there is some combination of constrained expectation + “condition X” that would be sufficient. I’m not sure what condition X is, but it still feels to me like the “complementary die” involution plays an important role. I’ve toyed with some ideas but haven’t managed to come up with anything convincing. Such a result would also apply directly to the multiset case (which also meets those conditions).
August 7, 2017 at 11:54 pm
I meant that the value is n(n+1)/2. (sorry not the sum I wrote but the verbal description) The additional condition is that the sum of squares should also be its expected value over all dices.
August 8, 2017 at 5:03 am
Hmm… maybe this should be obvious, but I don’t understand this intuitively: For some reason a constraint on the sum of squares is compatible with the existence of the “complementary die” involution.
If we denote the values of the die faces as v_i, then the constraint on the sum is:
sum v_i = n(n+1)/2
And the “complementary die” will also have the same sum of squares:
sum (n+1-v_i)^2 = sum [ (n+1)^2 – 2(n+1)v_i + (v_i)^2 ] = n(n+1)^2 – 2(n+1)n(n+1)/2 + sum (v_i)^2 = sum (v_i)^2
I’m not sure of a good way to generate random dice with the usual average constraint along with the sum of squares constrained to that of the standard die, but for small n it isn’t too hard to just calculate all of them. With n=14, there are 3946 proper dice with the additional sum of squares constraint. Picking a random die, it beats roughly as many dice as it loses to. But if we pick two at random and check out how it separates the remaining dice into four quadrants, it does not appear to split the space evenly. There still appears to be a lot of correlation.
August 8, 2017 at 8:34 am
I think there may not be any nice condition that guarantees quasirandomness, though I would be happy to be wrong about this.
The reason is connected with the ideas in the comment thread started by Thomas Budzinski just above. If those ideas are roughly correct, then what they tell us is that quasirandomness fails to occur because there is a non-negligible probability that two dice are “close” in a suitable sense. In particular, I currently believe that a proof of non-quasirandomness along the following lines should work, though clearly there will be details to be sorted out.
1. Choose some fixed positive integer
such as 10.
2. Prove that with probability bounded away from zero, the values of
are approximately
when
and
when
.
3. Prove that if
and
both satisfy the above condition, then there is significant correlation between the events “
beats
” and “
beats
“.
Assuming a proof like that can be made to work, I find it very hard to see what extra condition could hope to stop it working. The proof would be telling us that the set of dice is in some sense “low-dimensional”, and that seems hard to cure by passing to a naturally defined subset.
August 12, 2017 at 2:30 pm |
[…] basic idea comes from a comment of Thomas Budzinski. It is to base a proof on the following […]
August 16, 2017 at 7:21 pm |
The book by Lawler and Limic on random walks has a quite precise multidimensional local CLT with error bounds
August 16, 2017 at 11:28 pm
We looked at that book, and it is not impossible that it contains a result we can use. However, it wasn’t clear that there was a result we could quote directly. The problem is that (if my understanding is correct) there is a constant involved that depends on the distribution. And the dependence is necessary, since one could have a distribution that is almost entirely, but not quite, concentrated in a sublattice. What we need is an estimate after
steps of a random walk, where the random walk itself depends on
. So in the end we had to prove explicit bounds on the characteristic function outside a certain small region and go through what I imagine is essentially the standard characteristic-functions approach to LCLT making sure that the final bound obtained is good enough.
However, I don’t rule out that there is in the literature, and perhaps even in the Lawler-Limic book, a result we could quote that would save us from having to prove this small modification of existing results. It would be nice to be able to do that.
August 22, 2017 at 3:09 pm |
[…] This post is to note that the polymath13 project has successfully settled one of the major objective. Reports on it can be found on Gower’s blog especially in this post Intransitive dice IV: first problem more or less solved? and this post Intransitive dice VI: sketch proof of the main conjecture for the balanced-sequences model. […]