I hope, but am not yet sure, that this post is a counterexample to Betteridge’s law of headlines. To back up that hope, let me sketch an argument that has arisen from the discussion so far, which appears to get us close to showing that if and
are three
-sided dice chosen independently at random from the sequences model (I will recap some of these definitions in a moment), then the probability that
beats
given that
beats
and
beats
is
. Loosely speaking, the information that
beats
and
beats
tells you nothing about how likely
is to beat
. Having given the argument, I will try to isolate a statement that looks plausible, not impossible to prove, and sufficient to finish off the argument.
Basic definitions
An –sided die in the sequence model is a sequence
of elements of
such that
, or equivalently such that the average value of
is
, which is of course the average value of a random element of
. A random
-sided die in this model is simply an
-sided die chosen uniformly at random from the set of all such dice.
Given -sided dice
and
, we say that
beats
if
If the two sets above have equal size, then we say that ties with
.
A first reduction
When looking at this problem, it is natural to think about the following directed graph: the vertex set is the set of all -sided dice and we put an arrow from
to
if
beats
.
We believe (and even believe we can prove) that ties are rare. Assuming that to be the case, then the conjecture above is equivalent to the statement that if are three vertices chosen independently at random in this graph, then the probability that
is a directed cycle is what you expect for a random tournament, namely 1/8.
One can also make a more general conjecture, namely that the entire (almost) tournament is quasirandom in a sense defined by Chung and Graham, which turns out to be equivalent to the statement that for almost all pairs of dice, the four possible pairs of truth values for the pair of statements
beats
beats
each occur with probability approximately 1/4. If this is true, then given random dice
, all the
possibilities for which
beat which
have probability approximately
. This would imply, for example, that if
are independent random
-sided dice, then the probability that
beats
given that
beats
for all other pairs
with
is still
.
Several of us have done computer experiments to test these conjectures, and it looks as though the first one is true and the second one false. A further reason to be suspicious of the stronger conjecture is that a natural approach to prove it appears to be morally equivalent to a relationship between the correlations of certain random variables that doesn’t seem to have any heuristic justification or to fit with experimental evidence. So although we don’t have a disproof of the stronger conjecture (I think it would be very interesting to find one), it doesn’t seem like a good idea to spend a lot of effort trying to prove it, unless we can somehow explain away the evidence that appears to be stacking up against it.
The first conjecture turns out to be equivalent to a statement that doesn’t mention transitivity. The very quick proof I’ll give here was supplied by Luke Pebody. Suppose we have a tournament (that is, a complete graph with each edge directed in one of the two possible directions) and write for the out-degree of a vertex
(that is, the number of
such that there is an arrow from
to
) and
for the in-degree. Then let us count the number of ordered triples
such that
. Any directed triangle
in the tournament will give rise to three such triples, namely
and
. And any other triangle will give rise to just one: for example, if
and
we get just the triple
. So the number of ordered triples
such that
and
is
plus twice the number of directed triangles. Note that
is approximately
.
But the number of these ordered triples is also . If almost all in-degrees and almost all out-degrees are roughly
, then this is approximately
, which means that the number of directed triangles is approximately
. That is, in this case, the probability that three dice form an intransitive triple is approximately 1/4, as we are hoping from the conjecture. If on the other hand several in-degrees fail to be roughly
, then
is substantially lower than
and we get a noticeably smaller proportion of intransitive triples.
Thus, the weaker conjecture is equivalent to the statement that almost every die beats approximately half the other dice.
Why should a “typical” die beat approximately half the other dice?
The answer to this is fairly simple, heuristically at least. Let be an arbitrary die. For
define
to be the number of
with
and define
to be
. Then
,
from which it follows that .
We also have that if is another die, then
If we make the simplifying assumption that sufficiently infrequently to make no real difference to what is going on (which is not problematic, as a slightly more complicated but still fairly simple function can be used instead of
to avoid this problem), then we find that to a reasonable approximation
beats
if and only if
is positive.
So what we would like to prove is that if are chosen independently at random from
, then
.
We are therefore led to consider the random variable
where now is chosen uniformly at random from
without any condition on the sum. To write this in a more transparent way, let
be the random variable
, where
is chosen uniformly at random from
. Then
is a sum of
independent copies of
. What we are interested in is the distribution we obtain when we condition the random variable
on
.
This should mean that we are in an excellent position, since under appropriate conditions, a lot is known about sums of independent random variables, and it looks very much as though those conditions are satisfied by , at least when
is “typical”. Indeed, what we would expect, by the central limit theorem, is that
will approximate a bivariate normal distribution with mean 0 (since both
and
have mean zero). But a bivariate normal distribution is centrally symmetric, so we expect the distribution of
to be approximately centrally symmetric, which would imply what we wanted above, since that is equivalent to the statement that
.
Why are we not immediately done?
How can we make the above argument rigorous? The central limit theorem on its own is not enough, for two reasons. The first is that it does not give us information about the speed of convergence to a normal distribution, whereas we need a sum of copies of
to be close to normal. The second is that the notion of “close to normal” is not precise enough for our purposes: it will allow us to approximate the probability of an event such as
but not of a “probability zero” event such as
.
The first of these difficulties is not too worrying, since plenty of work has been done on the speed of convergence in the central limit theorem. In particular, there is a famous theorem of Berry and Esseen that is often used when this kind of information is needed.
However, the Berry-Esseen theorem still suffers from the second drawback. To get round that one needs to turn to more precise results still, known as local central limit theorems, often abbreviated to LCLTs. With a local central limit theorem, one can even talk about the probability that takes a specific value after a specific number of steps. Roughly speaking, it says (in its 2-dimensional version) that if
is a random variable of mean zero taking values in
and if
satisfies suitable moment conditions and is not supported in a proper sublattice of
, then writing
for a sum of
copies of
, we have that the probability that
takes a particular value differs from the “expected” probability (given by a suitable Gaussian formula) by
. (I’m not 100% sure I’ve got that right: the theorem in question is Theorem 2.1.1 from this book.)
That looks very close to what we want, but it still falls short. The problem is that the implied constant depends on the random variable . A simple proof of this is that if
is not supported in a sublattice but very nearly is — for example, if the probability that it takes a value outside the sublattice is
— then one will have to add together an extremely large number of copies of
before the sum ceases to be concentrated in the sublattice.
So the situation we appear to be in is the following. We have more precise information about the random variable than is assumed in the LCLT in the reference above, and we want to use that to obtain an explicit constant in the theorem.
It could be that out there in the literature is exactly the result we need, which would be nice, but it also seems possible that we will have to prove an appropriate version of the LCLT for ourselves. I’d prefer the first, but the second wouldn’t be too disappointing, as the problem is quite appealing and even has something of an additive-combinatorial flavour (since it is about describing an iterated convolution of a subset of under appropriate assumptions).
What properties can we establish about the random variable
?
I said above, with no justification, that we have more precise information about the random variable . Let me now try to give the justification.
First of all, we know everything we could possibly want to know about : it is the uniform distribution on
. (In particular, if
is odd, then it is the uniform distribution on the set of integers in
.)
How about the distribution of ? That question is equivalent to asking about the values taken by
, and their multiplicities. There is quite a lot one can say about those. For example, I claim that with high probability (if
is a random
-sided die)
is never bigger than
. That is because if we choose a fully random sequence
, then the expected number of
such that
is
, and the probability that this number differs from
by more than
is
, by standard probabilistic estimates, so if we set
, then this is at most
, which we can make a lot smaller than
by choosing
to be, say,
. (I think
can be taken to be 1/8 if you want me to be more explicit.) Since the probability that
is proportional to
, it follows that this conclusion continues to hold even after we condition on that event.
Another simple observation is that the values taken by are not contained in a sublattice (assuming, that is, that
is ever non-zero). That is simply because
and
averages zero.
A third simple observation is that with probability 1-o(1) will take a value of at least
at least somewhere. I’ll sketch a proof of this. Let
be around
and let
be evenly spaced in
, staying away from the end points 1 and
. Let
be a purely random sequence in
. Then the standard deviation of
is around
, so the probability that it is less than
is around
. The same is true of the conditional probability that
is less than
conditioned on the value of
(the worst case being when this value is 0). So the probability that this happens for every
is at most
. This is much smaller than
, so the conclusion remains valid when we condition on the sum of the
being
. So the claim follows. Note that because of the previous simple observation, it follows that
must be at least
in magnitude at least
times, so up to log factors we get that
is at least
. With a bit more effort, it should be possible to push this up to something more like
, since one would expect that
would have rough order of magnitude
for a positive fraction of the
. Maybe this would be a good subproblem to think about, and ideally not too difficult.
How about the joint distribution ? It seems highly likely that for typical
this will not be concentrated in a lattice, and that elementary arguments such as the above can be used to prove this. But let me indicate the kind of situation that we would have to prove is not typical. Suppose that
and
. Then as
runs from 1 to 15 the values taken by
are
and the values taken by
are
. For this example, all the points
live in the lattice of points
such that
is a multiple of 5.
This wouldn’t necessarily be a disaster for us actually, since the LCLT can be restricted to a sublattice and if after conditioning on we happen to have that
is always a multiple of 5, that isn’t a problem if we still have the central symmetry. But it would probably be nicer to prove that it is an atypical occurrence, so that we don’t have to worry about
living inside a sublattice (or even being concentrated in one).
My guess is that if we were to pursue these kinds of thoughts, we would end up being able to prove a statement that would say something like that takes a pretty representative sample of values with
being between
and
and
being in a range of width around
. I would expect, for example, that if we add three or four independent copies of
, then we will have a distribution that is similar in character to the uniform distribution on a rectangle of width of order of magnitude
and height of order of magnitude
. And if that’s true, then adding
of them should give us something very close to normal (in an appropriate discrete sense of the word “normal”).
What next?
There are two obvious tasks here. One is to try to prove as much as we can about the random variable . The other is to try to prove a suitable LCLT that is strong enough to give us that the probability that
given that
is approximately 1/2, under suitable assumptions about
. And then we have to hope that what we achieve for the first is sufficient for the second.
It’s possible that the second task can be achieved by simply going through one of the existing proofs of the LCLT and being more careful about the details. But if that’s the case, then we should spend some time trying to find out whether anyone has done it already, since there wouldn’t be much point in duplicating that work. I hope I’ve set out what we want clearly enough for any probabilist who might stumble upon this blog post to be able to point us in the right direction if indeed the result we want is out there somewhere.
May 27, 2017 at 10:20 pm |
What about the other “first” conjecture, that the fraction of ties goes to zero?
To what extent does this approach rely on that?
May 27, 2017 at 11:33 pm
I think that that will come out in the wash. My hope is that with a strong enough LCLT, we’ll be able to say that the probability that
is small enough that even when we condition on
it remains small.
May 28, 2017 at 12:11 am |
There is a different model of dice, which may be worth playing with, and which some of these desired attributes are easier to see.
Consider sorted sequence proper dice, where no number is repeated more than twice. This model of dice have a nice concept of a “negative” die, an involution such that the score(A,B), which is the number of roll pairs where A beats B – pairs where B beats A, has the nice property:
score(A,B) = – score(A,-B)
So you automatically get that for all die A, it beats exactly as many dice and it loses to. The distribution is perfectly symmetric.
Now, this says nothing about the number of ties, but with your approach would this be enough to conclude the following little conjecture?
For dice (in this restricted model) A,B,C such that A > B > C, (where > is the “beats” relation), is it just as probable that A > C as C > A.
May 28, 2017 at 12:40 am
To be more explicit, and help organize my thoughts, here is why the above model has those nice properties.
Instead of looking at a die in its sequence representation (a_1, a_2, … a_n), consider it in the multiset representation (m_1, m_2, .. m_n) where m_i is the number of dice face with value i.
Then the score function is just an antisymmetric bilinear function, and we can use linear algebra.
With the matrix:
S_{ij} = sign(j-i)
and denoting the column vector v(A) as the vector with components equal to the multiset representation of the die A,
the score function is then:
score(A,B) = v(A)^T . S . v(B)
In this representation, the standard die is a vector with all entries 1.
v(std) = [1,1,1,…,1]
Since the multiset representation already restricts the values of die face to the set of integers 1 to n, the only remaining constraint is the sum. This sum constraint is equivalent to saying:
score(std,A) = v(std)^T . S . v(A) = 0
As we are only considering multisets which meets that constraint, there is a more convenient representation, where we only consider the difference to the std die. Denoting this representation as w(A),
w(A) = v(A) – v(std)
the score function retains the same form:
score(A,B) = w(A)^T . S . w(A)
and it is obvious that the standard die ties with everything because it is just the zero vector in this representation.
If we constraint to the dice with multiplicity of values <= 2, now it becomes obvious what the "negative" relation is and why this restriction allows it.
w(-A) = – w(A)
And since the score function is bilinear, we immediately get:
score(-A,B) = – score(A,B)
The reason this doesn't work more generally, is that it doesn't make sense to have a negative number of faces with some number j.
The "one step" dice considered in the arxiv paper Gowers linked when starting this discussion, are a "basis" for the space of all dice in that these steps can be added as vectors in the w(A) representation to get to all other dice. Exploring what the score function on the space of all dice pairs looks like, using this basis as a guide, may be fruitful because the scores has already been discussed for this basis in the arxiv paper.
The "max2 multiset dice" is the maximal expansion about the origin (the standard die) with the "one step" basis, while still allowing all dice to have a natural "negative" die. I did some computational tests and this is enough to make the fraction of ties go to zero as n increases, and yet the larger "random tournament" conjecture still appears to fail. Strangely, it even appears to be failing with the same ratios for k=4 (I did not heavily test this, and at this point still likely to be a coincidence).
So this appears to be a nice small model that has all the same features as the full larger model.
May 29, 2017 at 11:29 am |
One more reference. Recent paper “Approximate Local Limit Theorems with Effective Rate and Application to Random Walks in Random Scenery” by Rita Giuliano and Michel Weber, published in a journal 6 days ago(!), but available in arxiv since 2014 (please google it, I am afraid that if I provide a link, this message will go to the moderation queue).
When I read the abstract that it proves “local limit theorem for sums of independent lattice valued random variables, with effective error term, that is with explicit parameters and universal constants”, I even hoped it would solve our problem immediately. Unfortunately, only one-dimensional case is covered there.
May 29, 2017 at 2:08 pm
Just for the record, comments with two links go to the moderation queue, but comments with just one link are fine. In any case, here is a link to the paper you mention.
It’s difficult to know what to do at this point. I imagine that if one fully understood that paper, it would be a relatively routine matter to generalize its main result to two dimensions, but that looks like quite a bit investment of time.
But perhaps we should try to achieve an understanding of local limit theorems in a polymathematical way. I can’t find it now, but I have some memory of Terence Tao organizing something like this once. That is, instead of several people going away and trying to understand the literature on this topic, perhaps we could simply think of the remaining task — to prove an appropriate LCLT — as a Polymath project with a slightly unusual character, in the sense that all the techniques needed are probably out there in the literature, so an obvious way to proceed is to try to understand those techniques.
It’s not clear to me that we necessarily want to follow the approach of the paper above, but perhaps we do. My instinct would be to use a characteristic-functions approach, but perhaps there are difficulties in making that sufficiently precise. So I think what I’ll try to do next is start writing out an argument and see where I get stuck (which should be pretty soon, as I’m not up on this kind of thing).
But I’m not trying to argue that we should use characteristic functions. If someone else can see how to do it in a different way, that’s just fine by me.
May 29, 2017 at 3:17 pm |
The rough idea of the characteristic-functions approach, which I’ll specialize to the 2-dimensional case, is as follows. (Apologies to anyone who knows about this properly for anything idiotic I might accidentally write.) Let
be a random variable on
and write
for
. If we take
independent copies of
and add them together, then the probability of being at
is
where that denotes the
-fold convolution.
Now let’s define the Fourier transform of
in the usual way by
Here
and
belong to
, but I’ll sometimes think of them as belonging to
too.
We have the convolution law that
and the inversion formula
Putting these together, we find that if random variables
are
independent copies of
, then the probability that their sum is
is
The very rough reason that we should now expect a Gaussian formula is that we consider a Taylor expansion of
. We can assume for our application that
and
have mean zero. From that one can argue that the coefficients of the linear terms in the Taylor expansion are zero. (I’ll give more details in a subsequent comment.) The constant term is 1, and the quadratic terms give us the covariance matrix of
and
. If we assume that we can approximate
by an expression of the form
for some suitable quadratic form in
and
, then the
th power should be close to
, and then, since Fourier transforms (and inverse Fourier transforms) take Gaussians to Gaussians, when we invert this one, we should get a Gaussian-type formula for
. (I’m glossing over the point that Gaussians are defined on
, whereas
and
live in
and
and
live in
. If most of
is supported in a small region around 0, then one can hope that this isn’t too much of a problem.)
May 29, 2017 at 3:26 pm |
Let
as above. Then if we partially differentiate
times with respect to
and
times with respect to
we obtain the expression
Setting
turns this into
. That justifies the statement that if
then there is no linear term, and that the quadratic term is proportional to the covariance matrix.
May 29, 2017 at 3:52 pm |
As I mentioned in parentheses at the end of the previous comment but one, we want
to be supported in a small set about
. Since
is quite large, this is saying that we don’t want
to be close to 1 except if
is close to
.
If
lives in a lattice, then this is false. Suppose, for example, that
only ever takes even values. Then
So we will want a condition that says that
isn’t supported, even approximately, in a lattice.
May 29, 2017 at 4:31 pm |
A useful thing to do might be to come up with a conjecture for when the
-fold convolution of a probability distribution on
should be Gaussian. In order to make a start on that, I’ll list some situations where this does not happen.
1. A first situation is when the distribution is concentrated in a proper sublattice. I’ve already mentioned this several times.
2. A generalization of the above is when we have a sublattice
and a centrally symmetric set
such that the
-fold sumset
does not contain any points of
other than the origin. If we now take a probability distribution that is concentrated in
, then its
-fold convolution will be concentrated in
: in other words, it will live inside an array of “blobs” rather than coalescing into a single “blob”.
3. That can be generalized further to an arbitrary multidimensional arithmetic progression that is sufficiently “proper” that its
-fold sumset is still a progression of the same dimension.
4. This isn’t a genuinely distinct example from 3, but it is an extreme case of it: one can take a probability distribution that is concentrated in a highly dissociated set: so much so that no two distinct sums of
elements of the set are equal.
Now some of the facts we know about the random variable
appear to rule out these kinds of examples very quickly. For instance, the fact that
is bounded by not much more than
and
is uniformly distributed in an interval of width
or so means that there just isn’t enough space for highly dissociated sets, or sets whose
-fold sumsets don’t swallow up everything around them. Indeed, I can’t immediately see a counterexample to the following statement (though there might easily be one, in which case the statement would need to be adjusted).
Very tentative conjecture. Let
be a probability distribution on
with mean
and suppose that
is supported in
. Suppose also that the sum of
over any proper sublattice of
is at most
. Then the
-fold convolution of
is approximately Gaussian.
By “approximately Gaussian” I mean approximated sufficiently well by a formula of the form
that we can deduce that, writing
for the
-fold convolution,
I’ve just realized that the formulation above rules out a “bad” example that I failed to mention above. I haven’t been thinking about the fact that one kind of “atypical” die is one where
for almost all
. If that were the case, then we would have a distribution that is highly concentrated at the origin (which is why the formulation above rules it out), which would effectively mean that we were taking a random walk with far fewer steps, which in turn would mean that it would have far less time to mix, and the bad examples listed above would be much easier to find.
Ah, but that observation leads me straight away to another bad example that gives a counterexample to the conjecture above. Let
be the set
. Suppose now that we have a probability distribution that is highly concentrated in
, but very occasionally takes the value
or
. If
is significantly bigger than
, then after
steps we’ll get blobs around multiples of
rather than a single blob.
This comment is getting a bit bloated, so I’ll stop it now and try to think about what a revised conjecture should look like.
May 29, 2017 at 5:44 pm
It’s possibly useful to think about the highly dissociated case in Fourier terms. For simplicity I’ll discuss the one-dimensional case. Suppose we have a random variable that is uniformly distributed in a highly dissociated set
. Then let’s consider the Fourier transform
Because the
are highly dissociated, if we choose
randomly from
, the numbers
behave like independent random variables. (This is an intuition that can be made rigorous using standard techniques.) Therefore, every so often they will all be close to 1 and our wish that that should happen only near the origin will be thwarted.
I’m now wondering whether it might be possible to prove directly that
does not get close to 1 away from the origin (when
is the probability distribution corresponding to the random variable
.)
May 29, 2017 at 5:02 pm |
Post of P. Peng reminded me that the original conjectures were formulated in a different model, in which dice is a non-decreasing sequence of its values, or, equivalently, is given by a vector
, where
is the number of faces with value i. We have conditions
for all i, and
and
.
We can avoid equality constraints by expressing
and
from them:
, hence
. Similarly,
, hence
.
So, now our dices are just integer points
in n-2-dimensional polytope defined by inequalities
,
, and
.
There is a lot of literature about counting integer points in a polytope, but, intuitively, the number of such point should usually be approximately equal to the volume of the polytope.
Now, fix dice A with the corresponding representation
. Whether it bits
depends on the sign of the
. Substituting
and
from the expressions above we get inequality of the form
.
So, in fact a given n-2-dimentional polytope intersects with a hyperplane. For ties conjecture, we need to prove that the fraction of integer points on the hyperplane is small. For the “small conjecture”, we need to show that hyperplanes arising from almost all dices A divides integer points in the polytope by about half. As a first step, we would need to prove that the volume of the polytope is divided into two almost equal parts.
May 29, 2017 at 11:02 pm
About ties conjecture. As above, each dice V can be represented as a point
in n-2-dimentional space. Let
be the number of such dices of size n.
Each dice V is in tie with some dice A if this point belongs to hyperplane with equation
. So, we have
points and
hyperplanes, and would like to bound the total number of incidences between them.
There is a multidimentional Szemerédi–Trotter theorem https://en.wikipedia.org/wiki/Szemer%C3%A9di%E2%80%93Trotter_theorem , which states that the number of incidences between k points and m hyperplanes in dimension d is bounded by
. But this bound is useless for
, which is our case. With $m=k$, even if each hyperplane contains each point, the number of incidences is
, much less than
…
But there may be other versions of the multidimentional Szemerédi–Trotter theorem, more suitable to our parameter range…
May 29, 2017 at 11:22 pm
But then we need to eliminate the case when a significant fraction of points lies in some n-4-dimensional subspace, shared by a significant fraction of hyperplanes… Yes, this is also not easy.
May 29, 2017 at 7:02 pm |
By chasing references from the paper Bogdan found, I came across this paper by D. R. McDonald, which has a sufficiently friendly account of how the “Bernoulli decomposition” approach to proving LCLTs works that I think I get the basic idea, which is a rather nice and natural one.
In very broad terms, the idea is that we know by the CLT what the approximate probability is that the sum of
copies of
lies in a given region of the plane. (Of course, we would need a suitably explicit form of the Berry-Esseen theorem to get this.) So to prove that the probability at an individual point at
is what it should be, it is sufficient to prove that within a small region the probabilities are roughly constant.
How do we do that? Well in the 1D case it appears to go something like this. Let
be a probability distribution on
. One defines a parameter
. If that parameter is not too small, it implies that there is a kind of smearing effect that causes the sum of many copies to be locally constant.
What I’m about to say next isn’t from the paper but I think it’s probably roughly equivalent to it, or else wrong. Anyhow, we have the option, if the parameter is large, of decomposing the random walk as follows. First, let’s focus just on two neighbouring points
and
with probabilities
and
. We could achieve the same probabilities by the following process. We define a new probability distribution
that’s the same as
except that
and
. Then if we happen to pick
, we decide to add 1 to it with probability
. Note that if the ratio of
and
isn’t huge or tiny, then
is bounded away from 1. Also, the contribution to
from pairs where one of
and
is at most
times the other is at most
, so if the parameter is not too small then we get plenty of pairs where
and
are of reasonably comparable size.
So now let us define a new random variable
with probabilities given by a function
, where for many disjoint pairs
we set
and
, and for the other values we set
.
. If we happen to pick
from one of the pairs
then we take a step to the right with probability
.
Now we can think of each step of the random walk as being “decomposed” as follows. We begin by choosing a random point using the probability distribution
After we have done this
times, we have taken a sum of
independent copies of
, and have added to that a sum of a certain number of independent Bernoulli random variables. Now when I say “independent” I mean that they are independent of each other, but the number of variables we take, and what they are, depend on which
s we happened to land on when we were choosing
-random values. But with high probability we will choose a representative sample of these Bernoulli random variables and will add together independent copies of them.
So the outcome should be that we get a broad Gaussian behaviour out of the sum of the
variables, which is then sort of “convolved” (it isn’t exactly a convolution, but might resemble one enough for what I’m writing not to be total nonsense) with a flat enough Gaussian obtained from adding together independent Bernoullis for the broad behaviour to tell us what the individual probabilities are.
Assuming that something like that is correct, it should help us think about what an appropriate 2D version might be like. Here’s one possibility. If
, then
, so
. This we would expect to happen a positive proportion of the time, and each time it does, it tells us that there is a pair of points
such that both of them occur with probability
.
If
exactly once, then
, which gives us a pair of points
that both occur with probability
. Again, I would expect this to happen a positive proportion of the time.
So it looks as though we should be able to decompose our random walk into a part that gives us a global Gaussian, and two little Bernoulli parts in independent directions that give us the flatness.
So I no longer feel so sure that characteristic functions is the way to go.
May 29, 2017 at 11:11 pm |
Ah, but here is a thought that makes me swing back in favour of characteristic functions (though not necessarily permanently). It’s that the type of condition introduced by McDonald is also very useful for that approach too. I’ll discuss the 1D case but it generalizes straightforwardly to the 2D case too.
Suppose we know that with positive probability if we choose a random
according to the distribution
, we have that
and
are comparable. I claim that this implies that
is not close to 1 except when
is close to 0. The proof is easy. The only way that
can be close to 1 is if
-almost every
is close to 1, which is equivalent to saying that
-almost every
is close to an integer. But if
is not close to zero (mod 1) and
is close to an integer, then
is close to
, which is not close to zero (all of this being mod 1. And since with non-negligible
-probability
is such that
has comparable probability to
, this implies that an
-random
has a non-negligible probability of being close to
, and in particular not being close to 1. Therefore,
is not approximately equal to 1.
So it looks to me as though if we can prove a McDonald-type condition for the random variable we are interested in (when
is typical), then we’ll have a good chance of making the characteristic-functions approach sufficiently quantitative for the purposes of the transitivity question.
May 29, 2017 at 11:13 pm
I should add that I think proving that kind of condition is straightforward, by essentially the same technique that I’ve been using for other statements of this kind: we prove that it holds with very high probability for a fully independent sequence, and then conditioning on the sum of the sequence doesn’t mess things up.
May 30, 2017 at 8:35 am |
Here then is a plan for a possible proof.
1. Get bounds for all the moments
, where
is the random variable
. Here
is a fixed die with typical properties and
is chosen uniformly at random from
.
2. Use these to obtain a Taylor expansion for the characteristic function with a very explicit bound for the error term.
3. Prove that
satisfies a McDonald-type condition (see above comments for what this means).
4. Deduce that outside a small ball about the origin (which will be stretched quite a lot more in the
direction than the
direction, but this is not important) the characteristic function is bounded away from 1.
5. From 2 and 4 deduce that the
th power of the characteristic function can be very well approximated by a Gaussian both near
and far away from it.
6. Deduce the result (since the Fourier transform of a Gaussian is a Gaussian).
May 30, 2017 at 1:33 pm
I’ll try to say more about this later, but I had a small think about steps 3 and 4, and I now think that the 2D McDonald condition I was talking about isn’t quite strong enough for our purposes here, but that it should still be possible to prove some condition of a similar kind that would be strong enough. But to be sure about this, I’m going to need to do some proper calculations.
May 30, 2017 at 12:22 pm |
Very interesting stuff! Two small typos: in the calculation of
you type $\sum_i(n – a_j + 1)$. Shouldn’t
be
there? And 10 lines below you claim that B beats A precisely when
is positive, while I believe that the variable should be
there. Good luck trying to finish the proof, while I keep struggling to understand anything you wrote ;)!
Thanks for pointing those out — fixed now.
August 22, 2017 at 3:09 pm |
[…] of the major objective. Reports on it can be found on Gower’s blog especially in this post Intransitive dice IV: first problem more or less solved? and this post Intransitive dice VI: sketch proof of the main conjecture for the balanced-sequences […]