Intransitive dice V: we want a local central limit theorem

It has become clear that what we need in order to finish off one of the problems about intransitive dice is a suitable version of the local central limit theorem. Roughly speaking, we need a version that is two-dimensional — that is, concerning a random walk on \mathbb Z^2 — and completely explicit — that is, giving precise bounds for error terms so that we can be sure that they get small fast enough.

There is a recent paper that does this in the one-dimensional case, though it used an elementary argument, whereas I would prefer to use Fourier analysis. Here I’d like to begin the process of proving a two-dimensional result that is designed with our particular application in mind. If we are successful in doing that, then it would be natural to try to extract from the proof a more general statement, but that is not a priority just yet.

As people often do, I’ll begin with a heuristic argument, and then I’ll discuss how we might try to sharpen it up to the point where it gives us good bounds for the probabilities of individual points of \mathbb Z^2. Much of this post is cut and pasted from comments on the previous post, since it should be more convenient to have it in one place.

Using characteristic functions

The rough idea of the characteristic-functions approach, which I’ll specialize to the 2-dimensional case, is as follows. (Apologies to anyone who knows about this properly for anything idiotic I might accidentally write.) Let (X,Y) be a random variable on \mathbb Z^2 and write f(x,y) for \mathbb P[(X,Y)=(x,y)]. If we take n independent copies of (X,Y) and add them together, then the probability of being at (x,y) is

f*\dots*f(x,y)

where that denotes the n-fold convolution.

Now let’s define the Fourier transform of f, which probabilists call the characteristic function, in the usual way by

\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}.

Here \alpha and \beta belong to \mathbb R/\mathbb Z, but I’ll sometimes think of them as belonging to [0,1) too.

We have the convolution law that \widehat{f*g}=\hat f\hat g and the inversion formula

f(x,y)=\int\int \hat f(\alpha,\beta)e^{-2\pi i(\alpha x+\beta y)}\,d\alpha\,d\beta.

Putting these together, we find that if random variables (X_i,Y_i) are n independent copies of (X,Y), then the probability that their sum is (x,y) is

\int\int \hat f(\alpha,\beta)^n e^{-2\pi i(\alpha x+\beta y)}\,d\alpha\,d\beta.

The very rough reason that we should now expect a Gaussian formula is that we consider a Taylor expansion of f. We can assume for our application that X and Y have mean zero. From that one can argue that the coefficients of the linear terms in the Taylor expansion are zero. (I’ll give more details in a subsequent comment.) The constant term is 1, and the quadratic terms give us the covariance matrix of X and Y. If we assume that we can approximate f(\alpha,\beta) by an expression of the form 1-q(\alpha,\beta) for some suitable quadratic form in \alpha and \beta, then the nth power should be close to \exp(-nq(\alpha,\beta)), and then, since Fourier transforms (and inverse Fourier transforms) take Gaussians to Gaussians, when we invert this one, we should get a Gaussian-type formula for f(x,y). So far I’m glossing over the point that Gaussians are defined on \mathbb R^2, whereas \alpha and \beta live in \mathbb T and x and y live in \mathbb Z^2, but if most of \hat f^n is supported in a small region around 0, then this turns out not to be too much of a problem.

The Taylor-expansion part

If we take the formula

\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}

and partially differentiate r times with respect to \alpha and s times with respect to \beta we obtain the expression

(2\pi i)^{r+s}\sum_{x,y}x^ry^sf(x,y)e^{2\pi i(\alpha x+\beta y)}.

Setting \alpha=\beta=0 turns this into (2\pi i)^{r+s}\mathbb E(X^rY^s). Also, for every \alpha and \beta the absolute value of the partial derivative is at most (2\pi)^{r+s}\mathbb E(X^rY^s). This allows us to get a very good handle on the Taylor expansion of \hat f when \alpha and \beta are close to the origin.

Recall that the two-dimensional Taylor expansion of \hat f about (0,0) is given by the formula

\hat f(\alpha,\beta)=\hat f(0,0)+D_1\hat f(0,0)\alpha+D_2\hat f(0,0)\beta+\frac 12D_{11}\hat f(0,0)\alpha^2
+D_{12}\hat f(0,0)\alpha\beta+\frac 12 D_{22}\hat f(0,0)\beta^2+\mbox{error term}

where D_1 is the partial derivative operator with respect to the first coordinate, D_{12} the mixed partial derivative, and so on.

In our case, \hat f(0,0)=\sum_{x,y}f(x,y)=1, D_1\hat f(0,0)=2\pi i\mathbb EX=0, and D_2\hat f(0,0)=2\pi i\mathbb EY=0.

As in the one-dimensional case, the error term has an integral representation, namely

\sum_{j=0}^3\binom 3j  \alpha^j\beta^{3-j}\int_0^1 (1-t)^2 (D_1^jD_2^{3-j}\hat f)(t\alpha,t\beta)\,dt,

which has absolute value at most 8\pi^3\sum_{j=1}^3\binom 3j|\alpha^j\beta^{3-j}\mathbb E X^jY^{3-j}|, which in turn is at most

8\pi^3\sum_{j=1}^3\binom 3j|\alpha|^j|\beta|^{3-j}\|X\|_\infty^j\|Y\|_\infty^{3-j}=(|\alpha|\|X\|_\infty+|\beta|\|Y\|_\infty)^3.

When (X,Y) is the random variable (h_A(j),j-(n+1)/2) (where A is a fixed die and j is chosen randomly from [n]), we have that \|Y\|_\infty=(n-1)/2<n.

With very slightly more effort we can get bounds for the moments of h_A as well. For any particular j and a purely random sequence A=(a_1,\dots,a_n)\in [n]^n, the probability that |h_A(j)|\geq t\sqrt n is bounded above by e^{-ct^2} for an absolute constant c>0. (Something like 1/8 will do.) So the probability that there exists such a j conditional on \sum_ia_i=n(n+1)/2 (which happens with probability about 1/n) is at most Cn^2e^{-ct^2}, and in particular is small when t=100\sqrt{\log n}. I think that with a bit more effort we could probably prove that \mathbb E_j |h_A(j)|^3 is at most Cn^{3/2}, which would allow us to improve the bound for the error term, but I think we can afford the logarithmic factor here, so I won’t worry about this. So we get an error of C(|\alpha|\sqrt{n\log n}+\beta n)^3.

For this error to count as small, we want it to be small compared with the second moments. For the time being I’m just going to assume that the rough size of the second-moment contribution is around c(|\alpha|\sqrt n+|\beta|n)^2. So for our error to be small, we want \alpha to be o(1/\sqrt{n}(\log n)^{3/2}) and \beta to be o(1/n).

That is giving us a rough idea of the domain in which we can say confidently that the terms up to the quadratic ones give a good approximation to \hat f, and hence that \hat f^n is well approximated by a Gaussian.

The remaining part

Outside the domain, we have to do something different, and that something is fairly simple: we shall show that \hat f^n is very small. This is equivalent to showing that |\hat f| is bounded away from 1 by significantly more than 1/n. This we do by looking more directly at the formula for the Fourier transform:

\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}.

We would like this to have absolute value bounded away from 1 by significantly more than 1/n except when \alpha is quite a bit smaller than n^{-1/2}(\log n)^{-3/2} and \beta is quite a bit smaller than n^{-1}.

Now in our case f is uniformly distributed on the n points (h_A(j),j-(n+1)/2). So we can write \hat f as

\hat f(\alpha,\beta)=n^{-1}\sum_je^{2\pi i(\alpha h_A(j)+\beta(j-(n+1)/2))}.

Here’s a possible way that we might try to bound that sum. Let m=n/2 and let us split up the sum into pairs of terms with j and j+m, for j\leq m. So each pair of terms will take the form

e^{2\pi i(\alpha h_A(j)+\beta(j-(n+1)/2))}+e^{2\pi i(\alpha h_A(j+m)+\beta(j+m-(n+1)/2))}.

The ratio of these two terms is

e^{2\pi i(\alpha(h_A(j)-h_A(j+m))-\beta m)}.

And if the ratio is e^{i\theta}, then the modulus of the sum of the two terms is at most 2(1-c\theta^2).

Now let us suppose that as j varies, the differences h_A(j)-h_A(j+m) are mostly reasonably well distributed in an interval between -\sqrt n and \sqrt n, as seems very likely to be the case. Then the ratios above vary in a range from about -\alpha\sqrt n to \alpha\sqrt n. But that should imply that the entire sum, when divided by n, has modulus at most 1-c\alpha^2n. (This analysis obviously isn’t correct when \alpha is bigger than n^{-1/2}, since the modulus can’t be negative, but once we’re in that regime, then it really is easy to establish the bounds we want.)

If \alpha is, say n^{-2/3}, then this gives us 1-cn^{-1/3}, and raising that to the power n gives us e^{-cn^{2/3}}, which is tiny.

As a quick sanity check, note that for (1-c\alpha^2n)^n not to be tiny we need \alpha to be not much more than n^{-1}. This reflects the fact that a random walk of n steps of typical size about \sqrt n will tend to be at a distance comparable to n from the origin, and when you take the Fourier transform, you take the reciprocals of the distance scales.

If \alpha is quite a bit smaller than n^{-1/2} and \beta is not too much smaller than n^{-1}, then the numbers \alpha h_A(j) are all small but the numbers \beta(j-(n+1)/2) vary quite a bit, so a similar argument can be used to show that in this case too the Fourier transform is not close enough to 1 for its nth power to be large. I won’t give details here.

What is left to do?

If the calculations above are not too wide of the mark, then the main thing that needs to be done is to show that for a typical die A the numbers h_A(j) are reasonably uniform in a range of width around \sqrt n, and more importantly that the numbers h_A(j)-h_A(j+m) are not too constant: basically I’d like them to be pretty uniform too.

It’s possible that we might want to try a slightly different approach, which is to take the uniform distribution on the set of points (h_A(j),j-(n+1)/2), convolve it once with itself, and argue that the resulting probability distribution is reasonably uniform in a rectangle of width around n and height around \sqrt n. By that I mean that a significant proportion of the points are hit around \sqrt n times each (because there are n^2 sums and they lie in a rectangle of area n^{3/2}). But one way or another, I feel pretty confident that we will be able to bound this Fourier transform and get the local central limit theorem we need.

33 Responses to “Intransitive dice V: we want a local central limit theorem”

  1. meditationatae Says:

    For one-dimensional probability distributions, there is a refinement to the Central Limit Theorem that gives an upper bound to the error made in using the CLT to approximate a sum of ‘n’ iid random variables. It’s known as the Berry-Esséen theorem, and goes back to the 1940s . The error bound depends on the absolute 3rd moment of one of the ‘n’ iid random variables, which must be finite.

    • gowers Says:

      I discussed that in the previous post: see the section entitled “Why are we not immediately done?”. The problem with the Berry-Esseen theorem for this application is that while it gives you a good approximation to the distribution function, it is not sufficiently good to tell the difference between discrete and continuous random variables. Here we have a pair of random variables (X_n,Y_n), a sum of n independent copies of (X,Y), and we are interested in events such as X\geq 0 \wedge Y=0. If X and Y were continuous random variables, this would have probability zero, but since they take values in \mathbb Z, it has positive probability, and we would like to be able to estimate it. It is for that kind of purpose that LCLTs are useful.

      One approach to LCLTs that’s in the literature is to use the Berry-Esseen theorem together with a separate argument that the distribution is “flat”, in the sense that if you change (x,y) by a small amount, then the probability of landing at (x,y) does not vary by much. Then one can estimate probabilities by using the Berry-Esseen theorem to get the probability of being inside some small region and the flatness to get the probabilities of the individual points in that region (which are roughly the same and have a known sum, so can be calculated). Although I chose to use characteristic functions above, I think that approach could work as well.

  2. gowers Says:

    I’ve noticed a subtlety that makes part of my sketch in the post above not correct as stated. I claimed that it was probably the case that for a typical A the numbers h_A(j)-h_A(j+m) should be pretty evenly distributed in a range of width about \sqrt n.

    Here’s why one might think that is true. Suppose we add 1 to j. Then to the given number we will add h_A(j+1)-h_A(j) and subtract h_A(j+1-m)-h_A(j+m). But h_A((j+1)-h_A(j) is the multiplicity with which j+1 occurs in A, and we would expect these multiplicities to behave like independent Poisson random variables of mean 1. So what we have here looks like a random walk where each step is given by a difference of two independent Poisson random variables. After several steps, the distribution should be approximately Gaussian, but what we’re interested in is not the distribution of the end point, but rather the distribution of a random point of the random walk. And this is a somewhat more slippery object.

    If we want to use the technique of pretending variables are independent, proving that something holds with very high probability, and then conditioning on an event that isn’t too unlikely, that limits what we can hope to prove here. For example, the probability that the normal random walk stays positive for n steps is of order n^{-1/2}, so if we then condition on an event of probability n^{-1} we cannot rule out that this will happen. And in fact, I think that these kinds of things actually will happen: it seems that conditioning the sum of A to be n(n+1)/2 makes it quite likely that h_A will be broadly positive for the first half and broadly negative for the second, in which case h_A(j)-h_A(j+m) would be broadly positive.

    However, for the purposes of bounding the Fourier coefficient, we don’t need to nail down the distribution too precisely: we just need it to avoid having “silly” properties such as h_A(j)-h_A(j+m) always being even. I’d also quite like to know that it doesn’t take the same value too many times.

    To that end, here’s a simple question to which the answer is probably known, but a quick Google search didn’t reveal it to me. Suppose we take a standard one-dimensional random walk for n steps. Let Z be the number of times the origin is visited. For what value of t does \mathbb P[Z\geq t] become smaller than n^{-3}? (The 3 there is just some arbitrary constant that’s safely bigger than 1.) I would expect the answer to be something like \sqrt n times a logarithmic factor, since we would expect the walk to wander around in an interval of width of order \sqrt n, not necessarily visiting all of it, but not being too concentrated on any one part of it.

  3. gowers Says:

    There’s probably a much nicer way of doing things, but I think for the question in the last paragraph of the comment above, the following fairly crude approach should give a bound that’s just about OK for the purposes we need. It’s to get some kind of estimate for the kth moment of Z, where k is a positive integer that one optimizes. The random variable Z^k counts ordered k-tuples of points at which the origin is visited. There are n^k possibilities for their x-coordinates. If we take one of these k-tuples and call it (x_1,\dots,x_k) (putting them in increasing order), then the probability that we’re zero at x_1 (ignoring parity, which in any case doesn’t apply to the random walk we’re ultimately interested in) is about x_1^{-1/2}. Then the probability that we’re zero at x_2 given that we’re zero at x_1 is about (x_2-x_1)^{-1/2}, and so on. For a typical k-tuple, this should give us something like (n/k)^{-k/2} (though it will often be somewhat bigger, so some care would be needed for this part of the argument). So ignoring a constant factor that depends on k (which will be important when we optimize), it looks as though the kth moment should be something like n^{k/2}.

    Therefore, by Chebyshev, the probability that Z\geq t should be bounded above by something like n^{k/2}/t^k. If we want that to be at most n^{-3}, then we need t to be at least n^{1/2+3/k}. This will be multiplied by a constant factor that depends on k, but it should give us an upper bound of n^{1/2+o(1)}.

  4. Bogdan Grechuk Says:

    Maybe we should do one more “random walk iteration”? Fix some j, and let (X,Y) be a random vector taking n values (I(i,j), i-(n+1)/2), \, i=1,2,\dots,n with equal chances, where I(i,j) is an indicator function: I(i,j)=1 if i \leq j and I(i,j)=0 if i>j. Assume that we can prove some sort of local central limit theorem for this very simple random walk. Then we can control the probabilities like P(U_n = a, V_n = b), where (U_n, V_n) is the sum of n i.i.d. copies of (X,Y). But P(U_n = k+j, V_n = 0) / P(V_n = 0) is exactly the probability that h_A(j) = k for random dice A.

    With indicator function I'(i,j)=1 if j<i \leq j+m but I'(i,j)=0 otherwise we should be able to control h_A(j+m)- h_A(j) in a similar way.

    • gowers Says:

      That could be useful, but would it give us information about the typical behaviour of the distribution of h_A(j)? We want to know not just what the probability is that it takes a given value, but quite a lot about how those probabilities for different j depend on each other.

  5. Oliver Johnson Says:

    If you have weak dependence of your variables, particularly if there’s symmetry between them, it might be worth trying Stein’s method https://statweb.stanford.edu/~souravc/beam-icm-trans.pdf as an alternative to controlling the characteristic function. For example, you can get the Berry-Esseen rate for independent variables fairly easily.

    Unfortunately, I don’t know how easy it is to get a local limit theorem using Stein – it tends to be used to control total variation.

  6. Josh Burdick Says:

    I came up with a possible definition of “sampling a probability density defined on [0,1]” (or at least, sampling from countably-many points in that interval). Such dice seem to be transitive, _if_ you center their expected value.

    I’m not quite sure this is relevant; I think it might be, though, because some of the discussion seems to involve the difference between continuous and discrete probability variables. Even if it doesn’t help settle the question, I think it’s an interesting sidelight.

    It’s on Github at

    Click to access fuzzyDice.pdf

    (Actually, my main motivation for doing this was that I like the description “fuzzy dice” 🙂

    • P. Peng Says:

      This looks interesting, but I’m not sure I’m understanding. Can you explain a bit more about your model?

      I’m not familiar with R, so I don’t quite follow the details of this generation (or the comparison function). But it roughly appears to generate a continuous function on [0,1] with length_scale scaling the derivatives so as to affect the length scale over which the function values are roughly equal. Is that understanding at least close?

      Unless I misunderstood your model, it sounds like the only difference is that in the large n limit, this model yields a continuous distribution where-as the discrete dice would represent a discontinuous distribution. If that is the only difference, then somehow the intransitivity _depends_ on the discontinuity!? Somehow normalized continuous functions on [0,1] are almost totally ordered (ties and exceptions to ordering are of probability 0), where-as normalized discontinuous functions are almost perfectly “not ordered”? Is there some simple reason this should be obvious? If true, this seems fascinating and non-intuitive to me.

      Would this mean that in the large n limit, if an idea doesn’t somehow capture the essential “discontinuities”, and instead approximates it as something smooth, it is bound to fail to describe the discrete dice?

      So I hope you continue poking at this continuous model of dice to pull out some more information. It sounds interesting and could also be telling us something important.

  7. Bogdan Grechuk Says:

    A short numerical experiment about last paragraph of the post. I have selected n=10,000, took uniform distribution of (h_A(j), j-(n+1)/2) as suggested, convolved it once with itself, and plot the graph: x axes is k/\sqrt{n} for -\sqrt{n} \leq k \leq \sqrt{n}, and y axes is the number of times h_A(i) + h_A(j) = k subject to i+j=n+1. The graph is http://imgur.com/Gpf2MiJ (not sure how to upload it here). Does it look like “close” to being about 100 for every k in this interval ?

    • gowers Says:

      It looks close in the sense I intended, yes. We know it has to be supported in a rectangle of width of order n and height of order \sqrt n. If almost no points occur with probability greater than 100 n^{-3/2}, that forces the distribution after two steps of the random walk to be pretty spread out, and then a local CLT should be achievable — though there’s still the problem of making sure that the distribution isn’t concentrated in a sublattice.

  8. Josh Burdick Says:

    A clarification about the continuous model: when I sampled (a sort of) “continuous” dice, with no restriction on their expected value. they didn’t seem transitive. However, when I forced their expected value to be 0.5, they did seem transitive.

    This seems vaguely analogous to forcing the sum of discrete dice to be n-choose-2, but other than that, this may not be relevant to whether discrete dice are intransitive, or not.

  9. Helge Dietert Says:

    (Hopefully the Latex code works…)

    Maybe some other approach to bound the Fourier transform focusing on
    the direct differences.

    Consider the Fourier transform
    \mathcal{F} = \frac{1}{n} \sum_{j=1}^{n}   e^{2\pi i\alpha h_A(j) + 2 \pi i \beta j}

    (I drop the constant shift in the second component, because it only
    adds a constant factor with modulus one)

    Let

    \Delta_k = (h_A(k) - h_A(k-1),1)

    with the convention h_A(0)=0. Then

    (h_A(j),j) = \sum_{k=1}^{j} \Delta_k.

    With this

    \mathcal{F} = \frac{1}{n} \sum_{j=1}^{n}                 e^{2\pi i (\alpha,\beta) \cdot \sum_{k=1}^j \Delta_k}

    = \frac{1}{n} \sum_{j=1}^{n} \prod_{k=1}^{j}                 e^{2\pi i (\alpha,\beta) \cdot \Delta_k}

    Let

    g_k = e^{2\pi i (\alpha,\beta) \cdot \Delta_k}.

    Then we can rewrite the Fourier transform as

    \mathcal{F} = \frac{1}{n} [g_1 + g_1 g_2 + \dots + g_1 g_2 \cdots g_n]

    = \frac{1}{n} \Bigg[\frac{g_1}{2} + g_1 \frac{(1+g_2)}{2}     + g_1 g_2 \frac{(1+g_3)}{2}

    + g_1 g_2 g_3 \frac{(1+g_4)}{2}     + \dots     + g_1 g_2 g_3 \dots g_{n-1} \frac{(1+g_n)}{2}     + \frac{g_1 g_2 \dots g_{n}}{2}\Bigg]

    As |g_k| \le 1, the absolute value can be bounded as

    |\mathcal{F}| \le   \frac{1}{n}   \Bigg[   1 + \frac{|1+g_2|}{2}   + \frac{|1+g_3|}{2}   + \dots   + \frac{|1+g_n|}{2}   \Bigg].

    The sum consists of n summands and each of it is at most 1. Hence
    if we can bound a positive fraction of the summands |1+g_k|/2 as
    strictly less than 1, then |\mathcal{F}| is strictly bounded away
    from 1.

    As an illustration, assume that the dice A is such that there exist
    constants \delta_{-1} and \delta_0 and \delta_1 such that

    | \{ k \in [n] : \Delta_k = (-1,1) \} | \ge \delta_{-1} n

    and

    | \{ k \in [n] : \Delta_k = (0,1) \} | \ge \delta_0 n

    and

    | \{ k \in [n] : \Delta_k = (1,1) \} | \ge \delta_1 n

    (I guess that could already be true for a typical dice A, but
    otherwise the argument could easily be refined. Moreover, looking at
    the distribution of h_A(k) - h_A(k-1) for a typical dice seems approachable.)

    Then for \gamma around 0 (in the torus topology),
    there exists a constant such that

    \frac{|1+e^{2\pi i \gamma}|}{2} \le 1 - c | \gamma |

    (if we are not close to 0, the modulus of the Fourier transform is easily bounded
    away from 1).
    This shows

    \frac{|1+e^{2\pi i \beta}|}{2} \le 1 - c | \beta |

    and

    \frac{|1+e^{2\pi i (\pm \alpha+\beta)}|}{2}   \le 1 - c\, | {\pm} \alpha {+} \beta |

    With the constants \delta_{-1},\delta_0,\delta_1, we can then easily
    bound the Fourier transform as

    | \mathcal{F} |   \le 1 - c \delta_{-1} |{-}\alpha{+}\beta|   - c \delta_{0} |\beta|   - c \delta_{1} |\alpha{+}\beta|.

    This seems pretty good to me, because we only need a bound when
    \alpha is relatively large (in particular I guess significantly
    larger than 1/n) and this shows that

    | \mathcal{F} | \le 1 - \tilde{c} |\alpha|

    for an appropriate constant \tilde{c}.

    N.B.: We can recover the approach sketched in the post, in a
    generalised setting by comparing the term mth neighbouring term,
    i.e. writing the sum as

    \mathcal{F} = \frac{1}{n}   \Bigg[   g_1 \frac{(1+g_2g_3\dots g_{2+m-1})}{2}
    + g_1 g_2 \frac{(1+g_3g_4\dots g_{3+m-1})}{2}   + \dots   + \text{boundary terms}   \Bigg]

    so that we need to bound terms of the form

    \frac{|1+g_{k+1}\dots g_{k+m}|}{2}   = \frac{\Big|1 + e^{2\pi i (\alpha,\beta) \cdot \Delta_{k+m,k}}\Big|}{2},

    where

    \Delta_{k+m,k} = (h_A(k+m),k+m) - (h_A(k),k).

    In the above approach it was taken m = n/2, but for refined
    estimates maybe m=\sqrt{n} could be more helpful.

    • Helge Dietert Says:

      I just tried to code up some rejection sampling and unless I have done some coding error, it seems to me that actually h_A(k)-h_A(k-1) nearly always takes the values -1,0,1 and rarely only slightly larger values (typically well less than 10). So the plotting suggests that the approach with the \delta_{-1}, \delta_0, \delta_1 should work.

      The used python code is below:

      import numpy as np
      import numpy.random
      import matplotlib.pylab as plt

      def sample(n):
      “””Sample a dice”””
      while True:
      dist = numpy.random.randint(1,n+2,n)
      if np.sum(dist) == n*(n+1)/2:
      return dist

      def delta(dist, k):
      “””Return the first component of Delta_k”””
      return np.sum(dist==k) – 1

      n = 10000
      dist = sample(n)
      kList = np.arange(n) + 1
      deltaList = np.array( [delta(dist, k) for k in kList] )

      # Plot directly
      plt.figure()
      plt.plot(kList, deltaList)
      plt.xlabel(‘k’)
      plt.ylabel(‘$\Delta_k$’)

      # Histogram plot
      if deltaList.min() < 0:
      deltaListP = deltaList – deltaList.min()
      binData = np.bincount(deltaListP)
      binLabels = np.arange(len(binData)) + deltaList.min()
      plt.figure()
      plt.bar(binLabels, binData / n)
      plt.xlabel('Value of $h_A(k)-h_A(k-1)$')
      plt.ylabel('Frequency')

    • gowers Says:

      That’s interesting: I expected the distribution to be roughly Poisson, or rather that the probability that h_A(k)-h_A(k-1)=t should be roughly e^{-1}/(t+1)! for t=-1,0,1,2,\dots. Is that consistent with your data?

    • gowers Says:

      I’ve realized while trying to write up the proof that you make a false statement above, when you say

      “Then for \gamma around 0 (in the torus topology),
      there exists a constant such that

      \frac{|1+e^{2\pi i \gamma}|}{2} \le 1 - c | \gamma |

      That final bound should be 1-c\gamma^2, and unfortunately that weakens the bound to the point where it is no longer good enough. However, that just means that looking at successive differences is insufficient, and I’m pretty sure that I am on track to complete the proof with a more complicated argument that looks at longer-range differences. I’m travelling at the moment, so it will probably take me a week or two to have a complete draft of the write-up.

  10. Helge Dietert Says:

    The guess that h_A(k) - h_A(k-1) is roughly e^{-1}/(t+1)! seems pretty consistent with the samples (I took n=5000 and n=10000). If you sample from that it is very unlikely to get anything bigger than say 10. That at least sounds to me that it is likely to get a positive fraction with the \delta_{-1},\delta_0,\delta_1.

    Just for the record: By the definition one finds immediately that h_A(k) - h_A(k-1) = |{ j : a_j = k}| -1 (I guess that was the intuition behind the Poisson distribution). Basically I guess the only question is whether the conditioning destroys anything, but that seems unlikely to me.

  11. gowers Says:

    I’ve just read your earlier comment a bit more carefully and it looks interesting and useful. I’m quite busy at the moment, and it’s that rather than any diminishing enthusiasm that has caused me not to comment much for a few days. I definitely intend to make a serious effort to finish off this argument, unless someone else has already done it by the time I’m ready.

  12. gowers Says:

    After the recent comments of Helge Dietert above, I think we’ve got to the point where it is worth trying to write something up. I will have a go at this, and when I either have a draft or get stuck, I’ll share the resulting file and see whether anyone else would like to rewrite parts of it, add to it, etc. And if we end up with a complete proof, we can think about whether we want to push on and try to solve the problem for the multiset model, and also try to disprove the stronger conjecture that the tournament is quasirandom.

  13. gowers Says:

    I have made a start with a write-up. I can’t quite claim that we have a proof, though I think we probably do. I had intended to post a complete draft, but it went more slowly than I had hoped, and now I have a very busy couple of weeks ahead and will be unable to spend time on this for a while. If anyone feels like doing some writing — either improving what’s there already or adding to it — I’ll be happy to send the source file. Just let me know.

    This is what I have done so far.

    • Simon Krzysiak Says:

      Noticed small typo: on page 5, believe you meant to write that the mean of (gA(j),j-(n+1)/2) is (0,0), instead of the mean of (gA(j),j.

    • gowers Says:

      Thanks for that and indeed that is what I meant to write — I’ve now corrected it.

  14. gowers Says:

    One step that was missing in the write-up has now, I think, essentially been filled. I asked a closely related question on Mathoverflow and have received a useful answer that will I think easily translate over into our context (to give a lower bound for \mathbb EU^2 — see draft for definition of the random variable U).

  15. gowers Says:

    A thought has occurred to me that might make a nice addition to the write-up. I think it ought to be possible to generalize the proof to show that if A and B are random sequences conditioned with A conditioned to have sum n(n+1)/2+a and B conditioned to have sum n(n+1)/2+b, then if b-a\geq\alpha n^{3/2} for some absolute constant \alpha>0, which is the typical order of magnitude of the difference of two sums of n random elements of [n], then B beats A with probability 1-o(1). If this is true, it will be prove rigorously something that the experimental evidence already shows, namely that intransitivity is extremely rare for purely random elements of [n]^n. That’s because whether or not A beats B will depend almost entirely on which has the larger sum, which is of course a transitive relation.

  16. P. Peng Says:

    Because uniform random “constrained sum sequence dice” can be viewed as just a non-uniform random selection from “proper/multiset dice”, is there a way to carry over the sequence results to give a response on the original conjecture on proper dice?
    Maybe for instance by pulling on the thread that the non-uniform weighting changes the weights equally on (die A) and (inverse die A)?

    • gowers Says:

      I’m keen to think about this once the write-up is finished, which I hope will be soon. I certainly hope that the multiset version will turn out not to require a substantial new idea.

  17. gowers Says:

    At last I have finished a first draft of the write-up. There are a lot of slightly fiddly estimates, so there is a high chance of minor errors, but I’m fairly sure the argument is robust and basically correct. (But it’s definitely necessary to check this, and I’d rather that checking was done by a fresh pair of eyes.) Anyhow, it can be found here.

    It still needs a bibliography, some comments about the experimental evidence for the stronger quasirandomness conjecture, and I also think we should try to solve two further problems. The first is the question of whether we can prove rigorously that the stronger quasirandomness conjecture is false, and the second is whether we can use similar techniques to prove the weaker conjecture for the multisets model. And one other thing that I very much want to add is a rigorous proof that if you just take random elements of [n]^n, then which die wins is with very high probability determined by the sum of the faces of the dice: that is, the die with the bigger sum almost certainly wins. I think that should in fact be quite a lot easier than what is proved in the draft so far — it just needs doing, as it explains why in the unconstrained model the experimental evidence suggests that one gets transitivity almost all the time.

    • gowers Says:

      I forgot to add that in the write-up there is a proof of a lower bound for \mathbb EU^2 which is then not used. That is because I erroneously thought I was going to use it, and haven’t quite taken the step of deleting it (which involves a small amount of reorganization).

    • Bogdan Grechuk Says:

      Thank you for the update. May I ask for a small clarification? Theorem 7.2 is formulated for A being random die (that is, random element of [n]^n with sum n(n+1)/2). The proof starts with assuming that A is a random element of [n]^n (that is, without any condition on the sum), but then the condition is used in the proof (for example, when we say that G is even function, we assume that the mean of G is 0). At which point of the proof A stops being random element of [n]^n and becoming random die?

    • gowers Says:

      Hmm, that may be a slip. I’ll have to think about it.

    • gowers Says:

      OK, let me think aloud.

      The idea of the proof was to prove that A has certain properties with sufficiently high probability that when we condition on the sum being zero it still has those properties. The main properties are the following two.

      1. \|U\|_\infty=O(\sqrt{n\log n}).

      2. The characteristic function of (U,V)^{*n} is very small when (\alpha,\beta) are not both small.

      I think that in the rest of the argument I don’t use probabilistic arguments, but just these properties. So what I have written is not expressed clearly but I think it is correct.

      What I think I should have done is got to the end of Section 5 and then fixed an A that had the desired properties, together with a sum that is equal to n(n+1)/2, noting that almost all random n-sided dice have this property (which is the main content of Section 5). After that, probabilistic arguments would be banned and the rest of the paper should be deterministic. I need to check, but I think that the only reason certain statements after Section 5 are true only with high probability is that they make use of statements in Section 5 that themselves hold with high probability.

      So, to summarize, thank you for picking that up, and at the moment I am fairly confident that it will be easy to correct.

    • gowers Says:

      I’ve now updated the link in the comment that starts this thread, so that it links to an updated file. I hope it is now OK.

    • gowers Says:

      I now have a sketch proof for the result that in the “purely random” model of n-sided dice — that is, the model where a random n-sided die is just a random element of [n]^n — transitivity almost always occurs. I’ll be travelling tomorrow, but will try to add this result to the write-up in the next day or two after that. As I suspected, the proof is short, and does not need a local central limit theorem — just easy tail bounds for sums of independent random variables.

Leave a comment