Intransitive dice III

I now feel more optimistic about the prospects for this project. I don’t know whether we’ll solve the problem, but I think there’s a chance. But it seems that there is after all enough appetite to make it an “official” Polymath project. Perhaps we could also have an understanding that the pace of the project will be a little slower than it has been for most other projects. I myself have various other mathematical projects on the boil, so can’t spend too much time on this one, but quite like the idea of giving it an occasional go when the mood takes me, and trying to make slow but steady progress. So I’ve created a polymath13 category, into which this post now fits. I’ve also retrospectively changed the category for the previous two posts. I don’t think we’ve got to the stage where a wiki will be particularly useful, but I don’t rule that out at some point in the future.

In this post I want to expand on part of the previous one, to try to understand better what would need to be true for the quasirandomness assertion to be true. I’ll repeat a few simple definitions and simple facts needed to make the post more self-contained.

By an $n$ –sided die I mean a sequence in $[n]^n$ (where $[n]$ is shorthand for $\{1,2,\dots,n\}$ ) that adds up to $n(n+1)/2$ . Given an $n$ -sided die $A=(a_1,\dots,a_n)$ and $j\in[n]$ , I define $g_A(j)$ to be the number of $i$ such that $a_i\leq j$ and $h_A(j)$ to be $g_A(j)-j$ .

We can write $h_A(j)$ as $\sum_i\mathbf 1_{[a_i\leq j]}-j$ . Therefore, if $C=(c_1,\dots,c_n)$ is another die, or even just an arbitrary sequence in $[n]^n$ , we have that
$\sum_jh_A(c_j)=\sum_j(\sum_i\mathbf 1_{[a_i\leq c_j]}-c_j)=\sum_{i,j}\mathbf 1_{[a_i\leq c_j]} - \sum_jc_j$ .
If $\sum_jc_j=n(n+1)/2$ and no $a_i$ is equal to any $c_j$ , then the sign of this sum therefore tells us whether $A$ beats $C$ . For most $A$ , we don’t expect many ties, so the sign of the sum is a reasonable, but not perfect, proxy for which of the two dice wins. (With a slightly more complicated function we can avoid the problem of ties: I shall stick with the simpler one for ease of exposition, but would expect that if proofs could be got to work, then we would switch to the more complicated functions.)

This motivates the following question. Let $A$ and $B$ be two random dice. Is it the case that with high probability the remaining dice $C$ are split into four sets of roughly equal size according to the signs of $h_A(C)$ and $h_B(C)$ ? I expect the answer to this question to be the same as the answer to the original transitivity question, but I haven’t checked as carefully as I should that my cavalier approach to ties isn’t problematic.

I propose the following way of tackling this question. We fix $A$ and $B$ and then choose a purely random sequence $C=(c_1,\dots,c_n)$ (that is, with no constraint on the sum) and look at the 3D random variable
$(\sum_jh_A(c_j),\sum_jh_B(c_j),\sum_j(c_j-(n+1)/2))$ .
Each coordinate separately is a sum of independent random variables with mean zero, so provided not too many of the $h_A$ or $h_B$ are zero, which for random $A$ and $B$ is a reasonable assumption, we should get something that approximates a trivariate normal distribution.

Therefore, we should expect that when we condition on $\sum_j(c_j-(n+1)/2)$ being zero, we will get something that approximates a bivariate normal distribution. Although that may not be completely straightforward to prove rigorously, tools such as the Berry-Esseen theorem ought to be helpful, and I’d be surprised if this was impossibly hard. But for now I’m aiming at a heuristic argument, so I want simply to assume it.

What we want is for the signs of the first two coordinates to be approximately independent, which I think is equivalent to saying (assuming normality) that the first two coordinates themselves are approximately independent.

However, what makes the question interesting is that the first two coordinates are definitely not independent without the conditioning: the random variables $\sum_jh_A(c_j)$ and $\sum_jh_B(c_j)$ are typically quite strongly correlated. (There are good reasons to expect this to be the case, and I’ve tested it computationally too.) Also, we expect correlations between these variables and $\sum_j(c_j-(n+1)/2)$ . So what we are asking for is that all these correlations should disappear when we condition appropriately. More geometrically, there is a certain ellipsoid, and we want its intersection with a certain plane to be a circle.

The main aim of this post is to make the last paragraph more precise. That is, I want to take three standard normal random variables $X, Y$ and $Z$ that are not independent, and understand precisely the circumstances that guarantee that $X$ and $Y$ become independent when we condition on $Z$ .

The joint distribution of $(X,Y,Z)$ is determined by the matrix of correlations. Let this matrix be split up as $\begin{pmatrix}\Sigma_{11}&\Sigma_{12}\\ \Sigma_{21}&\Sigma_{22}\\ \end{pmatrix}$ , where $\Sigma_{11}$ is the $2\times 2$ covariance matrix of $(X,Y)$ , $\Sigma_{12}$ is a $2\times 1$ matrix, $\Sigma_{21}$ is a $1\times 2$ matrix and $\Sigma_{22}$ is the matrix $(1)$ . A general result about conditioning joint normal distributions on a subset of the variables tells us, if I understand the result correctly, that the covariance matrix of $(X,Y)$ when we condition on the value of $Z$ is $\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}$ . (I got this from Wikipedia. It seems to be quite tricky to prove, so I hope it really can be used as a black box.) So in our case if we have a covariance matrix $\begin{pmatrix}1&\lambda&\mu\\ \lambda&1&\nu\\ \mu&\nu&1\\ \end{pmatrix}$ then the covariance matrix of $(X,Y)$ conditioned on $Z$ should be $\begin{pmatrix}1-\mu^2&\lambda-\mu\nu\\ \lambda-\mu\nu&1-\nu^2\\ \end{pmatrix}$ .

That looks dimensionally odd because I normalized the random variables to have variance 1. If instead I had started with the more general covariance matrix $\begin{pmatrix}a&\lambda&\mu\\ \lambda&b&\nu\\ \mu&\nu&c\\ \end{pmatrix}$ I would have ended up with $\begin{pmatrix}a-c^{-1}\mu^2&\lambda-c^{-1}\mu\nu\\ \lambda-c^{-1}\mu\nu&b-c^{-1}\nu^2\\ \end{pmatrix}$ .

So after the conditioning, if we want $X$ and $Y$ to become independent, we appear to want $\lambda c$ to equal $\mu\nu$ . That is, we want
$\langle X,Z\rangle\langle Y,Z\rangle=\langle X,Y\rangle\langle Z,Z\rangle$
where I am using angle brackets for covariances.

If we divide each variable by its standard deviation, that gives us that the correlation between $X$ and $Y$ should be the product of the correlation between $X$ and $Z$ and the correlation between $Y$ and $Z$ .

I wrote some code to test this, and it seemed not to be the case, anything like, but I am not confident that I didn’t make careless mistakes in the code. (However, my correlations were reasonable numbers in the range $[-1,1]$ , so any mistakes there might have been didn’t jump out at me. I might just rewrite the code from scratch without looking at the old version.)

One final remark I’d like to make is that if you feel there is something familiar about the expression $\langle x,z\rangle\langle y,z\rangle-\langle x,y\rangle\langle z,z\rangle$ , then you are not completely wrong. The formula for the vector triple product is

$x\wedge(y\wedge z)=\langle x,z\rangle y - \langle x,y\rangle z$ .

Therefore, the expression can be condensed to $\langle x\wedge(y\wedge z),z\rangle$ . Now this is the scalar triple product of the three vectors $x$ , $y\wedge z$ , and $z$ . For this to be zero, we need $x$ to lie in the plane generated by $y\wedge z$ and $z$ . Note that $y\wedge z$ is orthogonal to both $y$ and $z$ . So if $P$ is the orthogonal projection to the subspace generated by $z$ , we want $x-Px$ to be orthogonal to $y$ . Actually, that can be read out of the original formula too, since it is $\langle\langle x,z\rangle z - \langle z,z\rangle x,y\rangle$ . A nicer way of thinking of it (because more symmetrical) is that we want the orthogonal projections of $x$ and $y$ to the subspace orthogonal to $z$ to be orthogonal. To check that, assuming (WLOG) that $\|z\|_2=1$ ,
$\langle x-\langle x,z\rangle z,y-\langle y,z\rangle z\rangle=\langle x,y\rangle-\langle x,z\rangle\langle y,z\rangle$ .

So what I’d like to see done (but I’m certainly not saying it’s the only thing worth doing) is the following.

1. Test experimentally whether for a random pair $(A,B)$ of $n$ -sided dice we find that the correlations of the random variables $X=\sum_jh_A(j)$ , $Y=\sum_jh_B(j)$ and $Z=\sum_j(c_j-(n+1)/2)$ really do appear to satisfy the relationship

corr $(X,Z)$ .corr $(Y,Z)=$ corr $(X,Y)$ .

Here the $c_j$ are chosen randomly without any conditioning on their sum. My experiment seemed to indicate not, but I’m hoping I made a mistake.

2. If they do satisfy that relationship, then we can start to think about why.

3. If they do not satisfy it, then we can start to think about why not. In particular, which of the heuristic assumptions used to suggest that they should satisfy that relationship is wrong — or is it my understanding of multivariate normals that is faulty?

If we manage to prove that they typically do satisfy that relationship, at least approximately, then we can think about whether various distributions become sufficiently normal sufficiently quickly for that to imply that intransitivity occurs with probability 1/4.

This entry was posted on May 19, 2017 at 11:07 pm and is filed under polymath13. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

66 Responses to “Intransitive dice III”

Luke Pebody Says:
May 20, 2017 at 11:32 am | Reply
Just a quick change-of-basis argument that $=$ is indeed equivalent to X and Y being independent conditional on Z=0.

We can write jointly gaussian variables X,Z (for now with mean 0 and variance 1) as $X=\mathrm{sin} t Z + \mathrm{cos} t Q$ for some value t, where Z and Q are independent N(0,1).

Then we can write third jointly gaussian variable Y as
$Y=\mathrm{sin} u Z + \mathrm{cos} u \mathrm{sin} p Q + \mathrm{cos} u \mathrm{cos} p R$ , where $Z, Q$ and $R$ are independent N(0,1).

Then the covariances are
$= \mathrm{sin} t\mathrm{sin}u + \mathrm{cos} t\mathrm{cos} u\mathrm{sin} p$ ,
$= \mathrm{sin} t$ and
$= \mathrm{sin} u$ .

As such it follows that $=$ if and only if
$=1$ , $=1$ or $\mathrm{sin} p=0$ . Ignoring the first two cases for now, $\mathrm{sin} p=0$ if and only if the distributions of X and Y conditional on Z=0 (which are
$X=\mathrm{cos} t Q$ and $Y=\mathrm{cos} u\mathrm{sin} pQ + \mathrm{cos}u\mathrm{cos} pR$ ) are independent.
- Luke Pebody Says:
  May 20, 2017 at 11:39 am
  This would be a lot clearer if I had used $\langle X,Y\rangle$ rather than which just disappears in latex.
- gowers Says:
  May 20, 2017 at 1:49 pm
  I’m afraid the system completely swallows things up when you do that, so I’m unable to edit your comment. If you have a moment to write it again, that would be great, and then I’ll delete this one.
Gil Kalai Says:
May 20, 2017 at 6:33 pm | Reply
Given $n,m$ It is interesting to consider $n$ -sided dice which is a sequence of length $n$ of integers between one and $m$ that sum up to $n(m+1)/2$ . (We can assume that either $n$ or $m+1$ are even.) When we consider sequences rather than multiset the random model based on uniform probability distribution is slightly different. (I had the impression that the first post talked about random multisets while this post talks about random sequences.)

When $m=2$ without any constrain on the sum, this is essentially the model of approval voting. Then every three dice are transitive. Looking at $m=3$ could be interesting and also on the case that $n$ is fixed and $m$ tends to infinity. (There the multiset model is asymptotically the same as the sequence model, and the probability of ties tends to 0.)
- Luke Pebody Says:
  May 21, 2017 at 2:56 am
  Quick note that if $n$ variables $c_j$ are independently drawn from some distribution, then the
  covariance of $\sum_{j=1}^n f(c_j)$ and
  $\sum_{j=1}^n g(c_j)$ is simply $n$ times the covariance of $f(c_1)$ and $g(c_1)$ .
  
  It follows that the correlation coefficient of $\sum_{j=1}^n f(c_j)$ and $\sum_{j=1}^n g(c_j)$ is the same as the correlation coefficient of $f(c_1)$ and $g(c_1)$ .
  
  So the condition that $\sum_{j=1}^n h_A(c_j)$ and
  $\sum_{j=1}^n h_B(c_j)$ are independent subject to
  $\sum_{j=1}^n c_j=n(n+1)/2$ should be equivalent to the condition that $h_A(c_1)$ and $h_B(c_1)$ are
  independent once you remove the components of each that are
  dependent on $c_1$ .
  
  To be more clear on that, for each dice $A$ , define a number $p_A$ such that $h_A(j)-p_Aj$ is statistically independent
  from $j$ when $j$ is chosen uniformly from
  $\{1,2,\ldots,n\}$ . Then I think the condition stated should be
  equivalent to $h_A(j)-p_Aj$ and $h_B(j)-p_Bj$ being
  statistically independent.
  
  As a second note: if X and Z are jointly Gaussian with mean 0, variance 1 and correlation $\rho$ , then we can write $Z=\rho X+\sqrt{1-\rho^2}Y$ where X and Y are independent N(0,1).
  
  Then the event that X and Z are both positive depends only on
  the vector $(X/\sqrt{X^2+Y^2},Y/\sqrt{X^2+Y^2})$ which is
  uniform on the unit circle. In particular, it happens if that point is between $(0,1)$ and $(\sqrt{1-\rho^2},-\rho)$ , so
  the probability is $(a\cos(-\rho))/2\pi$ .
  
  As such if the signs of X and Z are independent, this probability is
  $\frac14$ , which means $a\cos(-\rho)=\frac{\pi}2$ , so
  $\rho=0$ .
P. Peng Says:
May 20, 2017 at 11:35 pm | Reply
Alright!
We have some good data to dig though.

First, this post contains data calculated using four different procedures for obtaining random dice. All dice are proper (values from 1 to n, sum constrained to n(n+1)/2), just different random selections. I will refer to the different random selection methods as follows:
sequence – uniform in unsorted sequence representation
multiset – uniform in multiset representation
sequence_walk – a random walk method, hopefully converges to same distribution as sequence method
multiset_walk – a random walk method, hopefully converges to same distribution as multiset method

==================================
How well do the random walk methods compare to their counterparts?

I had the number of steps scale as n log(n).
The fraction of intransitives and ties felt too summarized to get a good comparison, so I looked at the distribution of two measurements.

Comparison score:
for a given die A, randomly choose 100000 dice B and for each note the comparison score = number of roll pairs where A wins – number of roll pairs where B wins

graph for n=100 is here: http://imgur.com/a/UiKU3

L_1 distance from standard die:
calculated for 100000 randomly choose dice

graph for n=100 is here: http://imgur.com/a/YykOl

While previously we saw the fraction of k=4 intransitives was quite close comparing the sequence and multiset random model, these measurements both clearly distinguish between the two models.

The random walk distributions fall on top of their counterparts so closely that only statistical noise allows the colors of the covered curve to peek out now and then. So I’d say the random walk methods pass these tests nicely.

The sequence random method is fast enough that its random walk method is actually slower for the ranges I’ve looked at. However the multiset random walk is a huge improvement in speed. If we trust the multiset_walk method, we can now push to larger n sided dice.

==================================
Data on k=4 tournaments and fraction of ties

Calculated with multiset distribution, 100000 random dice each test.

n=50, tie=0.063590 intransitive=0.373510 (of non-tie = 0.398874)
n=100, tie=0.030920 intransitive=0.385000 (of non-tie = 0.397284)
n=150, tie=0.020690 intransitive=0.386690 (of non-tie = 0.394860)
n=200, tie=0.015290 intransitive=0.387660 (of non-tie = 0.393679)

So approaching towards ~ 0.39 is still visible.

Looking with the multiset random walk method, I also recorded the fraction of other tournament scores to try to see where the fraction is going.
———————-
n=100, tie=0.030870 intransitive=0.382700 (of non-tie = 0.394890)
[0, 1, 2, 3] 0.386940 (of non-tie 0.399265)
[0, 2, 2, 2] 0.101210 (of non-tie 0.104434)
[1, 1, 1, 3] 0.098280 (of non-tie 0.101411)
[1, 1, 2, 2] 0.382700 (of non-tie 0.394890)
———————-
n=200, tie=0.015500 intransitive=0.386350 (of non-tie = 0.392433)
[0, 1, 2, 3] 0.391660 (of non-tie 0.397826)
[0, 2, 2, 2] 0.102710 (of non-tie 0.104327)
[1, 1, 1, 3] 0.103780 (of non-tie 0.105414)
[1, 1, 2, 2] 0.386350 (of non-tie 0.392433)
———————-
n=500, tie=0.006130 intransitive=0.389120 (of non-tie = 0.391520)
[0, 1, 2, 3] 0.395030 (of non-tie 0.397466)
[0, 2, 2, 2] 0.105260 (of non-tie 0.105909)
[1, 1, 1, 3] 0.104460 (of non-tie 0.105104)
[1, 1, 2, 2] 0.389120 (of non-tie 0.391520)
———————-
n=1000, tie=0.002850 intransitive=0.391930 (of non-tie = 0.393050)
[0, 1, 2, 3] 0.394400 (of non-tie 0.395527)
[0, 2, 2, 2] 0.106880 (of non-tie 0.107185)
[1, 1, 1, 3] 0.103940 (of non-tie 0.104237)
[1, 1, 2, 2] 0.391930 (of non-tie 0.393050)

Just like in the previous calculations with sequence random dice, the fraction of intransitives increases to ~ 0.39

==================================

The fraction of ties goes to zero conjecture is intuitively clear, and the data is quite convincing. However for the other conjecture on randomness of tournaments, I am currently leaning towards this not being correct.

I think too much is being summarized in these fractions measurements, and enough details are being missed that it looks “almost” true, but not quite. So I explored for ways to show how the space of dice “looked” as viewed from each die.

The rough picture of these dice as suggested by the random tournament conjecture, is that except for rare dice that are washed out as n increases, the “space” as seen by comparisons from each die is practically equivalent and uncorrelated.

While not a disproof of the conjecture, we now know at least that this rough picture of equivalent dice is not correct. Consider the following graph:
http://imgur.com/a/mGL3Q

For a given die, the distribution of comparison scores to the other dice appears as a bell shaped distribution about 0. Tying is actually the peak of the distribution, the fraction of ties just goes down because it is only a sliver of the distribution for most dice.

This comparison “view” differs strongly across the dice, as the standard deviation of the bell curve seems strongly related to the L_1 distance from the standard die. In this sense, the standard die is just the limit of L_1 = 0, with the comparison scores fully peaked on ties.

If this picture is correct, the fraction of ties then goes to zero because the average distance from the standard die increases as n increases.

Now, it is still possible these score distributions become uncorrelated as n increases, but the result that the distributions for individual dice are already distinguishable with a distance parameter, makes it feel like the distributions would have to cleverly conspire to be uncorrelated. The k=4 data doesn’t seem to support such a conspiracy occurring, and I struggle for an intuitive reason to expect no correlation.

I will probably be busy for about the next week, but I should be able to run more tests in June.
- P. Peng Says:
  May 22, 2017 at 9:49 pm
  Oops, correction on:
  
  “Data on k=4 tournaments and fraction of ties
  Calculated with multiset distribution, 100000 random dice each test.”
  
  I meant each calculation involved 100000 tests (4 random dice each test)
- P. Peng Says:
  May 22, 2017 at 9:51 pm
  I put the code on github in case someone wants to run similar calcuations (or just wants to use the random dice generation methods mentioned above).
  https://github.com/perfluxion/dicetools
Luke Pebody Says:
May 21, 2017 at 10:54 am | Reply
So I ran some quick tests for 100-sided, 500-sided and 1000-sided dice, and the conditional correlations do not seem to be small.

I’ve put the code at http://ideone.com/5XmwVl so it can be verified.
- gowers Says:
  May 21, 2017 at 3:33 pm
  I’m glad someone else has independently come to this conclusion, as I wasn’t confident that my code was correct (partly, it has to be said, because I was particularly keen for the answer to come out differently).
  
  If P. Peng’s experiment provides unanswerable evidence that the “beats” tournament is not after all quasirandom, then everything fits and I think we’ll have to conclude that for the sequences model the general conjecture is false. It’s still possible that the more specific one about the probability of transitivity for three dice is correct, but the natural way of proving it would be ruled out. And of course, there would still the multiset model to think about, and various questions one could try to answer about the sequences model in order to understand the tournament, even if it isn’t quasirandom.
- Gil Kalai Says:
  May 21, 2017 at 5:04 pm
  Just to be sure: If A beats B and B beats C then the probability for A to beat C is larger than 1/2 ? (What is this probability?)
- Luke Pebody Says:
  May 21, 2017 at 5:28 pm
  What this indicates is that the events “A beats C” and “B beats C” are unlikely to be independent for a randomly chosen (but fixed) A and B.
  
  It may still be the case that for randomly chosen A, B and C, the events “A beats B”, “B beats C” and “C beats A” behave correctly.
  
  But I think, if I understand correctly, if “A beats C” and “B beats C” are not independent for randomly chosen fixed A and B, then it is likely that given “A beats C_i” for 1<=i<=n and "B beats C_i" for 1<=i<n, it is likely that the correlation for A and B is positive, so it is more likely than not that B beats C_n, and the big conjecture (as I understand it) is that basically any edge of the domination graph is independent of all of the other edges.
- Gil Kalai Says:
  May 21, 2017 at 8:50 pm
  Dear Luke, I still don’t understand. Did you choose A and B at random and then computed the correlation for many choices of C? Or did you choose A, B and C at random (again and again) and computed the correlation for all these triples between A>C and B>C?
- gowers Says:
  May 21, 2017 at 9:47 pm
  This may be repeating what Luke said — I’m not quite sure — but if for almost all pairs $A,B$ the events “ $A$ beats $C$ ” and “ $B$ beats $C$ ” are approximately independent (and each hold with probability approximately 1/2), then the whole tournament is quasirandom, since this condition is equivalent to the statement that the number of even (labelled) 4-cycles is roughly $n^4/2$ . But the condition that a triple $(A,B,C)$ is intransitive with probability approximately 1/4 is weaker than full quasirandomness, so even if, as is looking distinctly possible for the sequences model, full quasirandomness fails, the transitivity statement could conceivably still hold.
- Luke Pebody Says:
  May 21, 2017 at 10:27 pm
  Was not aware of that fact (if for almost all pairs A, B the events “A beats C” and “B beats C” are approximately independent and each hold with probability approximately 1/2 then the whole tournament is quasirandom).
  
  To be very clear about what the computations I have done are, for each dice A there is a function $h_A:\{1,2,\ldots,n\}\to\mathbb Z$ (defined in the previous post on this blog) such that for all dice $C$, the amount by which A beats C is $\Sigma_ih_A(c_i)$ .
  
  If for uniformly chosen $c_1, c_2, \ldots, c_n$ from $\{1,2,\ldots,n\}$ , the variables $\Sigma_ih_A(c_i)$ and
  $\Sigma_i(c_i-(n+1)/2)$ are approximately jointly Gaussian, then the conditional distribution of the first given the second should be roughly the same as the first minus the linear regression of the first on the second. For each dice A, my program computes $h_A(c_i)$ and this linear regression $\alpha c_i+\beta$ , and then computes the difference $h'_A(c_i)$ .
  
  Now, for random dice A, B, suppose we assume that all three variables $\Sigma_ih_A(c_i)$ , $\Sigma_ih_B(c_i)$ and $\Sigma_i(c_i-(n+1)/2)$ are approximately jointly Gaussian. Then we wish for the two first distributions to have independent signs subject to the third being zero. Jointly Gaussian distributions (with mean 0) have independent signs iff they are independent, so we want the two condtiional distributions to be independent. By the previous paragraph, this is the same as wanting $h'_A(c_i)$ and $h'_B(c_i)$ to be independent.
  
  This is what I calculate. I generate random dice A and B, and compute the correlation of $h'_A$ and $h'_B$ .
  
  I possibly even do this correctly.
  
  Interesting follow-up questions that I haven’t had time to investigate:
  
  For a fixed random dice A, how similar are the distributions of $\Sigma_ih_A(c_i)$ subject to $\Sigma_ic_i=n(n+1)/2$ and of
  $\Sigma_ih'_A(c_i)$ with no conditions.
  
  The same for two fixed random dice A, B. Does the correlation over random dice C seem to be close to what is predicted by this logic?
Brian Conrey Says:
May 21, 2017 at 5:30 pm | Reply
I don’t think the numerical evidence against the conjecture is very convincing. There are very possibly lower order main terms which could unduly sway the evidence for small n. Is there any way to test million-sided dice?
- Luke Pebody Says:
  May 21, 2017 at 6:17 pm
  Assuming my code is right:
  
  Testcase #1 of 1000000-sided die: Var1=83666677520.9489, Var2=58164194958.0453, Cov=-5142588342.6654, Corr=-0.0737
  
  Testcase #2 of 1000000-sided die: Var1=170652385885.1353, Var2=383837502837.2842, Cov=-44340175731.3077, Corr=-0.1732
  
  Testcase #3 of 1000000-sided die: Var1=95411567879.0051, Var2=411651017261.1688, Cov=23514160690.4223, Corr=0.1186
  
  Testcase #4 of 1000000-sided die: Var1=129337519658.2799, Var2=482870174382.8414, Cov=103327042561.5577, Corr=0.4135
  
  Testcase #5 of 1000000-sided die: Var1=63469236644.0365, Var2=190459144796.9994, Cov=36445058013.0572, Corr=0.3315
  
  Testcase #6 of 1000000-sided die: Var1=247959398175.2720, Var2=374571837411.0212, Cov=-168707342617.3397,
  Corr=-0.5536
  
  Testcase #7 of 1000000-sided die: Var1=118527199199.1557, Var2=235795826705.5231, Cov=81219466247.5660, Corr=0.4858
  
  Testcase #8 of 1000000-sided die: Var1=156142669182.8310, Var2=151112944593.1989, Cov=17976062335.7008, Corr=0.1170
  
  Testcase #9 of 1000000-sided die: Var1=155578825797.1025, Var2=186257137173.6792, Cov=-85601110430.8644, Corr=-0.5029
  
  Testcase #10 of 1000000-sided die: Var1=250084013516.8111, Var2=176570151724.8132, Cov=-37752750469.6730, Corr=-0.1797
  
  Testcase #11 of 1000000-sided die: Var1=83525126083.2328, Var2=188512345761.9844, Cov=-3293101901.2540, Corr=-0.0262
  
  Testcase #12 of 1000000-sided die: Var1=243148285063.8932, Var2=128526622114.4252, Cov=-109666564179.3167, Corr=-0.6204
  
  Testcase #13 of 1000000-sided die: Var1=111797132874.2193, Var2=359613721671.0355, Cov=10839894699.5140, Corr=0.0541
  
  Testcase #14 of 1000000-sided die: Var1=103272231129.6406, Var2=191318190359.1727, Cov=-8302571778.2218, Corr=-0.0591
  
  Testcase #15 of 1000000-sided die: Var1=86787771777.1460, Var2=200071063872.5374, Cov=42895672072.9774, Corr=0.3255
  
  Testcase #16 of 1000000-sided die: Var1=266654706016.3156, Var2=88389047630.7891, Cov=84162110644.4755, Corr=0.5482
  
  Testcase #17 of 1000000-sided die: Var1=139457028923.0280, Var2=107994124793.1600, Cov=57593784095.0311, Corr=0.4693
  
  Testcase #18 of 1000000-sided die: Var1=251366425435.1886, Var2=157545398985.3911, Cov=37697388546.0484, Corr=0.1894
  
  Testcase #19 of 1000000-sided die: Var1=153786028139.5456, Var2=263080371691.1945, Cov=-67579189182.8782, Corr=-0.3360
  
  Testcase #20 of 1000000-sided die: Var1=311433278205.4565, Var2=248353888692.7109, Cov=-5039875172.9270, Corr=-0.0181
- gowers Says:
  May 21, 2017 at 9:52 pm
  Just to check, are these the correlations between the random variables $h_A(j)-p_A(j)$ and $h_B(j)-p_B(j)$ , where $p_A$ and $p_B$ are as in your earlier comment? In other words, are they the correlations that ought to be zero in order for $h_A(c_i)$ and $h_B(c_i)$ to become independent when we condition on $\sum_jc_j=0$ ?
- Luke Pebody Says:
  May 21, 2017 at 10:01 pm
  Yes.
- Gil Kalai Says:
  May 21, 2017 at 11:15 pm
  Dear Luke, Again, I dont understand these outcomes. You chose A and B at random (In these 20 testcases) and computed the correlation between A beats C and B beats C for many choices of C. And then we see sometimes positive correlation and sometimes negative correlation. It is still possible that if we take r random dice (starting with 3 and 4) and compute the tournament of which beats which this will converge to a random tournament. (I dont refer to low order terms. I just dont see the connection between this experiment and the statement – I think you need to take many random triples A B and C and compute the probabilities for A beats B; B beats C; and C beats A.
- Luke Pebody Says:
  May 21, 2017 at 11:41 pm
  My contention is that “A beats C” and “B beats C” not being approximately independent for randomly chosen A and B doesn’t contradict r random dice converging to a random tournament for any fixed r, but that it makes it very unlikely that this will be the case for all r.
  
  Although, given what Tim said earlier, I think that if 4 random dice converge to a random tournament then the whole thing is quasirandom, which I assume would mean that “A beats C” and “B beats C” are approximately independent for almost all pairs A, B. (This paragraph is not something I am very comfortable with).
- Gil Kalai Says:
  May 22, 2017 at 6:37 am
  I agree that a crucial property would be
  
  “For a randomly chosen A with probability close to 1/2 A beats a randomly chosen B”
  
  If this fails (and the more delicate tests based on correlation for randomly chosen A and B) the conjecture is very implausible. (But checking directly random triples could be useful). It would be interesting to understand (in words) what property of A makes A more likely to beat many other dice and what properties of A and B leads to positive correlation or to negative correlation. Since all symmetric dice are tied it may be related to measures of asymmetry.
P. Peng Says:
May 21, 2017 at 11:07 pm | Reply
nudge … my post from yesterday appears stuck in moderation (maybe because it had several links to images and got marked as spam or something?)
Brian Conrey Says:
May 22, 2017 at 12:09 am | Reply
Let’s look at the case of 6-sided proper dice. There are 32 of them. Let’s toss out the standard die since it ties with everything and consider

{D1,… ,D31}={{6, 6, 6, 1, 1, 1}, {6, 6, 5, 2, 1, 1}, {6, 6, 4, 3, 1, 1},
{6, 6, 4, 2, 2,  1}, {6, 6, 3, 3, 2, 1}, {6, 6, 3, 2, 2, 2}, {6, 5, 5, 3, 1, 1}, {6, 5, 5, 2,  2, 1}, {6, 5, 4, 4, 1, 1}, {6, 5, 4, 2, 2, 2}, {6, 5, 3, 3, 3, 1}, {6, 5,   3, 3, 2, 2}, {6, 4, 4, 4, 2, 1}, {6, 4, 4, 3, 3, 1}, {6, 4, 4, 3, 2, 2}, {6,  4, 3, 3, 3, 2}, {6, 3, 3, 3, 3, 3}, {5, 5, 5, 4, 1, 1}, {5, 5, 5, 3, 2,  1}, {5, 5, 5, 2, 2, 2}, {5, 5, 4, 4, 2, 1}, {5, 5, 4, 3, 3, 1}, {5, 5, 4, 3,  2, 2}, {5, 5, 3, 3, 3, 2}, {5, 4, 4, 4, 3, 1}, {5, 4, 4, 4, 2, 2}, {5, 4, 4, 3, 3, 2}, {5, 4, 3, 3, 3, 3}, {4, 4, 4, 4, 4, 1}, {4, 4, 4, 4, 3, 2}, {4,  4, 4, 3, 3, 3}}

There are 31C3=4495 triples of these dice. Of these,
26 are intransitive triples, 2307 are transitive, and
2162 have a tie. Clearly this is a long way from the
“expected” 25% of intransitive triples, but the number
of sides is very small. Nevertheless we can look at
the intransitive triples and what we notice is that
certain dice and certain pairs of dice occur
inordinately often. For example D10 appears in 9 of
these 26 triples whereas D1, D16, D17, D20, D25,
D29, D30, and D31 don’t appear in any of them.

The list of the 26 triples is

{{14, 15, 19}, {13, 23, 24}, {13, 15, 23}, {12, 15, 19},
{11, 24, 27}, {11,   23, 28}, {11, 18, 19}, {10, 13, 23},
{9, 15, 19}, {8, 15, 19}, {8, 10,   22}, {7, 12, 13},
{7, 10, 23}, {7, 10, 14}, {6, 10, 23}, {5, 10, 22},
{4,  14, 19}, {4, 11, 21}, {4, 8, 10}, {3, 15, 24},
{3, 15, 19}, {3, 11, 26}, {3,  10, 22}, {3, 7, 10},
{3, 4, 11}, {2, 11, 26}}

where, for example, {14,15,19} means the triple
{D14, D15, D19}.

I don’t know if this is helpful or not but it may
have some bearing on the correlation calculations
being performed. I have a feeling that this lopsidedness
will persist for dice with many sides. I suspect that
there will be a smallish number of dice and pairs of
dice that contribute a lot to the intransitive total.
- Tristram Bogart Says:
  May 22, 2017 at 12:52 am
  Thanks for the data! It occurs to me that we could try to learn a little more from all of those ties. If there’s at least one tie among dice A, B, and C then there are various distinct cases:
  
  1) A ties B and both beat C
  2) A ties B and both lose to C
  3) A ties B, C beats A, and B beats C.
  4) A ties B, A ties C, and B beats C
  5) A, B, and C all tie pairwise.
  
  Cases (1), (2) and (5) are weakly transitive in the sense that we could implement a tiebreaking scheme to create a linear order, though in case (5) the tiebreaker would be doing all the work. Cases (3) and (4) would be intransitive in this sense.
  
  If it’s easy to extract from your data, could you tell us how many times each case arises?
- Brian Conrey Says:
  May 22, 2017 at 3:01 am
  Oops! What I wrote is wrong. There are actually 577 intransitive triples…
- Brian Conrey Says:
  May 22, 2017 at 3:21 am
  It should be 577 intransitive triples; 1385 transitive triples; 400 of type (1) above; 306 of type (2); 930 of type (3); 671 of type (4) which is two ties; and 226 of type (5) where all three dice tie.
- Brian Conrey Says:
  May 22, 2017 at 4:08 am
  The frequency of 6-sided dice in intransitive triples is
  23, {5, 5, 4, 3, 2, 2}
  24, {6, 4, 4, 3, 3, 1,
  24, {5, 4, 4, 3,3, 2}
  24, {4, 4, 4, 3, 3, 3}
  29, {6, 6, 6, 1, 1, 1}
  33, {6, 6, 4, 3, 1, 1}
  45, {6, 4, 3, 3, 3, 2}
  45, {5, 4, 4, 4, 3, 1}
  49, {6, 4, 4, 3, 2, 2}
  49, {5, 5, 4, 3, 3, 1}
  50, {5, 5, 3, 3, 3, 2}
  50, {5, 4, 4, 4, 2, 2}
  54, {5, 5, 5, 2, 2, 2}
  58, {6, 6, 4, 2, 2, 1}
  58, {6, 5, 5, 3, 1, 1}
  59, {6, 6, 3, 3, 2, 1}
  59, {6, 5, 4, 4, 1, 1}
  59, {6, 5, 3, 3, 3, 1}
  59, {6, 5, 3, 3, 2, 2}
  59, {6, 4, 4, 4, 2, 1}
  59, {5, 5, 4, 4, 2, 1}
  62, {5, 4, 3, 3, 3, 3}
  62, {4, 4, 4, 4, 3, 2}
  65, {6, 3, 3, 3, 3, 3}
  65, {4, 4, 4, 4, 4, 1}
  78, {6, 5, 4, 2, 2, 2}
  78, {5, 5, 5, 3, 2, 1}
  79, {6, 6, 5, 2, 1, 1}
  79, {6, 5, 5, 2, 2, 1}
  97, {6, 6, 3, 2, 2, 2}
  97, {5, 5, 5, 4, 1, 1}
- Luke Pebody Says:
  May 22, 2017 at 5:29 am
  Tristram, 4) is not intransitive.
Gil Kalai Says:
May 22, 2017 at 8:25 am | Reply
Perhaps to test how good a dice is, we need to compare the dice with the symmetric dice. If A is much better than (n+1-A) then it will be “charismatic” and win against more than 50% symmetric dices, if A and B are both “charismatic” or both “anticharismatic” then they will have positive correlation compared to a random C etc.
- Gil Kalai Says:
  May 22, 2017 at 9:36 am
  I think what is going on is the following: we can expect that if a random dice A beats a random dice B then the gap between the number of pairs for which A beats B and the number of pairs for which B beats A is linear in n (say $\alpha n$ . (Square root of the total number of pairs.) Moreover, near the expected number the distribution is roughly uniform.
  
  If we start with a random dice A then the expected gap between the number of pairs A beats another random dice B minus pairs that B beats A is also in expectation linear in n, (say the expectation is $\beta (A) n$ ).
  
  So the values of $\beta (A)$ will tilt the tournament described by several dice toward transitivity.
  
  The number of pairs for which A beats its symmetric pair (n+1-A) is also linear in expectation and perhaps it is well-correlated with $\beta (A)$ .
gowers Says:
May 22, 2017 at 9:11 am | Reply
After the experimental results that suggest that full quasirandomness is probably false, I’ve written a quick piece of code to check this directly. That is, I pick a random pair of dice $A,B$ and then take a whole lot of random dice $C$ and see how many of them fall into the four possible classes, according to whether they beat A and whether they beat B. If the tournament is quasirandom, then we’d expect these four numbers to be roughly equal, but in my tests so far they don’t seem to be. For instance, with 100-sided dice and 10000 tests I’ve just obtained the numbers (3376,1523,1523,3330). The equality of the second and third numbers isn’t a bug in my program as it doesn’t always happen when I run it …

Length 100 is probably not long enough to be a convincing indication of anything, but one can at least say that a number of bits of not wholly convincing evidence are stacking up. Of course, these bits of evidence are not independent of each other, so it would still be very good to run this simple test with longer sequences. I’ll do a few more runs with my primitive program and report on the result, but it would be very interesting to look at longer dice and more tests.
gowers Says:
May 22, 2017 at 9:38 am | Reply
A few results with sequences of length 500 and 500 tests for each (A,B) (so the standard deviation for each number should be about 10).

(128, 121, 136, 113)
(147, 103, 112, 136)
(121, 133, 118, 123)

These don’t seem terribly conclusive in either direction.

OK, here are a few goes with 200 sides and 2000 tests (so standard deviation up to more like 20).

(514, 511, 457, 495)
(428, 568, 589, 397)
(506, 523, 454, 494)
(401, 615, 600, 362)

Some of these are so far from what quasirandomness would predict that my belief in full quasirandomness for the sequence model is almost extinguished — I don’t quite rule out that what we’re seeing is a small-numbers phenomenon, but I would have expected sequences of length 200 to be a bit closer to quasirandom than this if the quasirandomness conjecture were true.

Another thing it would obviously be good to do is the same simple test for the multiset model. Maybe I’ll try that later, but I have other things to do for a while.
gowers Says:
May 22, 2017 at 10:04 am | Reply
I thought it would be helpful to have an example of a tournament that is not quasirandom but where intransitivity occurs with the expected probability. I can actually exhibit a large class of such tournaments (and I think it may be possible to enlarge it quite a bit further).

Let $n$ be a positive integer and let $A\subset\mathbb Z_n$ be an antisymmetric set, meaning that $x\in A$ if and only if $-x\notin A$ . (I don’t care what happens to zero, since that makes a negligible difference to what follows.)

Now define a tournament by putting an arrow from $x$ to $y$ if $y-x\in A$ . In this case I’ll say that $y$ beats $x$ .

Writing $A$ for its own characteristic function, we can express the number of triples $(x,y,z)$ such that $y$ beats $x$ , $z$ beats $y$ and $x$ beats $z$ as

$\sum_{x,y,z}A(y-x)A(z-y)A(x-z)$

which is equal to

$n\sum_{u+v+w=0}A(u)A(v)A(w)$ .

If we change variables by replacing each variable by its negative, we see that this is also equal to

$n\sum_{u+v+w=0}A(-u)A(-v)A(-w)$ ,

which by the antisymmetry of $A$ is equal (to make this work exactly, we could set $A(0)=1/2$ just for convenience) to

$n\sum_{u+v+w=0}(1-A(u))(1-A(v))(1-A(w))$ .

This last sum splits up into eight terms. The main term is simply $n^2$ . A term that involves just one of $A(u)$ , $A(v)$ or $A(w)$ gives $-n^2/2$ . A term that involves two of them gives $n^2/4$ (since any two of the variables $u,v,w$ vary completely independently). And finally, the term that involves all three of $A(u),A(v)$ and $A(w)$ can be taken over to the left-hand side. So we get that

$2\sum_{u+v+w=0}A(u)A(v)A(w)=n^2-3n^2/2+3n^2/4=n^2/4$

and hence that the number of triples we were counting is $n^3/8$ . Therefore, the number of intransitive triples is $n^3/4$ (since we can cycle round the other way too).

If we take a set $A$ such as $\{1,2,\dots,n/2\}$ , then this tournament is very far from quasirandom, since vertices that are close to each other have very similar neighbourhoods.

It might be an idea to try to think of a more general way of constructing tournaments like this, so that we have some kind of model for what might be going on with the dice: obviously we don’t expect the dice tournament to be based on a Cayley graph.
- gowers Says:
  May 22, 2017 at 10:06 am
  A small remark is that the little result in the previous comment can also be proved easily using Fourier analysis, the key fact, for those familiar with this technique, being that for each non-zero $r$ we have from antisymmetry that $\hat{A}(-r)^3=-\hat{A}(r)^3$ so $\sum_r\hat A(r)^3=\hat A(0)^3=1/8$ .
gowers Says:
May 22, 2017 at 10:19 am | Reply
As I was writing the previous comment I had a suspicion that it was completely unnecessary to restrict attention to Cayley graphs, and indeed the same proof works for general antisymmetric matrices.

To keep things consistent with what I did before, I’ll take the “adjacency matrix” of a tournament to be given by $A(x,y)=1$ if $y$ beats $x$ and 0 otherwise (rather than -1 otherwise).

So now the number of triples $(x,y,z)$ such that $y$ beats $x$ , $z$ beats $y$ and $x$ beats $z$ is

$\sum_{x,y,z}A(x,y)A(y,z)A(z,x)$

By swapping the names of $y$ and $z$ we get that this is equal to

$\sum_{x,y,z}A(x,z)A(z,y)A(y,x)$

which by antisymmetry is equal to

$\sum_{x,y,z}(1-A(x,y))(1-A(y,z))(1-A(z,x))$ .

The main term is $n^3$ . With just one “A” we sum $n$ times over all 1s in $A$ , getting $-n^3/2$ . With a sum such as $\sum_{x,y,z}A(x,y)A(x,z)$ we are summing over all $x$ the square of its outdegree. So if all outdegrees are $n/2$ , then this gives us $n^3/4$ .

So with that assumption we get that

$2\sum_{x,y,z}A(x,z)A(z,y)A(y,x)=n^3-3n^3/2+3n^3/4=n^3/4$

and with the cycles in the opposite direction we get that the probability of intransitivity is 1/4.

So it seems that it should be enough to prove that almost all dice beat roughly half of all the other dice.
- Luke Pebody Says:
  May 22, 2017 at 10:59 am
  Just a quick note that a simple double-counting proof has just occurred to me (based on a proof of the lower bound for the number of monochromatic triangles in a 2-coloured graph on n vertices) that not only is the correct proportion of transitive triangles implied by all the outdegrees being about a half, in fact these two statements are equivalent.
  
  Count the number of triples (x, y, z) where x beats y and y beats z. This is clearly C(n,3) plus twice the number of transitive triangles (by counting via the distinct elements x, y and z) and is also equal to the sum of $d^-(y)d^+(y)$ where these denote the indegree and outdegrees of y.
  
  Since these two degrees sum to n-1, the count is at most n(n-1)^2/4, so the number of transitive triangles is at most n(n-1)(n+1)/24 (very close to the expected number in a random) with equality if every degree is about half.
- gowers Says:
  May 22, 2017 at 11:03 am
  Ah, I had wondered whether that might be the case. In a way it makes the original conjecture slightly less interesting — it’s no longer “really” about intransitivity — but I would still like it if we could prove it!
gowers Says:
May 22, 2017 at 10:24 am | Reply
Going back to the experimental evidence mentioned above, the numbers given are the number of $C$ such that the dice beating $C$ are

( $A$ and $B$ , $A$ , $B$ , neither).

So for the intransitivity conjecture to be true, we would like the first and second numbers to add up to roughly half the total, and also the first and third numbers. So we want the outer two numbers to be roughly equal and the inner two numbers to be roughly equal. Apart from one slightly troubling run (which can probably be dismissed as a fluke) this seems to be happening reasonably well.
- gowers Says:
  May 22, 2017 at 10:54 am
  A few more runs picking a random die and counting how many times it wins and how many times it loses. Sequences of length 500, 2000 trials. Each pair is (number of times A wins, number of times A loses).
  
  $(1018, 980), (960, 1038), (996,996), (990, 1003)$
  
  This looks pretty good to me. The second pair isn’t great, but that could be simply because the die in that case was slightly untypical and that the true probability of its beating a further die was a little bit less than 1/2.
gowers Says:
May 22, 2017 at 11:01 am | Reply
It feels to me as though we ought to be able to prove the “small” conjecture now, given the observation that it is sufficient for almost all dice to beat roughly half the other dice.

We already have a heuristic argument for this, namely the following. Let $A$ be a typical die. (I think this should mean something like that $h_A$ is not zero too much.) Then $A$ beats $C$ if $\sum_jh_A(c_j)>0$ . But if $C$ is chosen purely randomly (with no condition on the sum), then we expect the random variable

$(\sum_jh_A(c_j), \sum_j(c_j-(n+1)/2)$

to approximate a bivariate normal $(X,Y)$ with mean $(0,0)$ . And if this is the case, then when we condition on $Y=0$ we should get a normal distribution, which will have a probability 1/2 of being greater than zero.

So if we can make this argument rigorous using Berry-Esseen type results, then we should be done.

To be clear, “done” means that we would have proved the conjecture about the frequency of intransitivity for the sequences model. If we can do that, it would be nice to push on and try to do the multiset model too.
- gowers Says:
  May 22, 2017 at 11:05 am
  And let me add that Bogdan Grechuk has already made a start on proving this.
- gowers Says:
  May 22, 2017 at 11:13 am
  One of the things that Bogdan needed to get the argument to work is some control on how large $h_A$ gets.
  
  I think it should be possible to say something about this quite easily — though whether what I say is strong enough I haven’t yet worked out.
  
  Suppose we choose a random die $A$ by choosing a purely random sequence and conditioning on its having the right total. For the purely random sequence, $g_A(j)$ is a sum of $n$ Bernoulli random variables with probability $j/n$ , so the probability that it differs from its mean $j$ by more than $t\sqrt n$ is going to be bounded above by $\exp(-ct^2)$ for some positive $c$ . So if $t\gg\sqrt{\log n}$ , the probability starts to be at most $n^{-C}$ .
  
  But when we condition on the sum being zero, this probability will multiply by at most $n$ or so, since the probability that the sum is correct is of order $n^{-1}$ . So I think we can safely say that for a typical die $A$ , no $h_A(j)$ will be bigger than, say, $100\sqrt{n\log n}$ .
  
  With another elementary argument I think we should be able to show that the sum of the squares of the $h_A(j)$ will not be too small. So maybe the ingredients are almost all in place.
- gowers Says:
  May 22, 2017 at 2:47 pm
  Here’s the kind of elementary argument I’m talking about that should prove that the sum of the squares of the $g_A(j)$ won’t usually be too small. Fix some $m\leq n$ and let us consider just the values of $g_A$ at $m, 2m, 3m, \dots$ . The probability that $|g_A(m)|\leq s$ is at most $Cs/\sqrt m$ . More generally, the probability that $|g_A(km)|\leq s$ given the value of $g_A((k-1)m)$ will be at most $Cs/\sqrt m$ . So for lots of the $g_A(km)$ to be small we need a lot of unlikely events to happen, the probabilities of which are multiplied together.
  
  It seems at least possible that for suitably optimized choices of $m$ and $s$ we’ll get a strong enough bound from this kind of argument. Indeed, if $s$ is something like $\sqrt m/100$ , then I think the above sketch should show that with very high probability $|g_A(km)|\geq\sqrt m/100$ for at least half the values of $k$ , which would give a lower bound for $\sum_k|g_A(km)|^2$ of $(n/2m)(m/100)$ .
  
  That looks unfortunate, but I now see that we can probably improve it quite substantially by observing that each time $g_A(km)$ is large, it is probably surrounded by quite a lot of other large values (since the only way this could fail to be the case is if there are unexpected clusters in $A$ ).
- gowers Says:
  May 22, 2017 at 3:10 pm
  I’m getting slightly confused, which is partly because I think that in Bogdan’s comment in the place where he defined the $h_j$ he forgot a power 1/2 in the denominator. That’s slightly better, and means that the lower bound I sketched for this denominator would be proportional to $\sqrt n$ . So we need to gain a further factor of $\sqrt{\log n}$ .
  
  But suppose that we have some $j$ with $h_A(j)\geq c\sqrt n$ . Ah, there’s a useful property here, which is that $g_A$ is increasing, so $h_A$ decreases by at most 1 when $j$ increases by 1. So we get $c\sqrt n/2$ values of $j$ with $h_A(j)\geq c\sqrt n/2$ , which gives a lower bound for $\sum_jh_A(j)^2$ of $c^3n^{3/2}/8$ and therefore a lower bound for the square root of this of $c^{3/2}n^{3/4}/4$ , which is a lot bigger than $\sqrt{n\log n}$ .
  
  So it’s possible that we’re done, but I’ve only half understood Bogdan’s comment, so I’ll need to await confirmation. Also, my calculation above could turn out to be nonsense — it’s not carefully checked.
- Bogdan Grechuk Says:
  May 23, 2017 at 1:44 pm
  I am sorry about “a power 1/2 in the denominator” (can you correct this?). As I noted in the comments to my post, control of maximal hj for a typical dice seems to be easy (thank you for adding the details, the observation that hj decreases by at most 1 indeed proves the statement), but, for the argument to work, Berry–Esseen theorem should remain correct for random vectors, while Wikipedia formulation is for the one-dimensional case only. I have asked for a reference with no response yet. So, I would not say we are done. But very close, one step away.
- gowers Says:
  May 23, 2017 at 1:56 pm
  Thanks for this clarification. I’ve added the 1/2 now.
  
  This mathoverflow question points us to a multidimensional Berry-Esseen theorem. In particular, it looks as though Theorem 1.1 in this paper is probably what we want. Does it look strong enough to you? (I could probably work this out, but I think you’ll be able to do so a lot more quickly.)
- gowers Says:
  May 23, 2017 at 3:24 pm
  It looks to me as though we wouldn’t be quite there, because the 2D Berry-Esseen theorem needs to be discrete, and that raises certain considerations that may turn out to be non-trivial.
  
  The theorem in the paper measures closeness to a Gaussian as the maximum discrepancy between the Gaussian probability of being a convex set and the probability of the given distribution being in that set. However, the convex set that we want to restrict to will be a one-dimensional set, and therefore very small.
  
  The kind of problem this could cause is illustrated by the following simple example. Let $X$ be a random variable that has mean zero and is supported on a finite subset of $\mathbb Z$ . Let $X_1$ and $X_2$ be independent copies of $X$ . Then a sum of $n$ independent copies of $(X_1,X_2)$ will approximate a normal distribution with mean $(0,0)$ and a certain variance.
  
  Now let’s suppose that $\mathbb P[X=1]$ is non-zero. Then the normal distribution we converge to will assign non-zero probabilities to every point in $\mathbb Z^2$ .
  
  But if instead we start with a different random variable with the same mean and variance as $X$ but supported on the even numbers, then the normal distribution we converge to will assign zero probability to numbers with an odd coordinate and will make up for it by assigning more weight to points in $2\mathbb Z^2$ .
  
  This is no problem when it comes to the probability of landing in a convex set, but it is more worrying when we condition on one of the coordinates being zero, which will lead to the probabilities being doubled, in a certain sense.
  
  In our case, we should be OK, because the variable whose value we are conditioning on will not be supported in a residue class, but I think it means we may have to delve into the proof of the multidimensional Berry-Esseen theorem rather than just quoting it — unless of course someone else has proved a version that is better adapted to our purpose.
- gowers Says:
  May 23, 2017 at 4:57 pm
  I think the kind of statement we might want is that the distribution tends, in a suitable sense, to a distribution with a Gaussian type formula (that is, the probability of being at $(x,y)$ is something like $c\exp(-q(x))$ for some positive definite quadratic form $q$ ) on some sublattice of $\mathbb Z^2$ (or more generally $\mathbb Z^d$ ).
- Bogdan Grechuk Says:
  May 23, 2017 at 5:46 pm
  Thank you for the reference to 2D Berry-Esseen theorem. Indeed, it does not seem applicable, and the problem is much trickier than I thought. I do not think that the problem is because we have discrete distribution. In the continuous case, the fact that random vector (X,Y) is “close” to Gaussian in the sense that “the maximum discrepancy on convex sets” (as defined in the reference) is small, tell us nothing about the conditional distribution of X given Y=0, because P(Y=0)=0…
- Bogdan Grechuk Says:
  May 23, 2017 at 11:09 pm
  An example. Fix a generic solution $x_1,x_2,x_3,y_1,y_2,y_3$ to the system of 5 equations: $\sum_i x_i=0, \sum_i y_i=0, \sum_i x_i^2=3, \sum_i y_i^2=3, \sum_i x_iy_i =0$ . Now, let random vector (X,Y) takes values $(x_1,y_1), (x_2,y_2), (x_3,y_3)$ each with probability 1/3. Then $EX=EY=0$ , $\sigma(X)=\sigma(Y)=1$ , $\mathrm{cov}(X,Y)=0$ . Let $(X_i,Y_i)$ be i.i.d copies of this random vector, $U_n=\sum_{i=1}^n X_i$ , $V_n=\sum_{i=1}^n Y_i$ . Then the joint distribution of $(U_n,V_n)$ converges to standard 2D normal. For example, $P[U_n>0, V_n>0]$ converges to 1/4, etc.
  
  However, in fact $V_n=k_1y_1+k_2y_2+k_3y_3$ , where $k_i$ are frequencies for each value, and $V_n=0$ implies $k_1=k_2=k_3$ , hence $U_n=0$ . So, $P[U_n=0 | V_n=0]=1$ .
  
  On the other hand, most of proof strategies we had for the dice problem, if worked, could also be used to prove that $P[U_n=0 | V_n=0]$ converges to 0, while $P[U_n>0 | V_n=0]$ converges to 1/2…
- Bogdan Grechuk Says:
  May 24, 2017 at 2:09 pm
  It seems I have finally found the correct type of theorems we need: these are Discrete local limit theorems for random vectors. The original result is due to Richter (MULTI-DIMENSIONAL LOCAL LIMIT THEOREMS FOR LARGE DEVIATIONS), but the formulation there is complicated. A recent book is “Random Walk: A Modern Introduction” by Gregory F. Lawler and Vlada Limic, Section 2, Theorem 2.1.1. There are different estimates there, but, basically, the Theorem states that the for every single point x in the lattice, the probability that we will be exactly at x after n steps can be estimated from the corresponding normal distribution, plus some error term.
- gowers Says:
  May 24, 2017 at 3:02 pm
  Fantastic! I also came across the Lawler/Limic book but had not understood it well enough to see that it had what we wanted (assuming that that is indeed the case).
  
  It seems to me that there will probably be some kind of technical lemma needed to argue that the probability that the entire random walk lives in a sublattice tends to zero. But probably this too follows reasonably easily from the fact that $h_A(j+1)\geq h_A(j)-1$ for every $j$ .
- Will B Says:
  May 24, 2017 at 7:13 pm
  Just as a reference for those following along who don’t have access to the Lawler/Limic book, Terry Tao gives a bit of related exposition in a blog post: https://terrytao.wordpress.com/2015/11/19/275a-notes-5-variants-of-the-central-limit-theorem/#more-8566. Hope it helps!
- gowers Says:
  May 24, 2017 at 9:27 pm
  That blog post is definitely worth a read, but let me add that everyone has access to the Lawler/Limic book.
- Bogdan Grechuk Says:
  May 25, 2017 at 12:05 pm
  Ah, the second time when I thought we are almost done, I then see a problem. For fixed A and random j we have a 2-dimensional random vector (X,Y) taking values $(h_A(j), j-(n+1)/2), \, j=1,\dots,n$ with equal chances. Note that for odd n it takes integer values only. Let $(X_i,Y_i), \, i=1,\dots,n$ be i.i.d copies of (X,Y), $U_n=\sum X_i$ , $V_n=\sum Y_i$ .
  
  For simplicity, let us talk about ties conjecture, for which it suffices to prove that $P[U_n=0 | V_n=0] = P[U_n = V_n = 0]/P[V_n=0]$ converges to $0$.
  
  Let $p_n(u,v) = P[U_n=u, V_n=v]$ , and let $p'_n(u,v)$ be an estimate of $p_n(u,v)$ obtained by integrating density of standard normal distribution around point $(u,v)$ . Then $p'_n(0,0)/P[V_n=0]$ converges to $0$ , so all we need is to control the ratio $p_n(0,0)/p'_n(0,0)$ .
  
  Now, substituting x=(0,0) into the formula (2.5) on page 25 in the book, we get $|p_n(0,0) - p'_n(0,0)| \leq c/n^2$ , while (again according to the book) $p'_n(0,0)$ is of order $c/\sqrt{n}$. This seemed to clearly imply that $p_n(0,0)/p'_n(0,0)$ converges to $1$, and we would be done. The proof that $P[U_n>0 | V_n=0]$ converges to 1/2 is similar.
  
  However, while the constant in Berry–Esseen theorem was universal (0.4748 works), the constant c in the formula (2.5) in the book seems to depend on our initial random vector (X,Y). In our case, the vector is not fixed but different for every n, so c depends on n, which makes the inequality meaningless.
  
  So, we need a version of Local Central Limit Theorem, as in the book, but with all constants universal, and the dependence of the initial random vector (X,Y), if any, should be explicit (in terms of moments, etc.) It seems not easy to extract this from the proof in the book.
- gowers Says:
  May 25, 2017 at 12:18 pm
  I’m in the middle of a new post, which I’ll now have to tone down slightly, to make clear that there is still work to do (or a convenient result to find in the literature). My guess is that we don’t actually need a new idea, but rather we will have to go through the proof of the LCLT very carefully, extract some assumption we need about the random vector $(X,Y)$ and then prove that for almost all $A$ we have that assumption when $(X,Y)$ is $(h_A(j), j-(n+1)/2)$ for a randomly chosen $j$ . The kinds of properties we might want are that the support of $h_A$ is quite small (about $\sqrt{n\log n}$ ) and that $(h_A(j),j))$ is not concentrated in some sublattice of $\mathbb Z^2$ .
P. Peng Says:
May 22, 2017 at 8:38 pm | Reply
Gowers,
My comment from a couple days ago is still stuck in the moderation queue.
- gowers Says:
  May 22, 2017 at 9:24 pm
  Again, apologies about that, and a curse on WordPress. To make the comment easy to find, here is a link to it.
Gil Kalai Says:
May 22, 2017 at 9:59 pm | Reply
Tim, I am a little confused about where matters stand and what the empirical evidence indicates. Are we leaning toward a YES answer or a NO answer.
- gowers Says:
  May 22, 2017 at 10:16 pm
  Well, I may be wrong, but I personally am leaning towards the small conjecture being correct and the big conjecture being false, where these are the following.
  
  Small conjecture: if $A$ beats $B$ and $B$ beats $C$ then the probability that $A$ beats $C$ is approximately 1/2.
  
  Big conjecture: the tournament obtained by the relation “beats” is quasirandom.
  
  We now know that the small conjecture is equivalent to the statement that almost every die beats approximately half the other dice, and the big conjecture is equivalent to the statement that for almost every pair of dice, the remaining dice are split into four approximately equal classes according to the four possibilities for which of the two they beat.
  
  I’ve only really been thinking about the sequences model, but I think the experimental evidence is suggesting that the same should be true for the multisets model as well.
  
  Also, I think we may already essentially have a proof of the small conjecture for the sequences model, but I’m waiting to see whether Bogdan Grechuk agrees with me about this.
- Gil Kalai Says:
  May 23, 2017 at 7:36 pm
  Ironically, the one thing I was rather certain about (based on the voting rules analogy) is that the big conjecture would follow from the small conjecture. Oh, well 🙂
gowers Says:
May 23, 2017 at 11:22 am | Reply
I’ve spent a little time thinking about how the argument, if it turns out to be OK, could be modified to deal with the multiset model as well. The bijection between multisets with $n$ values from $[n]$ and subsets of $[2n-1]$ of size $n$ ought to be helpful here.

One thing it gives us quite easily is a way of proving statements about the function $g_A$ . For convenience I’ll take a set $A\subset[2n-1]$ of size $n$ as my basic object. Then if we list the elements of $A$ as $a_1<a_2<\dots,<a_n$ , the elements of the multiset are $a_1, a_2-1, a_3-2,\dots, a_n-n+1$ (with multiplicity equal to the number of times they appear in this list). I’ll call this multiset $M_A$ and I’ll write $g_A(j)$ for the number of elements of $M_A$ , with multiplicity, that are at most $j$ .

It doesn’t seem to be wholly pleasant to define $g_A(j)$ directly in terms of the set $A$ : it is the maximal $i$ such that $a_i-i\leq j$ . However, we do at least have that the “inverse” is a nice function. That is, we know that the $i$ th largest face of the die $M_A$ is $a_i-i$ . And that is quite helpful. For instance, if instead of choosing a random $M_A$ that adds up to 1 we instead choose a random subset of $[2n-1]$ where each element is chosen independently with probability $n/(2n-1)\approx 1/2$ , then we would normally expect $a_i$ to be around $2i$ , and not to differ from $2i$ by much more than $\sqrt i$ . So we get that for this completely random model that the $i$ th largest face of the die typically has value close to $i$ , and is well concentrated. If we now condition on the size of the set being $A$ and the sum of the faces of the die being $n(n+1)/2$ , this will not change too much. And now we can just invert and say the same about $g_A$ .

We also have that $A$ beats $B$ if the number of $(i,j)$ such that $a_i-i>b_j-j$ is greater than the number of $(i,j)$ such that $a_i-i<b_j-j$ . This may turn out to be reasonably easy to work with — I’m not sure.
Intransitive dice IV: first problem more or less solved? | Gowers's Weblog Says:
May 27, 2017 at 4:53 pm | Reply
[…] to a statement that doesn’t mention transitivity. The very quick proof I’ll give here was supplied by Luke Pebody. Suppose we have a tournament (that is, a complete graph with each edge directed in one of the two […]
Jean-Camille Says:
October 27, 2019 at 10:25 pm | Reply
I’m sorry I did a mistake in my code, on the transitivity of $>>$ , I hope it wont appear, and that it did’nt yet made you lose some time…
- Jean-Camille Says:
  October 27, 2019 at 11:15 pm
  Fortunatelly the post I’m talking about was not posted (I had to register in between)
  
  After identification of $A=(a_1,…,a_n)$ and $A^*=(n+1-a_n,…,n+1-a_1) $I was trying to build $>>$, an ” almost transitive ” relation between these classes (that are pair or singleton (if $A$ ties with $A^*$)) in such a hope that if $A$ beats $B$ and $B$ beats $C$, then the fact that $A$ beats $C$ only depends on $>>$
  
  I made a mistake thinking I had this relation but the two very basic facts that I like are
  
  1)$A$ beats $B$ iff $B^*$ beats $A^*$
  2) If $A$ does not tie with $B$ then, either $A$ beats/defeats BOTH $B$ and $B^*$ either $B$ beats/defeats BOTH $A$ and $A^*$ so that we have a new relation on the quotient of all dies after identification of a die and its dual.
  
  Then we have something like this : $X$ beats $Y$ that beats $Y^*$ that beats X^*$ (where $X,Y\in \left\{A,A^*,B,B^*\right\}$, and $X$ does not tie with $Y$)
  Now if $X$ beats $X^*$ we have a transitive chain involving the four dies.., I kind of feel that an investigation on this might be intersting…