Intransitive dice VII — aiming for further results

While Polymath13 has (barring a mistake that we have not noticed) led to an interesting and clearly publishable result, there are some obvious follow-up questions that we would be wrong not to try to answer before finishing the project, especially as some of them seem to be either essentially solved or promisingly close to a solution. The ones I myself have focused on are the following.

  1. Is it true that if two random elements A and B of [n]^n are chosen, then A beats B with very high probability if it has a sum that is significantly larger? (Here “significantly larger” should mean larger by f(n) for some function f(n)=o(n^{3/2}) — note that the standard deviation of the sum has order n^{3/2}, so the idea is that this condition should be satisfied one way or the other with probability 1-o(1)).
  2. Is it true that the stronger conjecture, which is equivalent (given what we now know) to the statement that for almost all pairs (A,B) of random dice, the event that A beats a random die C has almost no correlation with the event that B beats C, is false?
  3. Can the proof of the result obtained so far be modified to show a similar result for the multisets model?

The status of these three questions, as I see it, is that the first is basically solved — I shall try to justify this claim later in the post, for the second there is a promising approach that will I think lead to a solution — again I shall try to back up this assertion, and while the third feels as though it shouldn’t be impossibly difficult, we have so far made very little progress on it, apart from experimental evidence that suggests that all the results should be similar to those for the balanced sequences model. [Added after finishing the post: I may possibly have made significant progress on the third question as a result of writing this post, but I haven’t checked carefully.]

The strength of a die depends strongly on the sum of its faces.

Let A=(a_1,\dots,a_n) and B=(b_1,\dots,b_n) be elements of [n]^n chosen uniformly and independently at random. I shall now show that the average of

|\{(i,j):a_i>b_j\}|-|\{(i,j):a_i<b_j\}|-\sum_ia_i+\sum_jb_j

is zero, and that the probability that this quantity differs from its average by substantially more than n\log n is very small. Since typically the modulus of \sum_ia_i-\sum_jb_j has order n^{3/2}, it follows that whether or not A beats B is almost always determined by which has the bigger sum.

As in the proof of the main theorem, it is convenient to define the functions

f_A(j)=|\{i:a_i<j\}|+\frac 12|\{i:a_i=j\}|

and

g_A(j)=f_A(j)-j+\frac 12.

Then

\sum_jf_A(b_j)=\sum_{i,j}\mathbf 1_{a_i<b_j}+\frac 12\sum_{i,j}\mathbf 1_{a_i=b_j},

from which it follows that B beats A if and only if \sum_jf_A(b_j)>n^2/2. Note also that

\sum_jg_A(b_j)=\sum_jf_A(b_j)-\sum_jb_j+\frac n2.

If we choose A purely at random from [n]^n, then the expectation of f_A(j) is j-1/2, and Chernoff’s bounds imply that the probability that there exists j with |g_A(j)|=|f_A(j)-j+1/2|\geq C\sqrt{n\log n} is, for suitable C at most n^{-10}. Let us now fix some A for which there is no such j, but keep B as a purely random element of [n]^n.

Then \sum_jg_A(b_j) is a sum of n independent random variables, each with maximum at most C\sqrt{n\log n}. The expectation of this sum is \sum_jg_A(j)=\sum_jf_A(j)-n^2/2.

But

\sum_jf_A(j)=\sum_{i,j}\mathbf 1_{a_i<j}+\frac 12\sum_{i,j}\mathbf 1_{a_i=j}

=\sum_i(n-a_i)+\frac n2=n^2+\frac n2-\sum_ia_i,

so the expectation of \sum_jg_A(b_j) is n(n+1)/2-\sum_ia_i.

By standard probabilistic estimates for sums of independent random variables, with probability at least 1-n^{-10} the difference between \sum_jg_A(b_j) and its expectation \sum_jf_A(j)-n^2/2 is at most Cn\log n. Writing this out, we have

|\sum_jf_A(b_j)-\sum_jb_j+\frac n2-n(n+1)/2+\sum_ia_i|\leq Cn\log n,

which works out as

|\sum_jf_A(b_j)-\frac {n^2}2-\sum_jb_j+\sum_ia_i|\leq Cn\log n.

Therefore, if \sum_ia_i>\sum_jb_j+Cn\log n, it follows that with high probability \sum_jf_A(b_j)<n^2/2, which implies that A beats B, and if \sum_jb_j>\sum_ia_i+Cn\log n, then with high probability B beats A. But one or other of these two cases almost always happens, since the standard deviations of \sum_ia_i and \sum_jb_j are of order n^{3/2}. So almost always the die that wins is the one with the bigger sum, as claimed. And since “has a bigger sum than” is a transitive relation, we get transitivity almost all the time.

Why the strong conjecture looks false

As I mentioned, the experimental evidence seems to suggest that the strong conjecture is false. But there is also the outline of an argument that points in the same direction. I’m going to be very sketchy about it, and I don’t expect all the details to be straightforward. (In particular, it looks to me as though the argument will be harder than the argument in the previous section.)

The basic idea comes from a comment of Thomas Budzinski. It is to base a proof on the following structure.

  1. With probability bounded away from zero, two random dice A and B are “close”.
  2. If A and B are two fixed dice that are close to each other and C is random, then the events “A beats C” and “B beats C” are positively correlated.

Here is how I would imagine going about defining “close”. First of all, note that the function g_A is somewhat like a random walk that is constrained to start and end at zero. There are results that show that random walks have a positive probability of never deviating very far from the origin — at most half a standard deviation, say — so something like the following idea for proving the first step (remaining agnostic for the time being about the precise definition of “close”). We choose some fixed positive integer k and let x_1<\dots<x_k be integers evenly spread through the interval \{1,2,\dots,n\}. Then we argue — and this should be very straightforward — that with probability bounded away from zero, the values of f_A(x_i) and f_B(x_i) are close to each other, where here I mean that the difference is at most some small (but fixed) fraction of a standard deviation.

If that holds, it should also be the case, since the intervals between x_{i-1} and x_i are short, that f_A and f_B are uniformly close with positive probability.

I’m not quite sure whether proving the second part would require the local central limit theorem in the paper or whether it would be an easier argument that could just use the fact that since f_A and f_B are close, the sums \sum_jf_A(c_j) and \sum_jf_B(c_j) are almost certainly close too. Thomas Budzinski sketches an argument of the first kind, and my guess is that that is indeed needed. But either way, I think it ought to be possible to prove something like this.

What about the multisets model?

We haven’t thought about this too hard, but there is a very general approach that looks to me promising. However, it depends on something happening that should be either quite easy to establish or not true, and at the moment I haven’t worked out which, and as far as I know neither has anyone else.

The difficulty is that while we still know in the multisets model that A beats B if and only if \sum_jf_A(b_j)<n^2/2 (since this depends just on the dice and not on the model that is used to generate them randomly), it is less easy to get traction on the sum because it isn’t obvious how to express it as a sum of independent random variables.

Of course, we had that difficulty with the balanced-sequences model too, but there we got round the problem by considering purely random sequences B and conditioning on their sum, having established that certain events held with sufficiently high probability for the conditioning not to stop them holding with high probability.

But with the multisets model, there isn’t an obvious way to obtain the distribution over random dice B by choosing b_1,\dots,b_n independently (according to some distribution) and conditioning on some suitable event. (A quick thought here is that it would be enough if we could approximate the distribution of B in such a way, provided the approximation was good enough. The obvious distribution to take on each b_i is the marginal distribution of that b_i in the multisets model, and the obvious conditioning would then be on the sum, but it is far from clear to me whether that works.)

A somewhat different approach that I have not got far with myself is to use the standard one-to-one correspondence between increasing sequences of length n taken from [n] and subsets of [2n-1] of size n. (Given such a sequence (a_1,\dots,a_n) one takes the subset \{a_1,a_2+1,\dots,a_n+n-1\}, and given a subset S=\{s_1,\dots,s_n\}\subset[2n-1], where the s_i are written in increasing order, one takes the multiset of all values s_i-i+1, with multiplicity.) Somehow a subset of [2n-1] of size n feels closer to a bunch of independent random variables. For example, we could model it by choosing each element with probability n/(2n-1) and conditioning on the number of elements being exactly n, which will happen with non-tiny probability.

Actually, now that I’m writing this, I’m coming to think that I may have accidentally got closer to a solution. The reason is that earlier I was using a holes-and-pegs approach to defining the bijection between multisets and subsets, whereas with this approach, which I had wrongly assumed was essentially the same, there is a nice correspondence between the elements of the multiset and the elements of the set. So I suddenly feel more optimistic that the approach for balanced sequences can be adapted to the multisets model.

I’ll end this post on that optimistic note: no doubt it won’t be long before I run up against some harsh reality.

Advertisements

24 Responses to “Intransitive dice VII — aiming for further results”

  1. Bruce Smith Says:

    Related to question 3: is it obvious whether or not there exists *any* predicate of dice which is negligible (i.e. o(n)) in the subsets model, but not negligible in the multiset model?

    I haven’t been following this project closely, but my impression is that your existing results can be characterized as “all dice behave ‘reasonably’ except for a negligible fraction, and among the ‘reasonable’ ones, our theorems hold, and from this it follows they hold in general”.

    So if we take as that predicate that a die is ‘unreasonable’, then if switching to the multiset model (and thus changing the distribution over dice) makes any of the analogous theorem statements false (and if my general understanding is correct), that predicate has to be one which is negligible in the subsets model but not in the multisets model. (Let’s call that a “contrasting predicate”.)

    (I’m not conjecturing these “contrasting predicates” don’t exist — in fact, I’m guessing that someone here might be immediately able to give an example of one — maybe it’s enough for the predicate to require that the distribution of element frequencies in the multiset has a certain property. But I’m wondering if thinking about the requirements on such a predicate might be illuminating.)

    • Bruce Smith Says:

      (In that comment, I should have defined “negligible” as o(N) rather than o(n), if there are N dice of size n in the model the predicate is about.)

    • Bruce Smith Says:

      (A second correction: when I said “subsets model” I should have said “balanced sequences model”.)

    • gowers Says:

      This does seem like a potentially good thing to think about. As you suggest, it probably isn’t hard to come up with distinguishing properties, but it may well be that in some precise sense they are all “irrelevant” to anything one might be interested in when discussing matters such as the probability that one die beats another. (I don’t know how to formulate such a conjecture, but it feels as though something like that might exist.)

      If one wants to come up with at least some distinguishing property, it seems good to focus on things like the number of repeated elements, or more generally how the numbers of the different elements are distributed. If we define a map from sequences of length n to multisets by writing the sequences in increasing order, then the number of preimages of a multiset depends very strongly on how many repeated elements it has, with extremes ranging from 1 (for the multiset (1,1,\dots,1)) to n! (for the multiset (1,2,\dots,n)). Since multisets with many repeats give rise to far fewer sequences, one would expect that repeats are favoured in the multisets model compared with the sequences model. I would guess that from this it is possible to come up with some statistic to do with the number of repeats that holds with probability almost 1 in the multisets model and almost zero in the sequences model.

  2. P. Peng Says:

    Another possible route to the multi-set result.

    Because the random distribution weights between sequence and multiset change so drastically (as you mention it can be as extreme as n! : 1), it feels like either something very special is being exploited for the conjectures to still hold in both models, or this should just happen fairly often with a change of weights. But we’ve already seen that the intransitivity is fairly fragile when changing the dice model.

    I think this “something special” is that with the sequences model, not only is the score distribution for a random die very similar to a gaussian, but I conjecture this is true with high probability even when looking at the score distribution for the subset of dice constrained to have some particular multiplicity of values (ie. 12 numbers are unique, 3 are repeated twice, 5 are repeated three times, etc.).

    Given the already completed sequence proof, the stricter conjecture is equivalent to saying the U variable is not correlated with the multiplicity of values. Looking at how U is defined, that sounds plausible to me, and may be provable.

    If this stricter conjecture is true, then any change of weights for the random distribution will be fine if each “multiplicity class” are changed by the same factor. And this is the case for the shift from sequences -> multiset.

    • gowers Says:

      That’s a very interesting idea. It seems plausible that as long as a sequence takes enough different values, then conditioning on the distribution of the numbers of times the values are taken shouldn’t affect things too much. It’s not quite so obvious how to prove anything: I don’t see a simple way of using independent random variables and conditioning on an event of not too low probability. But it isn’t obviously impossible, and it would be an interesting generalization if we could get it.

  3. Bruce Smith Says:

    > I don’t see a simple way of using independent random variables and conditioning on an event of not too low probability.

    (a) I guess the following won’t work, but I’d like to confirm that understanding (and that my reasoning makes sense about the other parts):

    If we fix a “multiplicity class”, then a balanced sequence is just a sequence that (1) obeys certain equalities between elements (to make certain subsets of them equal), (2) obeys inequalities between the elements that are supposed to be distinct, (3) has the right sum (so it’s balanced). If the value of each subset of sequence elements which are required equal by (1) is given by an independent random variable, then is the probability of ((2) and (3)) too low? (I guess (2) and (3) are nearly independent.) For (3) I’d guess the probability is similar to the balanced sequence model (the condition still says that some linear sum of the variables has its expected value, I think); for (2) we’re saying that k choices of random elements of [n] fail to have overlaps, where k depends on the multiplicity class but could be nearly as large as n. I guess the probability of (2) is then roughly exponentially low in n, which is why this doesn’t work. Is that right?

    (b) thinking out loud:

    But what if we just omit condition (2)? Then we have some kind of generalization of a “multiplicity class” (except we want to think of it as a random distribution over dice, not just as a class of dice). It’s no longer true that all the dice in this distribution have the same preimage-size in the map from the balanced sequence model to the multiset model… but (in a typical die chosen from this distribution) most of the k random variables have no overlaps with other ones, so only a few of the k subsets of forced-equal sequence elements merge together to increase that preimage size. Can we conclude anything useful from this?

    (We would want to first choose and fix one of these distributions, then show that using it to choose dice preserves the desired theorems, then show that choosing the original distribution properly (i.e. according to the right probability for each one) ends up approximating choosing a die using our desired distribution. In other words, we’d want some sum of these sort-of-like-multiplicity-class distributions to approximate our desired overall distribution.)

    • gowers Says:

      Yes, the problem you identify is indeed the problem: I think that typically the multiplicities for a random multiset will have something close to a Poisson distribution with mean 1 (they are given by the lengths of runs of consecutive elements of a random subset of \{1,2,\dots,2n-1\} of length n). So they will almost all be of constant size, and therefore the number of distinct values taken will be proportional to n, which implies that the probability that they are distinct is exponentially small.

      The difficulty with just not worrying about such coincidences as occur is that the weights are very sensitive to the numbers of coincidences. For example, if two values of multiplicity 3 are allowed to merge into one value of multiplicity 6, then the weight gets divided by \binom 63=15. And it seems that to take account of this brings us back to the problem we started with (since if we knew how to deal with these mergers then we could simply take the multiplicity class to be all singletons and deal directly with the multiset model).

      That’s just how it seems to me, but as with my previous remarks, anything that sounds pessimistic can potentially be knocked down by some observation that I have not made, or some additional factor that I have not taken into account, and I don’t rule that out here.

    • Bruce Smith Says:

      > And it seems that to take account of this brings us back to the problem we started with …

      That’s a good point, and I don’t see a way around it either.

      But now I am thinking that “being excluded from the analysis in your main theorem” is *not* uncorrelated with “having lots of repeated faces” (and thus being relatively overrepresented in the multiset model), but is *negatively* correlated with it. If that’s true, then at least in some sense the main theorem should be easier in the multiset model than in the balanced sequence model (since the excluded cases are less common in its distribution).

      It’s taking me awhile to write up my reasons for that thought (and even once written they will be vague), so I thought I’d mention that general idea first.

    • gowers Says:

      One reason that lots of repeated faces make things a bit harder is that it is slightly more complicated to say that the sum has a good chance of equalling a specified value, and probably some of the deviation estimates become worse. But I don’t think those effects would kick in in a serious way until the number of repeats is very large.

  4. Thomas Budzinski Says:

    Here are a few more remarks about the sketch of proof for the (false) strong conjecture. For an unconditioned dice, f_A has the same distribution as a random walk with Poisson(1) steps, conditioned on f_A(n)=n. Hence, g_A is a random walk with Poisson(1)-1 steps, conditioned on g_A(n)=0 and \sum_j g_A(j)=0 (I don’t know if it has already been noticed).

    Hence, showing that with macroscopic probability, g_A(x_i) and g_B(x_i) are close for every i should not be very hard. For a nonconditioned dice, the variables g_A(x_i) and \sum_j g_A(j) should be approximately Gaussian with explicit covariances, so for a conditioned dice the f_A(x_i) are still jointly Gaussian. Proving it properly would require a local limit theorem to handle the double conditioning. But this time, the step distribution is known and very simple, so the proof should be easier than the previous one (or, even better, maybe a general result could be applied).

    On the other hand, deducing from here that f_A and f_B are uniformly close does not seem obvious. We would need to show that f_A cannot vary too much in a short interval. A possible way would be to show that \left( f_A(j) \right)_{0 \leq j \leq n/2} is in some sense absolutely continuous with respect to a nonconditioned random walk, and then to use known results about the max of a random walk on a short interval. The absolute continuity also requires a local limit theorem, but this should not be too hard for the same reasons as above.

  5. Gil Kalai Says:

    For the sequence model you can base the criteria if dice A beats dice B on other rules (which can be described e.g. in terms of a function from (-1,1,0)^n to {-1,1,0}). For example dice A beats dice B if the largest “run” of wins of A vs B is larger than the largest runs of wins of B vs A. In analogy with voting rules I would expect that many rules will lead to asymptotic randomness (or pseudoandomness) even when you drop the condition on the sum of entries. (All my previous guesses based on this analogy failed but at least I figured for myself what was the difference.)

  6. P. Peng Says:

    Can you delete the previous post? It got mangled because I used less than and greater than characters.

    A partial sketch of a different approach.

    Consider the starting point: instead of dice represented as a vector of values, represent it as a multiplicity vector m_i = number of faces with value i. A scoring function f(A,B), which gives the number of rolls where A beats B – the number of rolls where B beats A, can be represented as a matrix equation A.F.B where F is an antisymmetric matrix. The constraints are now that m_i ≥ 0, the sum m_i = n, and sum i m_i = n(n+1)/2. The standard die in this representation (1,1,…,1) ties all dice due to theses constraints. Now in the realm of linear algebra, we can choose a set of changes that when starting from a valid die preserves the sum constraints, and that this set of changes spans the valid dice space. I will call this set the choice of dice “steps”. With a given choice of steps, this also provides a distance measurement: the minimum number of steps from the standard die.

    With this setup, we can handle both the sequence and multiset dice model if we allow the notion of a “random die” to involve a possibly non-uniform weight on this representation.

    Obviously not all weights will lead to intransitive dice. I believe an appropriate restriction would be to constrain possible weights to one such that it is symmetric to permutation of the multiplicity vector in the following way:
    – any multiplicity vectors which are the same up to permutation, and meet the sum constraints, will have the same weight

    We can choose steps such that the following are true:
    – the set of dice one step away from the standard die have the properties:
    – 0 ≤ m_i ≤ 2
    – each die’s multiplicity vector is the same up to permutation
    – each die beats exactly as many dice as it loses to
    – all proper dice can be reached in O(n) steps

    With such a choice of dice steps, and with the above constraints on the weights under consideration, the set of dice a distance 1 away from the standard die have the property:
    – for any die, its probability of beating a random die is exactly equal to its probability of losing to a random die

    This is a good starting point, and we can build up to any die by adding these steps together. Furthermore the scoring is linear, so knowing how the steps score against each other is sufficient. With multiple steps, due to correlations it is possible for the set of dice to no longer have every die beat exactly half the others. Since the starting point exactly has this symmetry, if the correlation is small enough combined with the weight constraint restricting the amount of asymmetry that can build up, since all dice can be reached in O(n) steps, maybe the asymmetry can’t build up “fast enough” to ruin the property we want for intransitive dice:
    – If A is a random die then with probability 1 – o(1), the probability A beats a random die is equal to the probability it loses to a random die + o(1).

    Maybe with a stronger constraint on the weights, one could also show that the probability A beats a random die is 1/2 + o(1), so that it also provides that the ties go to zero. But with such wild swings in the weights between sequence and multiset dice, I’m not sure what would likely be the appropriate strengthening which would also give the ties conjecture.

    I believe something like this approach may have been discussed before, but part of the issue with this is the m_i ≥ 0 constraint. It makes it difficult as not all step combinations are valid. However, if the weight constraint above is sufficient, we can temporarily consider them all valid, and it just happens that for sequences and multiset models, the weight for these dice is 0. This approach therefore allows smoothing out the difficulties with considering all the different representations on the same footing, if indeed that weight constraint is sufficient.

    Before I think about this approach any further, is there some simple argument or counter-example which shows steps with these properties + weight constraint is not sufficient?

    Currently it appears to fit with our understanding:
    – sequence dice model is intransitive
    – multiset dice model is intransitive
    – the improper dice model is mostly transitive (even though it is “balanced” in that every die has the same expectation value, it doesn’t have the nice choice of steps)
    – the un-balanced sequence model (sequence model without the sum constraint) is mostly transitive (again, it doesn’t have the nice choice of steps)
    – removing the weights constraint we can just choose weights to force transitivity

    I’m hoping something along these lines would work as it unites a lot of the results.

  7. Thomas Lumley Says:

    Another question that’s relevant to statistical theory: it appears that the ordering given by P(X>Y)>1/2 agrees with the ordering based on the sum and hence the mean. Is that true for other reasonable summaries (eg the median, or more generally for some class of linear combinations of order statistics)?

    The reason this is interesting is that ordering of distributions based on P(X>Y)>1/2 for *samples* is the (widely-used) Mann-Whitney/Wilcoxon test. It’s already an advance to know this is typically transitive when the means are different, even under just one generating model for distributions. It would be even more helpful to know if this is just a fact about reasonable dice behaving reasonably or something special about the mean.

    • gowers Says:

      I think the mean may be fairly special here. The argument above shows that for two random dice A and B with means that differ by a typical amount, the means almost certainly determine which one wins. But then another order statistic will not determine which one wins unless it typically agrees with the mean. My guess (though I haven’t thought about it properly, so I may be wrong) is that there is a probability bounded away from zero that the order of the medians of two random sequences is different from the order of the means. If that guess is correct, then the medians will not predict which die wins.

    • P. Peng Says:

      For dice where the mean is constrained but we allow values greater than n, this also appears to become transitive. In this case the mean is not a predictor, so some other property might give a summary.
      Who knows, maybe the median becomes important. I’m not sure anyone has looked into that yet.

    • gowers Says:

      That’s a great question and something I’d love to see the answer to. Just to clarify the question, when you say “this also appears to become transitive” do you mean that it appears to become transitive with probability 1-o(1) or just that there is a positive correlation between the events “A beats B and B beats C” and “A beats C”? If it’s the first, it should be easier to prove, and something I’d either like to have a go at or leave as a tempting open problem in the paper. I’m not sure how to go about analysing the model itself, since it doesn’t seem to be obtainable by some simple conditioning on a sum of independent random variables. On the other hand, I’m pretty sure it (and even more so its continuous counterpart) has been studied a lot, so maybe with a look at the literature we would understand how to prove things about it. Maybe someone reading this can even suggest where to look. (Just to be clear, the distribution I’m interested in is the uniform distribution over the face of the n-dimensional simplex consisting of all vectors in \mathbb R^n with non-negative coordinates that sum to 1.)

      I very much doubt that the median plays an important role here, but if transitivity holds with probability 1-o(1) it would be a very nice challenge to try to find a simple statistic that predicts which die will win.

    • gowers Says:

      Ah, I’ve just looked at the original paper and it does look likely from the evidence presented there that the probability of intransitivity tends to zero for this model (though that was for multisets — it might be interesting to see if the same holds for sequences).

    • P. Peng Says:

      I looked into the improper dice (mean still constrained to (n+1)/2, but values are now only constrained to be more than 0).

      If you take any proper die and choose an entry greater than 1, and decrease it by one, then compared to the standard die (looking at the possible roll combinations) it will win one less and lose one more. And for any die, if you choose an entry = n, and increase it by one, it will win only one more compared to the standard die. Increasing any value beyond n+1 does not give any added benefit when comparing to the standard die. Therefore, the standard die will beat any improper die with a value greater than n. It beats any “truly” improper die if you will.

      From some numerical tests on small sided dice, if we rank all the dice according to how many dice it beats – how many it loses to, the standard die is at the top or at least in the top few dice. At the bottom is the die that is all ones except a single face, as that loses to everything.

      Basically, since the “beats” relationship doesn’t care how much a die wins by in a particular roll, deviating from proper dice is only “wasting” pips on a lopsided distribution of value. For this reason, the median does roughly correlate with the order of the improper dice. As does the standard deviation. I would need to look at larger dice to understand the trend better, but I’d guess currently that these will only end up being weak correlations. There is likely a better predictor.

      In the proper die scenario the standard die tied everything and was in some sense at the ‘center’ of the dice. In the improper dice, the standard die is now near the top of the ranking, and the die that loses to all other dice is in a reasonable sense the ‘furthest’ from the standard die. So likely there is a measure of ‘distance’ from the standard die that strongly correlates with the ranking (and so strongly predicts if two die beat each other). I think the median would only weakly capture this at best as n gets larger.

      If we look at the sequence model of improper dice, what is the probability that at least one value is greater than n? Is it possible that the standard die beats 1 – o(n) fraction of the improper dice?

    • gowers Says:

      I’ve just realized that the sequence model of improper dice isn’t as hard as I thought. If we line up N=n(n+1)/2 points with N-1 gaps between them, then there’s a one-to-one correspondence between sequences of n numbers that add up to N, and ways of choosing n-1 gaps (that mark the point where one number finishes and the next number starts). So the total number of improper dice is \binom {N-1}{n-1}, which is reasonably close to N^n/n!. When the numbers are constrained to be at most n, then the number of sequences is at most n^n, but because the sum of a random sequence has standard deviation about n^{3/2}, it’s in fact more like n^{n-3/2}. A crude estimate of \binom {N-1}{n-1} is n^{2n}/2^n n!\approx n^n(e/2)^n. Since e>2, this is exponentially bigger than the number of sequences in the more constrained model. So I think the answer to your last question is yes.

      I think we can also say something about the typical shape of an improper die. Suppose that instead of selecting exactly n-1 gaps, we select each gap independently with probability (n-1)/(N-1). The distribution should be similar. But with this model, the expected gap length has a geometrical distribution with mean approximately n/2 (because N/n is about n/2). So it looks to me as though at least crudely speaking an improper die is what you get when you replace the uniform distribution on [n] by a geometric distribution with the same mean.

    • gowers Says:

      I have a guess about a statistic that I think will predict the winner in the sequence model for nonstandard dice. (That is, a random die is a random sequence of positive integers that add up to n(n+1)/2.)

      Let m=(n+1)/2, let p=m^{-1}, and for each positive integer r, let p(r) be the probability of choosing r with the geometric distribution with parameter p: that is, p(r)=p(1-p)^{r-1}. (This is sometimes called the shifted geometric distribution.)

      In the usual sequence model, the sum of a sequence (a_1,\dots,a_n) can be equivalently defined as the number of pairs (i,j) such that i\leq a_j, which is closely related to how well the die does when it is up against the standard die. And this sum is the right statistic to choose. Note that a random face of the standard die is uniformly distributed in [n].

      After the heuristic idea in my previous comment, it seems a rather plausible guess that the right statistic to choose for nonstandard dice is how well a die does against not the uniform distribution but the geometric distribution. So that statistic I propose is

      \sum_{i,j}p(i)\mathbf 1_{[i\leq a_j]}.

      Another way of thinking of this is that the sum of a sequence a_j is (up to a factor n) the sum of the values of the cumulative distribution function at the numbers a_j, where the distribution is uniform on [n]. Now I want to take the sum of the values of the cumulative distribution function of the geometric distribution.

      Since generating a random improper die is easy, it should be easy to test this hypothesis experimentally. If it checks out, then I’ll sit down and try to prove it.

    • gowers Says:

      Of course, the natural follow-up question if the conjecture in my previous comment is correct is whether if we condition on the improper dice taking the expected value for that statistic we get intransitivity with probability 1/4+o(1) again. If that turned out to be the case, then it would probably be a special case of a much more general principle, which would say that the following picture holds for a wide class of models.

      1. For each positive integer r, let p(r) be the probability that a given face of a random die from the model takes the value r. Let P be the cumulative distribution: that is, P(m)=\sum_{r\leq m}p(r), which is the probability that the face takes value at most m.

      2. Given a die A=(a_1,\dots,a_n), define P(A) to be \sum_iP(a_i). Then, with probability 1-o(1), A beats B if and only if P(A)>P(B).

      3. If we fix a value t and restrict attention to dice A for which P(A)=t (subject to some condition that ensures that the proportion of dice that satisfy this condition is not too small — for some models we might have to replace the condition by P(A)\approx t), then the probability that a random triple of dice is transitive is 1/4+o(1).

      If we could prove something like this, it would be a significant step forward in our understanding of the intransitivity phenomenon.

      Having said that, there is also a suggestion in the paper that for at least one model we get intermediate behaviour, where knowing that A beats B and B beats C makes it more likely that A beats C, but with conditional probability bounded away from 1. The model in question is where you choose n values independently and uniformly from [0,1] and then rescale so that the average becomes 1/2. For a full understanding, it would be good to understand this too.

    • gowers Says:

      Following on from the last paragraph of the previous comment, I now think it would be very nice to get a heuristic understanding (at least) of that rescaled-uniform model. A natural question is the following. Let A be a random die chosen according to that model. Let \mu be the average value of a_i before the rescaling. Does \mu correlate with the proportion of dice beaten by A?

      We might expect the answer to be yes for the following reason: if \mu is less than 1/2, then after rescaling, the values below 1/2 are slightly “squashed”, whereas the values above 1/2 are slightly “stretched”. But as P. Peng suggests above, a face does not get extra credit for beating another face by a large margin, so in some sense large values are a “waste of resources”. So one might expect (but this argument is so vague as not to be very reliable) at least a weak positive correlation between \mu and the strength of the die. This is something else that would be interesting to test experimentally.

      The dream for this model would still be to find a simple statistic that predicts which die wins. Such a statistic couldn’t take values in a totally ordered set (so some simple one-dimensional parameter wouldn’t do, for example), because that would imply transitivity with probability 1-o(1), which seems not to apply. But one could still hope for a map \phi that takes each die to a point in some tournament with a very simple structure, in such a way that the direction of the edge between \phi(A) and \phi(B) predicts which of A and B wins. And then the problem would be reduced to understanding the tournament.

      Come to think of it, that dream is one that could also be entertained for the balanced sequence model. We know that transitivity occurs with the frequency one gets in a random tournament, but we suspect that the tournament is not quasirandom. These two statements are consistent, because all we need for the transitivity statement is that almost all vertices have roughly the same out-degree as in-degree. So now we can ask what the structure of the tournament is like? Perhaps once you condition on the sum of the faces, there is some other statistic — again, I would hope for a tournament with a nice simple structure — that predicts with high accuracy which of two dice will win.

      I don’t yet have a good definition of “nice simple structure”, but an example of the kind of thing I mean is a circle where there is an arrow from x to y if y is less than half way round in a clockwise direction from x and from y to x if y is more than half way round. (If y is exactly half way round, then the direction of the arrow is chosen arbitrarily.) It is unlikely that we can associate with each die in the balanced-sequence model a point in the circle in such a way that this particular tournament predicts which die wins, but perhaps some higher-dimensional (but still low-dimensional) variant works. If we could do something like this, then we would have a wonderfully precise understanding of the Mann-Whitney/Wilcoxon test for this model.

  8. Timothy Gowers Says:

    I’ve thought a bit about the “dream” in this comment from the previous thread, and while I now feel somewhat less optimistic about it, I now have a new (to me anyway) way of thinking about the “beats” relation that I think has the potential to be helpful, and to capture the idea of “wasting resources” by winning too well for no extra reward.

    An initial thought is that if A and B are typical dice, then a_i and b_i grow approximately like i, so unless i and j are close (which usually means as a fraction of \sqrt n), if i<j, then a_i and b_i are both less than a_j and b_j. This means that usually if a_i<b_j then b_i<a_j, in which case the pairs (i,j) and (j,i) cancel out. So it makes some sense to focus on the “exceptional” pairs for which this cancellation does not happen.

    Suppose, then, that i<j and let us think about what needs to happen if we are to obtain both the inequalities a_i<b_j and a_j<b_i. A simple remark is that a_i<b_i, since we know that a_i\leq a_j (as I am writing the sequence elements in non-decreasing order). It therefore suffices to have the inequality a_j<b_i. If we now fix i, we see that the number of j that satisfy this condition is “the time it takes a_j to catch up with b_i.

    We can model the growth of a_i and b_j continuously in our minds, and we now see that if the gradient of A is small after i, then we get a big contribution to the number of pairs (i,j) with a_i<b_j, which is very helpful to B. Conversely, if the gradient is large, we get only a small contribution.

    This thought can be used to construct pairs of dice with the same sum where one beats the other quite easily. Take for example the dice

    A=(1,5,5,5,5,9,9,9,9,13,13,13)

    and

    B=(4,4,4,4,8,8,8,8,12,12,12,12).

    Most of the time, the graph of A sits just above the graph of B, but just occasionally B makes a big jump just before A does. This means that the graph of B has a tendency to be flat just after a point where A is bigger, whereas the graph of A has a tendency to be steep just after a (rare) point where B is bigger. So we expect A to win easily, and indeed it does: there are 144 pairs, and the numbers of b_j beaten by the a_i go

    (0,4,4,4,4,8,8,8,8,12,12,12),

    which adds up to 84, which is significantly bigger than 72.

    These considerations suggest that there could be a significant correlation between which die wins and the number of i such that a_i>b_i, though I would be surprised if there was agreement with probability 1-o(1).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: