An initial thought is that if A and B are typical dice, then and grow approximately like , so unless and are close (which usually means as a fraction of ), if , then and are both less than and . This means that usually if then , in which case the pairs and cancel out. So it makes some sense to focus on the “exceptional” pairs for which this cancellation does not happen.

Suppose, then, that and let us think about what needs to happen if we are to obtain both the inequalities and . A simple remark is that , since we know that (as I am writing the sequence elements in non-decreasing order). It therefore suffices to have the inequality . If we now fix , we see that the number of that satisfy this condition is “the time it takes to catch up with .

We can model the growth of and continuously in our minds, and we now see that if the gradient of is small after , then we get a big contribution to the number of pairs with , which is very helpful to . Conversely, if the gradient is large, we get only a small contribution.

This thought can be used to construct pairs of dice with the same sum where one beats the other quite easily. Take for example the dice

and

.

Most of the time, the graph of A sits just above the graph of B, but just occasionally B makes a big jump just before A does. This means that the graph of B has a tendency to be flat just after a point where A is bigger, whereas the graph of A has a tendency to be steep just after a (rare) point where B is bigger. So we expect A to win easily, and indeed it does: there are 144 pairs, and the numbers of beaten by the go

,

which adds up to 84, which is significantly bigger than 72.

These considerations suggest that there could be a significant correlation between which die wins and the number of such that , though I would be surprised if there was agreement with probability .

]]>We might expect the answer to be yes for the following reason: if is less than 1/2, then after rescaling, the values below 1/2 are slightly “squashed”, whereas the values above 1/2 are slightly “stretched”. But as P. Peng suggests above, a face does not get extra credit for beating another face by a large margin, so in some sense large values are a “waste of resources”. So one might expect (but this argument is so vague as not to be very reliable) at least a weak positive correlation between and the strength of the die. This is something else that would be interesting to test experimentally.

The dream for this model would still be to find a simple statistic that predicts which die wins. Such a statistic couldn’t take values in a totally ordered set (so some simple one-dimensional parameter wouldn’t do, for example), because that would imply transitivity with probability 1-o(1), which seems not to apply. But one could still hope for a map that takes each die to a point in some tournament with a very simple structure, in such a way that the direction of the edge between and predicts which of and wins. And then the problem would be reduced to understanding the tournament.

Come to think of it, that dream is one that could also be entertained for the balanced sequence model. We know that transitivity occurs with the frequency one gets in a random tournament, but we suspect that the tournament is not quasirandom. These two statements are consistent, because all we need for the transitivity statement is that almost all vertices have roughly the same out-degree as in-degree. So now we can ask what the structure of the tournament is like? Perhaps once you condition on the sum of the faces, there is some other statistic — again, I would hope for a tournament with a nice simple structure — that predicts with high accuracy which of two dice will win.

I don’t yet have a good definition of “nice simple structure”, but an example of the kind of thing I mean is a circle where there is an arrow from to if is less than half way round in a clockwise direction from and from to if is more than half way round. (If is exactly half way round, then the direction of the arrow is chosen arbitrarily.) It is unlikely that we can associate with each die in the balanced-sequence model a point in the circle in such a way that this particular tournament predicts which die wins, but perhaps some higher-dimensional (but still low-dimensional) variant works. If we could do something like this, then we would have a wonderfully precise understanding of the Mann-Whitney/Wilcoxon test for this model.

]]>1. For each positive integer , let be the probability that a given face of a random die from the model takes the value . Let be the cumulative distribution: that is, , which is the probability that the face takes value at most .

2. Given a die , define to be . Then, with probability , beats if and only if .

3. If we fix a value and restrict attention to dice for which (subject to some condition that ensures that the proportion of dice that satisfy this condition is not too small — for some models we might have to replace the condition by ), then the probability that a random triple of dice is transitive is 1/4+o(1).

If we could prove something like this, it would be a significant step forward in our understanding of the intransitivity phenomenon.

Having said that, there is also a suggestion in the paper that for at least one model we get intermediate behaviour, where knowing that beats and beats makes it more likely that beats , but with conditional probability bounded away from 1. The model in question is where you choose values independently and uniformly from and then rescale so that the average becomes . For a full understanding, it would be good to understand this too.

]]>Let , let , and for each positive integer , let be the probability of choosing with the geometric distribution with parameter : that is, . (This is sometimes called the *shifted* geometric distribution.)

In the usual sequence model, the sum of a sequence can be equivalently defined as the number of pairs such that , which is closely related to how well the die does when it is up against the standard die. And this sum is the right statistic to choose. Note that a random face of the standard die is uniformly distributed in .

After the heuristic idea in my previous comment, it seems a rather plausible guess that the right statistic to choose for nonstandard dice is how well a die does against not the uniform distribution but the geometric distribution. So that statistic I propose is

.

Another way of thinking of this is that the sum of a sequence is (up to a factor ) the sum of the values of the cumulative distribution function at the numbers , where the distribution is uniform on . Now I want to take the sum of the values of the cumulative distribution function of the geometric distribution.

Since generating a random improper die is easy, it should be easy to test this hypothesis experimentally. If it checks out, then I’ll sit down and try to prove it.

]]>I think we can also say something about the typical shape of an improper die. Suppose that instead of selecting exactly gaps, we select each gap independently with probability . The distribution should be similar. But with this model, the expected gap length has a geometrical distribution with mean approximately (because is about ). So it looks to me as though at least crudely speaking an improper die is what you get when you replace the uniform distribution on by a geometric distribution with the same mean.

]]>If you take any proper die and choose an entry greater than 1, and decrease it by one, then compared to the standard die (looking at the possible roll combinations) it will win one less and lose one more. And for any die, if you choose an entry = n, and increase it by one, it will win only one more compared to the standard die. Increasing any value beyond n+1 does not give any added benefit when comparing to the standard die. Therefore, the standard die will beat any improper die with a value greater than n. It beats any “truly” improper die if you will.

From some numerical tests on small sided dice, if we rank all the dice according to how many dice it beats – how many it loses to, the standard die is at the top or at least in the top few dice. At the bottom is the die that is all ones except a single face, as that loses to everything.

Basically, since the “beats” relationship doesn’t care how much a die wins by in a particular roll, deviating from proper dice is only “wasting” pips on a lopsided distribution of value. For this reason, the median does roughly correlate with the order of the improper dice. As does the standard deviation. I would need to look at larger dice to understand the trend better, but I’d guess currently that these will only end up being weak correlations. There is likely a better predictor.

In the proper die scenario the standard die tied everything and was in some sense at the ‘center’ of the dice. In the improper dice, the standard die is now near the top of the ranking, and the die that loses to all other dice is in a reasonable sense the ‘furthest’ from the standard die. So likely there is a measure of ‘distance’ from the standard die that strongly correlates with the ranking (and so strongly predicts if two die beat each other). I think the median would only weakly capture this at best as n gets larger.

If we look at the sequence model of improper dice, what is the probability that at least one value is greater than n? Is it possible that the standard die beats 1 – o(n) fraction of the improper dice?

]]>I very much doubt that the median plays an important role here, but if transitivity holds with probability it would be a very nice challenge to try to find a simple statistic that predicts which die will win.

]]>Who knows, maybe the median becomes important. I’m not sure anyone has looked into that yet. ]]>

The reason this is interesting is that ordering of distributions based on for *samples* is the (widely-used) Mann-Whitney/Wilcoxon test. It’s already an advance to know this is typically transitive when the means are different, even under just one generating model for distributions. It would be even more helpful to know if this is just a fact about reasonable dice behaving reasonably or something special about the mean.

]]>A partial sketch of a different approach.

Consider the starting point: instead of dice represented as a vector of values, represent it as a multiplicity vector m_i = number of faces with value i. A scoring function f(A,B), which gives the number of rolls where A beats B – the number of rolls where B beats A, can be represented as a matrix equation A.F.B where F is an antisymmetric matrix. The constraints are now that m_i ≥ 0, the sum m_i = n, and sum i m_i = n(n+1)/2. The standard die in this representation (1,1,…,1) ties all dice due to theses constraints. Now in the realm of linear algebra, we can choose a set of changes that when starting from a valid die preserves the sum constraints, and that this set of changes spans the valid dice space. I will call this set the choice of dice “steps”. With a given choice of steps, this also provides a distance measurement: the minimum number of steps from the standard die.

With this setup, we can handle both the sequence and multiset dice model if we allow the notion of a “random die” to involve a possibly non-uniform weight on this representation.

Obviously not all weights will lead to intransitive dice. I believe an appropriate restriction would be to constrain possible weights to one such that it is symmetric to permutation of the multiplicity vector in the following way:

– any multiplicity vectors which are the same up to permutation, and meet the sum constraints, will have the same weight

We can choose steps such that the following are true:

– the set of dice one step away from the standard die have the properties:

– 0 ≤ m_i ≤ 2

– each die’s multiplicity vector is the same up to permutation

– each die beats exactly as many dice as it loses to

– all proper dice can be reached in O(n) steps

With such a choice of dice steps, and with the above constraints on the weights under consideration, the set of dice a distance 1 away from the standard die have the property:

– for any die, its probability of beating a random die is exactly equal to its probability of losing to a random die

This is a good starting point, and we can build up to any die by adding these steps together. Furthermore the scoring is linear, so knowing how the steps score against each other is sufficient. With multiple steps, due to correlations it is possible for the set of dice to no longer have every die beat exactly half the others. Since the starting point exactly has this symmetry, if the correlation is small enough combined with the weight constraint restricting the amount of asymmetry that can build up, since all dice can be reached in O(n) steps, maybe the asymmetry can’t build up “fast enough” to ruin the property we want for intransitive dice:

– If A is a random die then with probability 1 – o(1), the probability A beats a random die is equal to the probability it loses to a random die + o(1).

Maybe with a stronger constraint on the weights, one could also show that the probability A beats a random die is 1/2 + o(1), so that it also provides that the ties go to zero. But with such wild swings in the weights between sequence and multiset dice, I’m not sure what would likely be the appropriate strengthening which would also give the ties conjecture.

I believe something like this approach may have been discussed before, but part of the issue with this is the m_i ≥ 0 constraint. It makes it difficult as not all step combinations are valid. However, if the weight constraint above is sufficient, we can temporarily consider them all valid, and it just happens that for sequences and multiset models, the weight for these dice is 0. This approach therefore allows smoothing out the difficulties with considering all the different representations on the same footing, if indeed that weight constraint is sufficient.

Before I think about this approach any further, is there some simple argument or counter-example which shows steps with these properties + weight constraint is not sufficient?

Currently it appears to fit with our understanding:

– sequence dice model is intransitive

– multiset dice model is intransitive

– the improper dice model is mostly transitive (even though it is “balanced” in that every die has the same expectation value, it doesn’t have the nice choice of steps)

– the un-balanced sequence model (sequence model without the sum constraint) is mostly transitive (again, it doesn’t have the nice choice of steps)

– removing the weights constraint we can just choose weights to force transitivity

I’m hoping something along these lines would work as it unites a lot of the results.

]]>Hence, showing that with macroscopic probability, and are close for every should not be very hard. For a nonconditioned dice, the variables and should be approximately Gaussian with explicit covariances, so for a conditioned dice the are still jointly Gaussian. Proving it properly would require a local limit theorem to handle the double conditioning. But this time, the step distribution is known and very simple, so the proof should be easier than the previous one (or, even better, maybe a general result could be applied).

On the other hand, deducing from here that and are uniformly close does not seem obvious. We would need to show that cannot vary too much in a short interval. A possible way would be to show that is in some sense absolutely continuous with respect to a nonconditioned random walk, and then to use known results about the max of a random walk on a short interval. The absolute continuity also requires a local limit theorem, but this should not be too hard for the same reasons as above.

]]>That’s a good point, and I don’t see a way around it either.

But now I am thinking that “being excluded from the analysis in your main theorem” is *not* uncorrelated with “having lots of repeated faces” (and thus being relatively overrepresented in the multiset model), but is *negatively* correlated with it. If that’s true, then at least in some sense the main theorem should be easier in the multiset model than in the balanced sequence model (since the excluded cases are less common in its distribution).

It’s taking me awhile to write up my reasons for that thought (and even once written they will be vague), so I thought I’d mention that general idea first.

]]>The difficulty with just not worrying about such coincidences as occur is that the weights are very sensitive to the numbers of coincidences. For example, if two values of multiplicity 3 are allowed to merge into one value of multiplicity 6, then the weight gets divided by . And it seems that to take account of this brings us back to the problem we started with (since if we knew how to deal with these mergers then we could simply take the multiplicity class to be all singletons and deal directly with the multiset model).

That’s just how it seems to me, but as with my previous remarks, anything that sounds pessimistic can potentially be knocked down by some observation that I have not made, or some additional factor that I have not taken into account, and I don’t rule that out here.

]]>(a) I guess the following won’t work, but I’d like to confirm that understanding (and that my reasoning makes sense about the other parts):

If we fix a “multiplicity class”, then a balanced sequence is just a sequence that (1) obeys certain equalities between elements (to make certain subsets of them equal), (2) obeys inequalities between the elements that are supposed to be distinct, (3) has the right sum (so it’s balanced). If the value of each subset of sequence elements which are required equal by (1) is given by an independent random variable, then is the probability of ((2) and (3)) too low? (I guess (2) and (3) are nearly independent.) For (3) I’d guess the probability is similar to the balanced sequence model (the condition still says that some linear sum of the variables has its expected value, I think); for (2) we’re saying that choices of random elements of fail to have overlaps, where depends on the multiplicity class but could be nearly as large as . I guess the probability of (2) is then roughly exponentially low in , which is why this doesn’t work. Is that right?

(b) thinking out loud:

But what if we just omit condition (2)? Then we have some kind of generalization of a “multiplicity class” (except we want to think of it as a random distribution over dice, not just as a class of dice). It’s no longer true that all the dice in this distribution have the same preimage-size in the map from the balanced sequence model to the multiset model… but (in a typical die chosen from this distribution) most of the random variables have no overlaps with other ones, so only a few of the subsets of forced-equal sequence elements merge together to increase that preimage size. Can we conclude anything useful from this?

(We would want to first choose and fix one of these distributions, then show that using it to choose dice preserves the desired theorems, then show that choosing the original distribution properly (i.e. according to the right probability for each one) ends up approximating choosing a die using our desired distribution. In other words, we’d want some sum of these sort-of-like-multiplicity-class distributions to approximate our desired overall distribution.)

]]>Because the random distribution weights between sequence and multiset change so drastically (as you mention it can be as extreme as n! : 1), it feels like either something very special is being exploited for the conjectures to still hold in both models, or this should just happen fairly often with a change of weights. But we’ve already seen that the intransitivity is fairly fragile when changing the dice model.

I think this “something special” is that with the sequences model, not only is the score distribution for a random die very similar to a gaussian, but I conjecture this is true with high probability even when looking at the score distribution for the subset of dice constrained to have some particular multiplicity of values (ie. 12 numbers are unique, 3 are repeated twice, 5 are repeated three times, etc.).

Given the already completed sequence proof, the stricter conjecture is equivalent to saying the U variable is not correlated with the multiplicity of values. Looking at how U is defined, that sounds plausible to me, and may be provable.

If this stricter conjecture is true, then any change of weights for the random distribution will be fine if each “multiplicity class” are changed by the same factor. And this is the case for the shift from sequences -> multiset.

]]>If one wants to come up with at least some distinguishing property, it seems good to focus on things like the number of repeated elements, or more generally how the numbers of the different elements are distributed. If we define a map from sequences of length to multisets by writing the sequences in increasing order, then the number of preimages of a multiset depends very strongly on how many repeated elements it has, with extremes ranging from 1 (for the multiset ) to (for the multiset ). Since multisets with many repeats give rise to far fewer sequences, one would expect that repeats are favoured in the multisets model compared with the sequences model. I would guess that from this it is possible to come up with some statistic to do with the number of repeats that holds with probability almost 1 in the multisets model and almost zero in the sequences model.

]]>I haven’t been following this project closely, but my impression is that your existing results can be characterized as “all dice behave ‘reasonably’ except for a negligible fraction, and among the ‘reasonable’ ones, our theorems hold, and from this it follows they hold in general”.

So if we take as that predicate that a die is ‘unreasonable’, then if switching to the multiset model (and thus changing the distribution over dice) makes any of the analogous theorem statements false (and if my general understanding is correct), that predicate has to be one which is negligible in the subsets model but not in the multisets model. (Let’s call that a “contrasting predicate”.)

(I’m not conjecturing these “contrasting predicates” don’t exist — in fact, I’m guessing that someone here might be immediately able to give an example of one — maybe it’s enough for the predicate to require that the distribution of element frequencies in the multiset has a certain property. But I’m wondering if thinking about the requirements on such a predicate might be illuminating.)

]]>