Intransitive dice VII — aiming for further results

While Polymath13 has (barring a mistake that we have not noticed) led to an interesting and clearly publishable result, there are some obvious follow-up questions that we would be wrong not to try to answer before finishing the project, especially as some of them seem to be either essentially solved or promisingly close to a solution. The ones I myself have focused on are the following.

  1. Is it true that if two random elements A and B of [n]^n are chosen, then A beats B with very high probability if it has a sum that is significantly larger? (Here “significantly larger” should mean larger by f(n) for some function f(n)=o(n^{3/2}) — note that the standard deviation of the sum has order n^{3/2}, so the idea is that this condition should be satisfied one way or the other with probability 1-o(1)).
  2. Is it true that the stronger conjecture, which is equivalent (given what we now know) to the statement that for almost all pairs (A,B) of random dice, the event that A beats a random die C has almost no correlation with the event that B beats C, is false?
  3. Can the proof of the result obtained so far be modified to show a similar result for the multisets model?

The status of these three questions, as I see it, is that the first is basically solved — I shall try to justify this claim later in the post, for the second there is a promising approach that will I think lead to a solution — again I shall try to back up this assertion, and while the third feels as though it shouldn’t be impossibly difficult, we have so far made very little progress on it, apart from experimental evidence that suggests that all the results should be similar to those for the balanced sequences model. [Added after finishing the post: I may possibly have made significant progress on the third question as a result of writing this post, but I haven’t checked carefully.]

The strength of a die depends strongly on the sum of its faces.

Let A=(a_1,\dots,a_n) and B=(b_1,\dots,b_n) be elements of [n]^n chosen uniformly and independently at random. I shall now show that the average of

|\{(i,j):a_i>b_j\}|-|\{(i,j):a_i<b_j\}|-\sum_ia_i+\sum_jb_j

is zero, and that the probability that this quantity differs from its average by substantially more than n\log n is very small. Since typically the modulus of \sum_ia_i-\sum_jb_j has order n^{3/2}, it follows that whether or not A beats B is almost always determined by which has the bigger sum.

As in the proof of the main theorem, it is convenient to define the functions

f_A(j)=|\{i:a_i<j\}|+\frac 12|\{i:a_i=j\}|

and

g_A(j)=f_A(j)-j+\frac 12.

Then

\sum_jf_A(b_j)=\sum_{i,j}\mathbf 1_{a_i<b_j}+\frac 12\sum_{i,j}\mathbf 1_{a_i=b_j},

from which it follows that B beats A if and only if \sum_jf_A(b_j)>n^2/2. Note also that

\sum_jg_A(b_j)=\sum_jf_A(b_j)-\sum_jb_j+\frac n2.

If we choose A purely at random from [n]^n, then the expectation of f_A(j) is j-1/2, and Chernoff’s bounds imply that the probability that there exists j with |g_A(j)|=|f_A(j)-j+1/2|\geq C\sqrt{n\log n} is, for suitable C at most n^{-10}. Let us now fix some A for which there is no such j, but keep B as a purely random element of [n]^n.

Then \sum_jg_A(b_j) is a sum of n independent random variables, each with maximum at most C\sqrt{n\log n}. The expectation of this sum is \sum_jg_A(j)=\sum_jf_A(j)-n^2/2.

But

\sum_jf_A(j)=\sum_{i,j}\mathbf 1_{a_i<j}+\frac 12\sum_{i,j}\mathbf 1_{a_i=j}

=\sum_i(n-a_i)+\frac n2=n^2+\frac n2-\sum_ia_i,

so the expectation of \sum_jg_A(b_j) is n(n+1)/2-\sum_ia_i.

By standard probabilistic estimates for sums of independent random variables, with probability at least 1-n^{-10} the difference between \sum_jg_A(b_j) and its expectation \sum_jf_A(j)-n^2/2 is at most Cn\log n. Writing this out, we have

|\sum_jf_A(b_j)-\sum_jb_j+\frac n2-n(n+1)/2+\sum_ia_i|\leq Cn\log n,

which works out as

|\sum_jf_A(b_j)-\frac {n^2}2-\sum_jb_j+\sum_ia_i|\leq Cn\log n.

Therefore, if \sum_ia_i>\sum_jb_j+Cn\log n, it follows that with high probability \sum_jf_A(b_j)<n^2/2, which implies that A beats B, and if \sum_jb_j>\sum_ia_i+Cn\log n, then with high probability B beats A. But one or other of these two cases almost always happens, since the standard deviations of \sum_ia_i and \sum_jb_j are of order n^{3/2}. So almost always the die that wins is the one with the bigger sum, as claimed. And since “has a bigger sum than” is a transitive relation, we get transitivity almost all the time.

Why the strong conjecture looks false

As I mentioned, the experimental evidence seems to suggest that the strong conjecture is false. But there is also the outline of an argument that points in the same direction. I’m going to be very sketchy about it, and I don’t expect all the details to be straightforward. (In particular, it looks to me as though the argument will be harder than the argument in the previous section.)

The basic idea comes from a comment of Thomas Budzinski. It is to base a proof on the following structure.

  1. With probability bounded away from zero, two random dice A and B are “close”.
  2. If A and B are two fixed dice that are close to each other and C is random, then the events “A beats C” and “B beats C” are positively correlated.

Here is how I would imagine going about defining “close”. First of all, note that the function g_A is somewhat like a random walk that is constrained to start and end at zero. There are results that show that random walks have a positive probability of never deviating very far from the origin — at most half a standard deviation, say — so something like the following idea for proving the first step (remaining agnostic for the time being about the precise definition of “close”). We choose some fixed positive integer k and let x_1<\dots<x_k be integers evenly spread through the interval \{1,2,\dots,n\}. Then we argue — and this should be very straightforward — that with probability bounded away from zero, the values of f_A(x_i) and f_B(x_i) are close to each other, where here I mean that the difference is at most some small (but fixed) fraction of a standard deviation.

If that holds, it should also be the case, since the intervals between x_{i-1} and x_i are short, that f_A and f_B are uniformly close with positive probability.

I’m not quite sure whether proving the second part would require the local central limit theorem in the paper or whether it would be an easier argument that could just use the fact that since f_A and f_B are close, the sums \sum_jf_A(c_j) and \sum_jf_B(c_j) are almost certainly close too. Thomas Budzinski sketches an argument of the first kind, and my guess is that that is indeed needed. But either way, I think it ought to be possible to prove something like this.

What about the multisets model?

We haven’t thought about this too hard, but there is a very general approach that looks to me promising. However, it depends on something happening that should be either quite easy to establish or not true, and at the moment I haven’t worked out which, and as far as I know neither has anyone else.

The difficulty is that while we still know in the multisets model that A beats B if and only if \sum_jf_A(b_j)<n^2/2 (since this depends just on the dice and not on the model that is used to generate them randomly), it is less easy to get traction on the sum because it isn’t obvious how to express it as a sum of independent random variables.

Of course, we had that difficulty with the balanced-sequences model too, but there we got round the problem by considering purely random sequences B and conditioning on their sum, having established that certain events held with sufficiently high probability for the conditioning not to stop them holding with high probability.

But with the multisets model, there isn’t an obvious way to obtain the distribution over random dice B by choosing b_1,\dots,b_n independently (according to some distribution) and conditioning on some suitable event. (A quick thought here is that it would be enough if we could approximate the distribution of B in such a way, provided the approximation was good enough. The obvious distribution to take on each b_i is the marginal distribution of that b_i in the multisets model, and the obvious conditioning would then be on the sum, but it is far from clear to me whether that works.)

A somewhat different approach that I have not got far with myself is to use the standard one-to-one correspondence between increasing sequences of length n taken from [n] and subsets of [2n-1] of size n. (Given such a sequence (a_1,\dots,a_n) one takes the subset \{a_1,a_2+1,\dots,a_n+n-1\}, and given a subset S=\{s_1,\dots,s_n\}\subset[2n-1], where the s_i are written in increasing order, one takes the multiset of all values s_i-i+1, with multiplicity.) Somehow a subset of [2n-1] of size n feels closer to a bunch of independent random variables. For example, we could model it by choosing each element with probability n/(2n-1) and conditioning on the number of elements being exactly n, which will happen with non-tiny probability.

Actually, now that I’m writing this, I’m coming to think that I may have accidentally got closer to a solution. The reason is that earlier I was using a holes-and-pegs approach to defining the bijection between multisets and subsets, whereas with this approach, which I had wrongly assumed was essentially the same, there is a nice correspondence between the elements of the multiset and the elements of the set. So I suddenly feel more optimistic that the approach for balanced sequences can be adapted to the multisets model.

I’ll end this post on that optimistic note: no doubt it won’t be long before I run up against some harsh reality.

Advertisements

12 Responses to “Intransitive dice VII — aiming for further results”

  1. Bruce Smith Says:

    Related to question 3: is it obvious whether or not there exists *any* predicate of dice which is negligible (i.e. o(n)) in the subsets model, but not negligible in the multiset model?

    I haven’t been following this project closely, but my impression is that your existing results can be characterized as “all dice behave ‘reasonably’ except for a negligible fraction, and among the ‘reasonable’ ones, our theorems hold, and from this it follows they hold in general”.

    So if we take as that predicate that a die is ‘unreasonable’, then if switching to the multiset model (and thus changing the distribution over dice) makes any of the analogous theorem statements false (and if my general understanding is correct), that predicate has to be one which is negligible in the subsets model but not in the multisets model. (Let’s call that a “contrasting predicate”.)

    (I’m not conjecturing these “contrasting predicates” don’t exist — in fact, I’m guessing that someone here might be immediately able to give an example of one — maybe it’s enough for the predicate to require that the distribution of element frequencies in the multiset has a certain property. But I’m wondering if thinking about the requirements on such a predicate might be illuminating.)

    • Bruce Smith Says:

      (In that comment, I should have defined “negligible” as o(N) rather than o(n), if there are N dice of size n in the model the predicate is about.)

    • Bruce Smith Says:

      (A second correction: when I said “subsets model” I should have said “balanced sequences model”.)

    • gowers Says:

      This does seem like a potentially good thing to think about. As you suggest, it probably isn’t hard to come up with distinguishing properties, but it may well be that in some precise sense they are all “irrelevant” to anything one might be interested in when discussing matters such as the probability that one die beats another. (I don’t know how to formulate such a conjecture, but it feels as though something like that might exist.)

      If one wants to come up with at least some distinguishing property, it seems good to focus on things like the number of repeated elements, or more generally how the numbers of the different elements are distributed. If we define a map from sequences of length n to multisets by writing the sequences in increasing order, then the number of preimages of a multiset depends very strongly on how many repeated elements it has, with extremes ranging from 1 (for the multiset (1,1,\dots,1)) to n! (for the multiset (1,2,\dots,n)). Since multisets with many repeats give rise to far fewer sequences, one would expect that repeats are favoured in the multisets model compared with the sequences model. I would guess that from this it is possible to come up with some statistic to do with the number of repeats that holds with probability almost 1 in the multisets model and almost zero in the sequences model.

  2. P. Peng Says:

    Another possible route to the multi-set result.

    Because the random distribution weights between sequence and multiset change so drastically (as you mention it can be as extreme as n! : 1), it feels like either something very special is being exploited for the conjectures to still hold in both models, or this should just happen fairly often with a change of weights. But we’ve already seen that the intransitivity is fairly fragile when changing the dice model.

    I think this “something special” is that with the sequences model, not only is the score distribution for a random die very similar to a gaussian, but I conjecture this is true with high probability even when looking at the score distribution for the subset of dice constrained to have some particular multiplicity of values (ie. 12 numbers are unique, 3 are repeated twice, 5 are repeated three times, etc.).

    Given the already completed sequence proof, the stricter conjecture is equivalent to saying the U variable is not correlated with the multiplicity of values. Looking at how U is defined, that sounds plausible to me, and may be provable.

    If this stricter conjecture is true, then any change of weights for the random distribution will be fine if each “multiplicity class” are changed by the same factor. And this is the case for the shift from sequences -> multiset.

    • gowers Says:

      That’s a very interesting idea. It seems plausible that as long as a sequence takes enough different values, then conditioning on the distribution of the numbers of times the values are taken shouldn’t affect things too much. It’s not quite so obvious how to prove anything: I don’t see a simple way of using independent random variables and conditioning on an event of not too low probability. But it isn’t obviously impossible, and it would be an interesting generalization if we could get it.

  3. Bruce Smith Says:

    > I don’t see a simple way of using independent random variables and conditioning on an event of not too low probability.

    (a) I guess the following won’t work, but I’d like to confirm that understanding (and that my reasoning makes sense about the other parts):

    If we fix a “multiplicity class”, then a balanced sequence is just a sequence that (1) obeys certain equalities between elements (to make certain subsets of them equal), (2) obeys inequalities between the elements that are supposed to be distinct, (3) has the right sum (so it’s balanced). If the value of each subset of sequence elements which are required equal by (1) is given by an independent random variable, then is the probability of ((2) and (3)) too low? (I guess (2) and (3) are nearly independent.) For (3) I’d guess the probability is similar to the balanced sequence model (the condition still says that some linear sum of the variables has its expected value, I think); for (2) we’re saying that k choices of random elements of [n] fail to have overlaps, where k depends on the multiplicity class but could be nearly as large as n. I guess the probability of (2) is then roughly exponentially low in n, which is why this doesn’t work. Is that right?

    (b) thinking out loud:

    But what if we just omit condition (2)? Then we have some kind of generalization of a “multiplicity class” (except we want to think of it as a random distribution over dice, not just as a class of dice). It’s no longer true that all the dice in this distribution have the same preimage-size in the map from the balanced sequence model to the multiset model… but (in a typical die chosen from this distribution) most of the k random variables have no overlaps with other ones, so only a few of the k subsets of forced-equal sequence elements merge together to increase that preimage size. Can we conclude anything useful from this?

    (We would want to first choose and fix one of these distributions, then show that using it to choose dice preserves the desired theorems, then show that choosing the original distribution properly (i.e. according to the right probability for each one) ends up approximating choosing a die using our desired distribution. In other words, we’d want some sum of these sort-of-like-multiplicity-class distributions to approximate our desired overall distribution.)

    • gowers Says:

      Yes, the problem you identify is indeed the problem: I think that typically the multiplicities for a random multiset will have something close to a Poisson distribution with mean 1 (they are given by the lengths of runs of consecutive elements of a random subset of \{1,2,\dots,2n-1\} of length n). So they will almost all be of constant size, and therefore the number of distinct values taken will be proportional to n, which implies that the probability that they are distinct is exponentially small.

      The difficulty with just not worrying about such coincidences as occur is that the weights are very sensitive to the numbers of coincidences. For example, if two values of multiplicity 3 are allowed to merge into one value of multiplicity 6, then the weight gets divided by \binom 63=15. And it seems that to take account of this brings us back to the problem we started with (since if we knew how to deal with these mergers then we could simply take the multiplicity class to be all singletons and deal directly with the multiset model).

      That’s just how it seems to me, but as with my previous remarks, anything that sounds pessimistic can potentially be knocked down by some observation that I have not made, or some additional factor that I have not taken into account, and I don’t rule that out here.

    • Bruce Smith Says:

      > And it seems that to take account of this brings us back to the problem we started with …

      That’s a good point, and I don’t see a way around it either.

      But now I am thinking that “being excluded from the analysis in your main theorem” is *not* uncorrelated with “having lots of repeated faces” (and thus being relatively overrepresented in the multiset model), but is *negatively* correlated with it. If that’s true, then at least in some sense the main theorem should be easier in the multiset model than in the balanced sequence model (since the excluded cases are less common in its distribution).

      It’s taking me awhile to write up my reasons for that thought (and even once written they will be vague), so I thought I’d mention that general idea first.

    • gowers Says:

      One reason that lots of repeated faces make things a bit harder is that it is slightly more complicated to say that the sum has a good chance of equalling a specified value, and probably some of the deviation estimates become worse. But I don’t think those effects would kick in in a serious way until the number of repeats is very large.

  4. Thomas Budzinski Says:

    Here are a few more remarks about the sketch of proof for the (false) strong conjecture. For an unconditioned dice, f_A has the same distribution as a random walk with Poisson(1) steps, conditioned on f_A(n)=n. Hence, g_A is a random walk with Poisson(1)-1 steps, conditioned on g_A(n)=0 and \sum_j g_A(j)=0 (I don’t know if it has already been noticed).

    Hence, showing that with macroscopic probability, g_A(x_i) and g_B(x_i) are close for every i should not be very hard. For a nonconditioned dice, the variables g_A(x_i) and \sum_j g_A(j) should be approximately Gaussian with explicit covariances, so for a conditioned dice the f_A(x_i) are still jointly Gaussian. Proving it properly would require a local limit theorem to handle the double conditioning. But this time, the step distribution is known and very simple, so the proof should be easier than the previous one (or, even better, maybe a general result could be applied).

    On the other hand, deducing from here that f_A and f_B are uniformly close does not seem obvious. We would need to show that f_A cannot vary too much in a short interval. A possible way would be to show that \left( f_A(j) \right)_{0 \leq j \leq n/2} is in some sense absolutely continuous with respect to a nonconditioned random walk, and then to use known results about the max of a random walk on a short interval. The absolute continuity also requires a local limit theorem, but this should not be too hard for the same reasons as above.

  5. Gil Kalai Says:

    For the sequence model you can base the criteria if dice A beats dice B on other rules (which can be described e.g. in terms of a function from (-1,1,0)^n to {-1,1,0}). For example dice A beats dice B if the largest “run” of wins of A vs B is larger than the largest runs of wins of B vs A. In analogy with voting rules I would expect that many rules will lead to asymptotic randomness (or pseudoandomness) even when you drop the condition on the sum of entries. (All my previous guesses based on this analogy failed but at least I figured for myself what was the difference.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: