DHJ(3) and related results: 1050-1099

I’m not sure how many more comment threads we will have, but we are running out of the 1000-1049 thread, so it’s time for a new one. The main news to report since the last post is that progress is being made on writing up the proof of DHJ(3) and DHJ(k). At the moment it is more like preparatory sketches, but they are pretty detailed and can, as usual, be found on the wiki. It looks as though some of the more technical parts will end up very streamlined thanks to work of Ryan O’Donnell: the final proof of DHJ(k) should be surprisingly short (though I hope that we will write it up with plenty of accompanying explanation so that it is not too compressed and hard to understand).

About these ads

136 Responses to “DHJ(3) and related results: 1050-1099”

  1. gowers Says:

    1050. Shelah

    I still haven’t given up on the idea of Shelahifying the proof of DHJ(k), but a weakness in my attempts so far is that I have no good story to tell about corners. So in this comment I am going to try to guess how a proof of the corners theorem might go if the induction worked the other way round. In other words, I want an argument that solves the higher-dimensional corners problem with the one-dimensional Szemerédi theorem as its main tool (rather than Szemerédi in one dimension down). It feels rather unlikely that this is possible, but we aren’t going to know if we don’t try.

    I think the difference should show up only once we get to three dimensions. I’ve been struggling to think of some way that complexity-1 sets might come into the picture (which in this context means Cartesian products X\times Y\times Z\subset[n]^3), and the best I’ve come up with so far is this. The conventional proof takes a “dense diagonal,” meaning a plane of constant x+y+z that contains many points of the set A, and exploits the fact that this set contains many aligned equilateral triangles to obtain a structured set that A is forced to avoid. What if we try to do things the other way round? We start with the fact that A\subset[n]^3 is dense and try to deduce from that that almost all diagonal planes are sparse. The hope is that the argument works unless A correlates with a set of complexity 1.

    Let’s be more precise about this. If A is a 3D-corner-free subset of [n]^3 and (x,y,z)\in A then for any d we cannot have all three points in the aligned equilateral triangle (x+d,y,z),(x,y+d,z),(x,y,z+d). Thus, each point in A rules out a large number of triangles in a large number of diagonal planes. What we would like to be able to deduce from that is that the density of A in those planes is necessarily small.

    This suggests the following question, which I have not had time to think about properly. For what collections \mathcal{C} of corners (in [n]^2 now) is it the case that every dense subset of [n]^2 contains a corner in \mathcal{C}?

    A more specific question is this. A set of complexity 1 in [n]^2 is a set of the form K=\{(x,y):x\in U,y\in V,x+y\in W\}. Let us say that \mathcal{C} correlates with K if there are more corners with vertices in K than there should be. Suppose that \mathcal{C} does not correlate with any dense set of complexity 1. Does it follow that every dense set has a corner in \mathcal{C}? And can it be proved surprisingly easily?

    If the answer to both questions were to be yes then we might be in business, but as I said at the beginning this feels a bit over-optimistic.

    Another point to make is that the above question could need a little adjustment before it becomes sensible. For instance, it might need to talk about more local forms of correlation.

    • jozsef Says:

      Tim, could you please clarify your last question? “\mathcal{C} correlates with if there are more corners with vertices in K than there should be.” (Probably you meant correlates with K.) I can’t see any condition which guarantees that \mathcal{C} is not even empty. Probably I missed something.

    • gowers Says:

      OK. First, I’ve added the missing K — thanks for pointing that out. The idea I had in mind was to assume that \mathcal{C} had density \alpha in the set of all possible corners, that \beta was the density of corners with vertices in K, and that the density of such corners that belong to \mathcal{C} was significantly different from \alpha\beta.

    • jozsef Says:

      I see. This sounds to me like another sparse hypergraph regularity problem. The complexity 1 set is the shadow of \mathcal{A} and we check if \mathcal{A} is quasirandom on its shadow or not.
      The complexity 1 set contains many corners (it is a large Cartesian product with small sumset) so we want to conclude that \mathcal{A} contains corners or it is “very irregular”. Isn’t it your modified Ajtai-Szemeredi proof? I have to read your post again, as I lost track of how this relates to a Shelah type argument.

    • jozsef Says:

      Regarding to the first question; There is a very sparse set of corners that every dense subset of [n]^2 contains one from that set. Using DHJ(4) (I know that we don’t want to use it but anyways) one can show that there is a set of n^{2.3219...} corners that every dense set contains a corner from that set. To construct such set, consider [n] as a m=\log{n} dimensional cube and there is a 1-1 correspondence between the points of [n]^2 and [4]^m. Take the triangles defined by combinatorial lines. By DHJ(4) every dense subset contains a corner (even a square) from this set of corners. Maybe one can find an even sparser set of corners but I don’t see how. I also can’t see a reasonable lower bound for the size of such set of corners. It seems to be an interesting problem.

    • Gil Kalai Says:

      Shalahifying DHJ(k) is a noble cause indeed; What about DHJ(3)? does the proof gives inverse ackerman type function for DHJ(3) as well? Is the goal of density 1/log log log … log n just for DHJ(3) out of reach with the new proof? (for a bounded number of logs? for two logs?)

    • Gil Says:

      Another quick question: what are the best bound known for multidimensional Sperner? (The wiki argument with DHJ(2^k) gives terrible bounds). I recall we discussed it but didnt find it. (Another question: is there an easy way to do search on the union of all threads?)

  2. gowers Says:

    1051. Shelah

    I’m going to have a quick go at the question I asked in 1050 by following the modified Ajtai-Szemerédi proof and seeing what assumptions I need to make about \mathcal{C} to get the argument to work.

    Actually, I’ve got to go — will do this later.

  3. gowers Says:

    1052. Shelah

    OK here goes. The first step was to find a dense diagonal. Do we still want to do that? Well, the point of a dense diagonal was that it gave rise to a large Cartesian product full of forbidden points. But if we’ve only got a certain subset of corners, then we cannot make such a strong conclusion.

    So what can we conclude? To investigate this, we could see what happens if the set of pairs on some diagonal that belong to corners in our system \mathcal{C} is random. If there are \alpha n^2 of these pairs, then they are ruling out a … random set of size \alpha n^2, which isn’t very helpful.

    Actually, what I’m doing is nonsense because the contrapositive of what I’m trying to show is that a dense “diagonal plane” forces a complexity-1 bias in the set A, which is clearly untrue. So I still don’t have any feel for how, or whether, the proof can be given a Shelah version.

    • Randall Says:


      This may make absolutely no sense, but what if instead of looking at corners in Z^3 you look at four terms progressions in Z…only you can still look at your dense plane in 3 space, only now your dense set is a set of a special type. Namely, what we really seek is a four term progression in a dense set E of Z, but we actually look at the set E'=\{(x,y,z)\in Z^3: x+2y+3z\in E\}, which is dense in Z^3. It seems that even though you are only forbidding a sparse set with your dense plane, that sparse set may project back to a dense set in Z (though I have no idea why that would be, if so, so this is mere speculation).

  4. Randall Says:

    1052. Szemeredi’s proof of Roth’s theorem

    I just looked for the first time at the page about Szemeredi’s combinatorial proof of Roth’s theorem on the wiki. It seems to me that this proof actual resembles the polymath proof of DHJ3 as much as does the Ajtai/Szemeredi proof, if not more. After all it uses this “cube lemma” which seems to be more comparable to DHJ(2) than is Szemeredi’s theorem. Indeed, isn’t my post 1002, which was intended to be a dumbed-downed version of our DHJ3 proof that would be just strong enough to tackle corners, essentially the same proof? (Assuming of course that 1002 is even correct….)

    • gowers Says:

      I had a similar thought at some stage (though less precise): the fact that he uses combinatorial cubes is certainly suggestive. No time just now to think about your question about 1002, but we should certainly try to clarify this.

  5. Ryan O'Donnell Says:

    1053. Simplifying the DHJ(3) proof.

    I’ll be back on my normal schedule soon, but let me just say I started thinking about the part of the proof wherein it is shown that if A has a density increment on a 12-set, then we can find a subspace (with \omega_n(1) coordinates) on which it also has a density increment.

    Let’s simplify by imagining that A has a density increment on a 1-set. By the argument already sketched (based on multidim Sperner), one can show that a 1-set contains not just a combinatorial line but a combinatorial subspace of dimension \omega_n(1).

    I think then one can use the new equal-slices perspective to easily show the Varnavides statement, that a 1-set contains a randomly chosen chosen \omega_n(1)-dimensional subspace with constant probability. Perhaps this can substitute for the argument which almost partitions 1-sets into combinatorial subspaces…

    • gowers Says:

      1053.1 I think that doesn’t work, for two reasons. First, I don’t see why a 1-set containing lots of combinatorial subspaces gives you a density increase on one of those subspaces. Why couldn’t they all be disjoint from A?

      More seriously, and this was an important factor in guiding me to the partitioning-into-subspaces perspective, one can use the Hahn-Banach theorem to prove that IF it follows from the fact that A correlates with X that A correlates with a set of type T, then one MUST be able to partition X into sets of type T. (I’ve oversimplified here, and not explained where partitioning almost the whole set comes in — perhaps I’ll wikify this argument some time soon.)

    • gowers Says:

      1053.2 I didn’t quite say what I meant there. The Hahn-Banach theorem tells you that you’re basically forced to find at the very least some kind of fractional partition — that is, a covering by subspaces where almost every element is covered almost the same number of times. But I don’t see a probabilistic way of proving that, because different elements can be in very different numbers of subspaces. (I gave up after failing to find a direct probabilistic proof that a Cartesian product in {}[n]^2 can be evenly covered by grids.)

    • Gil Says:

      Is this Hahn-Banach argument essentially LP duality? A general ststament will be useful and may ring various combinatorial bells…

    • Ryan O'Donnell Says:

      Gil, I think think so, except that when Tim says “Hahn-Banach” here I’m not sure he actually means “using the Hahn-Banach theorem”. I interpret it somehow as short for “the argument wherein you say that if one can partition X into sets of type T, then it must be that A being dense in X means A is dense in some set of type T”.

      Or perhaps he does mean, “use the Hahn-Banach Theorem” (i.e., LP duality), I don’t know.

    • gowers Says:

      My fault for writing in too much haste — sorry, I’ve been rather busy with other things recently (and am just about to go abroad for a few days, so I’ll be going even quieter). I was actually referring to the Hahn-Banach separation theorem, which says that if a point doesn’t belong to a convex body then you can find a linear functional that’s large on the point and small everywhere on the convex body. (I’m oversimplifying slightly, but not too much.) In this case, if the characteristic function f of X can’t be approximated by a suitable positive combination of characteristic functions of subspaces, then you have a convex set to which f does not belong. The functional given by the Hahn-Banach theorem can be used to produce a bounded function that correlates with X but not with any subspace. I’m trying to wikify the details but have kept messing it up. I’m going to work them out on paper and then I’ll get back to the wikification.

  6. Ryan O'Donnell Says:

    1054. Simplifying.

    Hi Tim, perhaps you’re right, but here is an alternative that I think works.

    By known arguments I think we can Varnavides-ise and multidimensional-ise any DHJ-type statement fairly easily, in the equal-slices measure. (The multidimensionalising is in the wiki for uniform measure I think, but I’m pretty sure it’s easy to do under equal-slices.) In particular, we can even Varnavides+multidim-ise any such statement. Hence, e.g., starting with Sperner, it’s not to hard to get an equal-slices VM-DHJ(2). [M = multidim, V = Varnavides.]

    That’s good, because that’s a key component in pushing up from k = 2 to k = 3. In particular, with VM-DHJ(2), one can get DHJ(3) for 1-sets (“Step 1″ in the overall sketch.) Were we calling this DHJ(1,3) or something? Let’s say we were.

    I checked (I think) that again, we can easily V-ise and M-ise. (One only needs to check that restrictions of 1-sets to subspaces are still 1-sets.) So we can easily get VM-DHJ(1,3); i.e., a 1-set in [3]^n with density \delta has the property that if you pick a random equal-slices m-dimensional subspace, it’s entirely within the set with probability c(\delta, m) > 0.

    Let me now wave my hands and pretend that we can somehow get DHJ(2,3) as well, where I take “DHJ(2,3)” to mean the claim that a dense 12-set in [3]^n contains a line. I’m not quite sure how to do this currently, but the jump from 1-sets to 12-sets in the current proof outline is easy, so let’s pretend it’s easy here too. :) [More honestly, I think we should try to do it by using the argument that appears below.]

    Again, we can V-ise and M-ise to get VM-DHJ(2,3).

    Now for the main point of this post: how can we use this to conclude DHJ(3)? If I’m not mistaken (I’m probably mistaken) I think it’s actually easy:

    As usual, we note that if A \subseteq [3]^n has (equal-slices) density \delta, yet fails to contain a line, then it must completely avoid a certain 12-set we like to call U \cap V. This set U \cap V has equal-slices density \eta = \eta(\delta).

    Let S denote a randomly chosen (equal-slices) m-dimensional subspace of [3]^n. Note that if we draw x \in S at random (equal-slices), then x is overall distributed as equal-slices. Hence:

    \delta = \Pr_{S,x}[x \in A] = \Pr[x \in A \mid S \subseteq U \cap V] \Pr[S \subseteq U \cap V]
    + \Pr[x \in A \mid S \not \subseteq U \cap V] \Pr[S \not \subseteq U \cap V].

    Since A avoids U \cap V we have \Pr[x \in A \mid S \subseteq U \cap V] = 0. But also, by VM-DHJ(2,3), we know that \Pr[S \not \subseteq U \cap V] \leq 1 - c(\eta, m) for some tiny but positive c(\eta,m). Hence the above equation implies

    \delta/(1-c(\eta,m)) \leq \Pr[x \in A \mid S \not \subseteq U \cap V].

    Hence there exists a particular subspace S_0 (not in U \cap V) such \Pr_{x \in S_0}[x \in A] \geq \delta/(1-c(\eta,m)), and we have the desired density increment on a (multidimensional) subspace.

    • Ryan O'Donnell Says:

      A small point: Actually, DHJ(1,3) follows almost trivially just from Sperner (as I’m sure we noted back in the 1-200 thread). So you can get to VM-DHJ(1,3) rather quickly. About DHJ(2,3), I’m still thinking…

    • Ryan O'Donnell Says:

      1054.2. DHJ(3) for 12-sets.

      Oh, it’s already on the wiki. Hmm. Are we done?

    • Ryan O'Donnell Says:

      1054.3. Okay, so just to recap. The following statements all refer to equal-nondegenerate-slices measure:

      (a) DHJ(3) for 12-sets is an easy consequence of Sperner (see 1054.2).

      (b) I claim that it is not too hard to Varnavides-ise, Multidimensional-ise, and indeed Varna+Multidim-ise DHJ(3) for 12-sets (using known arguments of Tim and Terry I think that are already on the wiki, but which can be streamlined a bit if you work in equal-nondegenerate-slices)

      (c) Then, assuming VM-DHJ(3) for 12-sets, use the argument following “Now for the main point of this post:” in 1054.

      If someone is able to agree with (c), I will endeavour to wikify (b) a bit more clearly.

    • Ryan O'Donnell Says:

      1054.4. I’m slightly nervous about the dog-chasing-its-own-tail aspect of the parameters here; you know, getting too small an increment on too small a subspace. Let me think about why it’s okay in the original proof sketch.

    • gowers Says:

      1054.5 I see better what you’re saying now. One detail troubles me, which is that the density increase you get on the subspace depends on the dimension of the subspace, which definitely isn’t the case in the other argument, but I haven’t yet decided whether that’s a serious problem or just an interesting detail.

    • gowers Says:

      1054.6 The more I think about it, the more I think that it is a problem. E.g. if on passing to an m-dimensional subspace you get a density increase of 2^{-m} it seems to me that there’s no way of getting the density up to 1.

    • Ryan O'Donnell Says:

      Right, it seems to be a problem. But I kind of like the flavour of this angle; I wonder if there’s a way to make it work… Probably won’t get a chance to think about it today though.

    • gowers Says:

      It would be great if you could simplify the argument. I’m busy wikifying an argument that places some limits on any proof that can possibly work — I hope it will narrow down and therefore speed up the search.

  7. Ryan O'Donnell Says:

    1055. Yes, Tim, I’m starting to see what you say, that this “inner increment/partition argument” is necessary.

    Without thinking about it too carefully, let me ask the following question: in the general partitioning argument, it seems to me that the “Subspace Richness” property tends to follow in general just from the “Line-Having” property, by the standard Multidimensional-ising and Varnavides-ising (perhaps one needs some of the other properties here; for example, I’ve only bothered to convince myself for 1-sets and 12-sets).

    If so, I guess the overall argument could go (or rather, is going):

    DHJ(2) (Sperner) => 12-sets in [3]^n have Lines => 12-sets in [3]^n are Subspace Rich (VM-ise) => 12-sets in [3]^n can be partitioned => DHJ(3) (because Line-free sets correlate with 12-sets)

    Now presumably the argument continues: DHJ(3) => “lower complexity” (??) sets in [4]^n have Lines => lower complexity sets in [4]^n are Subspace Rich => lower complexity sets in [4]^n sets are partitionable => DH(4)…

    Sorry, I guess this is already more or less on the wiki already and I’m just thinking out loud / catching up. Is this right?

    • Randall Says:

      I confess I have trouble following these outlines (partly because I just think rather slowly) but I think both proofs of the part in question that we have, Tim’s via almost-partitioning a 1-set into subspaces and Terry’s via dense fibers (really, these are essentially the same proof) proceed via…well…partitioning a 1-set into subspaces. (This is of course a tautology in the case of Tim’s argument, but it’s true of Terry’s as well because the greedy algorithm he uses may in general throw away almost the entire 1-set in question before it locates a suitable set of “maximal density” or drives the density on the remaining part to 1.) More generally, I tend to think Tim is right than any argument that is going to work at all has got to be powerful enough to partition a 1-set into subspaces, so you might as well do exactly that because it’s much easier to follow the proof in that case anyway. (I at least find Tim’s argument easier to follow than Terry’s, despite the fact that Terry’s is closer in spirit to the ergodic heuristic…and also despite the fact that they are essentially the same argument, of course.)

  8. Ryan O'Donnell Says:

    1056. Wikifying.

    Just wanted to say I’ve started trying to build a wiki page devoted to giving the all-details-filled-in version of the density-increment proof of DHJ(3). I’ll report back when I’ve fleshed it out some.

    • Ryan O'Donnell Says:

      Reader, you too should feel free to flesh it out, of course!

    • gowers Says:

      Ryan, one aspect of the outline you’ve got on the wiki is different, though perhaps not truly different, from how it is at the moment. You deduce that an ij-insensitive set is subspace rich from the fact that it has subspaces, whereas what we have at the moment deduces it from the fact that a dense subset of {}[2]^n is subspace rich. In both cases the proof is that we can express the measure as a positive linear combination (plus an error if necessary, but I think you’ve managed to avoid that) of suitable measures on subspaces. The way you do it, you need to add the extra remark that the restriction of an ij-insensitive set to a subspace is still an ij-insensitive set.

      I haven’t checked, but I think it may be that there is a sort of commutative diagram that allows one to see that the two proofs are exactly the same, in the sense that there’s a one-to-one correspondence between the sets they consider. If that is the case, then I find it slightly more natural to prove the subspace richness in {}[2]^n and then lift it up to ij-insensitive sets in {}[3]^n.

    • Ryan O'Donnell Says:

      1056.3. Tim: I agree with everything you wrote 100%, and in fact at some point I independently decided that it made more sense to commute those two steps: i.e., first show that dense sets in [2]^n are subspace rich, and then pull up to ij-insensitive sets in [3]^n.

      Besides being arguably more natural, doing it this way has a small benefit: If we really really care, we can use the Gunderson-Rodl-Sidorenko argument to get that dense sets in [2]^n have large subspaces, and this paper has way better quantitative parameters than the lines->subspaces argument we use in general.

      (On this point, I should note that the alternative-to-GRS lines->subspaces deduction I put on the wiki a while ago was extremely wrong, so I deleted it. At some point we should see if we can fix it; if GRS can get civilised parameters, hopefully we can also get civilised parameters for lines-in-[3]^n -> subspaces-in-[3]^n.)

  9. Ryan O'Donnell Says:

    1057. Wikifying.

    Just wanted to report that I’ve gotten about 50% of the way through making the aforementioned stub article for the fleshed-out proof. It’s currently an orphan page on the wiki; when/if I get to 90%+ I’ll put a link to it on the main page.

  10. Gil Says:

    While following, and cheering the emerging proof on the wiki, and resting a little from the serious efforts that took place here over the last 6 weeks or so (and I hope there will be a follow up math discussion (not just meta discussion) for a couple of weeks even after the success,) can I offer to amuse ourselves with the silliest spin-off problem that was offered in these discussions: (naturally, by me): What is the size of the largest family of subsets of {1,2,…,n} without two sets A and B so that A\B is twice as large as B\A. Any slice will do and we can (as pointed out by NK) add to the middle slice also silices of size n/4, n/8 etc. Can we do better?

  11. Kristal Cantwell Says:

    1058. Hyper-optimistic conjecture

    Could the Hyper-optimistic conjecture be extended beyond 3, and if so if it is true would it imply DHJ(n) for higher values of n. For example if it is true for 4 would we have DHJ(4)?

  12. Gil Says:

    1059. hyper-optimism
    Great question. I looked at the wiki and did not find it stated beyond 3. I also do not see off hand what it will give.

  13. Kristal Cantwell Says:

    1060. Corners theorem

    Could the corner’s theorem be extended to higher dimensions? Has it been already? I think that such a result might be useful in trying to get DHJ(n)
    from the hyper-optimistic conjecture for values of n greater than three.
    Even without the corner’s lemma if we could get the grid a+b+c+d=n
    of quadruples (a,b,c,d) with density delta contains (a+r,b,c,d),(a,b+r,c,d),
    (a,b,c+r,d),(a,b,c,d+r) that and its extension into n might be useful in a possible proof.

    • gowers Says:

      The corners theorem is known in all dimensions, either by ergodic theory or as a consequence of hypergraph regularity. (It also follows from DHJ but obviously that’s no help if you want to use it to prove DHJ.) The hyperoptimistic conjecture can be formulated for any k, and I do not know of any counterexamples. The result you talk about with the grid is equivalent to the 3D corners theorem, if I interpret correctly what you are saying.

      I recently tried to come up with a lower bound for DHJ(3) that would beat the Behrend bound applied to slices, but had absolutely no luck. But I’d be amazed if the hyperoptimistic conjecture turned out to be true …

  14. gowers Says:


    What are people’s views about writing up? Ryan wanted to wait, and I know others feel that there isn’t a big hurry. I myself would very much like to see a complete write-up, and would like to be closely involved in it, but I don’t mind waiting a bit more because I’m pretty busy over the next couple of weeks. I thought I’d ask though. For example, if I were to produce a first attempt at a plan of a paper, would that be welcomed, or would the other potential writers rather I didn’t?

    • Ryan O'Donnell Says:

      Hi Tim. Yes, I did suggest waiting a bit, but it seems like we all collectively have done so, perhaps unintentionally. I too have sorely needed to catch up on some other projects, which won’t be wrapped up till April 2. I still hope to finish the wiki outline I started.

      In any case, I would be fine with the idea of writing beginning in a week or two or three, whenever we get freer. I can contribute some writing, or Tim if you want to, please do so. I’d be happy with a model wherein anybody who felt they had time to write could just notify the others that they’re doing so. Tim, if you or someone manages to put out a full draft at some point, why not?

    • jozsef Says:

      It would be great if Tim could write up the proof. I’m trying to write a toy version of the proof for corners on the Cartesian product AxA, where A is a Hilbert cube. A few weeks ago my first try didn’t go well as I followed the original Ajtai-Szemeredi proof. I knew it quite well and didn’t pay attention to Tim’s modified proof. Now I see that actually the key of the DHJ3 proof is hiding there. So, I advise the interested reader to compare Tim’s version to the original, even if one knows the Ajtai-Szemeredi proof very well. The two wiki entries are:
      “Ajtai-Szemerédi’s proof of the corners theorem”

      and “A Modification of the Ajtai-Szemerédi argument”

    • Terence Tao Says:

      I plan to revisit the writeups I have of the triangle removal proof and the Furstenberg-Katznelson proof, and clean them up – right now I get the sense that they’re not really readable by anyone other than myself. But it does seem like the optimal strategy is to move slowly, in case some simplifications crop up that can be taken advantage of.

      For instance, I found it very useful in those two proofs to play around with substitution maps such as \rho_{1 \to 2}^I: [3]^n \to [3]^n (defined as the map that takes a string x in [3]^n and changes all the 1s in positions in I to 2s) in order to build lines (e.g. (x, \rho_{1 \to 2}^I(x), \rho_{1 \to 3}^I(x) is a line whenever x has at least one 1 in I). This might help with some of the arguments in the density-increment argument, though I haven’t checked it.

  15. Kristal Cantwell Says:

    1061. Hyper-optimistic conjecture

    I think I can get the hyper-optimistic conjecture implies DHJ(k).
    I have to assume I have DHJ(k) for density is equivalent for DHJ(k)
    for equal slices density as in k=3. Then the corners theorem makes the
    density of slices a without corners arbitrarily low then the hyper-optimistic conjecture transfers the small size to the equal slices density and the logical equivalence makes the density arbitrarily low. So If I start with
    density epsilon I can find a large enough n such that the corner free space has density much less than epsilon and that means by the hyper-optimistic conjecture the equal slices density is much less than epsilon and hence by logical equivalence we have the regular density must be low and we are done.

  16. Gil Kalai Says:

    1062. Finer slice decompositions.

    There are “finer decompositions” (compared to slices) of \{0,1,2\}^n that can also plays a role. For example, we can consider vectors woth a ’0′, b ’1′ so that the sum of indices of coordinates with value ’0′ is x, and the sum of indices of coordinates with value 1 is y. If we denote these vectors by \Gamma _{a,b;x,y} we get a finer slice decomposition which can be of interest.

    • Gil Says:

      hmm, when we consider union of such finer slices \Gamma_{a,b,c;x,y,z} (or perhaps a different forms of slicing) there are several issues that seems relevant. (But they do not suggest that union of such fine slices will give better constructions.)

      One thing is this: Consider a subset of {1,2,…,m} \times {1,2,…,k} (where km=n) with out 3 terms A P. Can you get examples with higher densities than for {1,2,…,n}? (lets assume k=m for simplicity) (Probably this is known to experts.)

      Another thing is that there are various extensions of Sperner lemma which show that under more general conditions you get the same bound. One is the 2-part Sperner lemma of Katona. Another is Kleitman’s solution of Littlewood-Offord problem. So if we think its fruitful to regard DHJ(3) as extension of DHJ(2). Is there an analog for Kleitman’s theorem?

  17. Kristal Cantwell Says:

    1063. Hyper-optimistic conjecture

    Apparently this is not true for the generalization for 4 see post
    1185 in the thread


    Congratulations to Klas Markström for finding the counterexample for this.

  18. Gil Kalai Says:

    1064. more on the analogy with Sperner

    We want to find a set avoiding a configuration C inside a large set X. If we can show that for some subset Y of X the maximum density of a C-avoiding subset is at most c|Y| then usually this is good. When there is a group acting like in the cap set problem we can show the same bound for subsets of X that avoid C. But even if there is no group action this can still be useful.

    For Sperner the basic observation is that if we take Y to be a chain then a line-free subset of Y contains at most 1 element. So the density of line-free subsets of chain is very small and this can be pushed to various proofs.

    So the question is (and we discussed it more or less in these threads, i just repeat it): is there a good analog for “chains” for DHJ(3). A small configuration (and I do not know if small is in the order of 1 or of 2^n up to poly(n)) on which we can show (or at least conjecture) that the density of line-avoiding subset must be small. (Thinking about combinatorial subspaces is sort of “cheating” and in any case not the analog for the case of Sperner.)

    Simple questions to state are: Is there any subset Y of {0,1,2}^n on which we can show (more easily) that the maximum density of line-free subsets is small. Is there a subset Y where this maximum density is expected to be smaller than that for the entire {0,1,2}^n?

  19. Terence Tao Says:


    I gave a talk at UCLA on the three combinatorial proofs (or proof sketches) we have, namely the density increment-based proof, the triangle-removal proof (based on Austin’s ergodic proof), and the finitary Furstenberg-Katznelson proof, with some notes at


  20. gowers Says:

    1065. Writing up.

    Here’s a possible structure for a write-up of the density-increment argument. I’m planning to make a start on this, but I’ll try to make it as modular as possible so that if people want to then it may be possible, without a major rewrite of the whole thing, (i) to make local changes and (ii) to switch the order about.

    \address{The world} [Not, of course, intended to be taken seriously.]


    \title{A new proof of the density Hales-Jewett theorem}



    Basic definitions and statement of the theorem.

    History of and motivation for the problem.

    Discussion of related results, including the corners theorem.

    \section{A sketch proof of the corners theorem}

    A fairly detailed sketch of the modified Ajtai-Szemer\’edi argument.

    \section{Different measures on $[k]^n$.}

    Definition of equal-slices measure and the P\’olya urn measure.

    Motivation for considering these other measures.

    Proofs that dense sets in any of those two measures or the uniform
    measure can be restricted to subspaces where they retain their
    density in any other of the measures.

    \section{Three important lemmas}

    \subsection{The multidimensional Sperner theorem.}

    \subsection{A dense line-free set correlates locally with a 12-set.}

    \subsection{A 23-insensitive set can be almost entirely partitioned into
    fairly large combinatorial subspaces.}

    \section{A proof of the theorem for $k=3$.}

    \section{A proof of the general theorem}

    \subsection{Uniform DHJ(k-1) implies that an equal-slices-dense
    subset of $[k-1]^n$ contains many combinatorial subspaces.}

    \subsection{Putting everything together.}


  21. jozsef Says:

    1066. Dear Tim, I still don’t have enough time to work out the details of the proof. Next week is the last week of classes here, so I will make some progress after that, I hope. Until then let me ask a question about the second outline of a density-increment argument. I don’t understand the argument after Substep 2.2. In each step you keep a 3^m “portion” of Z and partition the remaining B into \sim 3^M classes. Elements which partitioned into different classes will never be considered together in a subspace again. What I can’t see is that why will be the intersection of the 1-set and the 2 set large in a subspace? Isn’t it possible that only a small subset of the 1-set is element of a subspace S_i? Most likely I overlooked something. I will check it tomorrow again.

  22. jozsef Says:

    1066. (contd.) I still can’t see how to finish the sequence of iterations. The question is that what happens when Substep 2.3 ends. Given this rapidly growing family of partitions. If a partition is sparse then we don’t do anything with it, if dense AND large enough, then we find a subspace in it. What happens at the end when we have many small and dense partition classes? I’m stucked. I’ll get my morning coffee now…

    • gowers Says:

      I don’t know if this fully answers your question, but when we have a dense set, we don’t just find a subspace in it — we find a whole family of subspaces whose union is dense. Therefore, the number of steps in the iteration is bounded. But I’m answering this question without looking at the proof — I’ll give another answer in a moment.

    • gowers Says:

      I’ve had another look, and I still don’t quite understand what your difficulty is, so I’ll have to hope that my first reply answers it, or at least prompts you to explain what it is in terms that I understand better.

    • jozsef Says:

      During the iterations if a B_i is sparse, the you do nothing with it, if it is dense then you choose Z_1,\ldots , Z_m. The union of them, Z, has size at most M. It works fine until B_i has size at least M. What do you do with smaller B_i-s? I don’t see that this error term – the number of elements remaining in small B_i-s is small.

    • jozsef Says:

      Maybe I didn’t follow the right algorithm; Do we have a good bound on the number of iterations? I think that now I see; For any element, there is a very high probability that after a few iterations this elements in a selected subspace. Right?

    • gowers Says:

      I still don’t understand, but largely for “local” reasons. For instance, B_i is a subset of a subspace S_i. Since the number of iterations is bounded, we can ensure that the dimension of S_i is much larger than M all the time. When you talk about the cardinality of B_i, do you actually mean the dimension of S_i, and when you say “at least” do you actually mean “less than”?

      Also, it’s not clear to me whether your problem arises when we almost partition a 1-set, or only when we look at the intersection of a 1-set with a 2-set. Assuming it’s the first, then here is the rough idea of the argument.

      We begin by choosing M coordinates and partitioning [3]^n into 3^M subspaces according to those coordinates. We then choose Z_1,\dots,Z_m inside the M chosen coordinates such that for many x the subspace x+\langle Z_1,\dots,Z_m\rangle is a subset of \mathcal{B}. Then we remove all subspaces of the form x+\langle Z_1,\dots,Z_m\rangle from \mathcal{B}. This removes a positive fraction (depending on m and the error \eta you want to end up with) of the elements of \mathcal{B}. Also, one can check that after removing these elements, the restriction of the remaining set to any of the 3^M subspaces S_i is still a 1-set. Therefore, we can iterate, except that we do nothing if we have a set of density less than \eta/2 in a given subspace. Let us call such sets discarded. The total measure of the discarded sets cannot be more than \eta/2, since they are contained in disjoint subspaces inside which they have relative density at most \eta/2.

      After a bounded number R of iterations (depending on m and \eta) we have either discarded or put into subspaces all but at most \eta/2 of \mathcal{B}, and there is always space to do this provided that RM is substantially less than n. So provided n is large enough, the argument works.

      I’ve written that in the hope that you either agree with it or can point to the bit where I seem to be going wrong. (Even if it is wrong, I have theoretical reasons for believing that some argument like this has to work, at least if Terry’s argument about correlations with 1-sets implying correlations with subspaces was correct, which it seemed to be.)

    • jozsef Says:

      Tim, first of all I agree that the original argument works even for an incomplete partitioning. I don’t think that the argument is incomplete, I’m just slow following it. In this particular case, I think that I didn’t see the good bound on the number of iterations, I though that the fast growing number of partition classes might be a problem.

  23. jozsef Says:

    So, what you proved is a strong partition in the following sense; For every c>0, \epsilon >0 and m, there is a number K, that any c-dense 1-set has a large, 1-\epsilon-dense subset which is the disjoint union of (several) translates of K m-dimensional subspaces.

  24. Ryan O'Donnell Says:

    1068. Hi Tim, re the outline in 1065:

    Looks good. One question: It looks like you’ve divided the proof into three main lemmas: multidim-Sperner (more generally, multidim-DHJ(k-1)), line-free set correlating with intersections of ij-insensitive sets, and ij-insensitive sets being partitionable.

    It seems to me that the Varnavides-version of multidim-Sperner (more generally, multidim-DHJ(k-1)) may as well be considered the basic lemma. Where will this go?

    Putting it into \subsection{The multidimensional Sperner theorem} makes sense to me, although then the actual \section{A proof of the theorem for $k=3$.} might be quite short. On the other hand, if it goes into the proof section itself, then the multidim-Sperner therem subsection will be awfully short (might as well just quote Gunderson-Rodl-Sidorenko).

    The latter seems less modular to me, so I guess what I’m ultimately suggesting is that \subsection{The multidimensional Sperner theorem} be more like \subsection{The Varnavides multidimensional Sperner theorem}.

    Except that I strongly vote for using a more generic descriptor than “Varnavides”. There’s got to be a catchy word that indicates to the reader that not only do dense sets contain lines, a random line is in there with positive probability.

    Also, I’m still of two minds as to whether “Equal-Slices” should be treated as the main distribution, with Polya as a slight variant, or vice versa.

  25. Ryan O'Donnell Says:

    1069. Notation and terminology.

    Pre-writing, it might be helpful to fix our notation and terminology. I know it’s not crucial, but I think getting just the write names and letters can really help a reader. We may also try to keep things switchable via appropriate use of macros and the xspace package in LaTeX (see the first two comments to the post here: http://terrytao.wordpress.com/2007/06/08/advice-on-mathematical-careers-and-mathematical-writing/), although I must confess I sometimes find it tough to rigorously stick to using macros.

    Anyway, here are some questions we might debate:

    . are sets called $A$ or $\mathcal{A}$?
    . what should “Equal-Slices” measure actually be called (e.g., do probabilists already call this something else?)
    . what “Polya Urn” measure should actually be called? (same question)
    . what letter to use for each of them? what letter to use for the uniform distribution?
    . how to denote drawing a random line or a random subspace from them?
    . [k] = {0, 1, …, k-1} or {1, 2, …, k}?
    . unify terminology of ij-set, (special) set of complexity t, ij-insensitive set, etc.
    . questions from the previous post: what short words/phrases can we use to indicate that not only do dense subsets of [3]^n contain lines, they actually contain large-dimensional subspaces, and in fact a random large-dimensional subspace is in with positive probability?

  26. Terence Tao Says:


    I transferred Tim’s outline (and Ryan’s questions) to the wiki at


    I figure that the wiki format might be better for the topmost level of organising the paper, and then we can switch to LaTeX once all the top-level stuff is finalised.

    Some thoughts on notation:

    I guess, on balance, that [k]={1,…,k} looks slightly nicer than [k]={0,…,k-1}; the 0-based notation is slightly more “logical”, but we don’t seem to derive any substantial benefit from it. So I’m weakly in favour of {1,…,k}.

    We can borrow from geometry and use the notation Gr( [k]^n, d ) to denote the d-dimensional subspaces of [k]^n (a “combinatorial Grassmanian”). I don’t know what to call the measures on this space though. Does every measure \mu on [k]^n canonically define a measure on Gr( [k]^n, d )? It seems to me that one needs some additional parameters to specify such a measure.

    “A” versus “{\mathcal A}” – I would prefer A, as I want to think of a subset of [k]^n as a set of points, rather than a collection of sets (which is what the {\mathcal A} notation suggests to me). The one problem with using A is that we are also likely to be using A for subsets of [n]^2 in the corners theorem, and if we are going to discuss the reduction of corners to DHJ then there might be a very slight notational clash there. But perhaps this is actually a good thing, since we want to use the corners argument to motivate the DHJ one…

    As for the last question, what about “Hales-Jewett property” for containing lines, “subspace Hales-Jewett property” for containing subspaces, and “subspace Hales-Jewett-Varnavides property” for containing subspaces with positive probability? (thus, e.g. “dense ij-insensitive sets obey the subspace Hales-Jewett-Varnavides property”)?

  27. Ryan O'Donnell Says:

    I have collected up a small amount of free time and was planning to put some time into the writing-up. I agree that it will be nice to fix the overall outline before starting. Here are some more comments:

    . I propose continuing this discussion not here but on the discussion page of the wiki page Terry made for the paper outline. I will copy this post to that page too.

    . Since I will likely never finish it, I put a link to my old outline of the proof on the wiki’s home page. I kind of like that outline too :) although of course it’s quite similar to Tim’s.

    . Changed “author” from “Polymath” to “A. Polymath” (as Terry had suggested a while ago). Reasons: a) Automatic indexing systems probably like authors to have both a “first name” and a “last name”. This will help the author list to get sorted under “P”. b) The next projects could be authored by “B. Polymath”, “C. Polymath”, “ZZ. Polymath”, etc. This would also help get all projects sorted under “P”, yet differentiate the author groups. c) Subtle allusion to A. Nilli. :)

    . Address: perhaps could change to the URL of the wiki. [Perhaps we should get a stabler one?] I suppose we’ll need a corresponding address for the paper-journal version though…

    . I agree with Terry’s votes for [k] = {1, …, k} and A rather than \mathcal{A}.

    . I’ll admit it, I always find the terminology Grassmannian “scary”. If it were me I would just try to refer to “d-dimensional subspaces”. Not sure we need a name for the set :)

    . Naming the distributions: I think the phrase “equal slices” is unbelievably catchy, and this is cute. On the other hand, it’s hard to deny the officialness of “random ordered 3-partition“. I mean, such terms are in Chapter 1 of Stanley :) Either becomes a bit of a mouthful when you get to its sister distribution: “equal nondegenerate slices” or “ordered weak 3-partition” (“weak” = “parts of size 0 allowed”, apparently). So I don’t know…

    . For the letter, who knows? I’m kind of against \mu as it sounds like the uniform distribution to me. \nu is a nice letter for measures. Or maybe \lambda (for “line”) or \pi (for “partition”)…

    . For the last question, I know Varnavides’s name gets attached to this notion, but I must say it is the opposite of evocative for a nonexpert like me. It actually took me a good few hundred blog posts before I understood that “Varnavides = contains random lines w. positive prob.” Guided by Tim’s spirit of trying to make the paper as easy-to-read as possible, how about…

    “DHJ(3) = Nonnegligible sets in [3]^n contain a line.” Then:
    “Nonnegligible sets in [3]^n contain a high-dimensional subspace.”
    “Nonnegligible sets in [3]^n contain a nonnegligible fraction of lines.”
    “Nonnegligible sets in [3]^n contain a nonnegligible fraction of high-dimensional subspaces.”

    Wordy, perhaps, but I think it gets the idea across straight away. Also, using the same word “nonnegligible” subtly gets across the key fact that we’re actually talking about the same distribution..

    Or does this just sound too weird to an arithmetic combinatorialist’s ears?

  28. gowers Says:

    Ryan, I agree with almost all of what you say. My main quibble would be that I prefer “sets of positive density” to “nonnegligible sets”. Of course, that really means “density bounded away from 0 as n tends to infinity” but that kind of shorthand is very standard. So I’d end up with something more like

    “Sets of positive density in [3]^n contain a positive proportion of all lines/subspaces.”

    I’ve never liked Varnavides getting his name attached to a very standard averaging argument.

    My vote would be for “equal-slices” with a comment about how it relates to known terminology.

    I’m slightly more in favour of \mathcal{A} than you and Terry, but I see arguments on both sides and so am perfectly willing to be outvoted on that one.

  29. Terence Tao Says:

    I modified the outline slightly to reflect the preceding discussion. I also added a sample abstract, but am not seriously committed to it, feel free to modify.

    I also added some queries, for instance: should we have a section explaining what Polymath is, the history of the project, and/or the names of contributors?

    I also added a discussion section where we can talk about quantitative bounds, relationships to other proofs of DHJ, and other sundry topics.

  30. Ryan O'Donnell Says:

    I committed to giving a talk about this at Microsoft Research in 3 weeks, so I am thinking now more about the structure. A few thoughts that occurred to me after this morning’s work:

    . It wasn’t too productive to worry much about the name of the distribution, since I invariable referred to it by its letter anyway. I was writing \nu^n_k for the distribution on [k]^n.

    . It was more convenient to think of the equal nondegenerate-slices distribution as \nu and of equal-slices as the “variant”. This was because the nondegenerate version seems to show up more in the statements of results, whereas the pure equal-slices seems to show up more in the grungy technical details of measure arguments.

    . It wasn’t too important to name the set of d-dimensional subspaces either, because the point of the distribution \nu is that lines are identified with strings in [4]^n, and more generally, d-dimensional subspaces of [k]^n are identified with strings in [k+d]^n.

    . It may be convenient (I was doing this at one point) to write things like $\latex \nu_4^n(A) \geq \delta’$ to mean, “if one chooses a random line in [3]^n, it is entirely contained within A with probability at least \delta'“.

    . I agree that “dense” is better than “nonnegligible”, after all. E.g., on my roughest sketches of slides, I was writing things like:

    “DHJ(3) = if A \subseteq [3]^n with \nu_3(A) \geq \delta, then $\latex \nu_4(A) > 0$; i.e., if A has constant \nu_3-density then it has positive \nu_4-density.”

    “DHJ(3) implies if A \subseteq [3]^n with \nu_3(A) \geq \delta, then $\latex \nu_4(A) \geq \delta’ = \delta’(\delta)$.” [the Varnavides version]

    “DHJ(3) implies if A \subseteq [3]^n with \nu_3(A) \geq \delta, then $\latex \nu_{3+d}(A) \geq \delta’ = \delta’(\delta,d)$.” [the Varnavides version for d-dimensional subspaces, etc.]

    To be continued…

  31. Ryan O'Donnell Says:

    Here is another structural idea that I was using in my slides draft — which might also be worth considering for the paper version:

    First, a section on probability distributions and the various technical lemmas.

    Second, a section showing how to deduce the contains-a-subspace, contains-a-constant-density-of-lines, and contains-a-constant-density-of-subpsaces versions of DHJ from the simple, contains-a-line version.

    Third, sections on the argument for ij-insensitive sets, partitioning these into subspaces, density increment, etc.

    The idea of this structure is that it kind of isolates the more “essential” ideas involved in pulling up DHJ(3) from DHJ(2) from some of the more generic stuff. I.e., a sophisticated reader of the paper might skim through the “second” section described above and say, “yeah, yeah, okay, I get the idea from the sketches, this all seems like straightforward tricks and things”, and feel like “nothing has happened yet” after the second section.

    This I think would be good; this feeling of “nothing has happened yet” is what I’d hope for. Then, when they settled in to read the third (and succeeding?) sections they’d feel like they were properly starting the argument — which is the feeling I’d actually like to convey.

  32. Ryan O'Donnell Says:

    As a tiny point: It occurred to me that sequential ordering of initials — A., B., C., Polymath — will never work; it was hard enough to get comments numbered consistently. How would various groups known not to step on each other’s initials? Heck, for all I know the other, small-n thread may already have taken A. Polymath. Perhaps the initials could be chosen in a problem-centric way, then; I would of course propose “D.H.J. Polymath”.

  33. Ryan O'Donnell Says:

    Also: Terry, the abstract you wrote looks good to me.

  34. gowers Says:

    Ryan, I definitely like your “nothing has happened yet” approach. At best, we could get the technical stuff down to some easily comprehensible black boxes that would make the later proofs very streamlined, and I think that ideal may be attainable.

  35. Ryan O'Donnell Says:

    I hope so too. By the way, it appears as though the blog is down.

  36. Ryan O'Donnell Says:

    I hope so too. By the way, it appears as though the blog is down. Or more precisely, Michael Nielsen’s whole site is down.

  37. Ryan O'Donnell Says:

    Seems like the blog just went back up.

    Also, I understand now a small point that Tim clearly understood a while ago; namely, it is not necessary to show an abundance of d-dimensional subspaces to execute the partitioning step. Rather, that step only uses existence of a d-dimensional subspace.

    Although one may as well prove abundance at some point, for kicks.

  38. Ryan O'Donnell Says:

    I’ve been thinking about the partitioning argument. Here are a few points:

    . Seems to me the partitioning is pretty much the center part of the argument. I mean, once everything is set up, the part about a line-free set correlating with an intersection of ij-insensitive sets is, well, no more than 2 or 3 sentences. So partitioning might deserve a section of itself, or even a billing as “the main lemma”.

    . I don’t currently see a natural way to do it under something other than the uniform product distribution. That would be a bit of a shame if true, since it would be nice if we didn’t have to do *too* much back and forth and back and forth and back with the measures. For example, I was hoping we wouldn’t have to dust this argument off, but perhaps we will. I want to think about it.

    . In Terry’s UCLA talk description of it, he describes the partitioning step as “quotient down to [2]^n, then use the greedy algorithm to pick out large-dimensional subspaces”. Picking out large-dim subspaces in [2]^n is particularly easy if you’re willing to appeal to [GRS]. Can the actual argument for the partitioning really be made this easy? (I know Terry was short on space & time and so of course could not write out a full argument.)

    . Does anybody actually know the [GRS] argument cold? (Jozsef?) From some brief glances, it looks like they’re also doing a “partitioning” style argument…

    • jozsef Says:

      Hi Ryan, I think we should partition a 1-set, {\cal{A}}\times [2]^n, instead of a set system {\cal{A}}\subset [2]^n like in the [GRS]. Probably that’s what you meant, I just want to emphasize that this partition is quite different (and as I see it is much stronger) from [GRS]. I would put [GRS] as a reference there, and would write the probabilistic argument for the almost perfect tiling. If you wish you can leave this part to me, I wanted to write more about it anyways.

  39. Kristal Cantwell Says:

    1070. Outline of a proof of DHJ(3)

    I have an outline of a proof of DHJ(3) and an outline of its extension to DHJ(n) I am going to post them here to see if anyone sees any gaps or errors. If it works I think it would be simpler then the current proof.

    Outline of proof of DHJ(3)
    We get an 12 insensitive set.
    We get a space of density epsilon
    Actually this could be a much smaller constant function of epsilon
    inside the 12 insensitive set so it inherits the 12 insensitivity.
    We get a space in which the only coordinates
    That change are 2 and 3 with density delta
    much lower than epsilon.
    We do this be getting a subspace of the original
    space which projects down to this space if the preimage
    that projects down to the space has density epsilon
    for each point then we are done otherwise.
    We remove the point that has the low density get
    A slight increase in the density of the remaining spaces
    Associated with the preimages in the then take one
    of the preimages that has the slightly higher density
    and repeat the process this will result in a density
    increment if we cannot repeat the process often
    so it terminates
    and we get a set of primages with the desired density
    from which we can get the desired space.

    Then we use DHJ(2) to get a
    combinatorial line of length two in the space.
    We use the fact the space is 12 insensitive to
    extend the line to length 3 and we are done.

  40. Kristal Cantwell Says:

    1071 Outline of proof of DHJ(n)

    We get an 12 insensitive set.
    We get a space of density epsilon
    Actually this could be a much smaller constant function of epsilon
    inside the 12 insensitive set so it inherits the 12 insensitivity.
    We get a space in which the only coordinates
    That change are two through n with density delta
    much lower than epsilon.
    We do this be getting a subspace of the original
    space which projects down to this space if the preimage
    that projects down to the space has density epsilon
    for each point then we are done otherwise.
    We remove the point that has the low density get
    A slight increase in the density of the remaining spaces
    associated with the preimages in the then take one
    of the preimages that has the slightly higher density
    and repeat the process this will result in a density
    increment if we cannot repeat the process often
    so it terminates and from the primages at the terminal step we get the desired space.

    We use DHJ(n-1) to get a Combinatorial line of length n-1 in the space
    We use the fact the space is 12 insensitive to
    extend the line to length n and we are done.

  41. gowers Says:

    1072. I’m afraid I’d need a lot more detail before I even began to understand that. Just to give a few examples, what do you mean by “We get” at the beginning? What does the second line mean?

    I’m potentially interested though, because you seem to be going for a Shelah-type approach, which I was attempting to do in some earlier comments. Do you think you could expand on what you have written? Perhaps if it would be on the long side you could do it on the wiki — perhaps just for k=3 to start with.

  42. Kristal Cantwell Says:

    1073. I can’t get the 12 insensitive sets as easily as I thought. The idea won’t work without them. I doubt I can fix this.

  43. Ryan O'Donnell Says:

    1074. Outline.

    I’ve thought for a while and I can’t really see how to carry out the partitioning argument under the equal-slices distribution. For every technical difficulty that I worked around, a new one would pop up, and I was never able to simultaneously handle all of them. (The main problem: what “progress measure” to keep track of, akin to the “total uniform-distribution mass”.) I still hold the intuition that there should be a slick way to do it, but I’m stuck. Perhaps one could turn the whole argument into a “density-increment on an ij-insensitive set” argument rather than a “density-increment on a subspace” argument…?

    In any case, I therefore reluctantly propose that in the writeup, the uniform distribution be treated as the “main” distribution, with equal-slices as a secondary distribution one needs to slip in and out of to facilitate the argument. I say reluctantly because I feel that the equal-slices distribution on lines is so natural that it may even be that there is some superb (polynomial?!?) relationship between equal-slices point-density in [3]^n and equal-slices line-density in [3]^n. It would be nice to spend some time thinking about global obstructions.

    Anyway, I guess that means the outline of the measure-changing tricks is:

    1. Start with a set of uniform density \delta.

    2. Argue that one can pass to a subspace where both the [3]^n-equal-slices and [2]^n-equal-slices density are at least \delta - o(1). (Or else, one can get a subspace where the [3]^n-equal-slices density increments — in which case, one can further pass to a subspace where the uniform density increments.)

    3. Either find a line, or get an equal-slices density increment inside an intersection of 13-insensitive and 23-insensitive sets.

    4. Use this argument to convert this into a uniform-distribution density increment inside an intersection of 13-insensitive and 23-insensitive sets. (One has to note that that argument preserves the property of being an intersection of 13-insensitive and 23-insensitive sets.)

    5. Run the partitioning argument under the uniform distribution. Note that that argument doesn’t really need “subspace richness” per se, because it implicitly proves it anyway. (It’s okay, I think, to have c to include a 3^{-M} factor.) I.e., all it needs is existence of d-dimensional subspaces for [2]^n-uniform dense sets.

    6. Thus get a uniform density-increment on a subspace.

  44. Ryan O'Donnell Says:

    1075. Hmm. Once more my mind changes. It’s really essential to be in equal-slices mode when doing the line-free-sets-correlate-with-ij-insensitive stuff (so that a random line is in with nonnegligible probability); and even more is this so in the generalization to [k]^n.

    Further, I think I see a non-horrendous way to carry out the partitioning argument under equal-slices. So, I propose going back to “equal-slices is the main measure”.

    Re partitioning: the argument is written for the uniform distribution, but it seems like it really only needs a product distribution to work. (The key need is the ability to break strings up into pieces and have the global distribution be the product of the two marginal distributions.)

    Luckily, equal-slices is a mixture of product distributions. (There’ll be a bit of a technicality here probably, where one will want to throw out the product distributions (p_1, p_2, p_3) which have a component extremely close to 0 or 1.) So if you can partition an ij-insensitive set into d-dimensional subspaces up to tiny error under any product distribution, then you get a “fractional cover” of it by \langle subspace, product distribution \rangle pairs.

    So you should be able to get a density increment for A on some d-dimensional subspace under some product distribution. Which you can convert back to a density increase under equal-slices by an easy passing-between-measures argument.

    seem plausible?

  45. Ryan O'Donnell Says:

    1076. Simplifying the first step in the proof.

    Recall that in the first step of the proof, one does a little song and dance to arrange things so that A has density \geq \delta - o(1) under \nu_3 (equal-nondegenerate-slices on 3 characters) and simultaneously density at least some f(\delta) > 0 under \nu_2. (We currently get f(\delta) = \delta - o(1), but it would be okay to get, say, f(\delta) = \Theta(\delta^{3/2}) as we’ll do here.) It requires some hacking around with total variation distance between product distributions..

    Here is a sketch of an arguably more straightforward method.

    In this sketch I will: a) ignore the distinction between equal-slices and equal-nondegenerate-slices when it’s convenient (this only changes statements by an additive o_n(1)); b) ignore the tiny technicality that when one passes to a subspace [3]^{T} where T is a random set of coordinates, one has to take care of the unlikely event that |T| is outrageously smaller than expected.

    Define a random restriction (S,\ell) as follows: choose z \sim \nu_4^n and let S be the coordinates where z has 4‘s; also choose \ell \in [3] randomly. The meaning of “restricting according to (S,\ell)” is that we think of forming a string in [3]^n by fixing the S coordinates to be all \ell‘s and leaving the remaining coordinates \overline{S} = [n] \setminus S “free”. (Note that z only plays a role insofar as it determines the cardinality and location of S.) A basic observation aboute qual-(nondegenerate)-slices is that if we draw a random restriction (S,\ell) and then draw the free coordinates from \nu_3^{\overline{S}}, we precisely get the equal-(n.d.)-slices distribution \nu_3^n.

    Hence \mathbf{E}_{(S,\ell)}[\nu_3^{\overline{S}}(A_{(S,\ell)})] = \delta, and by the usual argument we can in fact ensure that \nu_3^{\overline{S}}(A_{(S,\ell)}) \geq \delta - \delta^{10}, say, for “almost all” restrictions (S,\ell). (If the variance of \nu_3^{\overline{S}}(A_{(S,\ell)}) is tiny then this holds; otherwise, we must have a density increment for some (S,\ell).)

    Consider now drawing a random restriction (S,\ell) and filling in the free coordinates according to \nu_{\{i,j\}}^{\overline{S}}, where \{i,j\} = [3] \setminus \{\ell\}. This gives some nonstandard distribution \lambda on [3]^n.

    Claim: \lambda(A) \geq \Omega(\delta^{3/2}).

    Once we show this claim the argument concludes as follows: For an \Omega(\delta^{3/2}) fraction of (S,\ell) we have \nu_{\{i,j\}}(A_{(S,\ell)}) \geq \Omega(\delta^{3/2}). Hence there exists some (S,\ell) for which both \nu_3(A_{(S,\ell)}) \geq \delta - \delta^{10} and \nu_{\{i,j\}}(A_{(S,\ell)}) \geq \Omega(\delta^{3/2}), and our mission is accomplished. (Recall that the latter fact, by Sperner, gives us an i\ell-insensitive \cap j\ell-insensitive set of measure \Omega(\delta^3) which A completely avoids.)

    Proof of Claim: Given a small \gamma we will show that \lambda(A) \geq \gamma \delta - 2 \gamma^3. This completes the proof by taking \gamma = \Omega(\delta^{1/2}).

    To argue this, let \beta_k denote the distribution on the cardinality of the set of k‘s in a draw from \nu_k^n. The distribution \lambda draws s \sim \beta_4, picks a random set S of cardinality s, fills it with a random 1, 2, or 3, then fills the remaining coordinates with equal-slices on the other two characters. If we had instead drawn s \sim \beta_3, the resulting distribution would have exactly been \nu_3. Hence if we have d\beta_4 / d \beta_3 \geq \gamma, we could conclude d\lambda / d\nu_3^n \geq \gamma and hence \lambda(A) \geq \gamma\nu_3^n(A) = \gamma \delta. Unfortunately, this is not quite true; in fact, \beta_4(s) can be arbitrarily smaller than \beta_3(s). However this occurs only for s which are very close to n.

    Indeed, one can think of s \sim \beta_k as placing k-1 bars independently at random within a row of n dots, and taking the length of the first dot-segment thus formed (here I am conflating equal-slices and equal-n.d.-slices). Hence


    = \mathbf{Pr}[\text{min of bar1, bar2, bar3} = s]

    \geq \mathbf{Pr}[\text{min of bar1, bar2, bar3} = s \mid \text{min of bar1, bar2} = s]

    \qquad \cdot \mathbf{Pr}[\text{min of bar1, bar2} = s]

    = (1-s/n) \mathbf{Pr}_{\beta_3}[s].

    So we have d\lambda / d\nu_3^n \geq \gamma for all s with s/n \leq 1 - \gamma. But we can simply “lose” (additively) the probability contribution from s/n \geq 1 - \gamma since this occurs with probability only \gamma^{k-1} under \beta_k. A little calculation then gives the claimed \lambda(A) \geq \gamma \delta - 2 \gamma^3. (Some tiny details skipped.)

  46. Ryan O'Donnell Says:

    1077. Simplifying the partitioning argument.

    It seems to me slightly simpler if we do the partitioning argument via the older notion of “ij-insensitivity on a subset of coordinates”. (“Local ij-insensitivity”, I think Terry called it?)

    Definition: Let a, b \in [k] be distinct characters, and I \subseteq [n]. We say that A \subseteq [k]^n is “ab-insensitive on I” if the following condition holds: for all x \in A, if x' is formed by changing x_i from a to b or b to a for some i \in I, then x' \in A also. If I = [n] we simply say that A is “ab-insensitive”.

    True, this introduces an extra definition. But it has the upside of making the argument a little simpler, I think. The reason is that one doesn’t have to carry around pieces of the set B living in different [k]^n universes, and one doesn’t have to collect up pieces of the eventual error set. After removing a subspace, one can simply say the new set is ij-insensitive except on some additional m coordinates.

    As an example justification, here is the partitioning argument under any product distribution, fitting on a page.

    In fact, having written that, it’s now (finally) clear to me how to directly do the partitioning under equal-slices (if we want to). In the product distribution case, all we really used was that we could find a d-dimensional subspace in B localised to our favourite m coordinates. We can almost do this under equal-slices except that we don’t get to pick m coordinates arbitrarily; they have to be picked randomly. Still, that’s not a big problem; so long as we’ve only “used up” Tm insensitive coordinates, there are still n - Tm insensitive ones remaining, so a random m-subset will fall inside the insensitive coordinates with probability 1 - o_n(1).

    (The benefit we get from using “local ij-sensitivity” is that we can take the equal-slices distribution of B over all coordinates to be the progress measure; the probability measure doesn’t keep shifting on us.)

    • Ryan O'Donnell Says:

      Typo: “V” towards the end of the above should be “Tm“.

    • Ryan O'Donnell Says:

      What the heck? “V” should be “Tm”. But every time I wrap “Tm” in latex-dollar-signs, it changes to a V in the published version. How strange!

  47. Ryan O'Donnell Says:

    It seems a couple of the wiki pages on generalizations to DHJ(k) have been deleted. Well, not exactly deleted, but blanked out, by mystery IP addresses.

    See here and here.

    I’ve tried to revert them properly, but can we use captchas on edits or block unregistered users from editing?

  48. Ryan O'Donnell Says:

    I have started creating TeX files for the writeup. We can still discuss the outline on the page Terry created.

    But to get the ball rolling, here is a new page for the TeX. The idea was proposed by someone in an earlier thread. On this page are wiki-links to 6 pages, named “dhj.tex”, “dhj.sty”, “intro.tex”, etc.

    When you go to the wiki page, e.g., measures.tex, don’t try to read it as a wiki page. Just click “edit”, copy the source for the page, paste it into your favourite editor, and save the document as measures.tex.

    Once you’ve done this for all 6 files, you should be able to run “latex” and “bibtex” to produce a document.

    If you want to change a file, just go ahead in your TeX editor, and when you’re done, copy/paste it back into the appropriate wiki-page source.

    It’s extremely bare-bones right now; I didn’t even put in the title etc. yet. Also, more files/sections need to be created.

    The only substantive thing done so far is writing some of the “measures” section. My goal for this section is just to get down, in the shortest amount of space, the lemmas that “doing uniform on most coordinates and equal-slices on an eps-fraction is O(eps sqrt(n))-close to uniform”, “doing equal-slices on most coordinates and then either uniform or equal-slice on an eps-fraction is O(k eps sqrt(n))-close to uniform”.

    I got most of the way on this, but didn’t finish.

  49. Ryan O'Donnell Says:

    In case anyone is interested, I saved all 9 threads on general-n DHJ (i.e., comments 000–1099 minus the 200′s, 700′s, and 900′s) to one pdf file.

    It’s here: http://www.cs.cmu.edu/~odonnell/dhj-polymath-comments-all.pdf

    Warning: it’s a 10MB file (346 pages).

    Out of curiosity, I also totted up number of posts by poster. I believe there were 9 people with 5+ posts, with counts ranging from 12 up to 338.

  50. Ryan O'Donnell Says:

    Hi all — I just gave a talk about the results and the project at Microsoft Research New England (thanks Jennifer & Christian for having me). I’ll post the slides soonish — I’m giving the talk again at IAS and am not sure if I want to preempt myself. :)

    Several good questions arose from the audience, and I was hoping to get people’s opinions:

    1. [from Adam Kalai]: When the result is published under the Polymath pseudonym, how will participants put their grant acknowledgment lines in the document?

    2. [from Jim Propp (I think; I'm better with names than faces)]: When will it be done / written up?

    For the “writing up” part… Besides, “soon, we hope”, should the correct answer to this question have included, in the spirit of the project, “feel free to write some or all of it up yourself! (under the polymath banner of course)”? I guess ~30 more people have now seen the sketch of the proof and could in principle work on the writing. Is this a good or bad idea?

    As for the “done” part…

    3. One audience member, Emanuele Viola, came up to me after the talk and mentioned some promising-seeming lemmas from his papers that might possibly help with parts of the current argument. We traded a few emails about this this morning and then both asked ourselves, “wait, should we be posting these to the blog?” I guess we should, although it seems like the blog has been dormant for ~3 weeks.

    At some point fairly soon, regardless of whether we get the energy to write up the proof (and of course, it would be terrible if we didn’t do this, I think), I think we ought to declare the project “done”. I mean, we can’t block other people from working on DHJ indefinitely…

    • Ryan O'Donnell Says:

      Emanuele requested I point out that the lemmas mentioned (which I refer to again below) are not his, just that his paper has pointers to the literature (and a variant of the lemma).

  51. gowers Says:

    A few immediate thoughts — I hope others will express their views too.

    1. I’m mercifully spared from having to think about this kind of thing, so I’m not sure what is sensible and what isn’t. My preference, as I have said before, is to keep actual names off the paper. Perhaps it would be in keeping with previous suggestions on this kind of topic to have minimal information on the paper (i.e., just the pseudonym) and maximal information on the Polymath1 wiki (i.e., names of participants, grant details, links to all the blog comments, etc. etc.).

    2. I think I’d like to do some writing starting in about a week’s time. The way I imagine going about it is trying to write a skeleton paper with bits that can be filled in by whoever has the inclination to do so (which might be me). Of course, the skeleton itself would be open to discussion and change, but perhaps once it’s there the task will seem more manageable and we can get a complete draft. In principle I don’t see why it shouldn’t take more than about two or three days of collective effort.

    3. I look forward to hearing more about this.

    4. For myself, once our proof of DHJ is written up, I would be happy to say that all the material on the blog that doesn’t make it into the paper can be freely used by anyone for any purpose. I suppose if someone felt that they had used some insight from the blog in a way that was critical to their result, it might be appropriate if they acknowledged that and gave the appropriate URL. Having said that, there were some questions that would still be interesting to pursue (such as whether not having any combinatorial lines can lead to a global correlation with a 12-insensitive set if you choose the right measure) and I wouldn’t rule out thinking about them. But neither do I think it would be right to claim indefinite ownership of those questions, so I’d go for declaring the project done sooner rather than later.

  52. Ryan O'Donnell Says:

    As I mentioned, I had an email discussion with Emanuele after I gave my talk. With his permission, I reproduce it below…

  53. Emanuele Viola Says:

    There are a few formulations [of the lemma], see Section 3 in my paper “Hardness amplification proofs require majority”. [link on my homepage; including it here seems to cause WordPress to refuse to accept the post]

    The statement that appears in this paper allows you to conclude that many coordinates are close to uniform. One in fact can prove a stronger version of the lemma in which the conclusion of Lemma 3.2 in my paper holds even if you condition on the values of all the variables in G – {i_1,…,i_q}. This might be the closest to DHJ: it gives you that if you ignore a few coordinates then you have a line (in fact, lines are dense in this projection).
    The proof of (all variants) of this lemma is elementary.

    Earlier I was thinking whether this might be enough to prove DHJ. The “only” thing that can go wrong is that the variables whose indexes are in the complement of G destroy all lines that you found in the good projection, but maybe one can do a recursive argument to bypass this. Also, I would guess some type of recursion is necessary, for else one would get too strong parameters for DHJ?

    I’d be interested in thinking more about this and knowing your opinion. Recently I have been spending quite a bit of time doing proofs that use the lemma inductively.

  54. Ryan O'Donnell Says:

    1079. Hmm, looks powerful Emanuele.

    > closest to DHJ: it gives you that if you ignore a few coordinates then you
    > have a line (in fact, lines are dense in this projection).

    Hmm, let’s see, why is that… [am not used to thinking about drawing
    a random point from A]. Let me write “B” = [n] \ G (B for bad, G for
    good). Just talking out loud here…

    I mean, suppose we pretend that life is even awesomer than Lemma 3.2:
    that a random draw from A is equivalent to:
    (a) the coordinates [3]^G get chosen totally uniformly
    (b) then the coordinates in [3]^B get set adversarially based on
    what happened in step (a).

    Hmm, so I guess it’s like saying, for every string x in [3]^G, there
    is an extension y in [3]^B such that (y,x) is in A.
    So in this awesome world, I guess yeah, you can take any line you want
    over the G coordinates, and it’s possible to extend it to an
    “almost-line” in A (“almost-line” meaning that the B-coordinates might
    be screwed up).

    What about in the not awesome but still cool world of Lemma 3.2. Or
    in your extension of Lemma 3.2. The extension (if I have it right)
    means something like:
    (a) you choose your favorite q coordinates Q in Q
    (b) an adversary chooses x in [3]^{G \ Q}.
    (c) then for almost all strings y in [3]^{Q}
    (d) there exists a string z in [3]^B
    such that (x,y,z) in A ?

    So then you’re saying, you can get pretty much any line action
    happening you want inside Q, but then it gets turned into an
    “almost-line” because of stage (d) above?

    Am I getting it right? Looks kind of powerful.

  55. Emanuele Viola Says:


    > Hmm, so I guess it’s like saying, for every string x in [3]^G, there
    > is an extension y in [3]^B such that (y,x) is in A.

    Yeah I think that’s a way to look at it.

    > What about in the not awesome but still cool world of Lemma 3.2. Or
    > in your extension of Lemma 3.2. The extension (if I have it right)
    > means something like:
    > (a) you choose your favorite q coordinates Q in Q
    > (b) an adversary chooses x in [3]^{G \ Q}.
    > (c) then for almost all strings y in [3]^{Q}
    > (d) there exists a string z in [3]^B
    > such that (x,y,z) in A ?

    The statement I have in mind is: Let v denote all the variables, then
    H(v^Q | v^{G \ Q) > 1 – eps.

    So this means that

    E_{w in v^{G \ Q}[ H(v^Q | v^{G \ Q) = w ] > 1 – eps,

    and so with probability >= 1/2 over w in v^{G \ Q} it is the case that
    VariationDistance(v^Q | v^{G \ Q), uniform) So then you’re saying, you can get pretty much any line action
    > happening you want inside Q, but then it gets turned into an
    > “almost-line” because of stage (d) above?

    I think that is correct.

    Note by the way the bad set will always be |B| >> |Q|.

  56. Ryan O'Donnell Says:

    > E_{w in v^{G \ Q}[ H(v^Q | v^{G \ Q) = w ] > 1 – eps,
    > and so with probability >= 1/2 over w in v^{G \ Q} it is the case that
    > VariationDistance(v^Q | v^{G \ Q), uniform) < 2eps.

    Hmm. Could one then say that with probability 1 – sqrt(eps), the
    variation distance is at most sqrt(eps)? Because that would be kind
    of cool, no? For almost all strings x in [3]^{G \ Q}, it holds that
    for almost all y in [3]^Q, that (x,y,z) in A for some z in [3]^B?
    Wouldn’t one just get that for almost all strings w in [3]^G, there is
    a z in [3]^B such that (w,z) in A?

    I also wonder if maybe it might be better to apply this lemma to the
    set I was calling “L-bar” in the talk. You know, it had density 1 -
    delta^2, hence A had density delta + delta^3 inside it. That set
    L-bar was almost completely arbitrary, except that it was a union of
    two “12-insensitive sets”, so we worked extremely hard to
    almost-partition it into copies of [3]^{log log log n}, so that we
    could get the delta^3 density increment on one of them, and recurse.

    But perhaps instead one could somehow just apply the lemma to L-bar (a
    bit weird, because its density is very close to 1) and get it
    fractionally covered by uniform distributions on coordinate-sets of
    size q… would have to think about it…

  57. Emanuele Viola Says:


    > Hmm. Could one then say that with probability 1 – sqrt(eps), the
    > variation distance is at most sqrt(eps)?

    Sure, I just put 1/2 for example.

    > Because that would be kind of cool, no? For almost all strings x in
    > [3]^{G \ Q}, it holds that for almost all y in [3]^Q, that (x,y,z) in A for
    > some z in [3]^B?
    > Wouldn’t one just get that for almost all strings w in [3]^G, there is
    > a z in [3]^B such that (w,z) in A?

    But I think x ranges over projections of strings in A, while y is uniform. I.e. we have found a subset Q of coordinates where projections of strings in A to those coordinates are almost uniform. And this is true even for most ways of filling the coordinates in G \ Q with projections of strings
    from A.

    Re your last point about almost-partitioning L-bar, I wonder in what sense it is almost partitioned. I am not sure if the lemma would be good for that…

  58. Ryan O'Donnell Says:

    Two more updates:

    1. The slides I used for my talk at Microsoft are here:

    If you don’t have PowerPoint there is a free viewer here:


    2. I have finished most details of the measures.tex file. We can now pass from uniform to equal-slices, and we can also pass from equal-slices overall to equal-slices on a restriction. Next step for me will be to write up the partitioning argument under equal-slices, with the “losing insensitive coordinates” method.

    Sorry, this went to my moderation queue.

  59. Ryan O'Donnell Says:

    A tiny notational note: as you can see from the slides, I actually went with [k] = \{0, 1, \dots, k-1\}. Two reasons I ended up preferring this:

    a) makes the deduction on Szemeredi from DHJ plainer (“just write numbers in base k“)

    b) computer scientists like \{0,1\}.

  60. gowers Says:

    Just to demonstrate that Ryan is not completely alone in the Polymath universe, I decided to write a draft of an introduction. I’ve put it on the wiki in one of the pages that Ryan created. I’m not sure I made any controversial decisions. I too have been coming to the view that we should make $[k]$ be $\{0,1,\dots,k-1\}$ though as it stands I’ve hedged my bets. But actually I think it’s slightly strange even for DHJ(3) to think of the coordinates as belonging to $\{1,2,3\}$ rather than $\{0,1,2\}$.

    As Ryan suggested earlier, I put the introduction up in LaTeX, so if anyone wants to edit it they just paste it into a file, edit the file, and replace what was there before by their new version. Or they can edit the page directly (but then they can’t compile it to see what it looks like).

    I also decided not to use any macros, though it’s conceivable that I did so by accident.

    Incidentally, I was imagining that the first section after the introduction would contain quite a lot of technical preparation: definition of combinatorial subspaces, discussion of different measures, etc. So I left all that out of the introduction (apart from the definition of a combinatorial line, which was essential).

  61. Ryan O'Donnell Says:

    I have added some more TeX files to the document. It’s not really organized right now, but I wanted to get some math details written and (hopefully) correct.

    I ended up just writing the partitioning argument under product distributions. It was just proving too awkward to do it all under equal-slices. So I propose we go in-and-out of a product distribution when doing the partitioning.

    PS: I went back to [k] = {1, 2, …, k}; I guess we should stop switching at some point. :) My reasoning was that it was common to talk about distributions on [k] as points in the k-dimensional simplex, and one really likes to index \mathbb{R}^k by 1…k, not 0…k-1.

  62. Ryan O'Donnell Says:

    I have made some updates to the paper at http://michaelnielsen.org/polymath1/index.php?title=TeX_files_for_first_paper

  63. Ryan O'Donnell Says:

    More updates…

  64. jozsef Says:

    I just finished a write up which proves the corner theorem for Cartesian products of cubes. The problem is easier than DHJ(3) and it makes the explanation of the density increment method easier to follow. I hope that this write up will help the reader to understand the key elements of the DHJ(3) proof. It is available at
    I would be happy to place it somewhere on the wiki sites, but I don’t know how to do that.

    • jozsef Says:

      I’ve found some typos in my note which I corrected but probably there are more of them. Working out the epsilons and deltas explicitly isn’t my strong side. Please let me know if you find more typos.

  65. The Cap-Set Problem and Frankl-Rodl Theorem (C) « Combinatorics and more Says:

    [...] for error-correcting codes. For the related problem of “density Hales Jewett” discussed on Gower’s blog, it turns out that while the problem is not invariant under a group action, it is still [...]

  66. Ryan O'Donnell Says:

    I gave a talk about the work at IAS today. Afterwards, Boaz Barak and Anup Rao reminded me that DHJ actually has an application in theoretical computer science: Verbitsky used it to give the first known proof of the Parallel Repetition Theorem (albeit with parameters which are essentially useless for applications of Parallel Repetition). The paper is here.

    Raz subsequently gave an effective (combinatorial/probabilistic) proof of the Parallel Repetition Theorem (for “2 players”), which had massive implications in TCS. Raz’s Theorem was subsequently improved by Thomas Holenstein and then Anup Rao.

    Anup also pointed out to me that Verbitsky’s proof using DHJ works for any number of players, and is indeed the only known proof of Parallel Repetition for more than 2 players.

    This application motivates the following question about DHJ: Try to show that for any finite alphabet \Sigma, if A \subseteq \Sigma^n has density \delta and n is large enough, then A contains “99 percent” of a line. By this we mean that there is a “template” \lambda \in (\Sigma \cup \{\star\})^n \setminus \Sigma^n such that for at least 99 percent of the symbols a \in \Sigma we have \lambda^{\star \to a} \in A.

    Can we prove this even for 67 percent in place of 99 percent? 51 percent? 50 percent? 1 percent?

  67. gowers Says:

    I’m trying to understand that last question, but I don’t yet. If n is allowed to depend on \Sigma then the answer is obviously yes with 100 percent, since that’s just DHJ. So it would seem that you are asking for a proof that for every \delta there exists n such that for every finite alphabet \Sigma, every subset of \Sigma^n of density at least \delta contains 99 percent of a line.

    But this cannot be the correct interpretation either, since for any fixed n one can just choose \Sigma to be enormous and choose A randomly, and by trivial arguments (Chernoff estimates plus a union bound over all lines) A will have density no more than \delta(1+\epsilon) in any combinatorial line.

    I’m assuming here that I understand correctly what you mean by containing 99 percent of a line: that it means there is a combinatorial line and at least 99 percent of the points in that line belong to A.

  68. Ryan O'Donnell Says:

    Hi Tim. Er, yes, my question didn’t make sense. But I believe there is a correct question in there — I will try to write it down soon.

    In the meantime, there are some more updates to the draft here:


    including a pdf of the current (extremely messy and less-than-half-finished) state.

  69. Ryan O'Donnell Says:

    It seems that some of the observations made about equal(-nondegenerate-)slices distribution may follow from “de Finetti’s Theorem” and extensions; see e.g. http://www.stats.ox.ac.uk/~steffen/teaching/grad/definetti.pdf

  70. gowers Says:

    A few comments about notation and definition choices in the current draft of the DHJ paper. I don’t want to make any changes without some discussion taking place, but let me give some opinions and questions here.

    Ryan, I see that you’ve changed from talking about combinatorial lines to talking about line templates and nondegenerate combinatorial lines. I can see that there is some reason for this, but it strikes me as a situation where the logical convenience of this notion (which presumably comes in when one wishes to argue that a random line template has probability almost 1 of actually having some wildcards) is outweighed by the unfriendliness when you compare it with the more geometrical notion of a combinatorial line. I think I would favour defining a combinatorial line as something that has to be nondegenerate, but allowing line templates to be degenerate. That way, the statements of the theorems don’t have to have the unnecessarily distracting nondegeneracy condition (just as, when stating Szemerédi’s theorem, we don’t say that a dense set contains a nondegenerate AP of length k).

    Actually, that’s all I have to say for now, but I haven’t got very far through the draft yet.

  71. gowers Says:

    Oh yes, there was another thing — the question of how to fill in the wildcards in the sentence, “The purpose of this paper is to give the first **** proof of [DHJ].” My initial attempt was “fully combinatorial” and you suggested “finitary”. There are problems with both: one can argue that just about anything is combinatorial, and I think that some combination of Tim Austin and Terry ended up with a finitary proof of DHJ that was heavily influenced by ergodic theory. So the question is this: is there some clever choice of words to insert here, or should we rewrite the first paragraph so as to avoid the problem? Would “first elementary proof” do?

    • Ryan O'Donnell Says:

      “Elementary” sounds good to me. It seems justified, given that the most complicated notion in the proof is the distance between probability distributions.

  72. gowers Says:

    I’ve now added a short section on Sperner’s theorem to the TeX files on the wiki, with a view to motivating the concept of equal-slices measure, and because I think it would be very wrong not to have such a section, given how much Sperner’s theorem influenced our thoughts.

  73. Ryan O'Donnell Says:

    I’ve made some more updates to the TeX files.

    I think that most of the technical pieces of the paper are now more-or-less in there. The major technical thing that remains is the “putting all the pieces together” argument.

    And the major nontechnical thing that remains is — well, everything :) Aside from the introduction, which is in good shape, there is zero explanation or helpfulness in the rest of the paper, and the organisation is haphazard.

    • gowers Says:

      I’m planning at some point to go through all the technical stuff and pad it out so that the steps are motivated. Not sure when I’ll get the chance to do it though. Anyhow, it’s great to hear that at least some kind of certificate of doneness almost exists: thanks for all the work you’ve been putting in.

  74. jozsef Says:

    I extended the note about corners in Cartesian products with a proof of the k=3 case of the DHJ theorem. This write up is (very) elementary; for example it uses equal slices distributions indirectly, without even stating the definition. Here is the link:
    I will put there the tex file too in case you will find it useful for the write-up. I’m still not sure what would be the way to use our wiki page for that.

  75. jozsef Says:

    I’m planning to add a few more pages to the note proving the higher dimensional version of the “corners in Cartesian products of cubes” theorem. It will give a surprisingly short and simple proof of Szemeredi’s theorem. The more I use Tim’s partitioning argument the more I am amazed about its simplicity and effectiveness. I would be very interested to learn more about the connections/similarities to the Hahn-Banach theorem. I vaguely recall that Tim promised some explanations …

  76. Ryan O'Donnell Says:

    Hi guys. I got back on the writeup today, but didn’t make much progress (some updates to the tex, though). One issue is that the parameter-situation is a nightmare! Nothing to be done about that, unfortunately.

    The other issue was a subtle snag that I did not previously consider. The partitioning argument goes through very pleasantly under any product distribution; this is convenient, since we get our first insensitive-density-increment under equal-slices, and this easily gives the same increment under some product distribution (since equal-slices is a mixture of product distributions).

    The bummer is this, though: If you draw a string from a product distribution pi^n, and condition it on falling in a certain d-dimensional subspace, the conditional distribution on the subspace is not just the pi-product measure. This spoils the remainder of the argument.

    It does work fine if pi is uniform, and also (I’m pretty sure) if pi is equal-(nondegenerate-)slices.

    So, again, there are two choices: (1) take the density increment under equal-slices and do some wangling to get it under uniform, then partition, then go back to equal-slices, and iterate; or, (2) get the partitioning to work under equal-slices.

    I remember taking several cracks at (2) and failing; maybe I’ll take one more crack. It would really be more elegant if we could execute the whole proof under one measure.

  77. Ryan O'Donnell Says:

    Actually, there’s a choice (3): try to get the correlation-with-an-insensitive-set part to work with the uniform distribution in some way. Then we could do everything under uniform, which would be extremely attractive. I wonder if that’s possible…

  78. Ryan O'Donnell Says:

    Okay, I’ve made some more additions, and now I believe the writeup contains all the technical material needed to prove DHJ(k). The only part missing is at the end, where one “puts all the pieces together”.

    In thinking about how to write that, I tried to figure out the actual quantitative bounds we get. As expected, for DHJ(3) we get a simple tower; I believe 2 \uparrow \uparrow O(1/\delta^3).

    For DHJ(k), k \geq 4, the bounds get comically bad. I get hopelessly muddled when Ackermann functions are involved, but I believe the bound is at least as bad as A(k+1, poly(1/\delta)). Perhaps worse, I’m not a hundred per cent sure.

    The particularly painful part, quantitatively, is where we deduce the d-dimensional version from the 1-dimensional version. Iterating a bound d times is not pretty, especially when d itself needs to be huge. I’ve wondered if we can improve this. Gunderson-Rodl-Sidorenko certainly improves upon our basic iterative method.

    Note that the proof is really no more complicated for general k; it’s just the actual bounds get laughable for k bigger than 3.

    Anyway, before nailing these last details, I think it’s better to make a clarifying pass over the document. There’s actually strictly more technical stuff in there than needed, because I didn’t quite know what pieces would ultimately be necessary, so I added extra stuff in just in case. And now that we know what quantitative bounds we’re shooting for (the reasonable tower for k = 3, anything at all for higher k), we know which factors we need to save and which we don’t care about.

  79. Ryan O'Donnell Says:

    By the way, I’m not sure why we can’t bill this as the first finitary proof of DHJ. TimA’s new proof is still “infinitary” (it includes that word in the title). I suppose our paper is also infinitary in the very weak sense that it views equal-slices as an infinitary mixture of product distributions.

    Perhaps we can evade the issue by saying that our paper is the first to give a quantitative bound.

    • gowers Says:

      The problem was that, if I understand correctly, Terry had some way of discretizing Tim Austin’s argument. But perhaps that argument isn’t sufficiently written to make it a problem to call ours the first finitary proof. We should probably see what Terry thinks about it.

    • Terence Tao Says:

      I’ve been busy with other things (including the sister polymath project) but I do intend to have a read through the draft here at some point. So far it’s looking pretty good…

      The notes on the wiki on finitising Austin’s proof are not yet at the level of a rigorous proof, but I believe that it can be made into one with more effort. At any rate I consider this part of the polymath project and so arguing about precedence seems somewhat moot. Once finitised, some quantitative bound is in principle extractable, but it is likely to be far more laughable than anything Ryan is likely to get out of the combinatorial proof (in particular it relies on multiple regularity lemmas with parameters analogous to those in the hypergraph lemma). In any case it is not at the point where it needs to be cited as fact. Perhaps one can bill the current proof as the first proof which is primarily combinatorial in nature?

      In the draft there is a side note to ask on the relationship between Austin’s proof and the combinatorial one. What Austin did was to take one of the key insights from this project, namely that lack of lines implied correlation with a local 12-insensitive set (or whatever the current terminology is – I haven’t kept up), and found an ergodic-theory analogue of this fact, in a framework which Austin used previously to establish another proof of the multidimensional Szemeredi theorem and a related multiple ergodic theorem (and which, deep down, is a very heavily disguised version of the hypergraph regularity framework, but that is a whole story in itself…). The other ingredients of the proof are different, though; for instance, there is nothing resembling a density increment argument (for much the same reason as the regularity lemma approaches to these sorts of problems don’t need density increment; they have energy increment instead as a substitute). So I think the relation is that one key idea in Austin’s proof was inspired by the discovery of one key idea in the combinatorial proof.

  80. Ryan O'Donnell Says:

    Am working on some introductory material now; a new draft is up.

  81. gowers Says:

    Ryan, this is just to say thanks for all the terrific work you’ve been putting into this. Right at the moment I am busy with some other write-ups, but at some point I’d like to work through this one and pad it out with a few explanatory remarks and things like that. But I don’t see myself doing that before about August or September. I hope that delay is acceptable. In any case, once a fully detailed argument exists in written form, which it seems that it more or less does now, we will be in a position where we could in principle do what you suggested in an earlier comment and declare the project to be finished. The practical meaning of that would be that anybody could make any use they wanted of the wiki and the blog posts, whether private or public. Another thing I want to do at some point is trawl through the posts and write a document containing problems that we did not solve but that are, in my view, pretty interesting. The reason I want to do that is that I think they would make ideal PhD problems: the best ones are fully motivated (since, even now we have a proof of DHJ, they would still advance our understanding of the problem), unlikely to have been worked on all that hard, but also unlikely to be trivial either. Of course, for such a document to be useful, we would need to say to a potential PhD student that he/she is welcome to work on these problems privately and make use of any relevant blog comments.

  82. Kevin O'Bryant Says:

    In the middle of page 2, the map (a_1, \ldots ,a_n) \to \sum_i (a_i-1)k^{k-1} is used to note that finding combinatorial lines in [k]^n is at least as hard as finding k-term APs in [k^n]. The map (a_1, \ldots, a_n) \to \sum_i a_i accomplishes the same thing and is substantially more efficient. The common difference is just the number of wildcards.

    • Ryan O'Donnell Says:

      I think you need a bijection between [k]^n and [N] though; starting with a subset of [N], how would one naturally convert it to a subset of [k]^n?

    • gowers Says:

      The map Kevin suggests works fine for colourings (you just take their inverse images) but isn’t quite so straightforward for density, since the inverse image of a dense subset of [N] doesn’t have to be dense, though it’s probably OK if you are using equal-slices measure. I think you can get it to work if you restrict to central slices. But probably using the map we have is easier.

  83. Kevin O'Bryant Says:

    Somewhere, perhaps, the paper should mention the asymptotic lower bound (currently Theorem 1.3 in the second Polymath paper). For each k there is a C_k>0 such that for every n there is a subset A of [k]^n with cardinality C_k k^n (r_k(\sqrt{n})/\sqrt{n})^{k-1} and no combinatorial lines. Here, r_k(N) is the maximum possible size of a subset of [N] not containing any k-term AP.

    The proof is only 10 lines or so; it may well have a better home here than in the “small n” paper. Or two homes.

    • Ryan O'Donnell Says:

      Good point, this lower bound should be added to the “New Proof” paper. I think it might be best for it to just quote the bound in the “Low Dimensions” paper.

  84. Ryan O'Donnell Says:

    Hi all. I have finished a rough draft of the “New Proof” paper. You can see its pdf here. A version without internal “notes-to-self” is here. The files are all on the wiki here for anyone who wants to work on it.

    I’m fairly satisfied with it for now and think I will take a break from it. It’s not 100 percent in submittable form yet; there are still many tiny notes in there about editing decisions, references to be added, concluding discussion to be filled in, proofreading to be done, etc.

    Also, the ultimate draft need not actually resemble this present draft; if people think the paper can be improved with significant rewriting or reorganization, that would be great.

  85. jozsef Says:

    What do you think about starting a new thread on the writeup? Sometimes it takes too long to download this page.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 1,416 other followers

%d bloggers like this: