DHJ(3): 851-899

Once again there is not a huge amount to say in this post. Since the last post there have been a few additions to the polymath1 wiki that may be of some use. In particular, there is now a collection of fairly complete write-ups of related results (see the section entitled “Complete proofs or detailed sketches of potentially useful results”) to which I hope we will add soon. Also on the wiki is an account of the Ajtai-Szemerédi proof of the corners theorem, which seems to have some chance of serving as a better model for a proof of DHJ(3) than the proof via the triangle-removal lemma. Meanwhile, progress has been made in understanding and to some extent combinatorializing the ergodic-theoretic proof of DHJ(3), ideas from which have fed into the discussion. As with the last post, this one is mainly to stop the number of comments getting too large. We’re now down to 50 comments per post (except that it was 51 for the last one and will be 49 for this), since with the new threading we seem to be averaging at least one reply per comment.

113 Responses to “DHJ(3): 851-899”

  1. Terence Tao Says:

    851. Density increment

    I’ll start off (perhaps a bit awkwardly) by responding to Tim’s query from 848, which was asking about how a hypothesis that a density increment is near-maximal implies that there are also no significant density decrements. This is basically just coming from subtraction.

    Let A be a set, and suppose that B is a structured set (specifically, a local 1-set of non-trivial size) on which one has a near-maximal density \delta_B := |A \cap B|/|B|. Then any structured subset B’ of B of non-trivial size also has to have density close to \delta_B. It can’t have a much bigger density, since otherwise we could pass from B to B’ and get a density increment; and it can’t have a much smaller density, since otherwise we could pass from B to B \backslash B' and also get a density increment.

    There are three parameters that will need to be juggled here to make it all work: the density of A in B, the density of B in [3]^n, and the number of bad coordinates in B. The last parameter should not cause a difficulty; as n is so large, we can afford to turn quite a lot of coordinates bad and still retain a lot of usable structure. So it should boil down to making sure that the greedy algorithm that locates B terminates in a bounded amount of time without sending the density of B to zero. The way I think of it is like this: if B contains a small subset B’ on which A has too low of a density, then one can remove B’ from B to improve the density without significantly shrinking the size of B. I don’t think this process can be iterated indefinitely, as the density will eventually shoot up to 1.

    • Terence Tao Says:


      In ergodic theory land, what we are doing here is stating the obvious fact that if a function f has mean \delta and is non-constant, then there is a set B of non-trivial measure on which f is uniformly bounded below by \delta+c for some c>0. Here, f is the “conditional expectation of 1_A to the sigma-algebra generated by local 1-sets”. The problem in the finitary world is that the local 1-sets don’t actually form a sigma-algebra and so f only exists in some “virtual” sense rather than a tangible one, and so one has to do various greedy algorithm contortions to simulate any ergodic theory computation involving f.

    • gowers Says:

      851.2 Terry, what worries me in your explanation above is the “of non-trivial size”. When you apply the result, you apply it to the set A'' of all good words. Now it appears to me that the density of this set goes to zero as m goes to infinity (and we need m to go to infinity in order to carry on with the density-increment argument), and therefore I don’t see how …

      Ah, I think I do see what you are saying now, and I have in fact used this kind of argument myself sometimes. The point (which you yourself have already said) is that if you remove only a small set then you don’t greatly change the size of B. But this can work only if your structured sets are closed under subtraction. So for instance if you had a maximal density on a subspace, you couldn’t conclude that you had near maximal density on small subspaces of that subspace. But if you remove a local 1-set from a local 1-set you end up with a local 1-set so it’s OK.

  2. Ryan O'Donnell Says:

    852. Different measures.

    Tim, re your #849, I also found that binomial probability distribution on A \subseteq B‘s surprisingly… “awkward” to analyze. I suppose one can just really grind things out and see if, say, the Fourier expansion turns out nicely after all.

  3. Terence Tao Says:


    As I had some idle moments, I decided to spin out a timeline of the story so far for polymath1:


    It’s remarkably nonlinear, but that’s how maths research goes, I guess.

    Also: this post needs a “polymath1” tag.

  4. Terence Tao Says:

    853. Density increment

    I just realised that the “density increment” trick underlying Randall’s “dense fibres” argument that I translated into a combinatorial argument is not really a density increment, but rather a “mass” increment argument, essentially the same as that used to prove the classical Hahn decomposition in measure theory (which, by coincidence, I taught in my class a few weeks ago).

    Let’s say A \subset [3]^n has density \delta, and let f = 1_A - \delta be the balanced function. Let {\mathcal B} be a family of “structured sets” (such as local 1-sets); this family is secretly supposed to be behaving like a sigma-algebra, but let us try to avoid using that fact. Suppose we have a large set B in {\mathcal B} on which A has a density increment, and more specifically we have {\Bbb E}_{x \in [3]^n} f(x) 1_B(x) \geq c for some c>0; note that this hypothesis forces B to be somewhat dense. We now want to restrict B to the fibres of {\mathcal B} on which f is positive. One way to do this is to try to maximise the “mass” {\Bbb E}_{x \in [3]^n} (f(x)-c/2) 1_{B'}(x) as B' ranges over elements of {\mathcal B}; note that we are not dividing by the density of B’, so this is measures a mass increment rather than a density increment. Setting B’=B we know that the mass can be positive. If we find a B’ that maximises the mass, we know that any non-trivial structured subset B” of B’ will have an A-density significantly larger than \delta (e.g. larger than \delta+c/4), since otherwise we could remove this “negative mass” portion B” from B’ and increase the mass of B’ substantially. (Not coincidentally, this is exactly how one proves the Hahn decomposition theorem.)

  5. gowers Says:

    854. Ajtai-Szemerédi

    First a general remark to add further motivation to the Ajtai-Szemerédi approach. Let’s briefly think about why it has never been extended to prove a result for three-dimensional corners. If you try to do it, you soon see that what you want in place of the applications of Szemerédi’s theorem is applications of the full 2-dimensional Szemerédi theorem, and it seems that the original Ajtai-Szemerédi argument is not powerful enough to do that (not that I’ve thought hard about why).

    But if we managed to push through to a proof of DHJ(3), it seems not outrageous to hope that we might be able to get a multidimensional version, and that that might be what was needed for DHJ(4), and so on. That’s all a bit pie-in-the-sky at the moment, but that’s all it’s meant to be.

    Assuming that the Randall/Terry argument is fine (as I do, but for my own benefit I hope to write it up on the wiki soon), where have we actually got to? Here is my assessment, which gets a bit vague in the details. What it definitely gives us is what seems (and that’s a pretty confident “seems”) to be the appropriate analogue of step 2 of the Ajtai-Szemerédi argument as it is presented on the wiki.

    That tells us that for almost every U we have pretty much the expected density of (V,W) such that (U,V,W)\in\mathcal{A}. Let us call such a U good. The obvious analogue of Step 3 is then this: we can find W with a positive density of pairs (U,V) such that U is good and (U,V,W)\in\mathcal{A}.

    An unthinking analogue of the fourth step would simply be that amongst these pairs (U,V) can be found a combinatorial subspace of dimension tending to infinity. Given the experience with Step 2, I think it pretty unlikely that this will be good enough, but let’s try to understand why not.

    Fifth step: in the Ajtai-Szemerédi argument they have a long AP of points on the diagonal, which gives rise to a long AP P of vertical lines. They then partition the horizontal lines into long APs with the same common difference as P, using the fact that an AP of vertical lines intersects an AP of horizontal lines with the same common difference in a grid.

    What could the analogue of this be in the subspace world? Well, an AP of vertical lines translates to taking a combinatorial subspace of Us and then extending each U in all possible ways to a point (U,V,W) in {}[3]^n. And when would a structure like that intersect a similar one for Vs in a combinatorial subspace? If we denote by \langle Z_0,Z_1,\dots,Z_m\rangle the combinatorial subspace that consists of all finite unions of the Z_i that include Z_0, and we write U(Z_0,\dots,Z_m) for the set of all (U,V,W) with U\in\langle Z_0,Z_1,\dots,Z_m\rangle, then the answer is that U(Y_0,Y_1,\dots,Y_m) intersects V(Z_0,Z_1,\dots,Z_m) in a combinatorial subspace if Y_0 and Z_0 are disjoint, and Y_i=Z_i for i=1,2,\dots,m.

    This is almost saying is that the combinatorial subspaces are translates of each other, but there are two additional complications. First, the sets that determine the translates (that is, Y_0 and Z_0) have to be disjoint. Secondly, we cannot partition {}[2]^n with translates of a given subspace because all points in all translates will be constant on the wildcard sets, whereas a general point definitely isn’t.

    So there’s a lot of sorting out to do if an analogue of Step 5 is to be made to work. But what’s encouraging is that these difficulties are rather similar in flavour to the difficulties that we’ve already encountered and dealt with in Step 2.

  6. gowers Says:

    Metacomment. A quick remark about teaching. Terry comments that he has just taught the Hahn decomposition theorem, and implies (I think, reading between the lines) that this has helped him to have certain thoughts more quickly and easily than he would otherwise have had them. So that prompts me to say that I am lecturing probability this term, and noticed when I was thinking about Fourier analysis and equal-slices measure how it made a definite difference to be completely 100% on top of things like expectations of products of independent random variables, conditional probability, etc. etc. The odd thing in that case was that the concepts were quite basic, but nevertheless there was a gain: what was gained was some kind of clarity and fluency that replaced the normal process of muddling through first and tidying up afterwards.

    • Terence Tao Says:

      Well, when I taught Hahn decomposition a few weeks ago, I thought to myself “Ah, another density/mass increment argument! These arguments are everywhere!”. But other than reinforcing my (already quite strong) belief that energy/density/mass increment arguments are an essential tool in analysis, I don’t think it directly helped me except in the a posteriori sense, when my attempts to explain Randall’s argument more coherently begin to ring a bell and I consciously went back to look at the proof of Hahn decomposition. But perhaps there was indeed some unconscious “priming” due to the fact that I have been spending the last few weeks revisiting the foundations of soft analysis to teach to the UCLA freshman graduate students.

      (Certainly, in order to convert ergodic theory arguments to finitary ones, one does have to think really hard about deconstructing extremely basic notions in measure theory and functional analysis – for instance, deconstructing the construction of conditional expectation, or the Riesz representation theorem, or the approximation of measurable sets by open sets, or the fact that every bounded monotone sequence converges, are essential first exercises before one can do anything even remotely nontrivial, e.g. finitising the mean ergodic theorem.)

  7. gowers Says:

    855. Density increment.

    I’m trying to write up the Randall/Terry argument on the wiki, but it’s very slow going as I keep getting stuck. And each time I get stuck it takes me ages to decide whether I’ve uncovered a genuine difficulty or simply not got the point. And I’m in that position yet again.

    Here’s my problem this time. Suppose I’ve passed to a local 1-set A’ where B has maximal density. So we’ve got some coordinates belonging to an index set I, and membership of A’ depends just on what happens inside I and what the 1-set of a sequence is outside I. (I’m using “1-set” in two senses here.) Let’s call this an (I,1)-set.

    Now I’m very happy with the idea that if we pass to a sub-(I,1)-set, then we can assume that B has near-maximal density there too: if it hasn’t, then we just subtract off the subset. But that’s not what is going on in the argument here. Here we define a (I\cup I_1\cup\dots\cup I_m)-set, and it’s much less obvious to me that we can subtract it off, since if we do, then we are not left with a (I,1)-set.

    Could we use the fact that we have a (I_1\cup\dots\cup I_m)-set after we’ve done the subtraction? I don’t see how, because the amount of measure we’ve subtracted is tiny (since the probability that a word w is good goes to zero as m goes to infinity) and the size of the set of bad coordinates has gone up a lot.

    I suppose what I’m really saying is that I’d be very grateful to see this argument written out in full detail. (Another problem, but I think I can solve this one, is finding a suitable measure on the set of I_1,\dots,I_m to make it true that words are good with positive probability.)

    • Ryan O'Donnell Says:

      855.2. Hmm, perhaps our #855’s are coming to a similar difficulty, to be discussed further.

      I just wanted to say in this short reply that in your parenthesis at the end seems to match the parenthesis in the second paragraph of what I wrote below. I don’t quite know how to solve this issue, but I do like to remember that the Gunderson-Rodl-Sidorenko paper Jozsef pointed us to shows that whenever you have a subset of \{0,1\}^n of positive density, it must completely contain an \Omega(\log \log n)-dimensional subspace. (This is a quantification of the “multidimensional Sperner” argument on the wiki, I think.)

    • Randall Says:

      Terry will answer this far more fully at some point but in the meantime I will say this much: once m is fixed, the positive probability that V(w) lies in A’ is bounded below by something like \delta^{2^m} and the size of the \alpha_i can be safely bounded above by some finite amount….

  8. Ryan O'Donnell Says:

    856. I wonder if I could ask for some clarification on the Randall/Terry argument; specifically, on #837.2.

    In particular, I’m trying to follow the last paragraph. (Actually, I don’t quite understand the appeal to DHJ(2.5) in the previous paragraph either, but I think I understand my own way around that.)

    Could we clarify precisely what the definition of A'' should be? If it’s defined exactly as written in #837.2, then I wouldn’t so much say it’s the union of parallel m-dimensional subspaces so much as I would say it’s union of (|I_1|+\cdots+|I_m|)-dimensional subcubes. Which would be fine, but this doesn’t seem to square with the idea that A'' \subseteq A' as discussed in #851.

    On the other hand, perhaps we shouldn’t take A'' to be all good w, but just those w such that the part in I_1 \cup \cdots \cup I_m is actually in the large-dimensional subspace V(m). But my worry here is that in general (if the “wildcards sets” are large) that this A'' will constitute only a negligible (o_n(1)) fraction of A', so it should be impossible to conclude that B has a density increment on A''.

    • Ryan O'Donnell Says:

      856.2. Oops, sorry for the simultaneous post with Tim’s #855; please consider what I wrote #856. [Changed — Tim]

  9. gowers Says:

    857. Varnavides version of multidimensional Sperner

    Ryan, it was amusing to see just how many similarities there were between our near-simultaneous comments. While we’re waiting for a response, here’s roughly how I expected to fill in the details with the appeal to DHJ(2.5). The obvious worry, which you have expressed before in a different context, is that you have to ensure that the wildcard sets can take plenty of different sizes: it’s not enough to go for “easy” combinatorial subspaces where they all have size 1, since \mathcal{U} could be the set of all sets of even size.

    The thought I had was that one could define the measure in a way that more or less guaranteed that the proof worked. You would first find an M such that every subset A of density \delta/2 in {}[3]^M contained a subspace of dimension m where you got an element of A whenever you fixed the wildcards to be 1 or 2. Then you would choose your random wildcard sets I_1,\dots,I_m as follows. First choose a random “easy” combinatorial subspace of dimension M, with ground set F, and then pick, uniformly at random, m disjoint non-empty subsets of F. I’m not 100% sure that works but it feels as though a double-counting argument should now do the trick.

  10. gowers Says:

    858. Density increment

    I’m going to try to approach this question in small stages, proving little lemmas without being sure they are useful — though I will try to explain why I think they should be.

    Suppose that one tries to build up a subspace inductively on which \mathcal{A} has a density increment. The problem one faces is this. We can assume that for every U\in\mathcal{U} there are too many (V,W) such that (U,V,W)\in\mathcal{A}, and we can even find a combinatorial subspace of such U, but we do not have a good grip on what happens when we convert that into a combinatorial subspace in {}[3]^n.

    Let’s just look at combinatorial lines. I’d like to find a subset E\subset [n] and a combinatorial line inside {}[3]^E such that for all its three points we get too many ways of completing them with 2s and 3s to form an element of \mathcal{A}. The problem is that while I may be able to find several U\subset E with too many completions, at some point I’m going to need U_1\subset U_2 such that not only do they both have too many (V,W)s, but if I fill U_2\setminus U_1 with 2s or 3s then there are still too many (V,W)s.

    Now there might be some hope of that if I could say that for almost every U\in\mathcal{U} it was the case that for almost all small sets G, if I fill G with 2s or 3s, then there are still too many (V,W)s that complete to a combinatorial line. Why? Because then I could use the fact that there is a positive density of lines (all this depends on the appropriate measures being used) to find a combinatorial line such that U_2\setminus U_1 behaved itself.

    So can we say that about G? Well, suppose that for a positive proportion of the U\in\mathcal{U} there is a positive proportion of small sets G that have too few extensions when you fill them with 2s. Then there must be a positive proportion of small sets G such that you get too many extensions when you fill them with 2s. And then you can pick a random G and get a dense set of Us such that you get too many extensions. So if we drop down to {}[n]\setminus G then we have a strengthening of our original hypothesis in that combinatorial subspace.

    So now suppose we’ve reached the point where we can’t play this game any more. Then my hope is that you get that for almost every U\in\mathcal{U} (and therefore all once you chuck out the bad ones) it is the case that for almost all G you can fill them with 2s or 3s and still get roughly the right number of extensions. But with positive probability U\cup G also belongs to \mathcal{U} (by one of our Varnavides-versions of Sperner) and then perhaps we have something.

    That’s written quickly and carelessly, so I’m not sure it’s right. But if it is, then maybe one can run more or less the same argument with m-dimensional subspaces instead.

  11. Terence Tao Says:

    858. Dense fibres argument

    Responding to 855, 856, etc.

    Firstly, I forgot to define A” in 837.2, as pointed out by Ryan. A” is the union of the V(w) as w ranges over all good words. It has a smallish density that depends on the density of the original 1-set A, the constants in DHJ(2.5), the number m, and the size of the wildcard sets I_1,\ldots,I_m, but – crucially – the density does not depend on the size of the pre-existing bad coordinate set I’. And this is important, because the size of I’ is going to explode (but, importantly, will always be of size O(1) as far as n is concerned).

    As noted by Tim, if A’ is an (I’,1) set, then A' \backslash A'' is not going to be an (I’,1) set – it’s going to be merely an (I'',1)-set, where I'' = I' \cup I_1 \cup \ldots \cup I_k. I like to think of the coordinates I_1,\ldots,I_k as being “consumed” or “used up” by the procedure of passing from A’ to A' \backslash A'', in that the operation of interchanging 1 to 2 on these coordinates has been “spent”. But the remaining n - |I''| \approx n interchanging operations remain “unspent”, and that’s good enough for our purposes. (This is the “IP” philosophy from ergodic theory; it’s OK if each operation can only be performed once, so long as one has a huge number of essentially identical copies of that operation to spend.)

    I don’t think we can just fix I’ in advance and maximise some mass over all (I’,1) sets; one has to do some greedy algorithm of the following sort:

    1. We start with a 1-set A on which the function f := 1_B - \delta has mass {\Bbb E} f 1_A \geq c, and a desired dimension m that one wants to get a c/4-density increment on.

    2. DHJ(2.5) should tell us that any 1-set of density at least c/100 should contain a lot of m-dimensional combinatorial subspaces in which the wildcard sets I_1,\ldots,I_m have size at most r for some r = O_{m,c}(1). Indeed, if one chooses a random such subspace (using an appropriate measure), the probability that the entire subspace lies in the set should be at least \varepsilon for some \varepsilon \gg_{m,c} 1.

    3. One should be able to localise DHJ(2.5) to (I,1)-sets rather than 1-sets, so long as |I| is much smaller than n, by the first moment method.

    4. Initialise I' = \emptyset and A’ = A.

    5. A' is a (I’,1) set with {\Bbb E} (f-c/2) 1_{A'} \geq c/2. In particular, A’ has density at least c/2.

    6. Suppose that there exists disjoint I_1,\ldots,I_m in [n] \backslash I' of size at most r, and a (I'',1)-set A” of density at least \varepsilon in A’, where I'' := I' \cup I_1 \cup \ldots \cup I_m such that B has density at most \delta+c/4 on A”. Then replace I’, A’ by I”, A”; this increases {\Bbb E} (f-c/2) 1_{A'} by at least c \varepsilon/4. Now return to Step 5.

    7. We can only loop O_{c,\varepsilon}(1) times, and so the net size of I’ at the end of the day is O_{c,\varepsilon,m,r}(1), and we end up with an (I’,1) set A’ of density at least c/2 such that B has density at least \delta+c/4 on every subset A” of the form discussed in Step 6. Now invoke DHJ(2.5) to find an A” of this form which is the union of m-dimensional subspaces, and we obtain the desired density increment.

    [Amazing how much messier the finitary argument is from the ergodic theory argument, where one simply takes the set where the conditional expectation of 1_B to the sigma-algebra of 02-invariant sets is bigger than \delta + c/2 to be A’!]

    • Terence Tao Says:

      A small correction: the \varepsilon in Step 6 should actually be 3^{-mr} \varepsilon. (The set of good w has density \varepsilon, but the set A’, formed from the union of the V(w), could have a slightly smaller density of 3^{-mr} \varepsilon. But this is no big deal – as long as it doesn’t depend on n or the size of the wildcard set I’, one is OK.)

  12. Terence Tao Says:

    860. Multidimensional Sperner

    I believe one can get the right sort of multidimensional Sperner just by iterating the right sort of DHJ(2).

    If A is a subset of [2]^n of density \delta, then there should exist an r=O_\delta(1) such that if one picks a random r’ between 1 and r, then picks a random wildcard set I of size r’, then picks a random combinatorial line with that wildcard set, then with probability at least \varepsilon = \varepsilon(\delta,r) > 0, the line will lie in A. (In fact one can probably take r to be about 1/\delta and \varepsilon to be about \delta^2, using the usual chain proof of Sperner. If one uses equal-slices measure then things may be particularly favourable.)

    In particular, if we pick r’ and I randomly, then the set A' \subset [2]^{n \backslash I}, defined as the set of all words w \in [2]^{n \backslash I} whose associated combinatorial line with wildcard set I lies in A, will have density at least \varepsilon/2 with positive probability. In particular there exists r’, I for which this statement is true.

    If we iterate this m times (replacing \delta by \varepsilon/2 at each stage) we conclude that there exist disjoint wildcard sets I_1,\ldots,I_m of size O_{m,\delta}(1) such that the proportion of combinatorial m-spaces with these wildcards that completely lie in A is at least \varepsilon_m for some \varepsilon_m = \varepsilon_m(\delta) > 0.

    This is the multidimensional DHJ(2). The standard derivation of DHJ(2.5) from DHJ(2) should then give the multidimensional DHJ(2.5) needed for my argument.

    • Terence Tao Says:

      The argument here, of course, is identical to the usual proof of the Szemeredi cube lemma (that dense subsets of [n] contain high-dimensional cubes), by first iterating the fact given that a subset A of [n] of density \delta one can find a positive h such that A' := (A+h) \cap A has density \gg \delta^2.

    • jozsef Says:

      Terry, I’m not sure that I understand what are you saying here. Would it lead to a Varnavides-type result? Here are the numbers that I see; If a set of subsets of [n] has at least c_d2^n/n^{1/2^d} elements, then it contains a d-dimensional subspace. This bound is close to be sharp. Using this we see that a c-dense subset of 2^{[n]} contains at least cn^{1-1/2^d}2^n d-dimensional subspaces, which is not much. Your calculation suggests much more.

    • Terence Tao Says:

      Dear Jozsef,

      One has to count the subspaces in an appropriate equal-slices fashion to get the right Varnavides-type result. (This was discussed quite a while back, I think in the 1-199 thread, but no harm in reviving it here.)

      Let’s start with m=1: if A has density c in 2^{[n]}, then a random maximal chain \emptyset = S_0 \subset S_1 \subset \ldots \subset S_n = [n] in [2]^n will intersect A in about c \sqrt{n} positions in the middle O(\sqrt{n}) of the chain. [The calculation is a little simpler here if one uses equal-slices density for A instead of uniform density, but never mind that.] The pigeonhole principle then tells us that if we randomly choose an r between 1 and O(1/c), and randomly pick i = n/2 + O(\sqrt{n}), then S_i, S_{i+r} will both land in A with probability about c^2.

      To put it another way, if we pick r randomly between 1 and O(1/c), and then randomly select from all the 2^{n-r} \binom{n}{r} combinatorial lines with r wildcards, then the probability that this line will lie in A is \gg c^2 (there may be some logarithmic losses of \log \frac{1}{c} due to the Chernoff inequality etc.) Note that this is not the same as choosing a combinatorial line uniformly at random among all 3^n-2^n such lines, or even amongst all such lines with O(1/c) wildcards.

      This Varnavides-type theorem gives a weighted count of combinatorial lines in the set; a line with r wildcards has a weight proportional to 1 / 2^{n-r} \binom{n}{r}. This is the one which I think one should iterate, and not the theorem based on raw (unweighted) cardinality of lines.

    • jozsef Says:

      Thank you Terry!

  13. gowers Says:

    861. Density increment.

    I’m going to have yet another go at expressing the argument in terms that I understand. Or rather, I’m going to try to prove the result, using what Terry has written as a huge hint as to what kind of argument I can expect to work. However, I do not plan to give anything like full details in this comment: I just want to convince myself of the argument.

    The set-up once again: I have a 1-set \mathcal{A} (I’d better call it \mathcal{A}, following Terry, to avoid confusion, though earlier I’ve been calling it \mathcal{B} and the dense set that correlates with it \mathcal{A}) defined as the set of all (U,V,W) such that U\in\mathcal{U}; I have a set \mathcal{B} of density \delta such that the density of \mathcal{B} inside \mathcal{A} is at least \delta+\eta; I want a density increase for \mathcal{B} on a combinatorial subspace with dimension tending to infinity.

    Step 1. For every U define d(U) to be the probability that if you randomly partition {}[n]\setminus U into two sets V,W then the point (U,V,W) belongs to \mathcal{B}. Then we are assuming that the average of d(U) over all U\in\mathcal{U} is \delta+\eta. (One of the details I’m going to be very non-explicit about is what measures I am using.) By an easy averaging argument we may pass to a dense subset of \mathcal{U} such that d(U)\geq\delta+\eta/2 for every U in the dense subset. Let’s replace \mathcal{U} by this subset and \eta by \eta/2 and still call them \mathcal{U} and \eta.

    Step 1.5. The obvious thing to do next is pass to an m-dimensional combinatorial subspace (in the {}[2]^n sense) that lives inside \mathcal{U}. Let us write it as Z_0\cup FU(Z_1,\dots,Z_m), which stands for the set of all sets Z_0\cup Z, where Z is a union of some of the Z_i. (The F stands, unnecessarily, for “finite”.) But if we do that then we find ourselves wondering why we bothered, since if we treat Z_0 as a fixed set of 1s and the Z_i as wildcard sets and try to fix the remaining coordinates to get a combinatorial subspace of {}[3]^n with a density increase, then we simply don’t succeed. It could be that as soon as we fill the Z_i with constant strings of 2s or 3s, we end up with sequences that give rise to hardly any extensions that belong to \mathcal{B}. (Just to explain slightly more, our aim is to find a combinatorial subspace such that the average density of possible extensions to a point in \mathcal{B} is too high, so that then we can turn things round and say that a random extension comes from too many points in the combinatorial subspace, which gives us our density increment.)

    Step 2. At this point, we say to ourselves, “Yes, but if that disaster happened all the time, then surely it would have to be compensated for by exploitable density increases elsewhere.” So now we are trying to produce an averaging argument. At this stage, let’s forget that m is supposed to tend to infinity, and treat it as a very large fixed constant. (This is a standard move — if we can always do it for sufficiently large n, then we simply work out what n we needed in terms of m, invert that function, and we’ve got our function of n that tends to infinity. Terry has been using this trick a lot — in fact, easy though it is, it should go on the Tricki.)

    Given that we are trying to produce an averaging argument, we had better look not just for one combinatorial subspace, but for a dense set of combinatorial subspaces. So now let’s assume we’ve got a Varnavides-type result for multidimensional Sperner that tells us that with positive probability (depending on the density of \mathcal{U} and on m but not on n) if we choose our Z_0,Z_1,\dots,Z_m randomly (according to some carefully chosen probability distribution) then Z_0\cup FU(Z_1,\dots,Z_m)\subset\mathcal{U}. Here I’m thinking of Z_0 as being a fairly typical U and Z_1,\dots,Z_m as being very small sets.

    Now let’s suppose that however we do that we find that, to our annoyance, there is some way of assigning 1s, 2s and 3s to the sets Z_1,\dots,Z_m and filling Z_0 with 1s, such that the density of ways of filling the rest of {}[n] with 2s and 3s to get an element of \mathcal{B} is a bit less than \delta+\eta (by some rather small amount that depends on m). So each Z_0 is associated with a positive density (depending on m) of “bad” sequence fragments, formed by taking some small sets Z_1,\dots,Z_m and assigning 1s, 2s and 3s to them. Turning this round we can find a many fragments with a lot of “bad” Z_0s.

    Now if I generate a sequence by picking Z_0,Z_1,\dots,Z_m randomly, filling Z_0 with 1s the rest with 1s, 2s and 3s and the rest of {}[n] with 2s and 3s, then I get almost exactly the same distribution as if I had just chosen a random sequence (or at least, I believe that one can easily ensure this). And I think we can use that to argue that if many fragments give rise to several “bad” Z_0s (by which I mean that the density of extensions by 2s and 3s that give rise to elements of \mathcal{B} is too low) then there must be many fragments that give rise to several “good” Z_0s. But then by averaging we can fix one of those fragments and pass to a (|Z_1|+\dots+|Z_m|)-codimensional subspace inside which we have a 1-set consisting of sets U for which d(U) has gone up very very slightly (by an amount that depends on m).

    Step 3. And now we just iterate. If the density goes up by c=c(m,\delta,\eta) each time, and if our sets Z_i have size bounded above by r, then we need something like mr/c to be distinctly less than n. If that is the case, then the iteration will stop at some point where we can find a combinatorial subspace Z_0\cup FU(Z_1,\dots,Z_m) inside \mathcal{U} with no possible bad assignments of 1s, 2s and 3s to the Z_i with i\geq 1. And that means that for every element of that combinatorial subspace we get too many 2,3-extensions that belong to \mathcal{B}. Turning that round, we pick 2s and 3s randomly outside Z_0\cup\dots\cup Z_m and get on average a density increase in the subspace.

    Terry, I’d be interested to know whether you think that’s exactly the same as your argument, or roughly the same, or the same apart from an error that I’ve introduced, or correct and slightly different, or wrong for a reason that’s hard to correct but once you manage to correct it you get precisely your argument, or what.

    • Terence Tao Says:


      Tim, I think this is essentially the same argument, but I like how you use an additional pigeonhole principle to freeze (and then discard) all the “bad” coordinates, thus avoiding the need to distinguish between global and local 1-sets. It’s not as if there was anything I was planning to do with those bad coordinates anyway, so one may as well pigeonhole them away.

      (Incidentally, if one does some Ramsey-theoretic preprocessing of the set, it may turn out that one has enough “stationarity” that one does not need to pass to subspaces at all… but this is overkill and would in any event produce worse quantitative bounds.)

      I guess the bounds will in fact end up being reasonably civilised at this stage; the bad set I of coordinates is only increasing linearly with the number of iterations, as you point out, not exponentially or anything.

    • Terence Tao Says:

      Some minor remarks:

      – I don’t think the averaging argument in Step 1 is actually needed; one can work with the global density of B in A rather than the local densities d(U) of B associated to U. (If it later turns out that there are lots of Us in A with too low of a B-density, then this will only help us when the time comes to make our density/mass increment.)

      – In order not to have to keep balancing the density of B in A, against the density of A, it is probably better to stare at the mass {\Bbb E} (1_A - \delta - \eta/2) 1_B rather than the density, as this (a) seems to go up by a non-trivial amount at every stage of the iteration, and (b) it automatically provides a lower bound on the density of A, which is needed to stop the amount removed from A at each stage of the iteration from shrinking to zero.

  14. gowers Says:

    862. Density increment.

    I want to throw out another thought, which is that there is something we would be silly not to try to do, though it may turn out not to be feasible. Actually, it may be one of those cases where Randall and Terry will be able to give an informed guess whether it has any chance of working by looking at things from an ergodic perspective. It looks as though we now know that correlation with a 1-set implies a density increase, but what about correlation with a 12-set?

    What is a 12-set? It is what on the wiki I called a special set of complexity 1: you take set systems \mathcal{U} and \mathcal{V} and define \mathcal{A} to be the set of all x with 1-set in \mathcal{U} and 2-set in \mathcal{V}. Equivalently, it is the set of (U,V,W) with U\in\mathcal{U} and V\in\mathcal{V}. (Thus, a 1-set is a 12-set where \mathcal{V}=[3]^n.)

    If the argument on the wiki that says that line-free sets correlate with 12-sets is correct, and if correlation with a 12-set implies correlation with a subspace, then the whole problem is solved, so the motivation for this question is rather obvious.

    A few preliminary thoughts. First, I think it is easy to prove (by imitating the DHJ(2.5) argument) that a dense 12-set \mathcal{A} contains multidimensional subspaces. You just choose some random copy of {}[2]^M inside {}[3]^n (where M is small enough for a random point in this random copy to be approximately uniformly distributed). In that copy, \mathcal{A} will on average be dense, and therefore contain a multidimensional subspace in the {}[2]^n sense. And then, since \mathcal{A} is a 12-set, we can allow the wildcards to be 3 and we still live inside \mathcal{A}. (I think this argument has probably appeared several times already in our discussions — I may even be one of the people to have given it.)

    That proof should give a Varnavides-type statement too. So what would the next stage be?

    Let’s choose a random combinatorial subspace as follows. First we choose random large sets U_0 and V_0. Then we choose random small wildcard sets Z_1,\dots,Z_m. And then we … er … hope that the density of \mathcal{B} inside the resulting subspace (where we assign values 1 to U_0, 2 to V_0, anything we like to the Z_i, and 3 to the rest) is at least \delta+\eta/2.

    If that always fails to be the case, then we argue as follows. With some small but positive probability p, the random combinatorial subspace is a subset of \mathcal{A}. That means that for some random assignment of 1s, 2s and 3s to the Z_i we get too few pairs (U_0,V_0) belonging to \mathcal{B} out of the ones that belong to \mathcal{A}. (By the way, I should confess that I’m losing the thread here and don’t really know whether what I’m writing is correct. But I’ll just plough on.) So that should be balanced by some other random assignment that gives too many pairs. And then we can get a subspace with a small density increase on a 12-set.

    I must go to bed in the knowledge that that argument could be anywhere along the spectrum that goes from complete nonsense at one end to a proof of DHJ(3) at the other. (For now I’ll leave unspecified what measure I am placing on that spectrum.) Perhaps someone will be able to tell me by the time I get up tomorrow.

  15. gowers Says:

    863. Density increment

    Here’s a slightly more precise, but still far from checked, version of what I said in 862.

    1. This is a way that one could choose a random point (U,V,W)\in[3]^n. You choose a random disjoint pair (U_0,V_0) and some random small wildcard sets Z_1,\dots,Z_m. You assign values to the wildcards, fill U_0 with 1s, V_0 with 2s and put 3s everywhere else. That gives you your point, and it’s more or less uniformly distributed.

    2. Now let’s condition on Z_1,\dots,Z_m and the assignment of their values. (I’ll call this a random sequence fragment.) If the probability that (U,V,W)\in\mathcal{B} given the random sequence fragment is ever more than \delta+\eta (plus a tiny tiny amount) then we’ve got a density increase on a 12-set in a subspace, and can iterate.

    3. Therefore, this conditional probability cannot be more than \delta+\eta or we’re done.

    4. But there is a positive probability that U_0\cup FU(Z_1,\dots,Z_m)\subset\mathcal{U} and V_0\cup FU(Z_1,\dots,Z_m)\subset\mathcal{V}. Moreover, the density of \mathcal{B} inside the subspace \langle U_0,V_0;Z_1,\dots,Z_m\rangle (I hope it’s easy to guess what that means) is at most \delta+\eta/2 or we’re done. So, turning things round, there is in fact a positive probability that the conditional probability in 3 is somewhat less than \delta+\eta.

    5. Therefore, by 1 (I think) there is a positive probability that it is somewhat more than \delta+\eta and we’re done.

    Again, that sort of feels as though it could be correct, but it also feels as though it could collapse when I try to write it out properly.

    • gowers Says:

      863.2 I see now that Step 2 is not very clear because I am not mentioning \mathcal{U} and \mathcal{V}.

      If in Step 1 we discard the point and try again unless U\in\mathcal{U} and V\in\mathcal{V} then we end up with a roughly uniformly distributed point in \mathcal{A}. (That is, the restriction of the distribution to \mathcal{A} is roughly uniform.)

      Then in Step 2 I want to fix Z_1,\dots,Z_m, and the probability I’m talking about is the probability of being in \mathcal{B} given that you are in \mathcal{A}. It is this that cannot go above \delta+\eta unless it is possible to get a density increase and an iteration.

      If nobody sees an obvious flaw in this approach by tomorrow morning (my time) then I suppose I’ll try to work out some details properly. I’m slightly suspicious of Steps 4 and 5.

      One tiny remark for clarification: this is not supposed to be the Ajtai-Szemerédi approach: it’s a different argument that would combine the density increase on a 12-set (as given on the wiki) with a Randall/Terry style argument to get from that to a density increase on a subspace. My dream for the project as a whole is that we’ll end up with lots of different proofs …

  16. Randall Says:

    864. Speculative reply to 862.

    From the ergodic perspective, it looks like correlation with a dense 12-set implying a mass increment is still on the order of double recurrence. However, the picture I seem to be getting is that maybe it’s worth pursuing anyway. Here is the idea. Say we assume some fancy double recurrence result (FDR) that is not, on the face of things, as general as DHJ (3). Like for example, what are called IP Szemeredi or IP Roth on the Wiki (comments 2 and 469, respectively). And say we can use this to show that correlation with a dense 12-set implies a density increment on a subspace. Then, assuming that really was all that was needed in the first place, we will have reduced DHJ to FDR, which might not be close to finishing, but might at least be progress.

    The reason I think this might be feasible is, in the ergodic world the projection P onto the asymptotically 02 invariant sets commutes with the projection Q onto the asymptotically 01 invariant sets. So, PQ ought to project to L^2 of some larger sigma-algebra. Correlating with something measurable with respect to this sigma-algebra gets you big fibers, you just need a way to bring these fibers back to themselves along a subspace. That’s where FDR comes in. I think you can probably work in a product space with genuine corners now. The idea is you want to show that A intersect T_{01}A intersect T_{02}A is non-empty (I suppress alpha in the notation, which tells you which coordinates to flip…also you want only to flip things that are zero; strictly speaking the notation I am using probably doesn’t mean anything taken literally, it’s only meant to be suggestive.) Well, on the horizontal coordinate, 1s and 2s are interchangeable, so T_{01}=T_{02} (more or less). On the vertical coordinate, 0s and 1s are interchangeable, so T_{01}=Id (more or less). So basically, you have T_{01} acting on each coordinate, and you need the putative corners type result (FDR) to bring sets in the Cartesian product back to themselves.

    • Terence Tao Says:

      Dear Randall, I think in fact that double recurrence for 12-sets collapses to single recurrence by the argument sketched out in 862. It’s easiest to explain for the corners problem: to find corners (x,y), (x+r,y), (x,y+r) in a Cartesian product A \times B, it suffices to just find the latter two points (x+r,y), (x,y+r) of the corner inside this product, as this will automatically place the third guy inside the corner as well. So the corners theorem in this case follows trivially from the pigeonhole principle; and for similar reasons, DHJ(3) for 12-sets follows from DHJ(2).

      In ergodic language: if f = f_1 f_2, where f_1 is S-invariant and f_2 is T-invariant, then the double recurrence integral \int f S^n f T^n f collapses to the single recurrence integral \int S^n (f_1^2 f_2)  T^n(f_2^2 f_1). In the DHJ world, if f = f_1 f_2, f_1 is \rho \sigma^{-1}-invariant, and f_2 is \tau \sigma^{-1}-invariant, then

      \int \sigma(f) \rho(f) \tau(f) \approx \int \rho(f_1^2 f_2) \tau(f_1 f_2^2)

      at least for words w that are sufficiently “large”.

    • Randall Says:

      864.2 Clarification.

      The set A I am trying to bring back is not a 12-set. It is a set that is measurable with respect to the sigma-algebra of 12-sets. (Notice that, unlike i-sets, ij-sets do not form an algebra.)

      In general, the aim of my last post was to outline a method for obtaining a density increment on a subspace for a set that correlates with a 12-set. If I understand Tim correctly, that may be the last piece. What I hope is possible is to fill in that gap with a double recurrence theorem that is, on the face of things, weaker (and hopefully easier to prove directly) than DHJ (3).

    • Randall Says:

      864.3 Obviously I meant “measurable with respect to the sigma algebra generated by the 12-sets”; although what I actually typed is an amusing oxymoron.

  17. Ryan O'Donnell Says:

    865. Structure Theorem/FK-McCutcheon approach.

    I know we have been making progress lately on Tim’s Ajtai-Szemeredi approach, but I’ve been thinking also about Terry’s approach, started in #818, of finitising Randall’s proof of the Furstenberg-Katznelson argument. Call me craven, but there’s something nice about also working an angle that “in principle” is known to succeed. Perhaps we might even end up with a glorious amalgam of the two approaches.

    The crux of Terry’s wiki version seems to be a proposed structure theorem which splits any function f into a “01-almost periodic relative to 12-low influence” part and a “01-uniform relative to 12-low influence” part. It’s slightly tricky for me to keep even the definitions in mind, and I had a moment of despair contemplating actually executing such a structure theorem.

    Hence I fantasise that some kind of simple structure theorem (a la #821 and #826) might also do the job. Let me throw this fantasy out there and see if it can survive for a bit.

    Put a graph on [3]^n where vertices are connected by an edge if they have Hamming distance 1. Define an operator L^{01} on functions f : [3]^n \to \mathbb{R} by L^{01}f(x) = f(x) - \mathbb{E}[f(x')], where x' is formed from x by choosing a random coordinate and flipping it between 0 and 1 (if it is not 2). Similarly define the operator L^{12}. For t \geq 0, define also the operator H^{01}_t = e^{-tL^{01}}, and similarly H^{12}_t. Note for intuition that if we had defined H^{01}_t for functions on [2]^n, we would have gotten the operator T_{e^{-t/n}} (as used in #822). These operators form a semigroup.

    So my hope is to have the structure theorem be

    f = H^{12}_{t} H^{01}_{t'} f + (f - H^{12}_{t} H^{01}_{t'} f)

    for some carefully chosen small quantities t, t'. Let me give an extraordinarily non-rigorous argument for why this might work out.

    First, is H^{12}_{t} H^{01}_{t'} f “01-almost periodic relative to 12-low influence”? Well, in fact perhaps it’s even 01-low influence. The idea here is that if t and t' are both very small, then hopefully H^{12}_t and H^{01}_{t'} approximately commute. Then H^{12}_{t} H^{01}_{t'} f \approx H^{01}_{t'} H^{12}_{t} f, and the latter function has 01-low influence because it has an H^{01}_{t'} out front.

    And what about g := (f - H^{12}_{t} H^{01}_{t'} f)? Is it “01-uniform relative to 12-low influence”? At the grossest level, I take this to mean, is it true that \langle H^{01}_{s} g, h \rangle is small for any 12-low influence h? (Note that H^{01}_{s} should be self-adjoint so this should be the same as asking if \langle g, H^{01}_{s} h \rangle is small for 12-low influence h.) By definition, this correlation is

    \langle H^{01}_{s} f, h \rangle - \langle H^{01}_{s} H^{12}_{t} H^{01}_{t'} f, h \rangle.

    Again, in the second term here we hope to commute the H^{12} operator to the front, use self-adjointness to put it on the other side, and get

    \langle H^{01}_{s} f, h \rangle - \langle H^{01}_{s+t'} f, H^{12}_{t} h \rangle.

    Now I again fantasise that perhaps by picking s and/or t carefully (at a “plateau in the energy spectrum”) we can say that H^{01}_s f \approx H^{01}_{s+t'} f. Then we’d get

    \langle H^{01}_{s} f, h \rangle - \langle H^{01}_{s} f, H^{12}_{t} h \rangle

    which equals

    \langle H^{01}_s f, h - H^{12}_t h \rangle.

    But h is 12-low influence so h \approx H^{12}_t h and hence this is indeed small.

    I’m normally much soberer than this post would suggest.

    • Terence Tao Says:

      Dear Ryan,

      I’m hopeful on this long-term strategy too, but from experience with finitising the ergodic proof of Szemeredi’s theorem, I’m pretty sure that the proof is going to be yuckier than the Ajtai-Szemeredi-flavoured approach.

      01-uniformity of a function f relative to 12-low influence is a bit stronger than what you’re saying. It’s the assertion that

      {\Bbb E} g(\ell(0)) f(\ell(1)) h(\ell) (1)

      is small for all bounded g and all 12-low influence h, where \ell ranges over lines. The point here is that h depends not only on the fixed coordinates of h, but also on the location of the wildcards. The statement you’re saying is roughly equivalent to saying that (1) is small in the case when h doesn’t depend on the location of the wildcards.

      Perhaps I can motivate things a bit better by considering the analogous notion when counting three-term progressions. Here, the analogue of “12-low influence” is simply “constant”, and a function f: {\Bbb Z}/N{\Bbb Z} \to [-1,1] is “uniform” (the analogue of “01-uniform relative to 12-low influence”) if one has

      {\Bbb E}_{n,r} g(n) f(n+r) h(r)

      small for all bounded g and all bounded h which are independent of n.

      Unfortunately, the structure theorems that show up in ergodic theory, once one leaves the “1-step” world and moves to the “higher-step” or “relative” world, are a bit messy, especially in the finitary world. In principle the machinery from my paper will be relevant here, but I’m reluctant to unpack it all now; I’m hoping that the Ajtai-Szemeredi thread may lead to some new insights or simplifications first. (Also, Tim Austin may soon be coming out with some new work that could also assist with this; more on this once it is more firm.)

    • Terence Tao Says:

      I forgot to say that the key point in the length three progressions example was that while h did not depend on the n parameter (which roughly corresponds to the “fixed positions” of the line), it still depended on the r parameter (which correspond to the “wildcards” of the line). Because of this, I don’t see an easy structure theorem here… the cheapest way is to take Fourier transforms and extract out the large Fourier modes, but this can’t really be done just by heat operators.

    • Randall Says:


      Ryan, it might help to consider the functions

      f(U,V,W)=e(\sum_V -y_i +\sum_W y_i),

      g(U,V,W)=e(\sum_V y_i),

      h(U,V,W)=e(\sum_W -y_i).

      Then f(w(0)) g(w(1)) h(w(2)) = 1 for all variable words w, so your structure theory had better not be counting g as “01-uniform relative to 12-low-influence” or some similar category. On the other hand, g looks like it may be orthogonal to 12-low-influence.

  18. Terence Tao Says:

    866. 12-sets

    Randall raises a good point about 12-sets not being closed under union, though they are still closed under intersection. To borrow a notation I floated a while back, one should make a distinction between “basic 12-sets” (the intersection of a 1-set and a 2-set, which to continue the topological analogy might be thought of as “sub-basic 12-sets”), and “general 12-sets”, which are unions (a bounded number of) of basic 12-sets. In the iterative process that is carving out the 12-set A’ on which to locate a density increment, we are removing basic 12-sets from A’ at each iteration, and so the complexity of the 12-set A’ (i.e. the number of basic 12-sets needed to union together to make A’) increases with the iteration.

    This is going to cause a problem. Some sort of regularity lemma might be needed here (the ergodic analogue of this would be the Lebesgue differentiation theorem: every general 12-set is very dense inside a basic 12-set). But this might not be enough.

    The other alternative is to take advantage of the freedom in the finitary world to restrict to smaller subspaces in one’s hunt for a density increment; this is a trump card that the ergodic world doesn’t really have with current technology. There is a chance that by continually restricting, one may only have to work with basic 12-sets and not general 12-sets. I’ll think about it…

    • gowers Says:

      866.1 This connects in an amusing way with a terminological difficulty I had, and that you picked up, on the wiki. A basic 12-set is like a rank-1 tensor, and at one point I tried to use the word “rank” instead of “complexity” for that reason. But I was inconsistent about it.

      If we went on to DHJ(4) things would get more complicated still. For example, there’s an important distinction between a 12-set (an intersection of a 1-set and a 2-set) and a {1,2}-set (a set A such that whether or not (U,V,W,X) belongs to A depends only on the pair (U,V)). Then a {1,2}-function of rank at most k would be a sum of at most k basic {1,2}-functions, etc. etc.

    • Randall Says:


      Terry, having read Tim’s very exciting posts more carefully, I see what you meant now. Yes, I raised the point of the 12-sets not forming an algebra, but apparently I was too caught up in the ergodic paradigm to ignore it, as I should have. In the ergodic setting one develops a habit of thinking not about how one function correlates with another, but rather how a function projects to a subspace. Hence my counterproductive impulse to immediately go searching for an algebra on which to project.

  19. gowers Says:

    867. Density increment

    I’m going to give a sort of commentary on 863 (which I reproduce here) as a first step towards either making it more precise or discovering a problem with it.

    1. This is a way that one could choose a random point (U,V,W)\in[3]^n. You choose a random disjoint pair (U_0,V_0) and some random small wildcard sets Z_1,\dots,Z_m. You assign values to the wildcards, fill U_0 with 1s, V_0 with 2s and put 3s everywhere else. That gives you your point, and it’s more or less uniformly distributed.

    I don’t foresee any problems with this step. It would use the uniform measure on {}[3]^n and exploit the fact that slices near (n/3,n/3,n/3) all have roughly the same size. The size of the wildcard sets would be some large constant that depended on the density of \mathcal{B}.

    2. Now let’s condition on Z_1,\dots,Z_m and the assignment of their values. (I’ll call this a random sequence fragment.) If the probability that (U,V,W)\in\mathcal{B} given the random sequence fragment is ever more than \delta+\eta (plus a tiny tiny amount) then we’ve got a density increase on a 12-set in a subspace, and can iterate.

    Here is a more precise formulation of what I mean, taking into account comment 863.1. Let me establish my terminology. A sequence fragment is a set Z\subset[n] and a function \sigma from Z to \{1,2,3\}. An extension of a sequence fragment (Z,\sigma) is a sequence x\in[3]^n such that the restriction of x to Z is \sigma. If \mathcal{A} is any subset of {}[3]^n, I shall write \mathcal{A}(Z,\sigma) for the set of all extensions of (Z,\sigma) that belong to \mathcal{A}, which can be naturally identified with a subset of {}[3]^{[n]\setminus Z}.

    The precise statement of Step 2 is now this. If we can ever find a sequence fragment (Z,\sigma) such that |\mathcal{B}(Z,\sigma)|/|\mathcal{A}(Z,\sigma)| is a bit bigger than \delta+\eta, then we can restrict the coordinates in Z to \sigma and iterate. Or at least, we can do that as long as \mathcal{A}(Z,\sigma) isn’t too small.

    One thing that makes me slightly anxious is the fact that the densities of the sets \mathcal{A}(Z,\sigma) may vary quite widely. But it occurs to me here that we may be able to do some kind of regularization by passing to a sequence fragment that maximizes this density and observing that the relative density of \mathcal{B}(Z,\sigma) must also be preserved or we would have found a density increment somewhere along the line. I’m pretty convinced this can be done if it’s needed, so I think for now I won’t worry about this problem too much (which makes me much more confident about the double counting arguments later on).

    I have to go so will continue this later.

  20. gowers Says:

    868. Density increment.

    4. But there is a positive probability that U_0\cup FU(Z_1,\dots,Z_m)\subset\mathcal{U} and V_0\cup FU(Z_1,\dots,Z_m)\subset\mathcal{V}. Moreover, the density of \mathcal{B} inside the subspace \langle U_0,V_0;Z_1,\dots,Z_m\rangle (I hope it’s easy to guess what that means) is at most \delta+\eta/2 or we’re done. So, turning things round, there is in fact a positive probability that the conditional probability in 3 is somewhat less than \delta+\eta.

    This is a bit easier to say in the new language. If Z_1,\dots,Z_m are disjoint sets, let’s write \sum_i\eta_iZ_i for the sequence fragment that takes the value \eta_i on Z_i (so Z=Z_1\cup\dots\cup Z_m). And if x\in[3]^{[n]\setminus Z} then let’s write x+\sum_i\eta_iZ_i for the extension of this fragment that takes value x_i at each i\in[n]\setminus Z.

    For any given Z_1,\dots,Z_m let \mathcal{A}(Z_1,\dots,Z_m) be the intersection of all the sets \mathcal{A}(\sum_i\eta_iZ_i). That is, \mathcal{A}(Z_1,\dots,Z_m) consists of all x\in[3]^{[n]\setminus Z} such that x+\sum_i\eta_iZ_i\in\mathcal{A} for every \eta_1,\dots,\eta_m\in[3]. It is easy to check (and I’ve done so — I promise) that \mathcal{A}(Z_1,\dots,Z_m) is a 12-set. In fact, I think I’ll need to give a quick proof of that later on.

    At this point we use the multidimensional Sperner type result to say that if we choose (Z_1,\dots,Z_m) at random then the expected density of \mathcal{A}(Z_1,\dots,Z_m) is positive (meaning bounded below by a positive constant that depends only on \delta, \eta, m and the sizes of the sets Z_i). This is saying that if \mathcal{A} is a dense 12-set, then a positive density of all possible m-dimensional combinatorial spaces (with very small wildcard sets) is contained in \mathcal{A}. This bit is not thoroughly checked, but I don’t think there’s much doubt that it can be with a bit of effort.

    Now let’s suppose that x\in\mathcal{A}(Z_1,\dots,Z_m). Let U_0 be the 1-set of x and let V_0 be the 2-set of x. Then we must have that the union of U_0 and any subcollection of the Z_i belongs to \mathcal{U}, and similarly for V_0 and \mathcal{V}. And the converse is true too, since \mathcal{A} is a 12-set. Let \mathcal{U}(Z_1,\dots,Z_m) be the set of all U_0 such that U_0\cup FU(Z_1,\dots,Z_m)\subset\mathcal{U} (to use my notation above), and similarly for \mathcal{V}(Z_1,\dots,Z_m). Then \mathcal{A}(Z_1,\dots,Z_m) is (naturally identified with) the set of all x\in[3]^{[n]\setminus Z} with 1-set in \mathcal{U}(Z_1,\dots,Z_m) and 2-set in \mathcal{V}(Z_1,\dots,Z_m). So it really is a 12-set in {}[3]^{[n]\setminus Z}.

    Now let’s fix a choice of Z_1,\dots,Z_m such that \mathcal{A}(Z_1,\dots,Z_m) has positive density. And let’s write \mathcal{A}' for \mathcal{A}(Z_1,\dots,Z_m), and similarly for \mathcal{U}' and \mathcal{V}'.

    For every x\in\mathcal{A}', the subspace of all points x+\sum\eta_iZ_i contains at most (\delta+\eta/2)3^m points in \mathcal{B} or we have our density increase on a subspace. Therefore, if we choose a random (\eta_1,\dots,\eta_m), the proportion of x\in\mathcal{A}' such that x+\sum\eta_iZ_i\in\mathcal{B} is on average at most \delta+\eta/2. So choose (\eta_1,\dots,\eta_m) such that this proportion is at most \delta+\eta/2.

    Now let’s suppose that \mathcal{B} has near maximal density \delta+\eta in \mathcal{A}(\sum\eta_iZ_i), which I will denote by (\mathcal{U},\mathcal{V})(\sum\eta_iZ_i). This set naturally partitions into four parts, according to whether the 1-set/2-set of x belongs/does not belong to \mathcal{U}'/\mathcal{V}'. The density of \mathcal{B} in the \mathcal{U}'\mathcal{V}' part is too small, so elsewhere it must be too big. That gives us a density increase for \mathcal{B} on a 12-set and we can iterate.

    All that can go wrong is if \mathcal{B} does not have near maximal density in \mathcal{A}(\sum\eta_iZ_i). But that would have to happen with positive probability, which means that somewhere we would get a density increase.


    At the moment, as is clear, I haven’t quite got to the point of plunging in and writing things up formally, but the formality is increasing and I’m still not getting any sense that it’s on the point of collapse. This raises questions about how we should proceed if we get to the writing-up stage. The wiki has been great for that, but I think the inconvenience of it would start to bite quite hard if one were actually writing things up in complete detail. (Unfortunately, Luca hasn’t yet written a LaTeX2wiki converter …)

    What I’d ideally like is to start writing things in LaTeX but in such a way that others can edit it. I’m not sure if that is technically feasible, however. Another possibility would be to write a skeleton version on the wiki, with statements of the main lemmas and things like that, and then work on the proofs in LaTeX. Or I could just go ahead and do what Ryan did and post a link to a pdf, which anybody could comment on, and which I’d happily send to anyone if they wanted to add to it or make changes (though we would then have to know at all times who was in charge of the latest version). Or we could do something like that but split it into a number of subfiles, one for each section.

    • Jon Says:

      870. Metacomment

      Perhaps using the wiki for LaTeX may not be that cumbersome, the wiki has the great advantage of being a common repository already, plus it has the tools to compare changes between versions.

      The idea would be to use the wiki simply as a storing device for the latest common LaTeX version, not a place to edit or view the paper. Each time one user A would like to read and/or change something in the LaTeX (say one section at a time as you suggest), A would simply copy-paste the latest wiki version of that section into a blank LaTeX template file on A’s computer and continue editing and monitoring the pdf locally. When done A would simply: (1) copy-paste that new LaTeX section onto the current wiki version; (2) check whether some changes to that section by some other user B have been made during A’s local editing.

      If not, then no problem: A’s version is now the common current one and appears as such on the wiki. On the other hand if A sees that a new version by B had appeared in between, then A would quickly edit the wiki of that section and add as first line “merging in progress” (a signal preventing disciplined other users to make further changes). Then A and B would need to dicuss their respective version in the wiki discussion page or a blog thread, reach a satisfactory common one, add it to the wiki, and finally remove the “merging in progress” tag.

      Since only a few people are working on the project this situation should be fairly rare and localized, so that independent parts of the common overall LaTeX file would progress quickly. Archived versions of the whole pdf file, say one per day, might be storable on a separate blog thread for example.

  21. Gil Kalai Says:

    869. Here is a variant on Sperner that I wonder what is the situation for it, and if our discussion is relevant.

    Supposet that for every k dimensional discrete cube you have a specific “forbidden” pair of maximal distance elements. E.g. for k=5 you can exclude the pair {(10100),(010111)} How large can be a subset of {0,1}^n so that for every k whenevr you fix n-k coordinates you do not have the forbidden pair for the remaining k coordinates. (The case that for every k you forbid {(000…),(11,..,1)} is Sperner.

    we can try even to forbid pairs which depend on the identity of the n-k coordinates or on the contant of the n-k coordinates (but not both as a random large density subset shows).

  22. gowers Says:

    870. Progress report.

    No time to do anything more today. This is to say that I’ve written a skeleton proof on the wiki and will try to flesh it out in the near future.

  23. Ryan O'Donnell Says:

    871. Passing between measures.

    I noted that Tim’s skeleton proof on the wiki at one point sketches why, if you have density \delta in the equal-slices measure, you can pass to a largish combinatorial subspace on which you have density \geq \delta - o(1) in the uniform measure.

    I wrote up this proof carefully on the wiki. I tried to make it crisp; hopefully that didn’t introduce major mistakes.

    Nothing unexpected happened; the result is that you can pass to a subspace of dimension at least \epsilon n while losing only O(\epsilon \sqrt{n}) + \exp(-\Omega(\epsilon n)) additively in the density.

    If someone wants to amuse themselves, they can try to evaluate this quantity that arises: (1/k^2) \int_{p_1 + \cdots + p_k = 1} \sqrt{\sum_{i=1}^k 1/p_i - k^2}. It’s not hard to check that this is O(\sqrt{k}) (I think), but is it in fact O(1)?

    • Ryan O'Donnell Says:

      871.1. In the amusement, it should be 1/k out front, not 1/k^2.

    • Gil Kalai Says:

      871.2 2 questions: How general is the passing between measure phenomenon goes. When there is a group action say for the cup set problem, whenever you prove a bound for any measure the same bound apply to the uniform measure by averaging. What is the situation here? It looks ok for moving from equal slice to uniform (when you pass the a larger subspace) but how general measures you can start with.

      I am probably trying to get away with not reading some detailed proofs and missed some crucial postings, but let me still ask: on the very conceptual and general level, what is the main ingredient that allows fourier proofs to work after all the initial examples of various sets with irregular number of lines and no large fourier coefficients?

  24. gowers Says:

    872. Fourier etc.

    Gil, I’m not quite certain to which Fourier proofs you are referring. But if you mean how can a density-increment strategy have a chance of working if it is not the case that the wrong number of lines implies a large Fourier coefficient, then the answer is (i) that one can keep localizing to subspaces and (ii) that one can directly prove correlation with a 12-set (on a suitable subspace) rather than via some kind of expansion. If you are referring to Fourier approaches to Sperner, the answer (depending on which approach you are talking about — both are written up on the wiki) is either that some non-Fourier ingredients are included, or that equal-slices measure is used so that some of the troublesome examples no longer work.

    Basically though, if I had to choose a one-word answer to your question, it would be “localization”.

    • Gil Says:

      872.1 (just trying to catch up) I suspect this “localization” means that at the end the emerging argument has a strong (probably central) ingredient of Szemeredi-like-regularity-lemma; so we have a distinction between Roth-like-density-increment and Szemeredi-like-density-increment.

  25. gowers Says:

    873. Probable collapse.

    I was beginning to think that the argument sketched out on the wiki had the ring of truth about it. But as so often happens the effort of writing it up has thrown up a problem that seems serious. I’ve only just realized it, so I’m not sure that it can’t be got round by some sort of trick, or more elaborate argument, but at the moment it feels to me as though a new idea is still needed.

    Here, in brief, is the difficulty, which I noticed when trying to write up Step 5 in full detail. The basic idea, which I got from Terry’s 837.2, was to have two iterations going on, one in which one tries to get a density increase on a subspace, and the other in which one tries to get a density increase on a 12-set. The problem, which escaped me before, is that the density increase on the 12-set depends on the density of that 12-set. (It comes from the Varnavides density of multidimensional subspaces that you can find in the 12-set.) So there is no guarantee that this inner iteration will terminate. And unfortunately, at least as the argument goes at the moment, the density of the 12-set may drop quite a bit when one does the inner iteration.

    I’ll probably think a bit more about whether something can be done about this, but I think progress is more likely to come either from returning to the Ajtai-Szemerédi template or from a new approach to getting from increased density on a 12-set to increased density on a subspace.

    Incidentally, one might wonder whether Terry’s argument has the same problem, but it doesn’t, and the reason it doesn’t is instructive. He makes use of the principle that if you have a function f defined on a set X and can find a small subset Y where f is less dense than average, then you can remove that subset from X. If Y is small, then you will get only a small density increase on the complement of Y, but you will have removed only a small set from X so things are OK. Now contrast that with a situation where you have a function f defined on a set X_1\times X_2 and it is less dense on a subset Y_1\times Y_2. By averaging, we get a density increase on one of X_1\times(X_2\setminus Y_2), (X_1\setminus Y_1)\times Y_2 or (X_1\setminus Y_1)\times (X_2\setminus Y_2). However, if Y_1 and Y_2 both have density \epsilon and the density increase is in the last of these sets, then we get a density increase that’s proportional to \epsilon^2 but have to drop to a subset of size around (1-2\epsilon) times the size of the original set. And this is too expensive if you do it over and over again. Basically the difference is that (1-1/n)^n converges to a positive limit while (1-1/n)^{n^2} converges to zero.

    • gowers Says:

      873.1 Actually, the situation is slightly worse: I now see that the problem that I discussed in the last paragraph above is precisely the problem that Terry was talking about in 866 and Randall in 864.2.

      A question that occurred to me just now was whether one might be able to deal with low-complexity members of the sigma-algebra of 12-sets. But I now see that that still creates problems, since then we don’t get a multidimensional Sperner-type theorem that just depends on the density of the set (or rather, finding such a result doesn’t seem any easier than DHJ(3) itself).

  26. Terence Tao Says:


    Unfortunately I won’t have much time to devote to this project for the next few days, but one possible way around the problem may be to factor the density increment into two pieces. Suppose that f has a density increment on X_1 \times X_2, then roughly speaking this means that f 1_{X_1} has a density increment on X_2. One could try to hold X_1 fixed and start passing to subspaces to increment the density of f 1_{X_1} until it reaches some saturation point, and then f would have some sort of “relative density increment” on X_1, and then one could pass to subspaces again to clean up X_1. This is terribly vague and there are a large number of issues, including the fact that 1-sets and 2-sets are not independent (so the analogy with Cartesian products is slightly misleading), though perhaps some preliminary regularity lemma type arguments might deal with that problem.

    • gowers Says:

      874.1 Terry, I don’t think I understand your suggestion even vaguely. For instance, if in the \mathbb{Z}_n^2 world we choose random sets V,X_1,X_2\subset\mathbb{Z}_n and take all points (x,y)\in X_1\times X_2 such that x+y\in V, then we have a density increase on X_1\times X_2, but it doesn’t seem to be possible to do much by looking at each variable separately.

  27. gowers Says:


    Now that this has happened, I want to suggest another question, which I had been slightly suppressing. It was always a bit of a worry that we were trying to get from a density increase on a 12-set to a density increase on a subspace when we did not know an analogous argument in {}[n]^2. I had persuaded myself that this might be OK because we were passing to combinatorial subspaces instead of long APs, but in retrospect that was not a very convincing argument. So my question is this. Suppose we allow ourselves to use Szemerédi’s theorem as a black box and we have a subset A\subset[n]^2 of density \delta that correlates with a dense Cartesian product X\times Y. Can we get a density increase on a grid (defined to be a product P\times Q, where P and Q are APs with the same common difference and length tending to infinity)?

    If we use arguments that would also prove a functional version, then it’s fairly easy to prove that what we are really trying to do here is prove that the characteristic function of X\times Y can be approximated by a positive linear combination of characteristic functions of grids. Roughly, the argument goes like this. If you can’t approximate it (in L_1) then the Hahn-Banach theorem gives you a bounded function f that has average at most zero on all grids but average 1 on X\times Y. Taking g to be \delta+\eta f for some small \eta we get a function with a density increase on X\times Y but no density increases on grids. Conversely, if the approximation is possible, then an easy averaging argument gets you from a density increase on X\times Y to a density increase in one of the grids.

    In one dimension, the analogous question is easy to answer. If you have a dense subset X\subset[n] then you can apply Szemerédi’s theorem to find an arithmetic progression P\subset X. If you remove P from X you still have a dense set, so can repeat. Keep going until you’re left with just a small proportion of P and you have then approximated P by a union of arithmetic progressions.

    The problem in two dimensions is that while it is not too hard to find a single grid inside X\times Y (by averaging find s such that X\cap(Y+s) is dense and apply Szemerédi’s theorem), when you remove this grid, you no longer have a Cartesian product. And perhaps it is also worth mentioning that the trivial argument does not work: if you approximately partition both X and Y into APs and take their Cartesian products, you won’t get grids, because the common differences in the two directions will not necessarily be the same.

    It is for this reason that I think our best hope is to go back to Ajtai-Szemerédi, because they avoided this problem.

  28. gowers Says:

    876. Ajtai-Szemerédi

    My natural instinct is to go off and have a hard think about how to adapt the Ajtai-Szemerédi proof, but I want to try to do things the polymath way, which may well mean a few comments that are vague, or go nowhere, or where I can’t quite explain properly what I’m trying to do. However, here goes.

    What we appear to be able to do is choose W such that there are many pairs (U,V) with (U,V,W)\in\mathcal{A}, and with the additional property that for almost all of the U involved in those pairs we have almost the expected density of (V',W') such that (U,V',W')\in\mathcal{A}. Having fixed such a W, we can also find a combinatorial subspace in {}[2]^n consisting entirely of U with both properties.

    What we would like to do at that point is say that there are many combinatorial subspaces in {}[3]^n that project down to this combinatorial subspace in {}[2]^n.

    Temporarily forgetting that last paragraph, let’s rephrase the Ajtai-Szemerédi argument as follows. By the usual averaging argument, we may assume that almost all grids have almost the right density. But if we choose a random grid, then with positive probability we find that it has a positive proportion of horizontal lines that are forbidden, because all the vertical lines point to points in a dense diagonal, and several horizontal lines point to points in the same dense diagonal. In other words, with positive probability, we get a density increment on a 1-set rather than a 12-set.

    I have to go for a little while, but I’m going to think along these lines for a bit.

  29. Randall Says:

    877. More speculation

    I had a look on the wiki at the Ajtai-Szem proof page (I am somewhat embarrassed to admit I had never seen this proof before), and I have a few initial thoughts that may or may not lead in a positive direction.

    First, I don’t see how multidimensional DHJ (2) can be the analog for Szemeredi’s theorem, given the disparity in depth between the results. In particular, it seems to me that whatever the analog for Szem is going to be, it has got to be a multiple recurrence theorem. I can’t think of what that would be in the DHJ setup, though. Now, as it happens I have long been pushing for a reduction of the problem to something else anyway, so thinking along these lines, here is a general proof strategy.

    Step 1. Reduce DHJ (3) to IP Roth using the “density increment on a subspace for a set correlated with a 12-set” strategy I tried to outline in a speculative fashion in 864. (This step is indeed speculative; I have no idea whether or how it can work.)

    Step 2. Reduce IP Roth to “IP* Szemeredi” via the Ajtai-Szemeredi argument. (I have not thought about this at all. It might be really easy, though, if the analogs are the right ones, given how easy the Ajtai-Szem argument is.)

    Step 3. Come up with a combinatorial proof of IP* Szemeredi. (No real idea how this might go….)

    IP* Szemeredi: Let \delta>0, k\in {\bf N}. There is an n such that if A\subset [n]^{[n]} with |A|>\delta n^n then A contains a configuration of the form \{f+i\cdot 1_\alpha: 0\leq i <k\} for some \alpha\subset [n] and some f:[n]\rightarrow [n].

    • gowers Says:

      877.1 Randall, that first point of yours is one I’ve mentioned a couple of times, but I haven’t really properly justified my view that it shouldn’t be a problem. So I’m going to think about it now. I may well end up deciding that it is a problem after all.

    • Randall Says:


      Well there is certainly no reason not to uses subspaces in that way; in fact it’s part of the Furstenberg-Katznelson argument (cf. first three lines of p. 7 of my notes). Something different seems to be going on in the Ajtai-Szemeredi argument, though, where Szemeredi’s theorem seems to be doing virtually all of the work.

  30. gowers Says:

    878. Ajtai-Szemerédi

    Hmm, what I said above is actually not the Ajtai-Szemerédi argument because they don’t choose a random grid like that. Instead, they choose a single set of vertical lines and a random grid that runs across in the other direction.

    So let’s think more like that. I’ll start by ignoring the requirement that the vertical lines have to have roughly the right density. Moving over to the {}[3]^n world, I find my “dense diagonal” — that is, a W such that many (U,V,W) belong to \mathcal{A}. Now I want to look for a combinatorial subspace with a good property of some kind. What should that property be?

    Let \mathcal{U} be the set of all U such that (U,[n]\setminus(U\cup W),W)\in\mathcal{A} and let \mathcal{V} be the set of all V such that ([n]\setminus(V\cup W),V,W)\in\mathcal{A}. Then a good property would be if all the points in the combinatorial subspace had their 1-sets in \mathcal{U}. Then we would know that no point with its 2-set in \mathcal{V} could belong to \mathcal{A}, or something along those lines, which would give us correlation with a 2-set. That would give us a density increase as long as we also knew that the density of \mathcal{A} in the combinatorial subspace was almost maximal.

    I’m finding this boringly hard to do on screen. I’ll allow myself a little bit of offline time to try to clarify what I’m saying.

  31. gowers Says:

    879. Correlation with 1-sets

    Nothing conclusive to report, so instead I want to revisit the question of showing that correlation with a 1-set implies a density increase on a subspace. In particular, I want to get a feel for whether the double iteration is necessary.

    It occurs to me that a Hahn-Banach argument ought to prove that the question is equivalent to showing that the characteristic function of a 1-set can be approximated in L_1 by a positive linear combination of characteristic functions of subspaces. At some point I’ll check that, and maybe even put it on the wiki, but for now I’ll assume it. So how should we write an arbitrary dense 1-set as a positive combination of subspaces?

    I think I see a way. No time to be fully detailed, but I’m aiming for something that’s equivalent to what we did above. The first step would be to choose small wildcard sets (Z_1,Z_2,\dots,Z_m) such that a positive proportion (depending on m) of the combinatorial subspaces with those wildcard sets are subsets of the given 1-set.

    All these combinatorial subspaces are disjoint, so we can safely remove them. Let Z=Z_1\cup\dots\cup Z_m and partition the rest of {}[3]^n according to how points restrict to Z.

    For every sequence z\in[3]^Z, let \mathcal{U}(z) be the set of all sequences y\in[3]^{[n]\setminus Z} such that (y,z) belongs to the 1-set \mathcal{U}. Also, let \mathcal{U}' be the set of all y\in[3]^{[n]\setminus Z} such that (y,z)\in\mathcal{U} for every z that is constant on all the Z_i. The precise partition we shall take is this. For each z that is not constant on all the Z_i we take the set \mathcal{U}(z). And for each z that is constant on all Z_i we take the set \mathcal{U}(z)\setminus\mathcal{U}'(z).

    This has given us a 1-set inside each combinatorial subspace obtained by fixing the coordinates in Z. And the average density of those 1-sets is down by a small factor (depending on m) from the density of the original 1-set. So we can iterate the procedure.

    I think that’s a nice clean way of presenting the Randall/Terry result about 1-sets. Of course, it’s still using something similar to the double iteration.

  32. jozsef Says:

    880. Shelah’s v.s. DHJ

    It seems to be a serious difficulty to follow the double iteration in a density incremental argument. It might be helpful to check what would be a Shelah-like density proof look like for k=3. The first step is actually the same; prove that for any 2-colouring of 3^{[n]} there are (many) “flip-flop” subspaces. A d-dimensional subspace of 3^{[n]} can be represented by d classes of wildcards, the elements from same class always have the same characters. Two elements in the subspace are neighbours if they differ only in one wildcard class where one is 2 and the other is 1. The subspace is flip-flop if any pair of neighbours have the same colour. The second step would be to show that there is a monochromatic line in 2^{[d]}. In our case it would mean to show that there is a flip-flop d-subspace with at least c_d3^d elements where we allow c_d to go to 0 slowly. There are two advantages; first that we allow c_d to go to 0, second that the number of flip-flop subspaces is independent of the original density as it follows from the two colouring of 3^{[n]}.
    This looks quite promising to me, but let me check first what did I write here…
    The first part of Shelah’s proof shows that

  33. jozsef Says:

    880.2 For DHJ, we say that a subspace is flip-flop if there are no neighbours that one is in our set and the other isn’t. (One might think that we can’t gain anything from this if most of the pairs are not from the set, however we will never make any statistics on the number of neighbour pairs inside or outside of our set in a subspace.) To prove that there are many flip-flop subspaces we can follow the original colouring proof; Colour every element of our dense subset by red and the points in the complement by blue. I will try to find a link to the proof or I will write it up myself. Then, the Varnavides type argument gives many flip-flop subspaces. The number of d-dimensional flip-flop subspaces is independent of the density of our pointset, but it certainly depends on the dimension d. The second observation is that every flip-flop subspace is sparse or there is a line. I think I should write up this part.

    • jozsef Says:

      880.2 For the proof of the existence of flip-flop subspaces I have find two books on Google, “Ramsey Theory” by Ronald L. Graham, Bruce L. Rothschild, Joel H. Spencer, and Jukna’s “Extremal Combinatorics”. There is a nice paper A. Nilli, “Shelah’s proof of the Hales–Jewett theorem” , Mathematics of Ramsey theory (Algorithms Combin.) , 5 , Springer (1990) pp. 150–151, but I was unable to find it online. I think that the original name was “fliptop” for a colouring of a subspace where top neighbours received the same colour, but the top isn’t special, the bottom pair would work as well, so I’ve changed it to flip-flop as it’s more appropriate (and funny)

    • jozsef Says:

      All proofs I know for the existence of flip-flop subspaces are recursive. (see the references above) For a d-dimensional flip-flop subspace one needs the recursion r_{1+1}=r^2_i3^{r_i} with r_1=3. n should be at least r_1+r_2+...+r_d to guarantee a d-dimensional flip-flop subspace in 3^{[n]}. This was also the type of proof I knew for the Sperner subspace theorem, but checking Tim’s write up in the Wiki, I realized that his proof is somewhat different.

    • jozsef Says:

      Well, Tim’s proof in the Wiki isn’t significantly different from the “traditional” proof but it’s elegantly written. Note that the recursive proofs give very uneven subspaces; the sizes of wildecards are increasing recursively as well. It isn’t a problem when one considers HJ where every point has its colour, however this property makes it difficult the use of such subspaces for density problems.

    • jozsef Says:

      After the second reading of Tim’s proof I see now that one can choose p, q, and \delta that one can get “balanced” subspaces.

  34. gowers Says:

    881. Correlation with 12-sets

    I’ve got to go to bed, but an idea has occurred to me. Maybe I’ll see by the morning that it’s nonsense. But a 12-set is just an intersection of a 1-set with a 2-set. So maybe one can use 879 to partition (almost all of) the 1-set into subspaces, and then use 879 again to partition the intersection of the 2-set with each of those subspaces into further subspaces, thereby ending up with a partition of the 12-set into subspaces.

    For Cartesian products in grids it would work like this. Given X\times Y, you first partition X\times[n] into grids, which is easy. And then inside each of those grids you partition the intersection of that grid with {}[n]\times Y into further grids.

    If that second argument is correct then (i) I don’t know why I didn’t spot it before and (ii) it suggests that the first one has a good chance of being correct. And if the first one is correct, it seems to do DHJ(3).

    Off to bed while this still feels good …

    • Terence Tao Says:


      This looks like it would work in the {}[n]^2 world, and give an answer to your 875 (and would also formalise my 874, for that matter). The one thing to bear in mind is that Szemeredi allows one to take the spacing of the long arithmetic progressions in X or in Y to be of size O(1) rather than O(n), by working locally. (Meanwhile, the length of the progressions is something like {}\log \log \log \log \log \log \log n.) That way, you don’t lose too much when taking the GCD of two different spacings.

      Of course, in the Hales-Jewett world, we don’t have GCD, but the trick of rendering a few coordinates “bad” and working with local 1-sets, etc. rather than global ones may help. (We may eventually have to also break out the Ramsey theory to make the local statistics match the global statistics; this is related to Furstenberg-Katznelson’s “strong stationarity” which, after talking to Tim Austin a bit, I suspect we may have to exploit to finish off this problem.)

    • gowers Says:

      881.3 Terry, I think I don’t need to worry about GCDs of spacings. In the corners world I just partition a 1-set into grids, and then the restriction of the 2-set to each grid is still a 2-set, so I partition its restriction to each grid into further grids. What’s more, this can be seen as a sort of dualized version of what Ajtai and Szemerédi do themselves. I’ve woken up still feeling very good about this, and plan to get wikifying straight away.

  35. gowers Says:

    882. Progress report

    I am in the middle of wikifying the latest DHJ(3) attempt. This time I would actually be prepared to put money on the argument working (unlike last time, when there were too many slightly complicated bits that I felt I didn’t fully understand). So far, I’ve written up a new proof of the corners theorem to serve as a template for the new DHJ(3) argument. The new proof of the corners theorem is not totally new: it is more like a reorganization of the ideas that go into the Ajtai-Szemerédi argument. Nevertheless, it simplifies things in a way that is crucial for the DHJ(3) proof.

    • Randall Says:


      I have looked at your new proof of corners and it really does make less mysterious what Szemeredi is doing. When I initially read the original proof of Ajtai/Szemeredi yesterday, it struck me that Szemeredi’s theorem was being used not once but twice…what was confusing was that it was used once on the diagonal, then again on one of the coordiates. The use on the diagonal gave an impression that DHJ (2) would go proxy for it in the DHJ (3) case. This struck me (everyone else too, I gather) as odd, given the disparity in depth, etc…. Something had to explain the fact that one wasn’t doing something about “compactness relative to the diagonal” or some such, and Szemeredi was the only culprit on offer. Your proof uses Szemeredi’s theorem twice also, once on each of the coordinates; indeed, it now appears that what Szemeredi’s theorem is actually doing in that proof is going proxy for a relative compactness over the diagonal notion. And, in the DHJ (3) case, what you have in mind to fill in here is the idea of partitioning a dense 1 set into dense subspaces, if I understand correctly. (So it’s that which corresponds to the use of Szemeredi after all, not DHJ (2)). And this isn’t all that surprising anymore, given that the proof of that seemed to involve (at least at the very superficial level I understand it at) a look at the decomposition over the diagonal.

      Aesthetically, all of this seems dead on, so I will not take your bet (and will indeed be quite depressed if something else is amiss).

  36. jozsef Says:

    883. Strong Sperner

    It is very possible that by now Tim is just polishing the write up of a combinatorial DHJ, but still let me go back to the unevenness of multidimensional Sperner or flip-flop subspaces which one can get by recursive arguments. It would be better to have a control on the arithmetic structure of such subspaces. In his Wiki article Tim describes a strong version of multidimensional Sperner. Unfortunately the argument there uses DHJ what we don’t want to use. On the other hand we might get a similar result by using multidimensional Szemeredi. Given a dense subset of 2^{[n]} denoted by A. Take a random permutation of [n]. An element of A is “d-nice” if it consists of d intervals, each has length n/2d\pm \sqrt{n}, and each interval begins at position id for some 0\leq i\leq n/d. (Suppose that d divides n) Any interval like this can be represented as a point in a d-dimensional [\sqrt{n}]^d cube. If it’s dense then multidimensional Szemeredi gives us a strong Sperner.

    • gowers Says:

      Metacomment: I wouldn’t say I’d got to the polishing stage exactly, but if you want to see what’s going on it’s here. At the moment, I simply don’t see anywhere where it can go wrong, but I’ve had that feeling about wrong proofs in the past, so I won’t feel entirely happy until I’ve got a bit further.

      However, the main point of this comment is to say that I strongly support your idea of looking at flip-flop subspaces. What I would really like is for the polymath collaboration to produce a polyproof. If the Ajtai-Szemerédi approach works, then that’s just the start: I’d like a triangle-removal approach, a Shkredov approach, and an ergodic approach, and if a Shelah-influenced approach is potentially feasible too then I’m very interested.

  37. Ryan O'Donnell Says:

    884. Wikification.

    Here is a short writeup of the multidimensional Sperner stuff roughly following Terry.860, as used in Tim.879. I will wikify it soon. In fact, this was more or less already on the wiki, in Tim’s latest additions. Here are the parameters:

    Let A \subseteq [2]^n have density \delta and let Y_1, \dots, Y_d be arbitrary disjoint subsets of [n] of cardinality r \geq (1/\delta)^{O(2^d)}.

    Choose a random nondegenerate d-dimensional subspace as follows. For each i, choose a random nondegenerate combinatorial line in [2]^{Y_i}, uniformly from the 3^r - 1 possibilities. (Actually, you can choose the line from virtually any reasonable distribution.) Form the final subspace by taking the Cartesian product of these lines, and then filling in the coordinates outside the Y_i‘s uniformly at random.

    Then this entire subspace is contained in A with probability at least

    \exp(-\exp(O(1) \ln(1/\delta) 2^d)).

    The O(1) can more or less be 2.

    • jozsef Says:

      884.1. It is a nice write up! There is a gap between the upper and lower bounds in Gunderson-Rodl-Sidorenko. As I remember, the density is between n^{-1/2^d} and n^{-d/2^d}. Ryan, do you think that you can close the gap?

    • Ryan O'Donnell Says:

      884.2. Done; it’s here. I’d change Tim’s writeup to point to it, but he seems to be editing it currently 🙂

    • gowers Says:

      884.3 I’ve added a link. (I’ve kept what I wrote too because I quite like having different styles of explanation on the wiki, even of the same result.)

    • Ryan O'Donnell Says:

      884.4. Hi Joszef — not sure about closing the gap… Actually, as written it doesn’t quite even match [GDR]: it requires density approximately n^{-1/2^{d+1}} rather than their n^{-1/2^d}.

  38. gowers Says:

    885 DHJ(3)

    I’m now pretty confident that the modified-Ajtai-Szemerédi-based approach to DHJ(3) is in the bag. I have a complete informal write-up on this page of the wiki, though some of the ingredients (such as getting a density increment on a 12-set in a subspace) are on other pages. A certain amount of work will be needed to get it into an acceptable form for a journal article, but not, I hope, as much as all that.

    If everyone else shares my belief that it works, then I’d be more interested in pressing on and doing DHJ(k), or at least thinking about its feasibility, than in rushing to write it up with all the numbers put in. Also, it seems to me that the statement about line-free sets correlating locally with 12-sets should be provable using localization rather than equal-slices measure, and that would bring it into line with the rest of the proof. So that’s something else I think we should try to do before writing anything up properly.

    In my next comment I’m going to speculate a little about DHJ(4).

    • Ryan O'Donnell Says:

      885.2. Hi Tim. I agree it’s looking pretty solid. Thanks for all the wikification! I plan to check it over tonight.

    • gowers Says:

      885.3 That’s great — I’ve got to go to bed pretty soon, but I’ll look forward to checking in the morning to see whether it still looks solid to you by then. Thanks for your wikification too!

  39. gowers Says:

    886 DHJ(4)

    How might the argument generalize to DHJ(4)? Probably a good way to think about this is to try to deduce the 3D corners result from the full 2D Szemerédi theorem. This wouldn’t be much use as a proof of the corners theorem because nobody knows a proof of the full 2D Szemerédi theorem that does not also give 3D corners. (Probably one could falsify that last sentence in silly ways, but I think it’s basically true.) However, there is reason to hope that (i) the multidimensional DHJ(3) theorem can be obtained from the one-dimensional theorem by some kind of trickery and (ii) it can be used as an ingredient for proving DHJ(4) in the way that multidimensional Sperner was used for proving DHJ(3). Eventually, of course, I want to come back to (i) and (ii) but for now I’ll stick to the easier world of 3D corners.

    For 2D corners, the first step is to find a dense diagonal. The nice thing about a dense diagonal is that it gives rise to lots of forbidden points: indeed, if (x,y’) and (x’,y) both belong to the same diagonal, with x<x', then (x,y) is not allowed in the set.

    Even better, the set of all forbidden points has a nice Cartesian-product structure. (In fact, it’s the points of a Cartesian product that lie below the diagonal, but that will contain a large Cartesian product.)

    The analogue of a diagonal for 3D corners is a plane of constant x+y+z. How can a dense diagonal forbid other points? Answer: if you have a suitably aligned equilateral triangle (x+d,y,z),(x,y+d,z),(x,y,z+d) then you forbid the point (x,y,z). Now the 2D corners theorem tells us that a dense diagonal plane contains many such equilateral triangles, so we end up forbidding a good lot of points. What is rather less clear is what kind of structure that set of points has. In fact, it’s so unclear that I think I’d better stop this comment because I do not immediately have anything useful to say about it.

    Actually, perhaps I do. What would be very nice would be to get a density increase on a dense (12,23,13)-set. By that I mean a dense set B of the form \{(x,y,z):(x,y)\in R,(y,z)\in S,(x,z)\in T\}. That would be nice because it is the natural analogue of a 12-set (natural, that is, for anyone who has thought about hypergraph regularity and that kind of thing).

    If the world is a friendly place, it will turn out that the set of points that form the bottom vertex of a 3D corner with the other three vertices in the dense diagonal plane is a dense (12,23,13)-set. Is it? Yes of course it is! It consists of all points (x,y,z) such that three conditions hold. The first is that if you go in the z direction until you hit the diagonal plane, you hit it at a point in A. But that condition depends on (x,y) only. The other two conditions depend on (y,z) and (x,z).

    OK, this is looking good. So now we’ve got our dense (12,23,13)-set that’s disjoint from A. By averaging we find one with which A correlates. So now all we have to do is partition a dense (12)-set (where this does not mean the same as a 12-set, but rather it means a set that depends just on (x,y)) into large 3D grids. And that we can easily do using 2D-Szemerédi! If the (12)-set is all (x,y,z) such that (x,y)\in B, then by 2D Szemerédi we can partition almost all of B into large grids with fairly small width. For each such grid G we can then partition G\times[n] into large 3D grids, and we’re done. The rest of the argument is almost exactly as before.

    Obviously, this technique is going to work to show that Szemerédi in d dimensions implies corners in d+1 dimensions.

    So it looks very promising for DHJ(k). The first target, it seems to me, is to get a multidimensional version of DHJ(3). Somehow the whole thing feels as though it is not going to be too hard …

    • gowers Says:

      886.1 No time to write it now, but I see how to deduce multidimensional DHJ(3) from DHJ(3). And indeed it is not hard.

    • Randall Says:

      886.2 General k

      I had a look at the FK proof for general k and found some very interesting parallels between their proof and what Tim has been doing in the past 48 hours. (And what he proposes above.) Quite striking, really, right down to the trick of cutting things up in one dimension first, then the other. (I must have forgotten this trick, as I didn’t consider using it for k=3.) At any rate, the general outline suggested for k=4 (and beyond) looks terribly sound.

  40. Ryan O'Donnell Says:

    887. Wikification.

    I started going through the proof Tim has sketched from the beginning, trying to fill in a few small details.

    I thought briefly about removing the use of equal-slices density in the first part of the argument, wherein it is shown that line-free sets correlate with 12-sets. It wasn’t immediately clear to me how to do this. Therefore I decided to leave it alone, and work out the “technicality” of passing from relative density under equal-slices to relative density under uniform, discussed in the last paragraph of the proof sketch.

    Specifically, this requires the details in the “more details” section of the currently abortive original density-increment plan.

    Therefore, I worked for a bit to clean these up. As usual, no surprises; everything is fine. Indeed, one can do it passing to subspaces of \Omega(n) dimension. I added the last 1% to Tim’s sketch and put it in the passing between measures wiki article. The only minor innovation is noting that you can write equal-slices exactly as a mixture of uniform-distributions-on-subspaces.

  41. Ryan O'Donnell Says:

    888. More wikification: equal-slices.

    I added the following observation (which I assume was clear to most everyone already) to the wiki entry on equal-slices measure, which helped me understand the “Hang on” part of the proof that line-free sets correlate with 12-sets.

    Another equivalent way to draw from the equal-slices distribution is as follows. Start with a string of n “dots” \bullet. Next, place a “bar” \mid randomly in one of the n+1 “slots” between (and to the left and right of) the dots. Next, place a second bar randomly in one of the n+2 slots formed by the string of n dots and one bar. (At this point we have determined the “slice”.) Next, fill in all the dots to the left of the leftmost bar with 1‘s; fill in all the dots between the two bars with 3‘s (not 2‘s!); and, fill in all the dots to the right of the rightmost bar with 2‘s. Delete the bars. Finally, randomly permute the resulting string of 1‘s, 2‘s, and 3‘s.

    With this viewpoint, it may be easier to understand the joint distribution of the 1-set and the 2-set of a string drawn from equal-slices. Specifically, it is one that is useful for proving density-Sperner’s theorem.

    Fact: Let z be a string drawn from the equal-slices distribution, in the manner described above. Let x \in [2]^n be the string that would have been formed had we filled in all the dots to the left of the first bar with 1‘s and all the dots to its right with 2‘s. Similarly, let y \in [2]^n be the string that would have been formed had we filled in all the dots to the left of the second bar with 1‘s and all the dots to its right with 2‘s. Then the following should be easy to verify:

    (i) x and y are both distributed according to the equal-slices distribution on [2]^n (but not independently);

    (ii) x, y, z form a combinatorial line in [3]^n; in particular, x and y are “comparable” in [2]^n, i.e., either x \leq y or x \geq y;

    (iii) \Pr[\text{line is degenerate}] = \Pr[x = y] = 2/(n+2).

    From these facts we can derive the density version of Sperner’s Theorem:

    Theorem: Suppose A \subseteq [2]^n has equal-slices density \delta. Then according to the above distribution on (x,y) \in [2]^n \times [2]^n, we get a nondegenerate combinatorial line in A with probability at least \delta^2 - \frac{2}{n+2}.

  42. Ryan O'Donnell Says:

    889. Last wikification of the night.

    Okay, using the above mentality I was able to rewrite in my own words Tim’s proof that line-free sets correlate with 12-sets. I added these words to the wiki, modulo the passing from uniform density to equal-slices density (which is still in that article and also partially here). It’s pretty late at night for me, so I hope I got it right.

  43. gowers Says:

    890. Multidimensional DHJ(3)

    My main teaching days are Mondays and Tuesdays this term, and today and tomorrow are the last two such days of term. So I’ll be fairly busy, but I hope I’ll still have a bit of time for blogging and wikification. Here I want, as a pre-wikification exercise, to sketch a proof that DHJ(k) implies multidimensional DHJ(k). I’ve woken up with the feeling that DHJ(k) is going to go through almost as easily as DHJ(3). If that is the case, it will be unexpected for two reasons. First, it will give a proof of Szemerédi’s theorem that has a strong claim to be the simplest known. (The only rival I can think of is a particularly clean approach via infinitary hypergraphs, due to Elek and Szegedy, but I may be wrong.) Secondly, it would be the first proof of Szemerédi’s theorem for which “the general case is the case k=3”. By that I mean that in all other proofs you have to go at least as far as k=4 before it’s obvious how to generalize, and in some you have to go to k=5. (Perhaps a true understanding of the problem would require a proof that generalizes straightforwardly from the k=2 case …)

    Back to multidimensional DHJ(k). Here’s what I think works. Let \mathcal{A} be a density-\delta subset of {}[k]^n and let M be large enough so that every subset of {}[k]^M of density at least \theta contains a combinatorial line. Now split {}[k]^n up into {}[k]^M\times[k]^{n-M}. For a proportion at least \delta/2 of the points y in {}[k]^{n-M} the set of x\in[k]^M such that (x,y)\in\mathcal{A} has density at least \delta/2. Therefore, by DHJ(k) (with \theta=\delta/2) we have a combinatorial line. Since there are fewer than (k+1)^M to choose from, by the pigeonhole principle we can find a combinatorial line L\in[k]^M and a set \mathcal{A}_1 of density \delta/2(k+1)^M in {}[k]^{n-M} such that (x,y)\in\mathcal{A} whenever x\in L and y\in\mathcal{A}_1. And now by induction we can find an (m-1)-dimensional subspace in \mathcal{A}_1 and we’re done.

    This gives a truly horrible bound, and should mean that if DHJ(k) goes through as I expect (and Randall also expects, I’m glad to see from 886.2), the bound that comes out at the end will probably be of Ackermann type, so it will be comparable to the bounds that come out of the hypergraph approach. (A small challenge that I know some people out there would enjoy is to try to see how this approach to Szemerédi fits in with the general philosophy that all the different proofs are at some deep level manifestations of closely related ideas. There are distinct echoes of hypergraphs in this proof, and yet it is far easier than hypergraph regularity and counting — what is going on? Possibly that we are “cheating” by continually passing to subspaces, but why can’t we do that with hypergraphs? Or can we? Perhaps there’s a way of passing to subgraphs without throwing away too many degenerate simplices. Hmm … I quite like that but no time to pursue it just at the moment.)

  44. Ryan O'Donnell Says:

    891. Corrected/expanded a bit the “passing between measures” article on the wiki.

  45. Ryan O'Donnell Says:

    892. Wikification.

    I finished the 1% fleshing out required in the article proving that line-free sets correlate with 12-sets, including all the passing back and forth between uniform and equal-slices measures. I think the only bit remaining undone here is instantiating all the parameters.

  46. jozsef Says:


    Meta comment: Something happened with the wiki. It seems that it has been hacked. Be careful with the links there.

    • jozsef Says:

      Just seems to be spammers changing the page. I reversed it.

    • jozsef Says:

      It wasn’t enough, the page is wrong again. I’m not sure what to do. I changed it back again, but I don’t think it will stay like this for long.

    • Terence Tao Says:

      I blocked the offending IP. If that fails, the next step would be to protect the main page by limiting edits to signed in users, I guess.

  47. Terence Tao Says:


    I’ve been busy, so I haven’t been able to stop by much recently, but things look pretty good at this point. I agree that the DHJ(3) Ajtai-Szemeredi sketch looks pretty solid. (An amusing side note: when I had Ajtai-Szemeredi described to me, I thought that they were already doing what we were doing now, i.e. getting correlation with an unstructured Cartesian product and then partitioning that product into grids. So I was a little confused when Tim was insisting that what we were doing was not quite Ajtai-Szemeredi… but now I see the subtle difference between the two approaches.)

    It looks like Tim Austin has come up with an alternate proof that is also based on correlation with 12-sets, etc. but is based on triangle-removal type strategies rather than density-increment strategies. It also requires a preliminary use of Graham-Rothschild to regularise a large number of statistics so as to make them stable under freezing of coordinates, and so is likely to give poorer bounds. But it is closer in spirit to the original intent of Polymath1. I believe Tim will come on here himself to report on this soon (he’s working in an ergodic theory setting), and I will focus on trying to finitise it. (It’s likely to be cleaner than the finitisation of Furstenberg-Katznelson, because one does not have to deal with relative almost periodicity.)

    • gowers Says:

      893.1 Re your amusing aside, I have had a very similar experience, which can sort of be deduced from my initial blog comment on the Ajtai-Szemerédi proof. (It can be found at the end of the article on the wiki.) At that stage I only half remembered their proof, and likewise assumed that they must have started with the dense diagonal, got a global Cartesian product disjoint from A, and deduced a density increment on a grid.

  48. Tim Austin Says:

    894. I seem to have come to this at a handy moment, following Terry’s post

    Having been following progress here (albeit only in fits and starts) and talking with Terry about the ideas that have come out, it struck me late last week that in the infinitary world of stochastic processes that Furstenberg and Katznelson move to, the approach raised here for obtaining obstructions to uniformity that are built from ij-sets can actually be coupled to a lot of machinery that’s already known from other things to give a new infinitary proof of their multiple recurrence result, without anything else being required. In particular, it uses an infinitary analog of `energy increment’ to improve the structure of a stochastic process, and then an appeal to an `infinitary hypergraph removal lemma’ originally motived by some work of Terry on infinite random hypergraphs, both which I recently used to play a similar game around the multidimensional Szemeredi Theorem (arXiv 0808.2267, in case it’s of interest).

    In fact, it turned out that this could be written up completely in just a couple of days by judiciously cutting, pasting and re-notating writeups of other things, so this is now done and on the arXiv: once it becomes publicly visible it’ll be at 0903.1633. I feel I should possibly offer my assurances that I wouldn’t have rushed from a moment of realization to completing a preprint if it really hadn’t been so very quick and mechanical from that point on, without requiring any input of new ideas from me.

    For what it’s worth, I’ve thought only briefly about finitizing this approach and Terry has already said most of what I could say. As it stands it will require a preliminary heavy appeal to Graham-Rothschild (Carlson-Simpson, in the infinitary world) and then proceeding in close analogy with hypergraph removal strategies. So it is rather removed from the density-increment approach that I think is now mainly being pursued here, and would look set to give much worse bounds unless some other new idea can remove the reliance on Graham-Rothschild.

    • gowers Says:

      894.1 Tim, this is great news and very much in the spirit of polymath leading to multiple proofs and all-round improved understanding.

  49. Randall Says:

    895. Book proof of Szemeredi? Or “This is a theorem Harry Furstenberg stole from Szemeredi…we’re stealin’ it back.”

    Regarding what Tim (Gowers) said about an easy proof of Szemeredi materializing, as well as what Terry said about avoiding relative almost periodicity (which seems to be exactly what makes this proof easy as well), it seems natural whether one of us should think about writing up carefully a “book proof of Szemeredi’s theorem”. For starters, it seems to me that this might entail proving Jozsef’s comment no. 2 from a multi-dim Sperner type of hypothesis, then pushing an induction to k-dimensional corners ala Tim’s 886. I am hoping it could be easier in the details than DHJk; that should be clear by the end of the first step, though.

    • Terence Tao Says:

      It may in fact be that, paradoxically, the book proof of Szemeredi may ultimately pass through DHJ. For instance, observe that the original Ajtai-Szemeredi proof of corners had to pass through Szemeredi, whereas by lifting up from [n] to cubes, we can substitute (multi-dimensional) DHJ(2) in place of Szemeredi.

      The hypergraph regularity/removal proof of (multidimensional) Szemeredi is not too bad, actually, despite its reputation. The ergodic version of it, which Tim Austin wrote up in the arXiv a few months back, is perhaps a touch simpler than the Furstenberg-Katznelson proof based on repeated extensions by relatively almost periodic functions, being instead based on extending the entire system up to a more “pleasant” system enjoying a number of useful relative independence properties.

    • gowers Says:

      I’ve been thinking about this too, and it may not be as paradoxical as all that — just a bit unexpected at first. For instance, it is generally accepted that the van der Waerden theorem is “really” the Hales-Jewett theorem (at least if you prove it combinatorially), and that starting with a subset of [n] distracts from what is actually going on. And something like that seems to have been the case here too: for a while the fact that Ajtai and Szemerédi used Szemerédi’s theorem was a distraction, in that it made it seem as though their approach reached a dead end at the corners theorem when in fact the structure they should have been using was a cube. So Jozsef’s comment 2 was spot on.

      Randall, there may be room for disagreement over your slogan, but I can’t help liking it …

  50. gowers Says:

    Metacomment: I’m doing my best not to write newly numbered comments here, since this thread is about to run out (at 899) and then we’ll reach an important milestone — comment number 1000. I feel that we deserve a decent-length post before that one. (My main activity at the moment is over at the wiki. I’m currently working on generalizing the local correlation with 12-sets from {}[3]^n to {}[k]^n.)

  51. Terence Tao Says:

    896. Austin’s proof

    I will probably focus the (limited) time I have available for this thread on trying to explicate Austin’s proof in finitary language. I know you guys don’t actually have access to it yet, but let me try to informally describe some of the details. Take all this cum grano solis; I have not yet fully digested the argument and some of the details may be slightly or perhaps even massively incorrect.

    Let me take [3] = {0,1,2} (rather than {1,2,3}) for sake of arbitrarily fixing the conventions (we’ve been a bit inconsistent on this). Let’s define a trilinear form \Lambda( f, g, h) on functions f, g, h: [3]^n \to {\Bbb R} by the formula

    \Lambda(f,g,h) = {\Bbb E}_{\ell} f(\ell(0)) g(\ell(1)) h(\ell(2))

    where \ell: [3] \to [3]^n varies along combinatorial lines with respect to some measure that I will intentionally leave vague. DHJ(3) is equivalent to the following “triangle-removalish” type statement:

    DHJ(3)’: Let f, g, h: [3]^n \to [0,1] be such that \Lambda(f, g, h) = o(1). Then {\Bbb E}_{x \in [3]^n} f(x) g(x) h(x) = o(1).

    Roughly speaking, Austin’s strategy is to “regularise” the situation so that 12-sets, 01-sets are “relatively independent” over a common algebra of 1-sets, and similarly for the 12-sets and 02-sets, etc., with the 0-sets, 1-sets, 2-sets themselves being relatively independent over \emptyset-sets (which, for us, I think means “unions of large subspaces”, and which can be ignored by passing to a large subspace). I don’t understand this part well yet, but it is analogous to the Szemeredi regularity lemma. There is also a preliminary reduction to “strong stationarity” which means, roughly, that the statistics of various 01-sets, etc. (e.g. the density of an overlap between a relevant 01-set and a relevant 02-set) doesn’t change of we freeze a bounded number of coordinates. This reduction is obtained via Graham-Rothschild and is going to be hideously expensive as regards quantitative bounds, but never mind that for now.

    1. Once one has this regularisation, this makes DHJ(3)’ is easy when f is an (indicator of a) (basic) 12-set, g is a basic 02-set, and h is a basic 01-set, much as it is easy to find corners connecting three sets A, B, C in [n]^2 when those sets are Cartesian products in the right fashion. Indeed things seem to collapse to DHJ(2.5) in this case. (This is analogous to the triangle removal lemma for three bipartite graphs when each of the three graphs is a complete bipartite graphs between one cell in each vertex set.)

    2. Next, this implies DHJ(3)’ when f,g,h are non-basic 12-sets, 02-sets, and 01-sets, i.e. finite unions of basic sets (or more precisely finite linear combinations of indicators of basic sets), but where the lower bound depends on the complexity of the partition into basic sets. (This is analogous to the triangle removal lemma for unions of complete bipartite graphs between cells.)

    3. Next, this implies DHJ(3)’ when f (and similarly g, h) are “Borel 12-sets” (to continue the topological analogy much as Borel sets can be approximated by open sets), which means that given any \varepsilon > 0, one can approximate f to within \varepsilon by a non-basic 12-set of bounded complexity. This is because pigeonhole ensures that there are a lot of non-basic 12-sets which are 99% occupied by f, and one will be able to get this case from applying Case 2 to these “rich” sets, and using some relative independence properties. (This is analogous to triangle removal for unions of 99%-complete bipartite graphs between cells.)

    4. Next, we obtain DHJ(3)’ when f (and similarly g,h) is the sum of a Borel 12-set and something which is highly orthogonal to all basic 12-sets, including the very small basic 12-sets coming from the cells of an approximation to the Borel 12-set. This is basically because the guy which is highly orthogonal to all basic 12-sets is so uniform as to have essentially no contribution to \Lambda(f,g,h). (This is analogous to triangle removal for a triplet of bipartite graphs which have been regularised.)

    5. Finally, we have a regularity lemma that tells us that arbitrary f,g,h decompose in this fashion (possibly after localising to a large subspace). This is a “soft” energy increment argument, analogous to that in the regularity lemma. One has to keep freezing coordinates while performing this increment argument, so it is important that one has the strong stationarity property first before one sets up the regularity argument.

    Maybe I’ll try to write a more coherent version of the above on the wiki at some point.

  52. gowers Says:

    897. Progress report

    I’ve reached the first point where it isn’t almost trivial to generalize the argument we have for DHJ(3) to an argument for DHJ(k). I think it’s going to be a sort of middling difficulty — that is, you can’t do it in five minutes but you know you’ll get there in at most a few hours. (Actually, I think it may be easier than that, but I haven’t yet tried.) Over on the wiki, I have written up part of an argument that generalizes to {}[k]^n the fact that a dense line-free set (in equal-slices measure) correlates with a dense 12-set.

    The problem in the {}[k]^n case is to prove that the lower-complexity set you get is dense. The proof for {}[3]^n uses a quantitative Sperner theorem (namely that you get a positive proportion of all possible combinatorial lines if you start with a dense set — all with respect to equal-slices measure). The proof for {}[k]^n needs a similar statement about combinatorial lines in {}[k-1]^n. It should be obtainable from DHJ(k-1) by a suitable averaging argument, but it’s not one I can do in my head.

    • gowers Says:

      897.1 I’ve got a vague outline of an argument, but have to go to bed. Here’s how it goes. We’re given a dense (in equal-slices measure) subset \mathcal{B}\subset[k-1]^n. I claim that it contains a dense set of combinatorial lines (where that means that if you randomly permute {}[n] and then randomly divide it up into k subintervals, filling the first k-1 with 1 to k-1 and treating the final one as a wildcard set, and then with positive probability that line lies in your set). I think that follows by an averaging argument where you choose a large subspace randomly by first picking the sizes of the wildcard sets and then choosing the sets themselves, and then you apply DHJ(k-1) and average up. Finally, I think that if you pick an equal-slices random point x in {}[k]^n and consider all the points in {}[k-1]^n you get by replacing the js of x by 1s, 2s, …, (k-1)s, then the distribution on the resulting combinatorial lines is not radically different from the distribution described above. In particular, I think the probability that all those points lie in the dense subset of {}[k-1]^n is positive.

    • gowers Says:

      897.2 Just realized that “not radically different from” might be better changed to “identical to”.

  53. gowers Says:

    Metacomment: I’m going to think offline for a bit about the problem in 897. Before I clock off for the night I’ll say whether I think I’ve sorted it out, in case anyone else wants to think about it. (It feels as though it might be Ryan territory.) And I’ve now put up a new post in case the comment count reaches 899. [Am clocking off now — see 897.1 above.]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: