## Problem solved (probably)

Without anyone being particularly aware of it, a race has been taking place. Which would happen first: the comment count reaching 1000, or the discovery of a new proof of the density Hales-Jewett theorem for lines of length 3? Very satisfyingly, it appears that DHJ(3) has won. If this were a conventional way of producing mathematics, then it would be premature to make such an announcement — one would wait until the proof was completely written up with every single i dotted and every t crossed — but this is blog maths and we’re free to make up conventions as we go along. So I hereby state that I am basically sure that the problem is solved (though not in the way originally envisaged).

Why do I feel so confident that what we have now is right, especially given that another attempt that seemed quite convincing ended up collapsing? Partly because it’s got what you want from a correct proof: not just some calculations that magically manage not to go wrong, but higher-level explanations backed up by fairly easy calculations, a new understanding of other situations where closely analogous arguments definitely work, and so on. And it seems that all the participants share the feeling that the argument is “robust” in the right way. And another pretty persuasive piece of evidence is that Tim Austin has used some of the ideas to produce a new and simpler proof of the recurrence result of Furstenberg and Katznelson from which they deduced DHJ. His preprint is available on the arXiv.

Better still, it looks very much as though the argument here will generalize straightforwardly to give the full density Hales-Jewett theorem. We are actively working on this and I expect it to be done within a week or so. (Work in progress can be found on the polymath1 wiki.) Better even than that, it seems that the resulting proof will be the simplest known proof of Szemerédi’s theorem. (There is one other proof, via hypergraphs, that could be another candidate for that, but it’s slightly less elementary.)

I have lots of thoughts about the project as a whole, but I want to save those for a different and less mathematical post. This one is intended to be the continuation of the discussion of DHJ(3), and now DHJ(k), into the 1000s. More precisely, it is for comments 1000-1049.

### 133 Responses to “Problem solved (probably)”

1. Jason Dyer Says:

Metacomment.

Appropriate timing of note:
Cantwell.949: A Moser set for n=5 has 124 points

2. Terence Tao Says:

1000. DHJ(k)

A small comment: from experience with hypergraph regularity, I would imagine that DHJ(k) will not follow exactly from DHJ(k-1), but from something more like DHJ(k-1,k), and so the induction hypothesis may need to have an additional parameter than just k. (For instance, the simplex removal lemma for 3-uniform hypergraphs requires, at base, something resembling the $K_4$-removal lemma for graphs rather than the triangle removal lemma for graphs. While the $K_4$-removal and triangle removal lemmas have nearly identical proofs, I don’t think one can easily deduce the former from the latter if the latter is a black box.)

• gowers Says:

1000.1 An $\epsilon^2$ comment: you mean of course DHJ(k-2,k).

I think the jury is still out on whether we need to prove the full DHJ(k-2,k) before doing DHJ(k) or whether passing to subspaces the whole time makes it unnecessary. I’m hoping for the latter, but am ready to switch if we have to.

An $\epsilon^3$ comment: you are now our official powers-of-ten champion

3. Anonymous Says:

1001. Line and equal-slice distributions in $[k]^n$.

It seems to me that the right generalization of the “another useful equivalent definition” of equal-slices on $[k]^n$ is as follows:

Pick $z \in [k]^n$ from the equal-slices distribution. Form $x^i \in [k-1]^n$ by changing all the $k$‘s in $z$ into $i$‘s, for $i = 1...k-1$. Finally, let $y^1, ... y^{k-1}$ be a random permutation of the $x^i$‘s.

Certainly the set $\{y^1, ..., y^{k-1}, z\}$ is a combinatorial line, which is degenerate only if $z$ had no $k$‘s.

I believe that each $y^i$ is individually distributed as equal-slices on $[k-1]^n$, but I can’t quite seem to prove it to myself at this late hour. Anyone want to prove or disprove?

(If it’s true, there must be some slick way to see it, like “break up a necklace of $n$ beads into $k$ pieces, then pretend you did it in one the $k!$ different orders, then…” etc.)

• Ryan O'Donnell Says:

• gowers Says:

1001.2 I’m pretty sure it’s false. In fact, isn’t it obviously false, since the expected number of $i$s in $y^i$ is twice the expected number of any j for $j\ne i$?

However, I think what can be proved is that the distribution on each individual $y^i$ is not importantly different from the equal-slices distribution, in that the collection of dense sets is the same in both cases. This is analogous to what happens in the corners problem, where the number of corners containing a point is different from point to point (even if we start off with a triangular grid) but this doesn’t matter because the discrepancy between the uniform measure of a set and the point-in-random-corners measure of a set depends only on the measures and not on $n$.

• gowers Says:

1001.3 However, I think I may be misunderstanding what you have written, since I don’t quite see why you bother to permute the $x^i$s (given that they were already distributed in a way that is permutation-invariant — or at least that’s how it appears to me from what you say).

• Ryan O'Donnell Says:

1001.4. Oops, yes, I think what I meant to say was “Pick a permutation $\pi$ on $[k-1]$ and then form $y^i = \pi(x^i)$, where by $\pi(x^i)$ we mean permute the $j$-sets of $x^i$ according to the permutation $\pi$.”

I will write a new comment on this topic later today.

4. Randall Says:

1002. 895 again.

Okay, maybe the book proof of Szemeredi goes through DHJ, but I hope not…I was trying before to get someone to bite on trying something potentially simpler than DHJ (3) that would still prove the corners theorem via the Ajtai-Szem method. (And in particular didn’t use Szemeredi’s theorem as a lemma.) Of course, being the token ergodic theorist, you can probably guess that I would like someone to bite so I don’t have to try to check myself, because I am woefully slow and error prone when it comes to such things. So let me try again…I’ll just think out loud and if none of this makes any sense, everyone can feel free to ignore.

What if I take as my basic object pairs $(\alpha, \beta)$ of $[n]$. A diagonal is a set $D_\gamma= \{ (\alpha,\beta) : \alpha+\beta =\gamma\}$, where $+$ is multi-set union and $\gamma$ is some fixed multi-set (of multiplicity no greater than 2 in each element). A corner is a triple of the form $\{(\alpha,\beta), (\alpha+\eta,\beta), (\alpha,\beta+\eta)\}$, where $\eta$ can be a set plus an anti-set (an antiset is like a set where every element has a multiplicity of -1).

Okay, let $A$ be a set of pairs of density at least $\delta$, i.e. $|A|\geq \delta 4^n$. Now, for some rather small, or at least not-so-big $m$, most of the whole space is covered by diagonals whose “multiplicity 2-or-zero” (for the defining multiset $\gamma$) set of indices has size no bigger than $m$.

Diagonals of this sort have at least $2^{n-m}$ members. Take one of these diagonals $D_\gamma$ on which $A$ is dense, say of relative density at least $\delta /2$. If $A$ is corner-free, then any pair of the form $(\alpha, \beta)$, where $(\gamma -\beta, \beta)\in A$ and $(\alpha, \gamma-\alpha) \in A$ is forbidden. That should be something like $4^{n-m-1} \delta^2$ forbidden pairs, and these pairs lie in a Cartesian product. This should mean that $A$ has a density increment on some other Cartesian product, call it $X\times Y$.

The next step is presumably to use the multidimensional Sperner’s theorem to carve almost all of $B=X\times Y$ up into subgrids of some large but fixed size $r$. (Here a subgrid is a collection $(\zeta\cup FU \{\beta_1,\ldots ,\beta_r\}) \times(\alpha\cup FU \{\beta_1,\ldots ,\beta_r\})$.) I wasn’t really anticipating how hard this would be but maybe you can use the iteration method we use to carve 1-spaces up. Now that I am thinking about this finally, maybe one can do it all in one go. Namely, you can always find $\beta_1,\ldots ,\beta_r$ with combined size at most $M$ such that the probability that $(\zeta\cup FU \{\beta_1,\ldots ,\beta_r\}) \subset X$ and $(\alpha\cup FU \{\beta_1,\ldots ,\beta_r\}) \subset Y$ is above some constant $c$. So for those pairs $(\zeta, \alpha)$ where you get the containment you were shooting for, remove those grids from $B$ and keep them as part of your partition; partition the other pairs into $4^T$ classes according to the value taken on the subgrid associated with that pair. Each of these classes is the intersection of $B$ with a subgrid and we iterate…just like in the proof of DHJ. And, I guess that’s it. Assuming all of the above is correct, you get an increment on a subgrid.

It’s almost 3 here and too late to check so I will apologize in advance if something is amiss.

• gowers Says:

1002.1 Randall, is your problem the same as Jozsef’s in comment 2? In one of the very early comments I tried to translate that one into set-theoretic language and ended up with something pretty similar. This is mainly to say that I agree that it’s worth seeing whether Szemerédi can be done without going the whole way to DHJ. For now I’m going to concentrate my efforts on DHJ(k) but will follow what you have to say with interest. I suppose there are two issues: can DHJ be avoided, and is the resulting proof substantially simpler?

• Randall Says:

1002.2 Yes, it was intended to be the same (not sure whether Jozsef intended to allow inverted corners, but I am here). Oh, yes…I see your comment 7. That looks to be exactly the interpretation I am employing.

Well, perhaps Terry will wake up and say whether the alleged proof I gave is really a proof or can be made into a proof. (And whether it’s simpler than what we have already.) Since he composed and typed the breakthrough post (837?) in eighteen minutes, perhaps he’ll be able to do this despite being very busy with other things.

On a more general philosophical note, Tim, to we ergodic theorists (who have trouble with subscripts appearing on the numbers capital N appearing in proofs), the dynamical proof of van der Waerden (not that of Furstenberg and Weiss but the easy one, due to Blszczyk, Plewik and Turek) is much more pleasant than any proof I have seen of HJ. (Actually, I have long wanted to see a proof, even an unpleasant one, of HJ that made efficacious use of non-metrizable topological dynamics.)

• Randall Says:

1002.3 Obviously I meant “efficacious use of *metrizable* topological dynamics”. There exist proofs using non-metrizable topological dynamics (e.g. Furstenberg and Katznelson’s proof). It may not be possible, or it may require some non-commutative stuff like in the FK proof of DHJ. There is a Bergelson-Leibman proof that tried to use recurrence for continuous maps of compact metric spaces, but it was realized after the paper appeared that neither continuity of the transformations nor completeness of the space was actually ever used.

• Randall Says:

1002.4 No induction….

I think the idea I had to prove Szemeredi via Ajtai-Szem type argument using Jozsef’s formulation from comment 2 is dead in the water at k=3.5. The problem I don’t see a way around is that there don’t seem to be enough corners at k=3 to forbid enough points at k=4. Obviously not a problem for DHJ since the number of lines at k=3 is 4^k which is equal to the number of points at k=4.

Oh well…if the above is correct (nobody checked?) it may be the simplest proof of the corner’s theorem, at least.

• Randall Says:

Hmm…I finally had a look at the above myself, and it seems to be complete garbage for (at least) two reasons…. Perhaps now I am convinced that the DHJ proof is the right one for Szemeredi after all. (Maybe.)

5. gowers Says:

Metacomment: because of more spam problems on the wiki, you now have to be registered on it and logged in if you want to edit the main page.

6. Problem solved (probably) « OU Math Club Says:

[...] experiment in doing math research on his blog.  It sounds like it was a resounding success!  Timothy Gowers announced on his blog that he is pretty convinced that the crowd on his blog has come up with a new proof of the density [...]

7. Terence Tao Says:

Metacomment.

This might be a good time to set up a “meta” thread to deal with issues not directly related to finishing off the proof of DHJ(3)/DHJ(k) [1000-1049] or of computing DHJ-type numbers in small dimension [900-999]. For instance, this seems like a good time to face issues concerning writing up the results, how to perform proper attribution, etc. (At a more frivolous level, there was a suggestion on the wiki to come up with a logo for that wiki; I think that might be a fun topic for the meta thread. More generally, discussion of the strengths and weaknesses of the wiki, or the threaded format, etc. would be apropos at this time, I think.)

• gowers Says:

OK what I’ll do is get on and write a post that I’ve been intending to write, in which I discuss general meta-issues that have arisen out of how things have gone. I think there’s a lot to talk about, so comments on that should be a natural place for discussing these issues amongst others.

8. Terence Tao Says:

1003. Austin’s proof

I’ve managed to digest a key ingredient in Austin’s proof (Lemma 6.1 of http://arxiv.org/abs/0903.1633 ) which may be of interest here. Roughly speaking, it says that 1-sets and 2-sets (say) are always “locally independent” of each other. A more precise statement is the following. Let $f: [3]^n \to [-1,1]$ be 02-insensitive (e.g. it is the indicator function of a 1-set), and let $g: [3]^n \to [-1,1]$ be 01-insensitive (e.g. it is the indicator of a 2-set). Suppose also that ${\Bbb E}_{x \in [3]^n} f(x) g(x)$ is large. Then f has large average on large-dimensional spaces, or (equivalently) correlates with a 012-low influence function (a function which is almost invariant under any interchanges between 0, 1, and 2).

Proof. Let x be a random string $[3]^n$, let $A \subset [n]$ be a random medium-sized set of indices, and let $\pi_{A, 02 \to 0}(x) = \pi_{A, 2 \to 0}(x)$ be the string in $[3]^n$ formed by replacing all occurrences of 0 or 2 in A with 0. If A is not too large, then $\pi_{A, 02 \to 0}(x)$ is still roughly uniformly distributed in $[3]^n$, and so

${\Bbb E}_{x \in [3]^n} f( \pi_{A,02 \to 0}(x) ) g( \pi_{A, 02 \to 0}(x) )$ is large.

But as f is 02-insensitive, $f(\pi_{A,02 \to 0}(x)) = f(x)$. As g is 01-insensitive, $g(\pi_{A,02 \to 0}(x)) = g(\pi_{A,02 \to 1}(x)) = g(\pi_{A,012 \to 1}(x))$, where $\pi_{A,012 \to 1}(x)$ is the string formed by sending every digit of x in A to 1. Thus

${\Bbb E}_{x \in [3]^n} f( x ) g( \pi_{A, 012 \to 1}(x) )$ is large.

Freezing A, we thus conclude that f has large mean on many |A|-dimensional subspaces; alternatively, we see that f correlates with the 012-low influence function $G(x) := {\Bbb E}_A g( \pi_{A, 012 \to 1}(x) )$ (if we choose A in a suitably Poisson manner).

A corollary of this is that if one has a 1-set A and a 2-set B which are “strongly stationary” in the sense that their average local density on medium-sized subspaces is close to their global density, then A and B are roughly independent, thus the density of the 12-set $A \cap B$ is roughly the density of A multiplied by the density of B.

As one might imagine, this fact is extremely useful for doing any sort of “counting lemma” involving 12-sets.

To be continued…

• Terence Tao Says:

A small correction: where I said that f and g could be indicators of 1-sets and 2-sets respectively, it might be better to think of them as the balanced functions of a 1-set and 2-set, i.e. the indicator minus the density.

• Terence Tao Says:

1003.2

Incidentally, this framework also gives a short proof of “lack of lines implies correlation with a local 12-set”. With the notation as before, observe that $x, \pi_{A,01 \to 1}(x), \pi_{A,02 \to 2}(x)$ will form a combinatorial line with high probability if A is not too small. Thus if ${\mathcal A}$ is a set with no lines and density $\delta$ then

${\Bbb E} (1_{\mathcal A}-\delta)(x) 1_{\mathcal A}( \pi_{A,01 \to 1}(x)) 1_{\mathcal A}( \pi_{A,02 \to 2}(x))$ is large.

But if we freeze A and the coordinates outside of A (thus passing to an |A|-dimensional subspace), then $1_{\mathcal A}( \pi_{A,01 \to 1}(x))$, $1_{\mathcal A}( \pi_{A,02 \to 2}(x))$ are indicators of a 2-set and a 1-set respectively, and the claim follows.

• gowers Says:

1003.3 Terry, I’m trying to digest 1003.2. It seems to me that the displayed line is true only if there are many x such that $\pi_{A,01\rightarrow 1}(x)$ and $\pi_{A,02\rightarrow 2}(x)$ both belong to $\mathcal{A}.$ In the previous proof this was done with the help of Sperner and that seemed necessary. I’m not sure where the Sperner step is appearing in what you’ve written.

9. Ryan O'Donnell Says:

1004. Random lines/equal slices for DHJ(k+1).

I think what I wrote in 1001 was correct. Let me restate it here for clarity:

Let $z \in [k+1]^n$ be drawn from equal-slices. Form the string $v^j \in [k]^n$ by changing all $(k+1)$‘s in $z$ to $j$‘s, for $j = 1 \dots k$. Finally, pick a random permutation $\pi$ on $[k]$ and define strings $x^i = v^{\pi(i)}$, for $i = 1 \dots k$.

Then each string $x^i$ is distributed as equal-slices on $[k]^n$. Further, since $(v^1, \dots, v^k, z)$ is a combinatorial line (“in order”), we also have that $\{x^1, \dots, x^k, z\}$ is a combinatorial line (possibly “out of order”).

The proof is here; I’ll wikify it soon.

• Ryan O'Donnell Says:

1004.1. Wikified here. Hopefully this will help with the generalization of line-free sets correlating with 12-sets.

• gowers Says:

1004.2 Ryan, I’m still struggling with the statement of what you are proving. Suppose k=2. Then I pick $z\in[3]^n$ from equal slices and form the strings $u$ and $v$ by turning the 3s of $z$ into 1s and 2s, respectively. Then I randomly decide whether to interchange $u$ to $v$. Now let’s look at the event that the number of 1s in $u$ is between $0.49n$ and $0.51n.$ If I choose $u$ according to equal-slices that is $0.02$. But if I choose it by starting with z, then either I start with about that number of 1s and change the 3s to 2s, or I start with about that number of 2s and change the 3s to 1s. The probability seems to me to be smaller because $z$ is being asked to have rather a lot of 1s or rather a lot of 2s. Since you’ve got a wikified proof, what I’m asking here is for an explanation of where I’m going wrong.

• Ryan O'Donnell Says:

1004.3. Hmm, I’m not quite sure what I can say to help.

In your example, when you pick $u$ from equal-slices on $[2]^n$ “in the normal fashion”, then as you say the event $A$ = “number of $1$‘s in $u$ is in $[.49n, .51n]$” occurs with probability exactly $.02$.

When you pick it in the funny way, through, $z$, what is the probability? As you say, half the time $u$ is formed by changing $z$‘s $3$‘s to $2$‘s. In this case, $A$ occurs iff $z$ had between $.49n$ and $.51 n$ $1$‘s. The other half of the time, $u$ is formed by changing $z$‘s $3$‘s to $1$‘s. In this case, $A$ occurs iff $z$ had between $.49n$ and $.51 n$ $1$‘s-and-$3$‘s-together, iff $z$ had between $.49 n$ and $.51 n$ $2$‘s.

Clearly the probability $z$ has between $.49n$ and $.51n$ $1$‘s is the same as the probability $z$ has between $.49n$ and $.51n$ $2$‘s. So we are reduced to asking why this probability is indeed $.02$.

Well, I’m not sure what to say, except that it is. It comes from the fact that $\int_{.49}^{.51} 2(1-p) dp = .02$, where $2(1-p)$ is the density of the first component in a uniform draw from the $3$-simplex. Note that this integral is a bit of a “coincidence”, because it is not true that, e.g., $\int_{0}^{.49} 2(1-p) dp = .49$.

The reason this all works is a bit easier to understand in the case of $k = 2$; see this explanation in the older wiki article on equal-slices.

• gowers Says:

1004.4 Thanks — I think I’m starting to get it now.

10. Terence Tao Says:

1005. Quantitative bounds?

Is our understanding of the density-increment proof of DHJ(3) good enough that we can venture a probable quantitative bound for n in terms of $1/\delta$? Naively I am expecting tower-exponential behaviour (with the height of the tower being either polynomial or exponential in $1/\delta$).

• gowers Says:

1005.1 I’m pretty sure it’s a tower. The rough reason is that when we drop to a subspace we obtain that subspace by applying multidimensional Sperner, which I think restricts its dimension to the log of the previous dimension. But I think the density increase we can get that way is good, so I think the bound should be a tower of height polynomial in $1/\delta$ and therefore not worse than what you get out of the Ajtai-Szemerédi proof of the corners theorem.

Also, if we’ve got the stomach for it, we could think about trying to Shkredovize the argument and beat Shelah’s bound for the colouring DHJ(3). That would be a serious undertaking though, as it would require us to understand the global obstructions rather than just the local ones.

11. Ryan O'Donnell Says:

1006. Equal-slices.

I wonder: can we mostly work in the equal-slices measure for the whole proof (of DHJ(3), say)? What are the advantages of the uniform distribution?

• gowers Says:

1006.1 My take on this is that equal-slices measure is very good when you want to mix random points and random lines, but the uniform measure is better when you want to restrict to subspaces. Roughly, the reason for the latter is that if you start with equal-slices and restrict a few coordinates, then the distribution on the resulting subspace becomes much more like a uniform one, or a suitably weighted uniform one. For example, if you’re told that a point’s restriction to the first m coordinates has roughly equal numbers of 1s, 2s and 3s, then with very high probability it belongs to a slice with roughly equal numbers of 1s, 2s and 3s, so the distribution in that (n-m)-dimensional subspace is roughly uniform.

It’s possible that we could argue that we can restrict to subspaces and get an equal-slices density increase, but I think we would still have to go via uniform, so in the end I don’t see a compelling argument for doing that.

• Ryan O'Donnell Says:

1006.2. That’s a good point. I guess I was idly hoping that the fact that equal-slices is exactly equal to a (fairly natural) mixture of uniform distributions on subspaces might allow us a clever way to stay entirely in the equal-slices world.

12. gowers Says:

1007. DHJ(k)

I think I’ve now proved (though with details a bit hazy at a couple of points) the analogue for k of the fact that line-free sets in ${}[3]^n$ correlate locally with 12-sets. The proposed argument can probably be tidied up in a few places. I haven’t yet thought about whether any of the rest of DHJ(k) is going to present problems, but it feels as though we’re closing in on it pretty rapidly. That’s my lot for today though.

13. Terence Tao Says:

1008. Re: 1003.3

Yes, you’re right, one also needs Sperner. Without it, what one gets is that if ${\mathcal A}$ is uniformly distributed with respect to 12-sets, then

${\Bbb E} 1_{\mathcal A} 1_{\mathcal A}( \pi_{A,01 \to 1}(x)) 1_{\mathcal A}( \pi_{A,02 \to 2}(x)) \approx {\Bbb E} \delta 1_{\mathcal A}( \pi_{A,01 \to 1}(x)) 1_{\mathcal A}( \pi_{A,02 \to 2}(x))$

and then one has to use DHJ(2) to say that the RHS is large (which one can do, after letting A vary in a sufficiently Poisson-like manner).

I would imagine that the same thing works in higher k, perhaps this is already implicit in 1007.

14. Top Posts « WordPress.com Says:

[...] Problem solved (probably) Without anyone being particularly aware of it, a race has been taking place. Which would happen first: the comment count [...] [...]

15. Ryan O'Donnell Says:

1009. Equal-slices distribution.

I think the following is an easier explanation of what I wrote.

Claim: Pick $z \in [k]^n$ according to equal-slices. Then pick $J \in [k-1]$ uniformly and form $x$ by changing all the $k$‘s in $z$ to $J$‘s. Then $x$ is distributed according to equal-slices on $[k-1]^n$.

Note: it’s helpful to think of $k$‘s as $\ast$‘s.

Proof sketch via coupling: Draw a circle of $n$ dots. Place bar(1) uniformly at random in the $n+1$ slots. Then place bar(2) uniformly at random in the resulting $n+2$ slots. Etc., placing $k$ bars. Pick $J \in [k-1]$ uniformly. Look at the arc of dots following bar(k) clockwise, up until the next bar. Fill in these dots with $k$‘s. Look also at the arc of dots preceding bar(k) counterclockwise, up until the next bar. Fill in these dots with $J$‘s. Next, for each of the remaining $k-2$ arcs of dots, fill it all with the same digit — using up the remaining digits $[k-1] \setminus \{J\}$ in some arbitrary way. Finally, delete the bars, cut the circle to make it a segment, and then randomly permute the whole sequence of digits. We claim the resulting sequence is distributed as equal-slices on $[3]^n$, and therefore we can take it as $z$.

Now forming $x$ from $z$ is the same as doing the above procedure but using $J$‘s where we were using $k$‘s before. But really, happens under this new procedure? You place $k-1$ bars as if you were doing equal-slices on $[k-1]^n$; then you throw in a phantom extra bar but only use it insofar as deciding to call the arc it lands in the $J$-arc; then you define the other $k-2$ arcs according to the digits in $[k-1] \setminus \{J\}$, etc. Now granted, the arc chosen to be the $J$-arc is not uniformly distributed among arcs — it’s biased towards the longer arcs. But who cares, since $J$ is uniformly random on $[k-1]$?

16. Ryan O'Donnell Says:

1010. “Varnavides”-DHJ(k). (Sorry for the scare quotes, but it strikes me as funny somehow to name this concept after a person.)

Tim, I can’t quite follow what you wrote at the end of your proof sketch that line-free sets in $[k]^n$ correlate with lower-complexity sets. I was wondering if you could clarify. In particular, as you say it seems we need to show that DHJ(k) actually implies Varnavides-DHJ(k) under some appropriate measures.

As you suggest, it seems that equal-slices is what to hope for here (it works pretty well for $k = 2$ at least!). In other words, we may hope that if $A \subseteq [k]^n$ has equal-slices measure $\geq \delta$ then if you choose an “equal-slices-combinatorial-line” (i.e., draw from equal-slices on $([k] \cup \{\ast\})^n$) then there is some $c(\delta) > 0$ chance that all $k$ points are in $A$.

Looking at that statement now I worry slightly that it is a bit optimistic. Well, let’s hope it’s true, and let me ask, Tim, about what you wrote.

I guess I don’t quite understand what’s going on with the parameter $M$. On one hand, I assume it’s to be “small” (although still perhaps $\omega_{n \to \infty}(1)$), because otherwise bounding the number of lines in $[k]^M$ by $(k+1)^M$ looks worrisome.

On the other hand, this makes me worry a bit about bullet-point 3: if $\nu$ is supported on $M$-dimensional subspaces then somehow I don’t expect that a set of lines $\mathcal{L}$ being large under $\nu$ should mean it’s large under a more “global” measure on lines such as equal-slices.

Quite likely I’m mistaking your argument or missing something — would you be able to clarify? Thanks!

• gowers Says:

1010.1 Ryan, the argument is so sloppily written (little more than a declaration that my gut feeling is that it works) that it is good that you apply a little pressure of this kind. I think I have answers to your specific queries, but that’s not the same as a proof. But maybe it will tip you over to believing that it works.

First of all, my parameter M is not supposed to tend to infinity. As I said in the write-up, it’s meant to be the smallest integer such that we have DHJ(k) with density $\zeta$, where $\zeta$ depends only on the equal-slices density $\theta$ of the set inside which we’re trying to find lots of combinatorial lines.

So what about your second worry? The point here is that the M-dimensional subspaces I’m averaging over are absolutely not local. Instead, they’re supposed to be far more tailored to equal-slices. What I’m not doing is fixing almost all coordinates and taking M small wildcard sets, or something like that. Instead, I’m randomly choosing the sizes of the M wildcard sets (which I allow to add up to 1, so there are in fact no fixed coordinates at all) and then uniformly at random choosing a line from the resulting subspace. That’s why I’m hoping that the resulting measure on lines is sufficiently global in character.

17. jozsef Says:

1011. A Shelah-type argument

I’m quite optimistic that a Shelah type argument might work for the general DHJ(k) problem. It’s late evening and I had a hard day, so I’m not sure how far can I go with the argument tonight. Let me give some basic definitions first. In $k^{[n]}$ two points are neighbours if they have same coordinates everywhere but in one position where one is k-2 and the other is k-1. For a subset $S\subset k^{[n]}$ the space $k^{[n]}$ is fliptop if a point p is in S iff any neighbour of p is also in S. (I will wikify it if the argument turns out to be useful) Our goal is to find a large fliptop subspace which is dense. By induction there will be a k-1 combinatorial line which gives a full combinatorial line because of the fliptop property of the subspace. I will follow Shelah’s strategy with some extra conditions to make it (hopefully) work for the density version.
Suppose that $S\subset k^{[n]}$ is c-dense and we already know the subspace version of DHJ for k-1. Partition [n] into m consecutive intervals of lengths $l_1, l_2, \ldots, l_m.$

We will repeat two steps to get a subspace eventually. The first one should keep the density high and the second will provide the fliptop property.
Step one: For any point in $k^{[l_m]}$ there is a certain number of points in S with that tail. Select the points from $k^{[l_m]}$ which are tails of at least $(c-\epsilon_1)k^{n-l_m}$ elements of S. The set of such points is denoted by H1. Applying the induction hypothesis we know that $k^{[l_m]}$ will contain a d1-dimensional subspace where every element avoiding (running) coordinate k-1 are from H1 (so it has many extensions that are in S).
Step two: Consider the elements of the d1-dimensional subspace where every coordinate is k-1 or k-2. We say that two such points are equivalent if they are the tails of the same subset of S. There are no more than $k^{n-l_m}$ such equivalence classes. Therefore if $d1 > k^{n-l_m}$ then we have a Sperner pair, two points where the set of k-1 coordinates of a point contains the other point’s set of k-1 coordinates. It defines a combinatorial line in the subspace and we will work with this line in the following steps.
We will repeat steps one and two in $l_{m-1}$. I will continue, but I would like to read over first what I wrote.

• jozsef Says:

1011. contd.

Let us denote the k points of the combinatorial line selected previously by $P^1_0, P^1_1,\ldots ,P^1_{k-1}$ (indexed by the running coordinates.)
Step one: select set H2 from $k^{[l_{m-1}]}$.
Select a point if with any $P_i$ it is the tail of at least $(c-\epsilon_2)k^{n-l_m-l_{m-1}}$ elements of S where $0\leq i\leq k-2$. There are two cases; either there are many such points or there is a tail of at least $(c+\delta)k^{n-l_m-l_{m-1}}$ elements of S. (More careful calculations are needed, but let me sketch the argument first.) If there are enough points, the there is a d2-dimensional subspace in $k^{[l_{m-1}]}$ where every point without coordinate k-1 is from H2.
Step two: Consider the elements of the d2-dimensional subspace where every coordinate is k-1 or k-2. We say that two such points, a and b. are equivalent if for any element $t\in k^{[n-l_m-l_{m-1}]}$, $t,a,P_i$ is in S iff $t,b,P_i$ is in S. There are no more than $k^{n-l_m-l_{m_1}}+1$ such equivalence classes. If $d2 > k^{n-l_m-l_{m_1}}+1$ then there is a Sperner pair, two points where the set of k-1 coordinates contains the other point’s set of k-1 coordinates. It defines the second combinatorial line in the subspace and we will work with the two lines in the following steps. Let us denote the k points of the new combinatorial line by $P^2_0, P^2_1,\ldots ,P^2_{k-1}$

• jozsef Says:

There are a few typos in the argument. I should fix them eventually, but let me mention her one which might be quite misleading; “There are no more than $k^{n-l_m}$ such equivalence classes.” actually, there are no more than $2^{k^{n-l_m}}$ such equivalence classes. Similarly, in the second iteration There are no more than $2^{k^{n-l_m-l_{m-1}+1}}$ such equivalence classes.

18. jozsef Says:

I’ll continue tomorrow morning, but I hope that what I wrote makes sense and maybe someone will tell me if this plan is feasible or not.

• gowers Says:

1012.1 I’ve just read through it and nothing jumps out at me as wrong, or obviously not going to work. I suppose my one worry is a metaworry, which is that many people must have tried to produce a density version of Shelah’s argument, and there doesn’t seem to be any sign of an obstacle in the above sketch. On the other hand, a balancing metareassurance is that you have a record of finding dazzlingly simple arguments.

I haven’t yet thought about the part where you write “more careful calculations are needed”.

Perhaps another reason to be hopeful is that one way of explaining what Shelah did is to say that he changed an inductive argument from deducing HJ(k) from HJ(2) with the repeated help of HJ(k-1) to deducing HJ(k) from HJ(k-1) with the repeated help of HJ(2), and there is no obvious reason to think that that cannot be done for density as well.

Anyhow, if it works … amazing!

19. gowers Says:

1013. Shelah.

I took the car to be serviced today, which resulted in a long walk to work, during which I pondered Jozsef’s argument. I can’t see a problem, but neither have I thought it through fully. So I want to try to assess its feasibility by seeing if I can sketch it. (Of course, Jozsef has already sketched it, but there’s no harm in having more than one sketch, and if I manage to sketch it without looking any further at what he wrote, then it might make us feel better in the way that people feel better about computer-assisted proofs if more than one person does the computer part and different programs are used.)

What is remarkable about Jozsef’s proposal if it works is that it is so similar to what Shelah did: it would make it very mysterious that the proof had not been discovered earlier (which would add another layer of mystery to the already remarkable fact that Shelah’s proof was itself not discovered earlier).

The idea is this. Suppose that we are trying to prove DHJ(k) with density $\delta$. We begin by choosing some parameters. First, we choose $m$ so large that DHJ(k-1) is true with density $\delta/2$. That is, we assume as an inductive hypothesis that every subset of ${}[k-1]^m$ of density at least $\delta/2$ contains a combinatorial line.

Next, we choose a very rapidly increasing sequence of integers (though in the context of some proofs of this kind the rate of increase is not too bad — this proof should result in a primitive recursive bound for DHJ(k), one level beyond a tower, just like Shelah’s) $r_1<. (Jozsef called them $l_1<<\dots< but on my screen an $l$ is just a vertical line so I’ve changed the notation.)

We shall repeatedly apply the following averaging argument: if $a_1,\dots,a_M$ are integers between 0 and 1, and the average of the $a_i$ is $\delta$, then at least $\epsilon N$ of the $a_i$ are at least ${}\delta-\epsilon.$ The proof is trivial (by contradiction).

Now let $n=r_1+\dots+r_m$ and think of ${}[k]^n$ as ${}[k]^{r_1}\times\dots\times[k]^{r_m}.$ Let $\mathcal{A}$ be a subset of ${}[k]^n$ of density $\delta.$ For each $y\in[k]^{r_m},$ let $\mathcal{A}_y=\{x\in[k]^{n-r_m}:(x,y)\in\mathcal{A}\}.$ Then the average density of $\mathcal{A}_y$ over $y\in[k]^{r_m}$ is $\delta$, so the density of $y$ such that $\mathcal{A}_y$ has density at least $3\delta/4$ is at least $\delta/4$.

Let $\mathcal{B}_m$ be the set of all $y\in[k]^{r_m}$ such that $\mathcal{A}_y$ has density at least $3\delta/4$. By the pigeonhole principle, we can find a subset $\mathcal{C}_m$ of $\mathcal{B}_m$ of density at least $\gamma=\delta/4k^{n-m_r}$ such that $\mathcal{A}_y$ is the same for every $y\in\mathcal{C}_m$.

Let us define a ”binary subspace” to be a set where you fix some of the coordinates and allow the remaining ones to take the values k-1 and k. If t is small enough, and if you choose a random binary subspace of ${}[k]^{r_m}$ by randomly choosing t coordinates to be the variable ones and randomly fixing the others, and if you then choose a random point in one of these random binary subspaces, the result will be very close to the uniform distribution on ${}[k]^{r_m}$. Therefore, by an easy averaging argument we can find a binary subspace inside which the density of $\mathcal{C}_m$ is at least $\gamma/2.$ (Note that for this to work we need $r_m$ to be large enough in terms of $\gamma$ and $t$.) And then by Sperner’s theorem we can find two points in the binary subspace that form a combinatorial line (in the ${}[2]^n$ sense but with $k-1$ and $k$ replacing 1 and 2, or 0 and 1 if you prefer).

We can extend this binary combinatorial line to a full combinatorial line in ${}[k]^{m_r}$, by allowing the wildcards to take the values $1,2,\dots,k-2$ as well. The result is a …

OK, this is the first point at which I’ve got stuck, though I think Jozsef may have dealt with the difficulty I am facing. Time to end this comment and start a new one.

20. gowers Says:

1014. Shelah

The problem with what I was doing just above is this. I’ve just defined a line $L=(y_1,\dots,y_k)$ in ${}[k]^{m_r}$, and I know that the sets $\mathcal{A}_{y_{k-1}}$ and $\mathcal{A}_{y_k}$ are the same. However, I don’t know anything about $\mathcal{A}_{y_i}$ for $i\leq k-2,$ and it would be a serious problem if e.g. they turned out to be empty.

However, I’ve ignored Jozsef’s step 1, which is perhaps his main new idea. He begins by using DHJ(k-1) to pass to a subspace where $\mathcal{A}_y$ has density close to $\delta$ for all sequences $y,$ provided only that they do not have any (variable) coordinate equal to $k$. (This can be done by an averaging argument similar to the one I used above to get dense binary subspaces — this time one wants dense (k-1)-ary subspaces.)

But now I’ve reached the point where I don’t understand Jozsef’s argument. How do I know that it is not the case that $\mathcal{A}_y$ is empty whenever one of the variable coordinates of $y$ is equal to k? I don’t see it at the moment, so I think I’d better wait for Jozsef to wake up.

• jozsef Says:

1014.1 Dear Tim, I don’t think that we need that $\mathcal{A}_y$ is dense if one of the variable coordinates is equal to k. This is actually a crucial observation, that’s why we need the fliptop property. At the last step, when only $r_1$ remains, one word of $r_1$ spans a dense subset of the $\{1,2,..,k-1\}^{m-1}$ part of the subspace. By induction there is a line, and by the fliptop property there is an extension to k. Note that the fliptop property was defined by considering running coordinates k as well, however it wasn’t sensitive for density in any ways.

• gowers Says:

1014.2 I’m still mystified. In Step 1, let’s choose a subspace such that for every sequence $y$ that avoids running coordinate $k$, there is a set of density almost $\delta$ of $x$ such that $(x,y)\in\mathcal{A}$. Now we go to Step 2. Nothing we’ve done so far stops $\mathcal{A}_y$ being empty for every $y$ that does involve $k$. So we choose our Sperner pair and have two points $y$ and $y'$ (where $y'$ is obtained from $y$ by changing some of the $(k-1)$s to $k$s). How can that help? Indeed, it seems that $\mathcal{A}_y$ could be empty for every point in the combinatorial line you select.

• jozsef Says:

Tim – I have to go to the University now, where I will rethink your question again, but I think that the answer is that it is not possible that you switch a running coordinate to k from k-1 and the density drops because the two points are neighbours. I will get back to you soon.

• gowers Says:

Jozsef, that gives me time to be more precise about my question. See comment 1016 below.

21. Polymath1: Probable success « Combinatorics and more Says:

[...] northern Israel. Coming back I saw two new posts on Tim Gowers’s blog entitled “Problem solved probably” and “polymath1 and open collaborative mathematics.” It appears [...]

22. Gil Kalai Says:

1015. A bit off-topic

“If this were a conventional way of producing mathematics, then it would be premature to make such an announcement — one would wait until the proof was completely written up with every single i dotted and every t crossed — ”

hmmm, maybe we should regard it premature also in this new mode. What do the rules say?

“but this is blog maths and we’re free to make up conventions as we go along.

So I hereby state that I am basically sure that the problem is solved”

Congratulations! I must admit that even if a full k=3 proof will take 200 more comments and several more weeks the success of the project exceeded my expectations.

“(though not in the way originally envisaged).”

Actually, this is a lovely aspect and a sign of success of the open collaboration idea, isn’t it?

• gowers Says:

1015.1 Gil, I feel slightly awkward about it, and won’t feel entirely happy until it’s written up in the conventional way. But if one regards a write-up as having a tree structure, then it’s all there on the wiki down to the small sticks, and only a few leaves are missing. Somehow there don’t seem to be any points where one says “I don’t quite see it but I’m sure it works.” It’s more like “I do see it but it’s slightly tedious to write out in full.” Despite that, I wouldn’t have written what I wrote if it had not been for the fact that other participants seem to share my confidence. But for now I will keep the “(probably)”.

23. gowers Says:

1016 Shelah

Jozsef, I’m still trying to follow your argument in 1011. Let me try to be completely precise about where it seems to me to go wrong. Of course, it could be me who is going wrong.

Consider the following example. I take a random subset S of ${}[k]^{r_m}$ and then take the set $\mathcal{A}$ of all sequences in ${}[k]^{r_1+\dots+r_m}$ such that the final part lies in S. This set has density 1/2. Now I apply your step 1, and it happens that the $d_1$-dimensional subspace I choose has the following property: every point with all running coordinates less than k belong to S, and all other points belong to the complement of S. (With high probability such a subspace exists, and no steps have been taken to avoid it.)

Then the pigeohole argument gives you that all the points with running coordinates less than k can be joined to anything you like in ${}[k]^{r_1+\dots+r_{m-1}}.$ However, if you take any line given by the application of Sperner, then both its top two points must involve a k and give the empty set. Therefore, the whole line gives the empty set (meaning that if your mth part belongs to the line then you don’t belong to $\mathcal{A}$). So at the second stage of the iteration I don’t see how you can do Step 1 (because of the “with any ${}P_i$” requirement).

• jozsef Says:

Tim, what you wrote is perfectly right with the exeption that in step one I never consider running coordinates larger than k-1. (with my old notation “larger than k-2″) So, in your example fliptop means that the neighbours given by the last (m-th) k, k-1 pair are simultaneously not in S. I’m still not sure that the argument works, but I don’t see any serious problem yet.

• jozsef Says:

I have office hours now, so I will beging to write up the argument and if I won’t have too many students to show up then I will post the sketch in an hour or so.

• gowers Says:

Just to try to be more explicit still, let’s imagine that $k=3$ and $r_m=6$. Suppose that all sequences of six 0s and 1s are the tail of every preceding sequence, and that the two points $001122$ and $001222$ are the tails of precisely the same set of sequences, so you choose as your line the line $001*22$. It seems to me that there is no reason for any point in that line to be the tail of any previous sequence.

A separate question: is the aim to construct in each ${}[k]^{r_i}$ a line $L_i$ in such a way that the product of the lines is fliptop (that is, insensitive to changing 1s to 2s when k=3), and $\mathcal{A}$ is dense in that product? My difficulty is that I don’t see the relevance of a statement about $\{0,1\}^{r_m}$ when the line $L_m$ might have 2 as one of its fixed coordinates.

• jozsef Says:

I see now. One should be more careful with the selection of the fliptop lines. I still think that it is duable, but it seems that we have to consider more subspaces. I don’t have much time, but I will think about it.

• gowers Says:

I should stress here that I definitely think this is worth attempting. Even if it turns out not to work 100% straightforwardly, the following question exists and is clearly an interesting one. You have a set $\mathcal{A}\subset[k]^n$ of density $\delta$. How large a subspace $S$ can you find such that $\mathcal{A}$ has density at least $\delta/2$ in the subspace, and membership of $\mathcal{A}$ in the subspace does not change if you flip a $k$ to a $k-1$? I do not see any philosophical reason for that being significantly harder than Sperner plus induction, though such a reason might emerge I suppose.

24. Ryan O'Donnell Says:

1017. Varnavides-DHJ(k).

In 1010.1, Tim wrote “What I’m not doing is fixing almost all coordinates and taking M small wildcard sets, or something like that. Instead, I’m randomly choosing the sizes of the M wildcard sets (which I allow to add up to 1, so there are in fact no fixed coordinates at all) and then uniformly at random choosing a line from the resulting subspace.”

Tim, I think I get it now; as usual I was misinterpreting the word “subspace” and was thinking you were doing what you said you weren’t doing :) I agree now that the sketch looks relatively solid. Let me see if I can fill in some “leaves”, as you say.

25. Ryan O'Donnell Says:

1018. Is Equal-Slices of Equal-Slices equal to Equal-Slices?

Suppose we want to try to prove Varnavides-DHJ(3); as in #1010, let’s try to show that if $A \subseteq [3]^n$ has equal-slices density $\delta > 0$, and we choose a random combinatorial line by drawing its template from equal-slices on $([3] \cup \{\ast\})^n$, then with probability at least $\eps(\delta) > 0$ the whole line is in $A$ (assuming $n$ is sufficiently large).

Tim’s strategy for this involves, among other things, picking a combinatorial subspace of dimension $M$ with no free coordinates, according to some distribution $\nu$; then drawing a point from that subspace (thought of as $[k]^M$) from some other distribution, $\sigma$.

It would be particularly helpful if the overall distribution is equal-slices on $[3]^n$. Also helpful would be if $lambda \nu$ is a relatively “natural” distribution; also $\sigma$ should have full support. (It’s also not essential that $\nu$ generates subspaces with exactly $M$ dimensions, but never mind that for now.)

Here is perhaps the simplest possible proposal: Let $\nu$ be equal-slices on $[M]^n$, and let $\sigma$ be equal-slices on $[k]^M$.

The question is: might combining these two actually give equal-slices on $[3]^n$ precisely?! I will submit some evidence in a reply to this post.

• Ryan O'Donnell Says:

1018.1. Evidence.

Well, my first piece of “evidence” is that in these simple probability matters, sometimes things just turn out nicely :)

Here is slightly more rigorous evidence… (I would try to simply give a proof, but I haven’t the time to work on it this evening, so I thought I’d quickly throw it out there.)

For my poor brain, let me fix $M = 100$ and fix $k = 3$, $[k] = \{R, G, B\}$ (red, green, blue).

We can think of the “outer” equal slices as choosing $p_1 + \cdots + p_{100} = 1$ uniformly, then drawing from the resulting product distribution on $\{\ast_1, \dots, \ast_{100}\}^n$. We can think of the “inner” equal slices as choosing $q_R + q_G + q_B = 1$ uniformly, then changing each $\ast_i$ to $R$ with probability $q_R$, $G$ with probability $G$, and $B$ with probability $q_B$.

If you think about it a minute, you see that the final is string is drawn from a randomly chosen product distribution $s_R +s_G + s_B = 1$ on $\{R,G,B\}^n$. That’s good! If we could just show that that $(s_R, s_G, s_B)$ is indeed distributed as “uniform on the $3$-simplex” we’d be done.

So how is it chosen? Well, we think of $(q_R, q_G, q_R)$ as being chosen first. Then, think of $p_1, \dots, p_{100}$ as being independent Exponential$(1)$‘s. (Of course, this does not give $100$ numbers adding to $1$, but we use these numbers as proportions, rather than probabilities.)

Then $(s_R, s_G, s_B)$ gets generated by putting each $p_i$ into the “R” bucket with probability $q_R$, into the “G” bucket with probability $q_G$, and into the “B” bucket with probability $q_B$. Finally, you add up all the Exponentials in each bucket, and $(s_R, s_G, s_B)$ is proportional to these totals.

So what is the distribution on the three bucket totals? It shouldn’t be too hard to really work it out, but let me write heuristically here. Once $(q_R, q_G, q_B)$ is fixed, we expect about $100q_R$ of the independent Exponentials to go into the R bucket, and similarly for G and B. (Even moreso do we expect this if $100 = M$ is huge.) And thinking of $100q_R$ as still huge, we’re adding up a huge number of i.i.d. Exponentials, so the CLT will imply that the sum will almost surely be very close to $100q_R$. Similarly, the sum in the G bucket will almost surely be very close to $100q_G$, and again for B.

But in this case, the proportions will be really close to $q_R/q_G/q_B$ I.e., almost surely it seems $(s_R, s_G, s_B)$ will be very nearly $(q_R, q_G, q_B)$. But $(q_R, q_G, q_B)$ is distributed uniformly on the $3$-simplex, as desired!

• Ryan O'Donnell Says:

1018.2. I don’t think it’s exactly correct, but I still hope some variant is…

26. Ryan O'Donnell Says:

1019. ENDS of ENDS = ENDS.

I think I’ve got it now. Define a distribution on $k$-partitions of $n$ called $k$-Equal Non-Degenerate Slices ($k$-ENDS): To get it, take $n$ dots in a line, consider the $n-1$ gaps between them, and place $k-1$ bars in these gaps, uniformly from among the $\binom{n-1}{k-1}$ many possibilities (i.e., “without replacement”). Then the intra-bar blocks of dots correspond to $k$ positive integers $m_1, \dots, m_k$ which add up to $n$. Associating block $m_i$ to character $i \in [k]$, we get a distribution on “Non-Degenerate” slices (each character appears at least once). We can extend this to a distribution on $[k]^n$ as usual, just by randomly permuting the string.

This is actually a kind of convenient variant on equal-slices, because when you’re using equal-slices to pick combinatorial lines, it’s a bit annoying to have to worry about the unlikely case of getting $0$ wildcards.

Anyway, I think it’s now easy to see that if you do $M$-ENDS, and then you do $k$-ENDS on that (as described in #1018), then you just get $k$-ENDS, exactly. Because all that’s really happening is that you’re picking $M-1$ bar-spots uniformly, and then you’re picking $k-1$ of those $M-1$ bars to be the “final bars”, uniformly. So that’s just the same as picking $k-1$ bars uniformly to start with.

I think. Once again, it’s late.

27. Ryan O'Donnell Says:

1019.1. “Formula does not parse” above is “(n-1 choose k-1)”. [Fixed]

28. Ryan O'Donnell Says:

1019.2. Some googling reveals this paper (among others) which discusses in its Section 2 the proper nomenclature for this stuff, as well as related probability distributions.

k-ENDS -> “random composition into k parts”; the consequent distribution on strings is “random ordered partition into k parts”.

dots -> “balls”
bars -> “walls”

29. gowers Says:

1020 DHJ(k)

Ryan, that all looks like good news — it would be very nice if we could get the technicalities to run smoothly so that they don’t obscure the main ideas, and your comments above give me hope that it will be possible.

I’ve got 25 minutes to spare so I’m going to try to explain why I think that DHJ(k) is basically done too. I’m assuming for this that the proof that a line-free set correlates locally with a set of complexity k-2 is going to work out now, which seems likely, as it’s been sketched down to a pretty fine level of detail and you’ve scrutinized the parts that looked most likely to go wrong if anything was going to.

In order to explain how I think the rest of the proof will go, what I need to do is explain the difference between the proof for DHJ(k) and the proof of DHJ(3) outlined on the wiki here.

Analogue of Step 1. We need to replace 1-sets by what I call $\{1,2,\dots,k-2\}$-sets. These are sets $\latex \mathcal{B}$ such that $x\in\mathcal{B}$ if and only if $(U_1(x),\dots,U_{k-2}(x))\in\mathcal{Z}$ for some suitable collection $\mathcal{Z}$ of disjoint (k-2)-tuples of sets and $U_i(x)$ stands for the set of places where x takes the value i. To save writing, I’ll call these simple sets. So the next obvious aim is to prove that a simple set can be approximately partitioned into subspaces. Once that’s done, I think it is fair to say that the rest of the proof is exactly as in the k=3 case.

The one other thing that needs to be changed in Step 1 is that the appeal to the multidimensional Sperner theorem has to be replaced by an appeal to multidimensional DHJ(k-1), a proof of which is sketched on the wiki here. Otherwise, the changes are trivial: 3 becomes k, and $\mathcal{U}\otimes[2]^n$ becomes our simple set $\mathcal{B}.$ (I admit that I haven’t been through Step 1 line by line checking what I’m saying — that would use up more than my 25 minutes.)

I’ve looked through Step 2 and I think all the changes needed are of the trivial variety (changing 3s to ks and that kind of thing). It might be nice at some point to write an abstract version of this argument, since what we’re using seems to be very general — that we can partition ${}[k]^n$ into subspaces very conveniently, that the structure of good sets is preserved when we do so, that dense good sets contain lots of subspaces — that kind of thing.

Step 3. And now we do the same trick as we did in DHJ(3) except that we have to apply Step 2 k-1 times instead of twice. The end result is that a set of complexity k-2 can be almost partitioned into subspaces, and that’s what we’re looking for.

30. Anonymous Says:

1021. Varnavides.

Hi Tim, I think the $k$-ary proof that line-free sets correlate with simple sets is now done to about 100% satisfaction in my mind. I’ll try to wikify some details later, but to illustrate the simplicity with which it’s working out, here is a complete proof of Varnavides:

Assume we have DHJ(k).

Technicality: I want to say that a combinatorial line is (also) “degenerate” if the fixed coordinates do not use each character from $[k]$ at least once. Deducing DHJ(k) with this very slightly stronger notion of degeneracy is trivial.

Proof of Varnavides: Assume $A \subseteq [k]^n$ has density $\delta > 0$ under the “uniform $k$-composition” distribution (i.e., equal-slices conditioned on the slice having at least one of each character). Let $M = M(\delta/2)$ denote the least integer such that DHJ(k) guarantees the existence of a nondegenerate line in any subset of $[k]^M$ with uniform $k$-composition density at least $\delta/2$.

Think of choosing $x \in [k]^n$ as first drawing a random $M$-dimensional subspace $V$ from the uniform $M$-composition distribution, and then drawing $x$ randomly from $V \cong [k]^M$ according to the uniform $k$-composition distribution. Since $x \in A$ with probability at least $\delta$, we know that with probability at least $\delta/2$ over the choice of $V$, the $k$-uniform composition density of $A|_V$ is at least $\delta/2$. Call such $V$‘s “good”. By DHJ(k) we know that for any good $V$, there is a (nondegenerate) line inside $A|_V$.

Now suppose we choose a random line $L$ in $[k]^n$ according to the uniform $(k+1)$-composition distribution. We claim that it is entirely within $A$ with probability at least $\delta/[2 M^{k/2} (k+1)^M]$ . To see this, again think of $L$ as being drawn by first choosing $V$, and then drawing a random line in $V$ according to uniform $(k+1)$-composition. With probability at least $\delta/2$ $V$ is good, in which case there is at least one nondegenerate line in $A|_V$. The probability $L$ is picked to be it is at least $1/[M^{k/2} (k+1)^M]$, since the least probability under uniform $(k+1)$-composition of $M$ is at least this quantity (er, or something like that).

• Ryan O'Donnell Says:

1021.1. That was me again, not logged in. I wanted to add that we can complete the whole “line-free implies correlation with simple set” proof exclusively in this uniform $k$-composition measure rather easily, I think. The only thing to really check is that if you choose a $1 - \epsilon$ fraction of the coordinates $S$ randomly, do uniform-$k$-composition on $S$, and then independently do uniform-$k$-composition OR uniform-$(k-1)$-composition on $\overline{S}$, then the overall probability distribution has total variation distance at most $O(\epsilon \sqrt{n})$ from the usual global uniform $k$-composition distribution. Which is, I’m pretty sure, true.

31. gowers Says:

1022. Shelah

While I wait for reaction about how we should go ahead with writing things up (see this comment), I want to continue thinking a bit about whether the proof we have could be Shelah-ized. Terry said in a comment some time ago that he is a big fan of translating things back and forth between combinatorial and ergodic languages. I (in common with just about every mathematician) am a big fan of “completing the square,” by which I mean completing puzzles of the form “A is to B as C is to ??” For example, “corner is to combinatorial line as large grid is to ??” the answer to which turned out, not completely expectedly, to be “${}[2]^k$ for some large $k$“. And that was just one part of a bigger such puzzle: “the Ajtai-Szemerédi proof of the corners theorem is to the triviality that a dense subset of ${}[n]$ contains two distinct points as ?? is to the proof of Sperner’s theorem.”

So here I would like to think about the puzzle, “The colour-focusing proof of the Hales-Jewett theorem is to Shelah’s proof of the Hales-Jewett theorem as the proof we now have of DHJ(k) is to ??”

The reason I am drawn to this is that I once lectured both proofs of the Hales-Jewett theorem and came to a very clear understanding of the relationship between them, which I’ll have to reconstruct, but I remember it as being natural enough that it will be easy to reconstruct. When I have a spare moment, I’ll write something about this on the wiki, but for now let me just say that Shelah’s basic idea can be interpreted as follows: whereas the old proof of HJ(k) used HJ(k-1) to reduce the problem to the almost trivial HJ(2), Shelah’s new proof used HJ(2) to reduce the problem to HJ(k-1). As I say, I’ll justify that at some point, but here I just want to take that thought and have a stab at applying it to DHJ(k) instead.

Of course, I’m not sure that it’s possible to do so, but just having the goal in mind does, I think, suggest some questions. For instance, here’s a place in the proof where DHJ(k-1) comes into the proof of DHJ(k). One first passes to a subspace, which we’ll notate as though it is the whole space, such that the density of $\mathcal{A}$ is almost $\delta$ and the density of $\mathcal{A}$ in ${}[k-1]^n$ is still positive. Writing $\mathcal{B}$ for $\mathcal{A}\cap[k-1]^n$, we use $\mathcal{B}$ to construct a dense set $\mathcal{C}$ of complexity $k-2$ that is disjoint from $\mathcal{A}$. It doesn’t matter too much what all this means: the main point is that once we have a set of complexity $k-2$ we are able to hit it with DHJ(k-1).

So a natural question, if we are trying to Shelah-ize, is whether we can construct some set of complexity 1 (which means it’s hittable with DHJ(2), aka Sperner) and thereby, somehow, create a fliptop subspace inside which $\mathcal{A}$ is dense. Recall that a fliptop subspace is one inside which if you change any coordinate with value k-1 to a k, then you don’t change whether the point x belongs to the set $\mathcal{A}$. So if your set is dense in such a subspace (or rather, dense in the ${}[k-1]^n$ part of the subspace) then you can hit it with DHJ(k-1) and finish off the proof immediately. Because DHJ(k-1) is used just as the inductive step and not as some part of an iterative procedure for getting to the inductive step, a proof of this kind should be far better quantitatively.

In my next comment I shall think about the following question: are there any complexity-1 sets that can be derived from a set $\mathcal{A}$? For now, I don’t care whether we can actually use them.

32. gowers Says:

1023. Shelah.

Continuing on from 1022, a completely simple-minded idea would be to pass to a subspace such that $\mathcal{A}$ is dense in ${}[2]^n$ and still had density almost $\delta$. (I’m just going to assume that that’s fine.) If we write $\mathcal{B}$ for $\mathcal{A}\cap[2]^n,$ then what set $\mathcal{C}$ can we construct from $\mathcal{B}$?

In the other argument, we looked at all sequences $x$ such that, for every $j$, if you change all $k$s to $j$s then you get a sequence in $\mathcal{B}$. Here, it is clear, we are going to have to do a much more radical collapsing of the sequence. One possibility, which may be nonsense, would be to convert all non-1 coordinates into 2s. In other words, we let $\mathcal{C}$ be the set of all x such that the 1-set of x is equal to the 1-set of some sequence in $\mathcal{B}$.

In the other argument, we then argued that $\mathcal{A}$ is disjoint from $\mathcal{C}$ if it does not contain a combinatorial line. What happens here? Well, bearing in mind that (using the Shelah proof as a guide) we are not trying to go straight for a combinatorial line, but rather for a fliptop subspace, we should perhaps assume as our hypothesis that $\mathcal{A}$ does not contain such a subspace.

Turning that round, we would want to prove that if $\mathcal{A}$ intersects $\mathcal{C}$ then it contains a fliptop subspace. (This doesn’t sound true, but let’s see what’s wrong with it first.)

For $\mathcal{A}$ to intersect $\mathcal{C}$ we need a sequence $x\in\mathcal{A}$ such that if you convert all its non-1s into 2s, then you get a sequence of 1s and 2s that also belongs to $\mathcal{A}$. Why should that be interesting? I don’t know.

But suppose instead that we managed to find quite a large subspace S of such points. Then if x belonged to $\mathcal{A}\cap S$ and had a variable coordinate equal to 2, we would … no, I don’t see where this goes.

Part of the problem is that I seem to be going for the fliptop subspace to be contained in $\mathcal{A}$, whereas what I should be aiming for is merely that $\mathcal{A}$ is dense in the fliptop subspace (which actually seems to be a flipbottom subspace).

I’m going to stop this comment here — I hope I’ve demonstrated the flavour of the argument that I think might conceivably exist.

• gowers Says:

1023.1 I should have added the remark that the set $\mathcal{C}$ has complexity 1.

33. Terence Tao Says:

1024. Austin’s proof

I’ve written down an informal (and sketchy) combinatorial translation of the k=3 case of Austin’s argument at

http://michaelnielsen.org/polymath1/index.php?title=Austin%27s_proof

It’s actually very much in the style of the proof of the triangle-removal lemma, especially if one takes an analytic perspective rather than a combinatorial one and talks about decomposing functions rather than regularising graphs. In particular, it’s best to avoid thinking in the classical language of vertices and take a more “point-less” approach in the spirit of probability theory or ergodic theory (or operator algebras, for that matter), thus for instance one should think about averages of products of functions rather than counting triangles etc. One reason for shifting one’s thinking this way is that it allows one to think about strong stationarity in a civilised way, whereas this concept becomes really difficult to work with if one insists on holding on to concrete vertices and edges. (In classical combinatorial language, being strongly stationary is like saying that a certain graph is a “statistically equivalent” to the subgraph formed by restricting of its vertex set from a big space such as [3]^n to a randomly chosen subspace; but defining what “statistically equivalent” means precisely is sort of a pain unless one decides to give up on vertices altogether.)

It’s also clear that this proof will end up being messier than the pure density-increment proof (though cleaner than a finitisation of the Furstenberg-Katznelson proof), and give worse bounds, in large part because one has to apply Graham-Rothschild (with parameters given in turn from a Szemeredi-type regularity lemma!) to get the initial strong stationarity properties. It looks possible to use density increment arguments as a substitute for Graham-Rothschild, but this is still going to have quite lousy bounds.

• Terence Tao Says:

1023.1 Information-theoretic approach

Hmm, it occurs to me now that the information-theoretic approach to the regularity lemma that I laid out in

http://front.math.ucdavis.edu/math.CO/0504472

(where I “regularise” random variables using Shannon entropy) is actually well suited to this problem, and may lead to a relatively slick (albeit alien-looking) proof. What I may do at some point is begin by showing how by playing around with conditional Shannon entropies of various random variables, one can establish the triangle removal lemma, and I believe that this proof may extend relatively painlessly to the case under consideration. More on this later…

34. Ryan O'Donnell Says:

1025. Always the uniform-ordered-partition distribution.

Following up on my comment #1006, I feel somewhat sure that we can work entirely within the uniform-ordered-partition distribution (i.e., “equal-nondegenerate-slices”). I’m going to try to write it down now to be sure, but if it’s correct, it should mean that “DHJ(k)=>Varnavides”, “DHJ(k)=>multidimensional-DHJ(k)”, and hopefully also “line-free sets correlate with simple sets” can all be proved with simple arguments.

More precisely, they will be the same simple arguments we have now, but with no annoyances due to all the tedious passing back and forth between various distributions via subspaces.

• gowers Says:

1025.1 Wow — I very much hope you’re right about this. I suppose the bit that would be hardest (but not obviously impossible) would be “simple sets can be almost partitioned into subspaces”. In the uniform measure you are then done, but in other measures you can’t instantly argue that a set of density $\alpha$ inside the simple set must have density almost $\alpha$ on one of the subspaces.

But that could well be just another little task to add to your collection. Now the idea would be to prove that the measure on a simple set can be written as a positive linear combination of the measures on subspaces. If that too can be done cleanly in the UOPD/ENS then we are in fantastic shape.

35. gowers Says:

1026. Does triangle removal imply DHJ(3)?

A wild thought this one, especially as it’s close to an idea I had several years ago and couldn’t get to work. But with the benefit of a bad memory (something that is very helpful in mathematics, in my view) I now can’t see what is wrong with it.

The idea is sort of ridiculous. Let’s pass to a subspace in which, notating it as ${}[3]^n$, $\mathcal{A}$ has average density at least $\alpha$ in the slices up to $m$, where $m$ is much smaller than $n$. Sorry — by that I mean slices where the 1-set and 2-set are of size at most $m$.

The hope now is that the disjointness graph becomes dense (because two sets of size $m$ are almost always disjoint), so we can just apply the usual triangle-removal lemma. My one reason for not completely dismissing this idea is that when I tried it all those years ago I did not have the concept of equal-slices measure to play with.

Unfortunately, it doesn’t become dense. Only the UV-part of it is dense (because it is just the 1-set and the 2-set that I am assuming to be small).

Let me ask a slightly different question, which probably has a fairly easy answer, and that answer will probably kill off this little idea. Given $n$ points in ${}[3]^N$ (where $N$ can be as big as you like), how many combinatorial lines can they contain?

• jozsef Says:

Well, if you don’t care about the dimension, then the number of lines might be $cn^2$ Just take 0-1 sequences as i zeros followed by n-i ones (for every i) and 0-2 sequences where the zeros are followed by two-s. For any pair of such 0-1 sequences there is a third 0-2, giving a line. So this number itself won’t show that your plan won’t work.

• gowers Says:

I don’t understand — large should mean $cn^3$ surely? Or perhaps $Cn^2$ but with the three bipartite graphs dense.

• jozsef Says:

Probably I don’t understand the question. Isn’t it so that two points define the line uniquely. In any case, my example won’t work.

• jozsef Says:

As it happened before, I didn’t read your post carefully. It is an extremely busy period for me; heavy teaching, committees, etc. So, I’m just writing those “half baked remarks” as Boris would say. I will try to refrain from that in the future.

36. Ryan O'Donnell Says:

1027. A single distribution?

In preparation for trying to justify what I wrote in 1025, I have written this document explaining the distribution. I’ll wikify it if and only if this scheme works out and proves helpful :)

• Ryan O'Donnell Says:

1027.1 I wrote some more here but I’ve gotten too sleepy to finish tonight. I’ll try to finish tomorrow and explain it properly in a post, but I have to go out of town tomorrow too…

37. Klas Markström Says:

1028.
I will make my first visible visit on this side of the project by asking one of the stupid questions discussed in the metathread.

Regarding the question from 1026. As far as I understand any pair of points specify a unique line. if that is the case n points can specify at most Cn^2 lines.

Now if we have a set of points and a set of lines where every pair of points specify a unique line we have the beginnings of a projective plane, but a projective plane has the same number of points and lines.

So is there another known nice family of combinatorial objects which instead maximises the gap between the number of points and lines?

38. gowers Says:

1029

Klas, your “stupid” question has exposed the stupidity (without inverted commas) of mine — it’s not what I meant to ask. Let me have another go.

The way we prove Sperner using equal-slices measure can be thought of like this. We partition the measure on ${}[2]^n$ into measures where the set of combinatorial lines is dense. Each measure is obtained by taking a random permutation of ${}[n]$ and taking all intervals in that permuted set, and then any two points with non-zero measure define a combinatorial line.

So a vague version of what I am trying to ask is whether we can somehow find dense structures inside ${}[3]^n$ and play a similar game, this time exploiting the corners theorem. However, I don’t have a sensible precise version of this question, which perhaps suggests that the answer is no.

39. gowers Says:

1030. Shelah

Another idea. Instead of struggling to preserve the density of $\mathcal{A}$ by constructing just one fliptop subspace, perhaps we can partition the whole of (or almost the whole of) ${}[k]^n$ into fliptop subspaces. Then in at least one of them $\mathcal{A}$ would have density almost $\delta$ and we would be done by induction.

Actually, that’s not quite right unless we’re rather careful, because in order to apply induction we need $\mathcal{A}$ to be dense in the ${}[k-1]^m$ part of the fliptop subspace. Let me not worry about this for the time being.

Instead, I’d like to think about whether some kind of argument like the one we’ve used for partitioning 1-sets into subspaces could do the job here. I can’t see it in advance so I’ll just try to plunge in.

Let m be a medium-sized integer — much smaller than n, but much larger than 1 — and write ${}[k]^n$ as ${}[k]^{n-m}\times[k]^m.$ By the pigeonhole principle we can find a subset $\mathcal{A}_1$ of ${}[k]^{n-m}$ of density $k^{-m}$ such that for every $x\in\mathcal{A}_1$ the set of $y\in[k]^m$ such that $(x,y)\in\mathcal{A}$ is the same. Let that set be $\mathcal{B}_1$, and find a big subspace $S_1$ in ${}[k]^m$ that is fliptop with respect to $\mathcal{B}_1$. Then all the subspaces $\{x\}\times S_1$ such that $x\in\mathcal{B}_1$ are fliptop. (They could be disjoint from $\mathcal{A}$, but we don’t care about this.) Let us include them in our attempted partition and remove them from ${}[k]^n.$ Note that we have removed a positive proportion (depending on $m$) of ${}[k]^n$ as a result of this.

We now partition ${}[k]^n$ according to the part that belongs to ${}[k]^m.$ The result is a number of subspaces of the form $S_y=\{(x,y):x\in[k]^{n-m}\}.$ If $y\notin S_1,$ then we have not thrown away any of $S_y$ and we just repeat the first step. However, if $y\in S_1,$ then things are more complicated, because we have thrown away some of $S_y$ and are therefore not allowed to use that part in any new cell that we want to add to the attempted partition.

At this point I think it helps to imagine that we’ve somehow managed to deal with this problem for several iterations. What position would we expect to be in then? Well, we’d have restricted to some subspace (of codimension $rm,$ where r is the number of iterations completed), and we would want to find a big chunk (which would be a union of fliptop subspaces) to remove. Hmm, it doesn’t seem to work, since by this stage I’m needing to find a big orderly chunk of subspaces inside a set over which I have pretty well no control.

40. Ryan O'Donnell Says:

1031. New distribution.

This writeup is practically the same, but I think it sets things up the best. It’s a bit long, but that was mainly just to convince myself what I was saying is correct.

The gist is this:

The basic objects are length-$m$ lists (ordered) of nonempty sets, where the sets’ union is $[n]$. Such lists are in obvious correspondence with the set of “nondegenerate” points in $[m]^n$ (where a point is degenerate if it does not include each character at least once).

The basic operations are: 1. $\Pi$ = randomly permute the list. 2. $C$ = do a random “coagulation”: pick $i \in [m-1]$ uniformly, and merge the $m$th set into the $i$th set (shrinking the list length to $m-1$).

The basic distribution is: Start with the length-$n$ list $\langle \{1\}, \{2\}, \dots, \{n\} \rangle$. If we first apply a $\Pi$ operation, and then apply $n - k$ consecutive $C$ operations, we get a length-$k$ list, which corresponds to a point in $[k]^n$. This is our basic distribution on strings. It is (one can show) the “equal-nondegenerate-slices” distribution.

Finally, note that if we only did $n - (k+1)$ of the $C$ operations, we’d get a string $\lambda \in [k+1]^n$, which we think of as $([k] \cup \{\star\})^n$; i.e., a combinatorial line (template). If we were to further apply the $C$ operation to this, we’d get one of the $k$ points on this line, uniformly at random. And, note that $C \lambda$ is a string in $[k]^n$ distributed according to our basic distribution.

—-

I’m quite sure that at the very least, Varnavides and Multidimensional DHJ(k) become “trivial” now. The proof of line-free sets in $[k]^n$ correlating with simple sets was already pretty simple assuming these two tools, but I’m hoping it also gets a paragraph proof now.

Of course, I further hope that this helps simplify the “other half” of the proof, namely the density-increment argument. I’d like to think more about it except I’m going to visit family for a few days and will likely not get the chance.

41. gowers Says:

1032. Shelah/wikification

I’ve now put on the wiki a very slight modification of the usual proof of the (colouring) Hales-Jewett theorem, together with Shelah’s proof. The purpose of the exercise is to show just how similar the two proofs really are: one uses HJ(k-1) to get you down to HJ(2), while the other one uses HJ(2), in pretty well exactly the same way, to get you down to HJ(k-1). If I’d been feeling silly enough, I would have interpolated between the two cases and written a general argument for how to use HJ(s) to get you down to HJ(k-s+1), which can quite clearly be done. (Indeed, it’s so clear that it can be done that maybe what’s already there can be thought of as doing it by giving enough examples to make clear what the general case is.)

I bothered to do this partly because I find it historically interesting that it took such a long time for Shelah’s proof to be discovered, and I think that presenting the two proofs side by side in that way makes it really quite mysterious. But my main motive is to try to get people interested in the process of Shelahification of the proof we (probably) have of DHJ(k). If the induction can be turned upside down so easily in the colouring version, I will need a pretty convincing argument to persuade me that it cannot be done for the density version. And if it can be done, it would be really good to do it because the bounds would drop from Ackermann to primitive recursive.

The proofs are on the page about the colouring Hales-Jewett theorem.

• jozsef Says:

Isn’t it so that the iterations give you a k-(k-1) fliptop space, then a (k-1)-(k-2) fliptop space in the smaller cube so on? Then if you check any line, it is fliptop for any consecutive pairs, so it is monochromatic. I was wondering if a statistical variant would work or not; Take an almost fliptop subspace (many neighbours have the same colour) and a smaller almost fliptop subspace so on. At the end if at least one line is pairwise fliptop then we are done.

• gowers Says:

Jozsef, I’m not sure if this is what you’re asking, but on the wiki I presented Shelah’s argument in terms of flipbottom subspaces instead of fliptop ones.

• jozsef Says:

Tim, I just changed the notation there. I should have followed yours. While I think that I understand Shelah’s proof completely, I’m not sure that I see exactly the limits of the technique. For example, it seems to me that we don’t use that the points of every neighboring pair have the same colour. For example if 11111… is blue and 21111… is red in the first fliptop subspace, it won’t make any trouble later as the line defined by the two elements is outside of the next subspace. In general, every point with coordinate 1 has exactly one “semi-neighbour” which is important in the iteration, where the 1-s are replaced by 2-s. But I don’t see an easy way to consider the important pairs only.

• gowers Says:

Yes, I had a similar thought soon after writing up the wiki page. I modified the usual proof so as to ensure, unnecessarily, that it reduced HJ(k) to HJ(2). But the usual colour-focusing proof doesn’t need that full strength. I therefore suspect that it is possible to modify the Shelah proof so that instead of producing a fliptop subspace you produce a subspace with some weaker property. But I haven’t tried to work out any details, or even to convince myself that it’s definitely possible.

42. Gower’s Polymath Experiment - Problem probably solved « Healthy Algorithms Says:

[...] combinatorial proof of the density Hales-Jewitt theorem (DHJ). Big congratulations to them, because the problem is solved, probably. Summarizing why he spent his time on this particular problem, Terry Tao wrote: I guess DHJ is [...]

43. gowers Says:

1033. Shelah

I’m still just throwing out guesses here (and seeing whether I get anywhere with them in real time — no luck so far) and here’s another one.

In the argument we have now, we could present it as saying that we try to find a combinatorial line, and if we can’t then we get correlation with a set of complexity k-2. If we are going to try to Shelah-ize, then we’re very much hoping to find a fliptop subspace (or perhaps, as Jozsef suggests above, something slightly weaker but still sufficient) where $\mathcal{A}$ is still dense. So perhaps we should plunge in and try to construct such a subspace, but instead of being completely determined to succeed, we should hope that either we will succeed or we will get correlation with a set of complexity 1.

The first step of what I was trying to do in 1013, following Jozsef in 1011, was as follows. Write ${}[k]^n$ as ${}[k]^r\times[k]^s$, where $s$ is much bigger than $r$. For each $y\in[k]^s,$ define $\mathcal{A}_y$ to be $\{x\in[k]^r:(x,y)\in\mathcal{A}\}.$ We would like to find $y_1$ and $y_2$ that differ only in a few places where coordinates that were k-1 in $y_1$ are k in $y_2$, such that $\mathcal{A}_{y_1}=\mathcal{A}_{y_2}.$ So far so good, but we would also like $\mathcal{A}$ to have density almost as big as the starting density when restricted to ${}[k]^r\times L$, where $L$ is the unique combinatorial line that contains both $y_1$ and $y_2.$

Now if we can’t do this, then we potentially get an interesting set inside which $\mathcal{A}$ has reduced density. It’s formed as follows: for each pair $(y_1,y_2)$ as above, form the line $L$, and take the union $X$ of all those lines.

Ignoring for now the fact that the union is not an arithmetic operation, so we can’t actually conclude without extra work that if $\mathcal{A}$ has reduced density on every ${}[k]^r\times L$ that it has reduced density on ${}[k]\times X,$ let’s see whether $X$ has any good structure, such as low complexity.

A sequence $z\in[k]^s$ belongs to $X$ if and only if we can find some $j$ and some subset $Z$ of the j-set of $x$ such that when we overwrite $Z$ with (k-1)s or ks, we get two points $y_1$ and $y_2$ with $\mathcal{A}_{y_1}=\mathcal{A}_{y_2}.$ Hmm, that doesn’t seem to have low complexity. Back to the drawing board.

• jozsef Says:

Just a technical remark; when you cut $[k]^n$ into two parts the you can play with the densities. The density of A is c, say. If any $A_x$ or $A_y$ had density larger than $c+\epsilon$ then we can go there and repeat. That means that the density of x that $A_x$ is at least $c-\epsilon_2$ is almost 1, I think.

44. gowers Says:

1034. Shelah

Time for a very vague thought indeed. Maybe the mistake in our thoughts so far about a density version of Shelah’s proof is that we’re still using colourings too much. Is there some “density equivalent” of a fliptop subspace?

To get a handle on this question, let us try to focus on the property that’s good about fliptop subspaces: that if you ever find a line with all the points up to k-1 of the same colour, then the whole line must be monochromatic. The obvious density version of that would be that if the first k-1 points belong to $\mathcal{A}$ then so does the kth.

One immediate small observation is that for this to be the case, all we need is that if you change some (k-1)s to ks, you don’t go out of $\mathcal{A}$. We don’t mind if this operation takes you in to $\mathcal{A}$. But is there perhaps some much more general circumstance under which we can deduce that $\mathcal{A}$ contains a line from the fact that $\mathcal{A}\cap[k-1]^n$ does? Or perhaps from the fact that $\mathcal{A}\cap[k-1]^n$ contains lots of combinatorial lines? In the latter case, what is the set of points that $\mathcal{A}$ will be forced to avoid?

I have to go now, but at first glance it still doesn’t look as though there are any super-low-complexity sets floating about.

45. Terence Tao Says:

1035. Austin’s proof w/o Ramsey theory or density increment

I’ve written up a more or less complete proof of DHJ(3) using a (rather abstract) triangle-removal approach, based on Austin’s proof, at

http://michaelnielsen.org/polymath1/index.php?title=Austin%27s_proof_II

I found that stationarity is not used as much as I had previously thought, and can in fact be obtained by a simple energy increment argument and the pigeonhole principle rather than more high-powered Ramsey-theoretic tools such as Graham-Rothschild.

The argument is a bit tedious though. It follows the approach to triangle removal used for instance in my reworking of the hypergraph removal lemma, in particular, heavily using lots of conditional expectation computations, rather than the traditional language of cleaning out “bad” cells and then counting both “non-trivial” and “trivial” triangles in the surviving cells. But there are some distracting complications due to the multiplicity of scales, and the presence of some additional random choices (basically, one has to choose some random index sets I to flip from 0 to 2, etc). One has to make the whole argument “relative” or “conditioned” to these random choices, which makes the argument look stranger than it normally does.

The bounds seem to be tower-exponential in nature – not so surprising, being based on triangle removal. Amusingly, if one used uniform measure rather than equal-slices measure, one would be forced to make a double tower-exponential, because of the fact that each new scale would have to essentially be the square root of the previous. Hooray for equal slices measure!

I’m pretty sure the density-increment argument is going to end up being shorter and simpler than this one, but I’m putting the triangle removal proof here for completeness.

I’ll be travelling overseas shortly, and so I’ll probably be contributing very little for the next week or two.

46. Polymath « Maxwell’s Demon Says:

[...] even though it touches on my central topics of maths and communication.  Now with its preliminary success feels like a good time to do so. [...]

47. gowers Says:

1036. Graham-Rothschild

A quick pair of closely related questions: is the density version of Graham-Rothschild known, and do we now have the tools to prove it (whether or not it is known)?

For the benefit of anyone who can’t be bothered to look it up on the wiki, the Graham-Rothschild theorem is like the Hales-Jewett theorem except that the basic objects are combinatorial subspaces of some fixed dimension rather than points. For instance, a special case is that if you colour the lines in ${}[3]^n$ with finitely many colours, then there is a combinatorial subspace of dimension m such that all the lines in that subspace have the same colour — provided n is big enough in terms of m and the number of colours.

It may be that the question is easy. For instance, suppose we think of a combinatorial line in ${}[k]^n$ as a point in ${}[k+1]^n$ with coordinates that belong to the set $\{1,2,\dots,k,*\}.$ If we then find in our dense set $\mathcal{A}$ of such lines an m-dimensional subspace, what does that translate into? It gives us some wildcard sets $Z_1,\dots,Z_m,$ each of which can be converted into an element of this alphabet, and it also gives us some fixed coordinates. The problem is that the fixed coordinates can be equal to $*$, so we do not end up with all the lines in some combinatorial subspace.

At first this looks problematic: given a dense subset of $\{1,2,\dots,k,*\}^n,$ we can’t hope to find a subspace inside it with no fixed coordinates equal to $*$, since $\mathcal{A}$ might consist of all points with at least one coordinate equal to $*$. However, that’s not a problem, because our entire space consists of points with at least one $*$ (since they are combinatorial lines).

So here’s a DHJ(3) variant. Suppose you have a dense subset $\mathcal{A}$ of ${}[3]^n.$ Must there be an m-dimensional subspace S such that all the fixed coordinates are equal to 1 or 2, and every point in S that includes at least one 3 is an element of $\mathcal{A}$?

• gowers Says:

Just realized that the formulation of that last question was a bit careless, since the usual example of a set with roughly equal numbers of all three coordinates is a counterexample to this when m=2. So it’s not instantly obvious what a density version of Graham-Rothschild would even say, but I hope that with the help of equal-slices measure one might be able to formulate a decent statement.

• Randall Says:

Generally, these Ramsey theorems are either “projective” (Schur, Hindman, Ramsey), “affine” (van der Waerden, HJ) or “projective/affine” (Graham-Rothschild, Carlson, CST). Nothing with a projective aspect will typically have a “naive” density version, though I don’t know exactly what you are shooting for here. (Presumably you don’t mean something that the set of words having an odd number of 3s would be a counterexample to.)

• jozsef Says:

Hi Randall, It is an interesting question if these problems have density versions or not. Let’s consider Schur’s thm. Ben Green proved that if a subset of [n], S, has only a few, $o(n^2)$, solutions to x+y=z in S then one can remove o(n) elements from S to destroy all solutions. This might be viewed as a density version of Schur’s theorem. On the other hand I don’t see a simple way to deduce Schur’s theorem from this density version. I tried it as it might give a bound on the triangle removal lemma, but without success.

48. Ryan O'Donnell Says:

1037. Uniform ordered partition measure.

Here is the corrected link from 1031; in other words, add “.pdf” to the broken link. I think this measure will work well with the Graham-Rothschild problem described above; again, I’ll try to write more in a few days when I get back home.

49. jozsef Says:

1038. Subspace implies line

A simple argument shows that DHJ(k) implies d-dimensional subspaces in $[k]^n$ for any dense subset (with large enough n). I remember seeing a wiki article about this, but I was unable to find it this morning. It is also true that a subspace theorem implies DHJ. That looks fairly trivial but let me explain it; I want to show that the following statement is equivalent to DHJ(k): “There is a constant, $c < 1$ that for every d there is an n that any c-dense subset of $[k]^n$ contains a d-dimensional subspace.”
I would like to show the direction that the statement above implieas DHJ(k). The other direction is already wikified.
As before, write $[k]^n$ as $[k]^r\times[k]^s$, where s is much bigger than r. For each $y\in [k]^s$, define $\mathcal{A}_y$ to be $\{x\in[k]^r:(x,y)\in\mathcal{A}\}$.
Let Y denote the set of $y\in [k]^s$, that $\mathcal{A}_y$ is empty. Suppose that $\mathcal{A}$ is large, line-free, and its density is $\delta =\Delta-\epsilon$ where $\Delta$ is the limit of density of line-free sets and $\epsilon < (1-c)\Delta$. We can also suppose that no $\mathcal{A}_y$ has density much larger than $\Delta$ as that would guarantee a combinatorial line. But then the density of Y is at most 1-c, so there is a c-dense set Z $Z=[k]^s-Y$ such that any element is a tail of some elements of $\mathcal{A}$. For every $y \in Z$ choose an x $x\in [k]^r:(x,y)\in\mathcal{A}$. This x will be the colour of y. It gives a $[k]^r$ colouring on Z. By the initial condition Z contains arbitrary large subspaces, so by HJ(k) we get a line in $\mathcal{A}$.

50. gowers Says:

1039 Graham-Rothschild

I’ve got a new attempt at a density version of Graham-Rothschild that might have a chance of being true. I realized that equal-slices measure wouldn’t rescue the previous version, since the set of lines with wildcard set of size approximately $\alpha n$ is a counterexample.

As I write, I realize that my new attempt fails too. Basically, if you’ve got a 2D subspace then you must have lines with wildcard sets of sizes x, y and x+y, so a density version looks as though it would have to imply a density version of Schur’s theorem, which is of course false. (I realize that I am sort of repeating what people have already said in their replies to 1036.)

Maybe one could go down the route Jozsef implicitly suggests and try proving that if $\mathcal{A}$ is a set that contains few m-dimensional subspaces, then one can remove a small number of elements from $\mathcal{A}$ and end up with no m-dimensional subspaces. I’m not sure that I feel like trying this though …

51. Klas Markström Says:

1040.
Since the density of a subset of [k]^n is quite insensitive to the removal of an element, at least for large n, a set with positive density will contain many lines. How many lines can be guaranteed, as a function of the density and n?

From the equal-slices proof of Sperner it is not hard to get a lower bound for DHJ(2) so I’m curious as to what can be said for DJH(3).

52. jozsef Says:

1041. Line-free sets correlate locally with complexity-1 sets

I would like to go back to one important part of the DHJ(3) proof, to analyze it from a slightly different angle. Let us consider our set $\mathcal{A}$, a dense subset of $[3]^n$, as a subset of $({\Bbb Z}/3{\Bbb Z})^n$. Build a graph on $\mathcal{A}$ as follows; The vertex set is $\mathcal{A}$ and two vertices, a and b, are connected iff for c: a+b+c=0 c is also in $\mathcal{A}$. If $\mathcal{A}$ was dense then there are many edges in any typical subspace. Now DHJ is equivalent to the statement that there are two connected elements of $\mathcal{A}$ with the same set of 3-s and that one’s set of 1-s contains the 1-s of the other. This model leads us to a density Sperner problem.
It just turned out that I have to go somewhere right now. I will come back in a few (4+) hours.

• Ryan O'Donnell Says:

1042. The distribution.

Hmm. In fact, it may be even better to view strings in $[k]^n$ being generated in the time-opposite way from the one I described. Specifically, equal-slices and its nondegenerate variant are the same as the following Polya-Eggenberger urn process:

[Non-degenerate version:] Start with one of the $k!$ permutations of the string $123\cdots k$. Repeat the following $n-k$ times: pick a random character in $[k]$ with probability proportional to the number of times it appears already in the string. Now insert that character into a random place in the string.

[Equal-slices version, I think:] Same, except: a) start with the empty string; b) for the purposes of proportionality, pretend there is a phantom copy of each of the $k$ characters.

This yet-another viewpoint on the equal-slices distribution helps with making “subspace arguments” (which the uniform distribution was good for): the point is, if you do a $k$-color Polya urn process for $M + n$ steps and $n \gg M$, then the final distribution hardly depends at all on what happened in the first $M$ steps.

Will write more when I get back in two days.

53. Gil Kalai Says:

1042. Density increasing and an analog problem

(This is a little off topic) Let me mention a problem which I thought of as analogous to Roth/cap set where the gaps between lower and upper bounds are similarly shocking and the current density increasing arguments cannot help; (It is related to old work of mine with Kahn and Linial and also to some more recent work with Shelah that we did not publish.)

you have a subset A of $\{0,1\}^n$ of density c and you would like to find a combinatorial subcube F of dimension 0.9n so that the projection of A to F is of large density say 0.99. In other words, we want to find a set of 0.9n coordinates so that for a fraction 0.99 of the vectors supported on this set we can find a continuation on the other coordinates that is in A. (We usually talk here about restriction to a subcube/subspace and not about projections. But traditionally sections and projections are not unrelated.)

By a density increasing argument doing it one coordinate at a time it was known from the late 80s that this can be achieved if c is $n^{-\alpha}$ for $\alpha=1/5$ or so. A conjecture by Benny Chor asserts that $c=0.999^n$ is good enough!

I think it is a little analogous to Roth (or cap set)

a few points:

1) The proof is also by a slow density increasing argument (you reduce the dimension by one every time) and there are examples the such an argument cannot be improved.

2) There are some conjectures (by Friedgut and others) which may explain why we can get the density down to $n^{-\beta}$ for every $\beta$ maybe even $\beta=\gamma \log n$ but no plans beyond it.

3) There are alarming examples by Ajtai and Linial of Boolean functions descibed by certain random depth 3 circuits (that Ryan already mentioned) that may (or a varian of) give a counter example. It is complicated to check it.

I admit that the analogy with density increasing argument for Roth-type problems is not very strong: this problem is about projection to subcubes and there it is about restrictions to subspaces or similar creatures; But there may be some connecion.

In particular I would try subsets described by low depth small size circuits (with operations over {0,1,2}) as candidates for counter examples for the most ambitious conjectures regarding Roth and cap sets.

(On the positive side maybe more sophisticated density increasing arguments of the kind we talk about here can be used in this problem.)

54. Kristal Cantwell Says:

1043. Graham-Rothschild

From 1036: ” A quick pair of closely related questions: is the density version of Graham-Rothschild known, and do we now have the tools to prove it (whether or not it is known)?
For the benefit of anyone who can’t be bothered to look it up on the wiki, the Graham-Rothschild theorem is like the Hales-Jewett theorem except that the basic objects are combinatorial subspaces of some fixed dimension rather than points. For instance, a special case is that if you colour the lines in with finitely many colours, then there is a combinatorial subspace of dimension m such that all the lines in that subspace have the same colour — provided n is big enough in terms of m and the number of colours. ”

I doubt there is density version of the Graham-Rothschild theorem. If one fixes a coordinate and deletes all lines that has a constant coordinate at that point then that will only lower the number of lines by a constant factor but it will prevent the formation of any two dimensional space with all its lines monochromatic as in that case the intersection of the moving coordinates of all of the lines is the null set.

55. Kristal Cantwell Says:

1044. Density theorems and Ramsey theorems

Density theorems are stronger than Ramsey theorems. Any density theorem can be converted to a Ramsey theorem as follows. One sets the density less than 1/r and gets a large enough configuration so that the if the density is greater than 1/r there will be the desired result than for any r coloring there will be one color with density 1/r or more and we will have a monochromatic configuration in that color.

56. Kristal Cantwell Says:

1045. Density Schur Theorem

I have just remembered the counterexample for a density version of Schur’s theorem. One just takes the odd numbers, they have density 1/2 but since the sum of two odd numbers is even there are no triples a,b,c in the set such that a+b=c.

• gowers Says:

The fact that the density version of Schur’s theorem is false implies that density Graham-Rothschild is false — see 1038.

57. gowers Says:

1046. Bad news and good news.

Bad news: our proof of DHJ(3) is wrong!

Good news: it isn’t too hard to fix.

But it’s still quite amusing that nobody noticed that Substep 2.2 of the write-up on the wiki of the crucial lemma that a dense 1-set can be almost completely partitioned into subspaces was nonsense as presented. I’ve left the old version there as a cautionary tale. Fortunately, the reason for the mistake was that I had translated the corresponding part of Terry’s argument in a sloppy way — the argument itself was sound. But it did give me a bit of a scare …

• Randall Says:

1046.1 Good news and polymath….

Well, perhaps it will be somewhat reassuring that the changes you have made seem to be only adding more detail, and that, from the ergodic perspective at least, it was natural to omit said detail; loosely translated (and unless I misunderstand), what you seem not to be checking more carefully is that when you remove the fibers (over some factor) that recur under a subspace, what you have left is still measurable with respect to that subspace. So perhaps the good news is actually that polymath is like math with a net. (In which case the bad news is the danger than everyone involved will internalize that piece of good news to the point where it becomes a problem.)

• Randall Says:

I of course meant (a) “now” to be checking more carefully, and (b) measureable with respect to that “factor”.

58. gowers Says:

1047. Wikification

I have now wikified an abstract version of the iterative argument that was wrong before. From the abstract point of view the property that I was forgetting about was the all-important sigma-algebra property. This version should be fine now and is intended to be sufficiently general to deal with DHJ(k) as well. The argument could be sharpened up a bit towards the end — I ran out of energy.

59. gowers Says:

Metacomment. I’ve created a thread for comments 1050-1099, so this one should draw to a close.

60. Michael Nielsen » The Polymath project: scope of participation Says:

[...] week, Gowers announced that the problem was “probably solved”. In fact, if the results hold up, the project [...]

61. Concluding notes on the polymath project — and a challenge « What Is Research? Says:

[...] project, as of February 20, 2009. The project, which began around February 2, 2009, has now been declared successful by Gowers. While the original aim was to find a new proof of a special case of an already proved theorem, the [...]

62. An Open Discussion and Polls: Around Roth’s Theorem « Combinatorics and more Says:

[...] (as a little spin off to the polymath1 project, (look especially here and here)) if you have any thoughts about where the truth is for these problems, or about how to [...]

63. A gentle introduction to the Polymath project « The Number Warrior Says:

[...] to an online collaborative approach, then kicked things off with a blog post. Six weeks later, the main problem he proposed was declared (essentially) solved. However, the project still continues apace, especially at threads at Terry Tao’s [...]

64. Un blog concentra a cientos de matemáticos en la demostración conjunta de un teorema « Francis (th)E mule Science’s News Says:

[...] (probably),” Nature Physics, 5: 237, April 2009 . Hay una entrada en el blog de Tim con el mismo título. Más allá de la demostración la experiencia muestra a los jóvenes que se inician en la [...]

65. sunee Says:

Thank you. best information for me.

66. More polymath projects « Algorithmic Game Theory Says:

[...] polymath projects After the success of the first polymath project (see also here and here as well as the project wiki), there are now [...]

67. Michael Nielsen » Introduction to the Polymath Project and “Density Hales-Jewett and Moser Numbers” Says:

[...] subprojects of the Polymath Project progressed quickly. On March 10, Gowers announced that he was confident that the polymaths had found a new combinatorial proof of DHJ. Just 37 days [...]

68. Why do we still publish research (via) papers?Research cycle research Says:

[...] papers, but platforms like OpenWetWare clearly show that this is doable, and initiatives like the Polymath projects demonstrate that it may even be beneficial to the topic — an aspect that is especially [...]

69. Open Science | Cosmic Variance | Discover Magazine Says:

[...] to the Hales-Jewett theorem — and, several hundred comments later, announced that they had succeeded. Here’s the paper. Buoyed by this success, the group has set up a Polymath Wiki to expedite [...]

70. Drafting proposals in the open – a practical test | Research cycle research  Says:

[...] to go because, to paraphrase Michael Nielsen’s account of Tim Gowers’ synopsis of the initial Polymath project, doing science in the open is to traditional science like driving is to pushing a [...]

71. Drafting proposals in the open – a practical test Says:

[...] way to go because, to paraphrase Michael Nielsen’s account of Tim Gowers’ synopsis of the initial Polymath project, doing science in the open is to traditional science like driving is to pushing a [...]