Here then is the project that I hope it might be possible to carry out by means of a large collaboration in which no single person has to work all that hard (except perhaps when it comes to writing up). Let me begin by repeating a number of qualifications, just so that it is clear what the aim is.
1. It is not the case that the aim of the project is to find a combinatorial proof of the density Hales-Jewett theorem when . I would love it if that was the result, but the actual aim is more modest: it is either to prove that a certain approach to that theorem (which I shall soon explain) works, or to give a very convincing argument that that approach cannot work. (I shall have a few remarks later about what such a convincing argument might conceivably look like.)
2. I think that the chances of success even for this more modest aim are substantially less than 100%. So let me be less fussy still. I will regard the experiment as a success if it leads to anything that could count as genuine progress towards an understanding of the problem. However, that success has a quantitative aspect to it: what I am really interested in is progress that results from multiple collaboration. If what actually happens is that several people make intelligent remarks, but without building on each other’s remarks, then what will have been gained will still be worthwhile but it will be less interesting from the point of view of this experiment.
3. I’ve thought of one more ground rule, and also a general guideline that clarifies a couple of the existing rules. The extra rule is that there needs to be some possibility for the experiment to be declared to have finished. What I suggest here is that if a consensus emerges that no further progress is likely, I (or perhaps somebody else) will write a summary of the main conclusions that have been reached, and it will then become public property in the way that a conventional mathematics preprint would be. That is, others would be welcome to make use of it in their own publications if they wanted to. Probably the summary would be posted on the ArXiV (after people had had a chance to suggest changes) but not conventionally published. The guideline, which is designed to stop people going away and doing a lot of work in private — the whole point is that we should all display our thought processes — is that you should not write anything that requires you to go away and do some calculations on a piece of paper. As I said in the other guidelines, if all goes well there will eventually be a place for detailed working out of ideas, but people should do that only after first establishing that that would be welcomed.
In order to encourage the style I’m hoping for, I shall practise what I preach, in a sense anyway, by giving my initial thoughts on the problem in small chunks. That is, I’ll write them as though they were a lot of comments that could in theory have been made by several different people instead of just one. I’ll number them with letters of the alphabet (so that they don’t get confused with the numbered comments on this post).
A. Might it be possible to prove the density Hales-Jewett theorem for by imitating the triangle-removal proof that dense subsets of contain corners?
B. If one were going to do that, what would the natural analogue of the tripartite graph associated with a dense subset be? To put that question more precisely, if we’ve got a dense subset of , then how might we define a tripartite graph in such a way that triangles in correspond to (possibly degenerate) combinatorial lines?
C. It’s perhaps worth pointing out that there is a natural definition of a degenerate combinatorial line: it’s one where the set of coordinates that vary from 1 to 3 is empty.
D. If we just look at the vertex sets and in the case of subsets , we have a very simple definition for an edge: it’s a pair that belongs to .
E. Does split up nicely as a Cartesian product?
F. Well, you can write it as , but that doesn’t help much.
G. Why doesn’t it help?
H. Because if we represent a typical point as with and , then a Hales-Jewett line is not a triple of the form , , . That’s partly because we can’t even make sense of adding , but more fundamentally because even if we could invent a useful definition of addition in this context, we still wouldn’t get a Hales-Jewett line.
I. It was clear in advance that that was not going to work, because the splitting up as a Cartesian product was completely non-canonical.
J. By the way, there’s a nice way of deducing the corners result from density Hales-Jewett. Let me quickly explain it, omitting a few details. Suppose we take a set with the special property that whether or not a sequence belongs to depends only on the numbers of s, s and s in . That is, let be the set of all triples of non-negative integers such that , let be some subset of , and let be the set of sequences such that , where stands for the number of s in . Now suppose you have a combinatorial line . If is the number of variable coordinates in , then must contain a triple of points of the form (where , and are the numbers of s, s and s amongst the fixed coordinates of ).
Now is a large triangular subset of a triangular grid, and one can (if one really feels like it) shear it so that it becomes a right-angled triangle instead of an isosceles triangle. In this way it is not hard to show that the assertion that every dense subset of contains the vertices of an equilateral triangle of the form , , is equivalent to the corners result.
Unfortunately it’s not the case that dense subsets of correspond to dense subsets of (because almost every point in has similar numbers of s, s and s) but there are dodges for getting round that.
K. That’s very helpful, because we almost certainly want whatever definition we come up with for the graph to give rise to the definition that we used for when the subset of is of the special kind that depends only on the numbers of coordinates with the three possible values.
L. All right then. That tells us that for subsets of that just depend on numbers of different types of coordinates, we want edges of the graph to correspond in some sensible way to points in that have a particular number of s and a particular number of s.
M. That suggests that in the general case we want some data about the s and s in to determine , just as the number of s and s determines the numbers of all three types of coordinates. And of course this is trivial: instead of looking at the numbers of the three types, look at the sets where they occur.
N. Indeed, there is an obvious tripartite graph we can define. Let its vertex sets, which I’ll call , and , all be copies of the power set of : i.e., in each vertex set the vertices are subsets of . Given two disjoint sets and , there is a unique point that takes the value in , the value in , and the value everywhere else. Join to if that point belongs to the given subset . And do the obviously corresponding things for the and graphs.
O. Does that work? That is, if we have a triangle in that graph, does it correspond to a combinatorial line?
P. Yes. Here’s a quick proof. Suppose we have , and , and suppose that they form a triangle in the graph. By the way we defined the edges, the sets , and are disjoint. For let’s write for the set of such that . Then in the set we can find a point such that and , and also a point in with and , and finally a point in with and . Now let . Since for every point we have , we find that , and . In other words, the three sequences , and , which all lie in , all take values on , on and on , and is everywhere on , is everywhere on and is everywhere on . This gives us a combinatorial line except in the degenerate case .
Q. Wow. That means we’re done by triangle removal doesn’t it?
R. Unfortunately it doesn’t. The problem is that that tripartite graph is not dense. In fact, it’s not even close to dense. To see why not, just pick a random pair of subsets and . They cannot be joined in the graph unless they are disjoint, and the probability of that happening is absolutely tiny.
S. That sounds a bit bad, but aren’t there sparse versions of Szemerédi’s regularity lemma?
T. I’m not sure it’s all that likely that there’s a version that applies when the density is exponentially small.
U. That’s a misleading way of putting it when the graph itself is exponentially large. If we let , so that there are vertices in each of the three vertex sets, then the density of the graph is going to be of the form for some constant . But that’s still pretty sparse, it has to be said.
V. Here’s another problem. The existing sparse regularity lemmas tend to assume that you’re sitting inside a highly random sparse graph, but are dense relative to that graph. In our case, the natural graph that we’re sitting inside is the one where you join two sets if and only if they are disjoint and ignore whether they define a sequence that belongs to . It’s possible that the graph defined above will turn out to be dense relative to this graph. However, is not by any stretch of the imagination random, or quasirandom. For example, if you look at the part, you can define to be the subset of where all sets have size at most , and the same for , and it’s not hard to check that the density of the part of the graph from to is substantially greater. (Basically, conditioning on sets being a bit smaller than average makes them quite a lot more likely to be disjoint.)
W. Yes, but at least the graph is very concretely defined. Perhaps one could do some kind of regularity argument relative to that exploited the fact that it has a nice simple definition.
X. Here’s a small and slightly encouraging observation. What does a typical edge look like? Well, there’s a one-to-one correspondence between edges and sequences in . Since a typical sequence in has roughly equal numbers of s, s and s, a typical edge in the graph consists of two disjoint sets of size about .
Y. Unfortunately it’s still very sparse if we somehow weight our sets so that almost all of them have size about , since two sets of size have a tiny probability of being disjoint.
Z. Aren’t there tricks we could do to find dense bits of the graph ? For example, we know that if and are random small sets (of size , say) then with high probability they are disjoint. Might there be some way of restricting to “the small part of “?
AA. How could one do that when almost all sequences have roughly equal numbers of s, s and s? The dense set might consist just of well-balanced sequences.
BB. It’s not as impossible as it sounds, because we can use an averaging trick. Suppose we want to restrict , and to small sets. What we can do is this. First we randomly choose a very small subset of . Then for every we randomly choose an element of . Finally, we do everything we want inside . That is, we define a graph by joining two subsets and of if they are disjoint, and define associated sequences by letting and determine the values inside and the random choice of values outside give us the rest. This gives us a structure of the kind we are talking about, and the average density of points in in this structure will be close to the density of if is small enough, because random sequences that are unbalanced inside a random small set will not be especially unbalanced globally.
CC. That’s an interesting thought, but I think it leads to a problem. Suppose our graph consists of pairs of disjoint small subsets of , and similarly for our graph and our graph. Where are the degenerate triangles? You can’t have a sequence with very few s in , very few s in and very few s in all at the same time.
DD. That’s a serious objection, but I still think the averaging trick could be quite useful for this problem.
EE. Here’s a different idea. The big problem is not so much that is sparse as that it is not quasirandom. But perhaps we could make it quasirandom by putting weights on the edges (and perhaps if necessary the vertices as well, since the averaging trick above will enable us to get sets of positive density even with funny vertex weights). The basic idea would be to penalize edges that joined small sets together, because somehow those sets were “cheating” by being small. Is there some natural set of weights that one can just write down?
[The history of this idea is that earlier this afternoon I had got up to Y when I had to go and fetch my daughter from school. While walking there I couldn’t help thinking about the problem and this idea, which I have not explored, occurred to me. So even the effort of writing down my thoughts has been quite stimulating. Because I haven’t yet thought about this idea properly, I’m still in that delicious state of thinking it might be the key to the whole problem. But I’ve also been in this game long enough to know that the chances of that are very small. It’s tempting to investigate it, but I’m not going to as this is also an ideal opportunity to throw out an idea and wait for somebody else to destroy it (or, just perhaps, to develop it) instead.]
FF. A natural way to find such weights would be to start by trying to make every vertex have constant (weighted) degree.
GG. It’s easy to see that you can’t make quasirandom by weighting the edges. For example, suppose you let and both be the set of all subsets of that contain . These two sets have density and yet there is no edge of between them.
[The history of this idea is that although I resolved not to think about the idea put forward in EE., I was on a car journey and I just couldn’t help it. That goes for the next couple of responses to it too, which I could help even less because at that point I was starting to get worried that the whole thing was indeed hopeless.]
HH. That sounds pretty bad.
II. Is it really as bad as all that? I can see that it’s a problem if you have very dense parts of the graph, because then you don’t have a global upper bound for the density of subgraphs when you are trying to prove a regularity lemma. But if you have parts of the graph that are very sparse, or even entirely free of edges, it’s not so obviously a problem because there could still be an upper bound. In other words, perhaps it’s enough for to be “upper quasirandom” rather than quasirandom.
JJ. Another idea is that you just try to do everything relative to . That is, you define the density of an induced subgraph of to be the number of edges in that subgraph divided by the number of edges in the corresponding induced subgraph of . Could that work?
KK. Going back to vertex weights, here are two remarks. First, the averaging trick shows that for more or less any weighting on the vertices, if you can prove the theorem with that weighting (which affects which sets count as dense) then you can prove the theorem with the uniform weighting. I could write a formal lemma to this effect if anybody wanted, but for now I think the informal statement is enough.
The second remark is that a natural weighting on sets is to make the weight of a set equal to . Then the size of a typical set is , and the probability distribution on the collection of all subsets of is the same as the probability distribution of the set of places where a random sequence in takes any given value.
LL. In response to II, I don’t think it’s possible to make even upper quasirandom by introducing edge weights. The idea to do so was roughly that one should penalize edges that join small sets because those were more likely to be disjoint. But making sets small is not the only way to make them more likely to be disjoint. For example, you could make consist of all sets that are slightly biased towards the first half of (meaning that their intersection with the first integers is slightly bigger than their intersection with the second integers) and could consist of all sets that are slightly biased towards the second half of . Of course, one could legislate against that particular example, but the same trick can be played for any partition of into two equal-sized subsets. It seems to be impossible to legislate against this, since, for trivial reasons, every set will be biased with respect to some partitions.
[The history of that idea was that I was sitting in the car waiting for a bus to arrive that had my daughter in it. Once again I just couldn’t help my mind wandering in a Hales-Jewetty direction.]
How might one completely kill off the above approach?
The tripartite graph that we form out of has few triangles in the following sense: every edge in the graph is contained in just one triangle (the other vertex of which is ). If one is to prove the theorem by means of an adaptation of the triangle-removal lemma, one would want the following graph-theoretic statement to be true: let be any subgraph of such that each edge is contained in at most one triangle. Then it is possible to remove edges from to make it triangle free. A counterexample to this more general statement would almost certainly make the approach hopeless. (I say “almost certainly” because one might just be able to use some extra structural property of the graph derived from . But that would change the nature of the approach quite a bit.)
Perhaps there could be convincing arguments that described the most general possible iterative approach to proving regularity lemmas and then proved that no iterative approach could give a regularity lemma that had any chance of helping with this problem. The seeming impossibility of making quasirandom would undoubtedly be a part of such an argument.
That’s all I have to suggest at the moment.