Today I did something for the first time ever that I should have done many times before: I put a paper on the ArXiV. Since I’ve got a blog I thought I’d use it to give the paper a small plug and, more importantly, to let anybody who might already be familiar with the paper know that I have revised it quite a bit recently, for the better.
The paper itself is called “Hypergraph regularity and the multidimensional Szemerédi theorem.” At the bottom level, the basic idea of the paper is due to Ruzsa, Szemerédi and Rödl. Ruzsa and Szemerédi started the ball rolling with a short and very clever argument that showed that Szemerédi’s famous theorem on arithmetic progressions, in the case of progressions of length 3, could be deduced from Szemerédi’s almost as famous regularity lemma, a remarkable result that allows any graph to be partitioned into a bounded number of pieces, almost all of which “behave randomly.”
Rödl then took the further step of noticing that a corresponding proof would work for the general case of Szemerédi’s theorem if one could generalize the regularity lemma from graphs to hypergraphs. (Whereas a graph consists of pairs of elements from some set, a k-uniform hypergraph consists of k-tuples.) However, turning this observation into a formal conjecture was not completely straightforward because the obvious generalizations of the regularity lemma from graphs to hypergraphs are either too strong to be true or too weak to be useful. Rödl formulated a generalization that avoided these defects (and which, while less obvious, is more natural in a Gromov-type sense).
This led to a clear programme for obtaining a new proof of Szemerédi’s theorem. Unfortunately, carrying out the details of this programme was much harder than one might have expected given the simplicity of the argument of Ruzsa and Szemerédi. Nevertheless, Rödl, with the help of Frankl, managed to do it for progressions of length 4.
Interest in the problem was heightened when Jozsef Solymosi noticed that Rödl’s programme would also have the multidimensional Szemerédi theorem as a consequence. This states that, for any finite set and any , there is an such that every subset of of density at least contains a subset of the form .
A few years ago I managed to complete the Rödl programme, and so, independently, did Rödl, together with Nagle, Schacht and Skokan. Our proofs were interestingly different, however. Their generalization was more directly inspired by Szemerédi’s regularity lemma as it is usually stated, whereas mine replaced the notion of a “regular pair” by that of a quasirandom bipartite graph. For dense graphs, these two notions are equivalent, but when one generalizes to sparse hypergraphs, which necessarily occur in the argument, then they diverge. I came upon the idea of using quasirandomness as a result of thinking about generalizing to hypergraphs some of the analytic notions that had arisen in an earlier proof I had given of Szemerédi’s theorem.
The story of how the paper reached the form it is in now is an interesting one. While it can be exhilarating to obtain a result that needs a long and complicated proof, it is also pretty depressing to write such a result up. I myself find it almost impossible to do so without introducing errors of what one might call medium seriousness: they don’t cast doubt on the proof but they do necessitate a boring amount of rewriting. (Why do they not cast doubt on the proof? That is an interesting question and I can do no better than quote Gian-Carlo Rota: “There are two kinds of mistakes. There are fatal mistakes that destroy a theory, but there are also contingent ones, which are useful in testing the stability of a theory.” I would substitute the words “robustness” and “proof” for “stability” and “theory” but otherwise the second sentence captures very well the idea that small errors have an important positive side to them.)
Something like a year after I put the preprint on my home page, Yoshiyasu Ishigami emailed me to let me know that he had found a fairly serious mistake — that is, patchable, but not trivially patchable. I was very busy at the time and couldn’t face thinking about it for a while, but then Terence Tao came up with another proof (more along the lines of that of Nagle, Rödl, Schacht and Skokan but quite a lot simpler) and I realized I had better hurry up with the task.
The result was a big rewriting: when one writes a complicated paper, the last thing one wants ever to have to do is read it over a year later. But if one does, one often finds that one can improve it a lot. In this case, I realized that I could do much better than patching up the old proof: I could give a new and simpler proof. It was based on the same ideas I mentioned earlier, and the same ideas for making those ideas work, but at about the third or fourth level of the hierarchy it was very different. (But I would call it the same proof.)
Once I had done that I thought I had better submit it (something I had not done previously because I had been too lazy to do the long and tedious task of carefully checking it). I can’t remember exactly when, but quite some time ago it came back from the journal, accepted, but with large numbers of comments from two referees. Another purpose of this post is to thank them very much for the effort they put in. (If you are one of the two referees, then thank you. You have helped to make the paper far better than it would otherwise have been.)
Again I hoped I could get away with minor patches here and there, but this time the biggest problem with the paper was that my “new and simpler proof” was clearly not found simple by the referees. And, on rereading it myself after having had enough time to forget most of it once again, I too found it far from easy. Somehow I had not managed to convey the basic simplicity of the argument. So I have ended up substantially rewriting the paper yet again. I have made the notation less neat but easier to interpret. I have also added, at the suggestion of one of the referees, a new section, which turned out to be rather long, in which I illustrated the main arguments on a small example. This was a very good thing to do, not just for the reader’s sake but also for my own, as it hugely helped me to understand my own argument again and led to a substantial further simplification of the presentation.
In fact, out of the experience comes a tip that I wish I had followed. The main difficulty in writing up the proof was that I had a method that I knew how to apply in any given case, but I needed to describe it in full generality. That meant that I had to formulate a rather complicated inductive hypothesis. That is one of those tasks that “ought to be routine” and there is a very strong temptation to proceed as follows: guess the hypothesis, try to prove it, find that the hypothesis isn’t quite strong enough or general enough, modify the hypothesis, try to prove it, and so on until you finally hack out a proof. Instead, one should proceed as follows: do a small but sufficiently general case and then read off the hypothesis from that; if you’re not confident that your case is sufficiently general, then do a bigger one. There’s a kind of laziness that makes one not want to do things the second way, but my experience on this paper has taught me that a bit of effort at this stage can save a huge amount of effort at a later stage. With this paper, it is only very recently that I have gone about things the right way and the result is a small but extremely helpful reorganization of the induction that makes it work far more smoothly and comprehensibly.
The paper, by the way, can be found here. I’m not asking anyone to do my dirty work for me, but if you happen to notice an error then I’d rather know sooner (while the argument is once again fresh in my mind) than later (when the thought of looking at the paper will have gone back to being depressing — though I hope less depressing now that the paper is better organized).
One final comment: if you look it up on the ArXiV, you will see that the paper is 53 pages long. If that puts you off, bear in mind that a large proportion of that length is not strictly needed for the proof: it is just there to make the proof easier to understand than it would have been if the paper had been half the length.