Today I did something for the first time ever that I should have done many times before: I put a paper on the ArXiV. Since I’ve got a blog I thought I’d use it to give the paper a small plug and, more importantly, to let anybody who might already be familiar with the paper know that I have revised it quite a bit recently, for the better.

The paper itself is called “Hypergraph regularity and the multidimensional Szemerédi theorem.” At the bottom level, the basic idea of the paper is due to Ruzsa, Szemerédi and Rödl. Ruzsa and Szemerédi started the ball rolling with a short and very clever argument that showed that Szemerédi’s famous theorem on arithmetic progressions, in the case of progressions of length 3, could be deduced from Szemerédi’s almost as famous *regularity lemma*, a remarkable result that allows any graph to be partitioned into a bounded number of pieces, almost all of which “behave randomly.”

Rödl then took the further step of noticing that a corresponding proof would work for the general case of Szemerédi’s theorem if one could generalize the regularity lemma from graphs to hypergraphs. (Whereas a graph consists of pairs of elements from some set, a k-*uniform hypergraph* consists of k-tuples.) However, turning this observation into a formal conjecture was not completely straightforward because the obvious generalizations of the regularity lemma from graphs to hypergraphs are either too strong to be true or too weak to be useful. Rödl formulated a generalization that avoided these defects (and which, while less obvious, is more natural in a Gromov-type sense).

This led to a clear programme for obtaining a new proof of Szemerédi’s theorem. Unfortunately, carrying out the details of this programme was much harder than one might have expected given the simplicity of the argument of Ruzsa and Szemerédi. Nevertheless, Rödl, with the help of Frankl, managed to do it for progressions of length 4.

Interest in the problem was heightened when Jozsef Solymosi noticed that Rödl’s programme would also have the multidimensional Szemerédi theorem as a consequence. This states that, for any finite set and any , there is an such that every subset of of density at least contains a subset of the form .

A few years ago I managed to complete the Rödl programme, and so, independently, did Rödl, together with Nagle, Schacht and Skokan. Our proofs were interestingly different, however. Their generalization was more directly inspired by Szemerédi’s regularity lemma as it is usually stated, whereas mine replaced the notion of a “regular pair” by that of a quasirandom bipartite graph. For dense graphs, these two notions are equivalent, but when one generalizes to sparse hypergraphs, which necessarily occur in the argument, then they diverge. I came upon the idea of using quasirandomness as a result of thinking about generalizing to hypergraphs some of the analytic notions that had arisen in an earlier proof I had given of Szemerédi’s theorem.

The story of how the paper reached the form it is in now is an interesting one. While it can be exhilarating to obtain a result that needs a long and complicated proof, it is also pretty depressing to write such a result up. I myself find it almost impossible to do so without introducing errors of what one might call medium seriousness: they don’t cast doubt on the proof but they do necessitate a boring amount of rewriting. (Why do they not cast doubt on the proof? That is an interesting question and I can do no better than quote Gian-Carlo Rota: “There are two kinds of mistakes. There are fatal mistakes that destroy a theory, but there are also contingent ones, which are useful in testing the stability of a theory.” I would substitute the words “robustness” and “proof” for “stability” and “theory” but otherwise the second sentence captures very well the idea that small errors have an important positive side to them.)

Something like a year after I put the preprint on my home page, Yoshiyasu Ishigami emailed me to let me know that he had found a fairly serious mistake — that is, patchable, but not trivially patchable. I was very busy at the time and couldn’t face thinking about it for a while, but then Terence Tao came up with another proof (more along the lines of that of Nagle, Rödl, Schacht and Skokan but quite a lot simpler) and I realized I had better hurry up with the task.

The result was a big rewriting: when one writes a complicated paper, the last thing one wants ever to have to do is read it over a year later. But if one does, one often finds that one can improve it a lot. In this case, I realized that I could do much better than patching up the old proof: I could give a new and simpler proof. It was based on the same ideas I mentioned earlier, and the same ideas for making those ideas work, but at about the third or fourth level of the hierarchy it was very different. (But I would call it the same proof.)

Once I had done that I thought I had better submit it (something I had not done previously because I had been too lazy to do the long and tedious task of carefully checking it). I can’t remember exactly when, but quite some time ago it came back from the journal, accepted, but with large numbers of comments from two referees. Another purpose of this post is to thank them very much for the effort they put in. (If you are one of the two referees, then thank *you*. You have helped to make the paper far better than it would otherwise have been.)

Again I hoped I could get away with minor patches here and there, but this time the biggest problem with the paper was that my “new and simpler proof” was clearly not found simple by the referees. And, on rereading it myself after having had enough time to forget most of it once again, I too found it far from easy. Somehow I had not managed to convey the basic simplicity of the argument. So I have ended up substantially rewriting the paper yet again. I have made the notation less neat but easier to interpret. I have also added, at the suggestion of one of the referees, a new section, which turned out to be rather long, in which I illustrated the main arguments on a small example. This was a very good thing to do, not just for the reader’s sake but also for my own, as it hugely helped me to understand my own argument again and led to a substantial further simplification of the presentation.

In fact, out of the experience comes a tip that I wish I had followed. The main difficulty in writing up the proof was that I had a method that I knew how to apply in any given case, but I needed to describe it in full generality. That meant that I had to formulate a rather complicated inductive hypothesis. That is one of those tasks that “ought to be routine” and there is a very strong temptation to proceed as follows: guess the hypothesis, try to prove it, find that the hypothesis isn’t quite strong enough or general enough, modify the hypothesis, try to prove it, and so on until you finally hack out a proof. Instead, one should proceed as follows: do a small but sufficiently general case and then read off the hypothesis from that; if you’re not confident that your case is sufficiently general, then do a bigger one. There’s a kind of laziness that makes one not want to do things the second way, but my experience on this paper has taught me that a bit of effort at this stage can save a huge amount of effort at a later stage. With this paper, it is only very recently that I have gone about things the right way and the result is a small but extremely helpful reorganization of the induction that makes it work far more smoothly and comprehensibly.

The paper, by the way, can be found here. I’m not asking anyone to do my dirty work for me, but if you happen to notice an error then I’d rather know sooner (while the argument is once again fresh in my mind) than later (when the thought of looking at the paper will have gone back to being depressing — though I hope less depressing now that the paper is better organized).

One final comment: if you look it up on the ArXiV, you will see that the paper is 53 pages long. If that puts you off, bear in mind that a large proportion of that length is not strictly needed for the proof: it is just there to make the proof easier to understand than it would have been if the paper had been half the length.

October 17, 2007 at 11:55 am |

Great to see you are publishing on arxiv. Are you planning to do any work on mathematical foundations? I found your comments on Wittgenstein’s philosophy off mathematics & formalism in “very short” intriguing. I would love to see you compare this approach to Lakoff/Dehaene’s naturalism and Penrose’s Platonism. Dehaene’s “number sense” i found a marvelous book, and he is really harsh on formalism:

“If educational psychologists had paid enough attention to the primacy of intuition over formal axioms … a breakdown without precedent in the history of mathematics might have been avoided.”

“In the 1970s … a new mathematical curriculum was designed that imposed a heavy burden of obscure axioms and formalisms on pupils. “

Bourbaki reasoned: “why let pupils lose precious years solving simple, concrete arithmetic problems, when abstract group theory summarizes all such knowledge in a much more concise and rigorous way?”

Answer: “The child’s brain … is a structured organ that acquires facts only insofar as they can be integrated into its previous knowledge. It is well adapted to the representation of continuous quantities and to their mental manipulation in an analogical form. Evolution never prepared it, however, for the task of ingurgitating vast systems of axioms, nor of applying lengthy symbolic algorithms.” (p241)

October 18, 2007 at 1:53 am |

Bourbaki reasoned: “why let pupils lose precious years solving simple, concrete arithmetic problems, when abstract group theory summarizes all such knowledge in a much more concise and rigorous way?”I’m not sure whether it is you or Dehaene who is attributing this reasoning to Bourbaki. But in any case, it is an unfair attribution. Bourbaki was concerned with professional mathematics, not K-12 educational psychology.

Evolution never prepared it, however, for the task of ingurgitating vast systems of axioms, nor of applying lengthy symbolic algorithms.”Lengthy symbolic algorithms = “Old Math” par excellence.

As for “vast systems of axioms”, what is the point of an abstract axiomatic presentation unless the axioms are simple, and few in number?

October 20, 2007 at 5:09 pm |

The quote was a direct quote from Dehaene. Bourbaki was not one person, but a movement. Surely mathematicians, including Bourbaki, have an influence on mathematics education? They aren’t all ‘men who love only numbers’. Some actually love children and want to help them with their education.

If you take all the mathematics that school children encounter the axiomatic system is vast — at least from a child’s perspective. It also seems pointless (to the child) — if you are totally at ease with intuitive arithmetic why bother with dry, meaningless axioms? (Of course properly motivated adults, and some children, get to see the point — but can, or should, most children?)

You make a good point with Old Maths = Lengthy Symbolic Algorithms. But Dehaene wants to remove that aspect of Old Math. So instead of ‘long division’, or even learning the multiplication tables, he suggests — use the calculator.

November 9, 2007 at 12:23 pm |

are you an endorser?

February 13, 2008 at 1:39 pm |

Apologies for what may seem like an out-of-place comment : I think

I have spotted a mistake in one of Dr. Gowers’ papers online (see below)

and although he says that “comments on papers are most welcome”

I did not understand just how or where I should make that criticism

of mine known.

Regards,

Ewan Delanoy

Criticism on the paper “A new proof of Szemerédi’s theorem”

at http://www.dpmms.cam.ac.uk/~wtg10/papers.html :

In the proof of lemma 2.3. it is stated that “the congruence classes

can be partitioned into sets $P_j$ of consecutive elements

with every $P_j$ of cardinality between

$\lceil st/2N \rceil$ and $\lfloor st/N\rfloor$”. This fails when

the congruence class has odd cardinality and

when both those bounds are equal to 2 (when

$r=20,s=50,N=60$ for example).