Update: Nets Katz and Michael Bateman have posted a preprint to the arxiv that gives an improvement to the bounds for the cap-set problem. More on this later.
I have been extremely pleased with the response to my first post on the cap-set problem, in that various people have very generously shared their ideas about it. Indeed, Nets Katz has even posted a comment that may give an improvement to the bound by a factor of for some very small (For reasons I have already explained, this would be very interesting if it worked.) This followed a previous comment in which he outlined some arguments that he had found with Michael Bateman that go a lot further than the ones that appeared in the section entitled “Sets with very weak additive structure” in my previous post.
Also, Ernie Croot, in addition to sketching a different very interesting approach to the problem, suggested that I should get in touch with Seva Lev, which I did. Seva has an example of a cap set that is interestingly different from the usual examples. I will discuss it in a moment.
My plan, by the way, is to try to understand the ideas in Ernie’s and Nets’s comments and then explain them in my own words in a future post. But I’m not yet ready to do that, so in this post I want to discuss some other aspects of the problem, including Seva’s example.
Seva’s example of a cap set.
Let me do that first. Regardless of what it may actually turn out to do for us, the example is a nice one. It’s defined as follows. Let We shall think of as where is the field with elements. Additively, this is the same as of course, but it gives us some multiplicative structure that we can use to define our set. The definition of the set, which I’ll call is very simple. Let be an element of that isn’t a square. Then we take the set of points of the form (Note that and are all points of so these points do indeed belong to )
Why is a cap set? Well, if it contained an AP of length 3, then we would have elements such that the following three points all belonged to and formed an arithmetic progression.
Taking the sum of the first and third points and subtracting twice the second (which mod 3 is the same as adding the second) and using the fact that the third coordinates must be in arithmetic progression, we find that But that implies that which contradicts the assumption that was not a square. (More precisely, the only way not to get a contradiction is if so the only APs are degenerate ones.)
What is interesting about this example? I won’t go into much detail, but will just point out a few of its features.
The only method I knew of for producing examples of cap sets until hearing from Seva was the following very simple-minded approach: take small examples and multiply them together. For example, the subset of contains no non-trivial progression, so the set doesn’t either. One can obtain slightly more interesting examples by starting with larger sets: if you have a cap set in of size where then for arbitrarily large you can find a cap set in of size
Seva’s example has density so it does not answer the question of whether a density of is possible. However, it does provide an example of a set that does not have an obvious product structure (and Seva has a proof of a rigorous statement to the effect that it does not have a product structure at all). Moreover, and this was one of Seva’s main points, the Fourier coefficients of the set are such that this set does not have any more bias on a codimension-1 subspace than is guaranteed by the simple Roth/Meshulam argument. In fact, it looks as though something similar is true for all small-codimension subspaces, for some suitable interpretation of “small”. However, because the density is exponentially small in the dimension, the example does not contradict the conjecture (or at least hope) that one can say something interesting after passing to a subspace of codimension that is a power of the logarithm of the density.
Thoughts provoked by Ben Green’s observation.
Recall the following question from the previous post.
Question 1. Let be a cap set in of density For which values of and can one prove that there must be an affine subspace of of codimension such that the density of inside is at least
If one is feeling very optimistic, one might hope to be able to take for some absolute constant and But if that feels a bit too bold (and after Ben’s comment it does seem to be asking quite a lot to prove it, even if it is true) then one could tone down the hope a bit and ask for to be for some absolute constant such as 4, say. Even this would be enough for a very substantial improvement to the Roth/Meshulam bound.
Now I’ve just realized that the question above is different from the one I actually asked. What I asked was the “off-diagonal” version of the question, where the initial assumption is not that is a cap set but that we have three sets and such that contains no triple in arithmetic progression. Ben made the observation that any interesting bound for Question 1 can easily be converted into a bound for the codimension of a subspace contained in where is a dense set. The argument is as follows. If we have a positive answer to Question 1 for and (in the question I took to be an absolute multiple of so I am generalizing slightly here), then let be a set of density Either is all of or it misses out a point In the latter case, misses out 0, so applying Question 1 to the three sets and we find a subspace of codimension such that the density of in is at least The number of iterations we can do of this argument is at most by which time we have passed to a subspace of codimension at most In particular, if and then contains a subspace of codimension at most
Ben made the point that if is too small then this bound is better than the remarkable recent bound obtained by Sanders. He also hinted at the converse point that perhaps one might try to argue in the opposite direction, using Sanders’s ideas (which I shall come to soon) to show that Question 1 has a positive answer as long as isn’t too small.
In order to investigate that possibility, we need to understand a bit about Sanders’s argument, and in particular the lemma of Croot and Sisask on which it depends. Rather than jumping straight in, I shall describe a proof strategy and then discuss whether it can be made to work.
Just before I do that, I’ll make an obvious comment given the discussion so far. It’s that I do not know of any technique for obtaining a density increment given that contains no non-trivial AP that does not work more generally, giving a density increment in on the assumption that there are very few solutions to the linear equation with and where and are sets of comparable density and and are non-zero. I can’t even think of a technique that might work. Until I can, or until somebody else can point me in the direction of such a technique, I shall look at the more general problem, which amounts to looking at what we can say about a set if there is some convolution that is very small on a set with all of and of comparable density. (This approximates the hypothesis that has density at most where is the density of and of )
A possible approach to obtaining good cap-set bounds.
If is a set of density and is not all of then in particular has density at most Suppose now that we could find a subspace such that was approximately equal to Recall that convolving with replaces the value with the average of over the coset Thus, what we are supposing is that the convolution tends to be approximately equal to its averaging projection, and therefore approximately invariant under translations by vectors in
Since has density at most , there is a set of density on which takes the value 0. And since we can say something like that there is a union of cosets of of total density at least on each of which has small density.
All we actually need is that there should be one coset on which is small. If we now pick any non-zero then it cannot often be the case that the density of is large in both and since otherwise we would add up a large number of reasonably large contributions to in the coset contradicting the fact that was supposed to be small there.
Speaking very approximately, for each we can put into or but not both. (That would be correct if . I am claiming that it is approximately true if we only approximately have that assumption.) It follows that must have at least double the density in some cosets (since it can intersect only half of them).
This sketchy argument would be of zero interest were it not for the fact that there is now a rather exciting new technique for obtaining statements of precisely the desired form. Having provided some motivation for finding approximate translation invariance of convolutions, let me now turn to the Croot-Sisask approach to obtaining it.
The Croot-Sisask lemma.
My aim in this section is not to give a full proof of the lemma, but merely to explain enough of the method to turn it into an exercise for anyone with a little experience in additive combinatorics. (I could try to spread the net a little wider, but my guess is that anyone who does not at least find the sketch below plausible enough to be persuaded that the result is correct will have lost interest in these posts by now.)
Suppose we have two subsets and of with densities and We want to be approximately invariant under all the translates from some subspace. But let’s settle for less for the time being — we’ll merely try to get it to be approximately invariant under as many translations as we can, not worrying about whether those translations form (or contain) a low-codimensional subspace.
Why should have any translation invariance properties at all? One answer comes from Fourier analysis: the Fourier coefficients of decay reasonably fast (they are absolutely summable), so we can approximate in (or indeed in any -norm with ) by keeping only the part of the Fourier transform with large Fourier coefficients, of which, by Parseval, there are not too many. But a character on is constant on cosets of a 1-codimensional subspace, so a linear combination of characters is constant on cosets of a -codimensional subspace. The problem with this argument is that has to be rather large. I won’t go into details, but this kind of argument is nothing like strong enough to give an improvement to the cap-set bounds, or even to equal them.
Croot and Sisask have a different approach that does not involve the Fourier transform. To explain it, it will be helpful to normalize as follows: instead of considering the convolution I’ll consider the convolution We can think of this probabilistically. Its value at is the probability that if you choose uniformly at random from Equivalently, you take all the sets with and take the average of their characteristic functions.
Suppose now that we could find a small such that is approximately equal to for a positive fraction of all sets of size Then there must be some set of size such that is approximately equal to for a positive proportion of all translates of But if is approximately equal to both and then is approximately equal to (Just to spell this out, if is close to then trivially is close to so the claim follows from the triangle inequality.)
How, then, are we to find a small set such that There is an obvious answer: just pick a random small subset of And this is exactly what Croot and Sisask do. Or rather, it’s almost exactly what they do, but because independent random variables are easier to handle, what they actually do is take elements from uniformly at random (with replacement) and consider the function If are distinct (which they usually are) and we let then this is indeed
To analyse how this works, consider the value at a point of the function It is the average of independent random variables, each of which involves picking a random and setting the value to be 1 if and 0 otherwise. That is, we have independent Bernoulli variables with mean and variance at most 1.
How many of these do we need in order that their sample mean will, with high probability, be close to Well, the variance of the sample mean of of them is at most so at the very least we need this to be at most To give an important example, suppose that is a random subset of of density Then is approximately so we need or And a moment’s thought shows that this is trivially necessary: the convolution is roughly constant, and since has density there is no hope of making it roughly constant if you take the average of fewer than translates.
I haven’t actually said what kind of approximation I am talking about here, but it is sort of implicit in the fact that I looked at the value at an individual is that I am looking at an approximation for some And this is indeed what Croot and Sisask, and also Sanders, do.
Now let us think about two further aspects of the method: how large a set it gives, and whether that set has any useful structure. The answer to the first question is that there are ordered subsets of of size (where is the density of and ), so if we have chosen such that for at least half of them we have then we can find at least of those that are translates of each other. So as a rule of thumb, the density of the set of good translates you get is where is the density of and is the size of your random samples.
In the case of the example where and are random sets of density this gives us a set of density Even if that set is a subspace, it has codimension so isn’t that good.
Before I discuss that further, let me quickly mention the matter of structure. Croot and Sisask observed that if you choose so that the approximations are all very good, by which I mean that we obtain a set such that is very close to for every then by the triangle inequality we still get pretty good approximations even if all we know about is that it belongs to for some not too large number of summands. But iterated sumsets have much more structure than arbitrary sets, a fact that was heavily exploited by Sanders (using Chang’s theorem).
However, the example above suggests that one might be able to do something else. After all, in that case is roughly constant, so it is roughly invariant under all translates. Can we get anywhere by understanding why the Croot-Sisask approach gives only a few of these translates?
Here is one natural thought that sheds some light on what goes wrong in the random example. Suppose we look for plenty of small sets such that approximates not in some norm but in the norm. Since all the translates of are roughly equal in the norm, this is going to be very easy — in fact, we can take The triangle inequality will then give us plenty of values such that is small. But we know that the Fourier coefficients of are absolutely summable. If the difference has small norm on the Fourier side, then it must have small -norm, so by Parseval we get some kind of bound on This suggests that the problem with the argument might be that the intermediate step it aims for (an bound) is too strong.
I haven’t checked what the bound is that comes out of the above argument, because in the next section I want to discuss an example that messes up all the thoughts I have had along these lines.
A killer example?
In this section I want to discuss an example that, to put things rather melodramatically, has been the bane of my life for many years. And I am not alone: for example, Nets Katz briefly alluded to it in one of his comments on my previous post. I am not yet well enough up on the new methods to know how damaging it is. I will say what I can about how I think the example might be dealt with, and will be very interested to know whether others have thoughts on the matter.
The basic idea behind the example is simple: take a random union of subspaces. Of course, there are some choices to be made here. We need to decide on the dimensions of the subspaces and the number that we shall take. Broadly speaking, if we take too many subspaces then their union will start to look like a fairly random set and we will no longer see its internal structure, whereas if we take too few then their union will be highly structured.
Suppose, then, that we take subspaces of codimension Since we have chosen them randomly, their pairwise intersections have codimension (with high probability). A property that will be very handy will be if we can treat the union as though its characteristic function were just the sum of the characteristic functions of the so we would like it if the measure of the intersections was small.
Let us write both for the subspace and for its characteristic function (a practice I was adopting above with the set when I convolved it with ). Let be the density of each The density of the union of the is, by the inclusion-exclusion formula, at least which is approximately provided is substantially less than 1. So let us assume this. Note that the square of the norm of the sum of the is (since the inner product of with is when and when ). So with this condition on and we have that the union and the sum are approximately equal in
Let The next thing I want to look at is the convolution Let and assume that which, as we have seen, implies that is small. By an easy case of Young’s inequality it follows that is small. I won’t bother with the precise calculation here — I’ll just feel free to use instead of .
Of course, is very easy to calculate: it is Now whereas if then is the constant function that takes value everywhere. (This second assertion follows from the fact that and are subspaces of density and that ) Therefore, equals
I would like to choose and so that the first term is not tiny compared with the second term Recall that we were talking about an approximation. A typical non-zero value taken by is so we need to be smaller than for which in turn we need to be at most or so.
If we also want to be about so that has density then it seems that a good choice is and So let us make that choice. Then for reasonably small the function is close to the function In other words, instead of being a constant function, it is biased by having twice as much density as it should on the set and slightly less everywhere else.
The rough point I want to make now is that is roughly constant only in directions that belong to almost all the subspaces But that forces us down to a subspace of codimension which is about So it seems as though if we want a subspace such that then it has to have disappointingly large codimension.
Properties of the set A just constructed.
In this section I shall try to understand the set a bit better, and try to see as precisely as I can what proof methods it rules out. (That is very much the point here. Obviously, a union of subspaces does not threaten to be a cap set, but it does have annoying properties that show that certain lemmas one would like are false.)
To begin with, let us work out the Fourier transform of By our choices of and this is well approximated by the Fourier transform of the function Let be the annihilator of for each Then it is easy to check that the Fourier transform of is Recalling that and noting that the subspaces are disjoint except for and have cardinality this gives us Fourier coefficients of size Moreover, they span a subspace of dimension which equals the Chang bound.
Next, let us observe that since is symmetric, the 3AP density of is which, by our discussion of above, is approximately This is significantly more than the that one expects in the random case.
Next, let us think what happens when we pass to an affine subspace Our set then becomes the union which has density at most (where I am also using to denote the uniform distribution on ). Let be the annihilator of Then
If intersects in a subspace of dimension then this works out as If we fix the dimension of to be then by convexity if then we maximize this by taking one to be and the rest to be zero. (I’m assuming that the are in general position.) Thus, the best density increment we can get on an -codimensional subspace is obtained by the obvious method of making that -codimensional subspace contain one of the subspaces That multiplies the density of that subspace by and doesn’t affect the densities of the other
Next, let us think about translation invariance of Recall that this is approximately equal to Now take a point For what density of do we have that ? Before we discuss that, we ought to decide how good we want the approximation to be. Since a typical value of is around it seems reasonable to ask for the difference to be small compared with For this it is important not to choose such that and
Now the density of is around (since we chose and to make this the case). If then it is not possible for and both to belong to and otherwise it is. So let be the set of such that Then to a pretty good approximation, has density around
It follows that unless belongs to at least half the the density of points such that is small compared with is at most Since in the sketchy argument earlier I wanted to know with reasonable accuracy on a set of density and since the density of that belong to half the is extremely small, this example seems to show that there is a fundamental limitation to any argument that seeks to find a subspace and prove that is roughly constant on translates of
The above argument is a bit sketchy and I confess that I have not (yet) backed it up with precise statements and rigorous calculations. But I am reasonably confident that I am drawing the correct moral from it.
Some further observations concerning the above example.
1. There is an obvious analogue in of the set I have just constructed in Instead of taking a random union of subspaces, one takes instead a random union of random Bohr sets (which will have predictable densities because they are defined by highly dissociated frequencies). This suggests that one ought to be able to prove the sharpness of Chang’s inequality in without resorting to niveau sets. I plan to check this. (Alternatively, if anyone feels like doing some of the estimates, then perhaps there is scope for a joint paper here. The project would be to take the discussion of the previous paragraph, make it rigorous, and generalize it to all finite Abelian groups. Of course, all that would depend on its being correct in the first place.) [Added later: I've now realized that what I wrote here is wrong, since the example is weaker than the Chang bound. Chang's inequality tells us that the number of dissociated frequencies where the Fourier coefficient is at least is at most whereas this union-of-subspaces example gives only ]
2. It is interesting to note that there is an easy way of getting a density increase for the set above, but getting the density to decrease is much harder. To get the density to increase, we pick on a subspace and choose in such a way that If we want to decrease the density, then it looks as though we should choose to be disjoint from But this has a much smaller effect. This appears to demonstrate that the one-sidedness that both Nets Katz and I have run up against (and actually Seva Lev talks about it too) is a phenomenon that genuinely occurs. It also suggests that one might perhaps be able to find an interesting example by taking the complement of a union of subspaces, so that density increases become hard to obtain. However, I do not yet see how to make this thought work. (I can’t just take the complement of the set above as that has density But if I try to make it smaller, then the union and the sum are no longer roughly the same, so the example becomes harder to analyse. And now, a while later, I have made a fairly serious effort to analyse it and it seems to me to be hard to get the density to be small without the set becoming quasirandom.)
3. The main feature I would like to draw attention to in the above example is that it splits naturally into pieces, each of which has excellent translation-invariance properties. So a natural thing to try to do is find a general argument for splitting sets up into their “components” before attempting to apply an improved Croot-Sisask lemma. Alternatively, perhaps one can give an argument of the following kind: if a set contains pieces and that have low mutual additive energy, then is roughly constant, which implies that contains 3APs. If it does not contain such pieces, then ought to have some unusual structure that prevents this. Perhaps, for instance, there must be many directions such that is unusually large, and perhaps that implies that we can obtain a better density increment than Roth-Meshulam gives.
4. The annoying example has a large spectrum that splits into subspaces. We know that we can in fact assume that no subspace can contain too many elements of the large spectrum. Could it be that we can obtain useful translation invariance of under the additional assumption that the large spectrum of is “well-dispersed”?
As with my previous post, I end this one feeling as though there is much more to say. But I think my thoughts will be more organized if I leave it a few days, especially if I get some useful feedback on this post: I’ll be particularly interested in any ideas people might have for dealing with random unions of subspaces. Given that these cause difficulties for many of the ideas that occur to me, perhaps the title “What is difficult about the cap-set problem?” would have been more appropriate for this post than the last one.