## More on the cap-set problem

Update: Nets Katz and Michael Bateman have posted a preprint to the arxiv that gives an $n^\epsilon$ improvement to the bounds for the cap-set problem. More on this later.

I have been extremely pleased with the response to my first post on the cap-set problem, in that various people have very generously shared their ideas about it. Indeed, Nets Katz has even posted a comment that may give an improvement to the bound by a factor of $n^\epsilon$ for some very small $\epsilon.$ (For reasons I have already explained, this would be very interesting if it worked.) This followed a previous comment in which he outlined some arguments that he had found with Michael Bateman that go a lot further than the ones that appeared in the section entitled “Sets with very weak additive structure” in my previous post.

Also, Ernie Croot, in addition to sketching a different very interesting approach to the problem, suggested that I should get in touch with Seva Lev, which I did. Seva has an example of a cap set that is interestingly different from the usual examples. I will discuss it in a moment.

My plan, by the way, is to try to understand the ideas in Ernie’s and Nets’s comments and then explain them in my own words in a future post. But I’m not yet ready to do that, so in this post I want to discuss some other aspects of the problem, including Seva’s example.

Seva’s example of a cap set.

Let me do that first. Regardless of what it may actually turn out to do for us, the example is a nice one. It’s defined as follows. Let $n=3m.$ We shall think of $\mathbb{F}_3^n$ as $\mathbb{F}_{3^m}^3,$ where $\mathbb{F}_{3^m}$ is the field with $3^m$ elements. Additively, this is the same as $\mathbb{F}_3^n$ of course, but it gives us some multiplicative structure that we can use to define our set. The definition of the set, which I’ll call $A,$ is very simple. Let $\lambda$ be an element of $\mathbb{F}_{3^m}$ that isn’t a square. Then we take the set of points of the form $(x,y,x^2-\lambda y^2).$ (Note that $x,$ $y$ and $x^2-\lambda y^2$ are all points of $\mathbb{F}_{3^m}$ so these points do indeed belong to $\mathbb{F}_3^n.$)

Why is $A$ a cap set? Well, if it contained an AP of length 3, then we would have elements $x,y,a,b\in\mathbb{F}_{3^m}$ such that the following three points all belonged to $A$ and formed an arithmetic progression.

$(x-a,y-b,(x-a)^2-\lambda(y-b)^2),$

$(x,y,x^2-\lambda y^2),$

$(x+a,y+b,(x+a)^2+\lambda(y+b)^2.$

Taking the sum of the first and third points and subtracting twice the second (which mod 3 is the same as adding the second) and using the fact that the third coordinates must be in arithmetic progression, we find that $2a^2-2\lambda b^2=0.$ But that implies that $\lambda=(a/b)^2,$ which contradicts the assumption that $\lambda$ was not a square. (More precisely, the only way not to get a contradiction is if $a=b=0,$ so the only APs are degenerate ones.)

What is interesting about this example? I won’t go into much detail, but will just point out a few of its features.

The only method I knew of for producing examples of cap sets until hearing from Seva was the following very simple-minded approach: take small examples and multiply them together. For example, the subset $A=\{0,1\}$ of $\mathbb{F}_3^1$ contains no non-trivial progression, so the set $A^n\subset\mathbb{F}_3^n$ doesn’t either. One can obtain slightly more interesting examples by starting with larger sets: if you have a cap set in $\mathbb{F}_3^k$ of size $m,$ where $\log_3m=\alpha k,$ then for arbitrarily large $n$ you can find a cap set in $\mathbb{F}_3^n$ of size $3^{\alpha n}.$

Seva’s example has density $3^{-n/3},$ so it does not answer the question of whether a density of $3^{-o(n)}$ is possible. However, it does provide an example of a set that does not have an obvious product structure (and Seva has a proof of a rigorous statement to the effect that it does not have a product structure at all). Moreover, and this was one of Seva’s main points, the Fourier coefficients of the set are such that this set does not have any more bias on a codimension-1 subspace than is guaranteed by the simple Roth/Meshulam argument. In fact, it looks as though something similar is true for all small-codimension subspaces, for some suitable interpretation of “small”. However, because the density is exponentially small in the dimension, the example does not contradict the conjecture (or at least hope) that one can say something interesting after passing to a subspace of codimension that is a power of the logarithm of the density.

Thoughts provoked by Ben Green’s observation.

Recall the following question from the previous post.

Question 1. Let $A$ be a cap set in $\mathbb{F}_3^n$ of density $\delta.$ For which values of $k$ and $\eta$ can one prove that there must be an affine subspace $V$ of $\mathbb{F}_3^n$ of codimension $k$ such that the density of $A$ inside $V$ is at least $\delta+\eta?$

If one is feeling very optimistic, one might hope to be able to take $\eta=c\delta$ for some absolute constant $c>0$ and $k=C\log(1/\delta).$ But if that feels a bit too bold (and after Ben’s comment it does seem to be asking quite a lot to prove it, even if it is true) then one could tone down the hope a bit and ask for $k$ to be $C(\log(1/\delta))^\alpha$ for some absolute constant $\alpha$ such as 4, say. Even this would be enough for a very substantial improvement to the Roth/Meshulam bound.

Now I’ve just realized that the question above is different from the one I actually asked. What I asked was the “off-diagonal” version of the question, where the initial assumption is not that $A$ is a cap set but that we have three sets $A,$ $B$ and $C$ such that $A\times B\times C$ contains no triple in arithmetic progression. Ben made the observation that any interesting bound for Question 1 can easily be converted into a bound for the codimension of a subspace contained in $A+A+A$ where $A$ is a dense set. The argument is as follows. If we have a positive answer to Question 1 for $k=k(\delta)$ and $c=c(\delta)$ (in the question I took $c$ to be an absolute multiple of $\delta$ so I am generalizing slightly here), then let $A$ be a set of density $\delta.$ Either $A+A+A$ is all of $\mathbb{F}_3^n$ or it misses out a point $x.$ In the latter case, $A+A+(A-x)$ misses out 0, so applying Question 1 to the three sets $A,$ $A$ and $A-x$ we find a subspace $V$ of codimension $k(\delta)$ such that the density of $A$ in $V$ is at least $\delta+c(\delta).$ The number of iterations we can do of this argument is at most $(\delta/c(\delta))\log(1/\delta)$ by which time we have passed to a subspace of codimension at most $k(\delta)(\delta/c(\delta))\log(1/\delta).$ In particular, if $k(\delta)=(\log(1/\delta))^\alpha$ and $c(\delta)=c\delta,$ then $A+A+A$ contains a subspace of codimension at most $C(\log(1/\delta))^{\alpha+1}.$

Ben made the point that if $\alpha$ is too small then this bound is better than the remarkable recent bound obtained by Sanders. He also hinted at the converse point that perhaps one might try to argue in the opposite direction, using Sanders’s ideas (which I shall come to soon) to show that Question 1 has a positive answer as long as $\alpha$ isn’t too small.

In order to investigate that possibility, we need to understand a bit about Sanders’s argument, and in particular the lemma of Croot and Sisask on which it depends. Rather than jumping straight in, I shall describe a proof strategy and then discuss whether it can be made to work.

Just before I do that, I’ll make an obvious comment given the discussion so far. It’s that I do not know of any technique for obtaining a density increment given that $A$ contains no non-trivial AP that does not work more generally, giving a density increment in $A$ on the assumption that there are very few solutions to the linear equation $ra+sb+tc=0$ with $a\in A,$ $b\in B$ and $c\in C,$ where $A,$ $B$ and $C$ are sets of comparable density and $r,$ $s$ and $t$ are non-zero. I can’t even think of a technique that might work. Until I can, or until somebody else can point me in the direction of such a technique, I shall look at the more general problem, which amounts to looking at what we can say about a set $A$ if there is some convolution $A*B$ that is very small on a set $C,$ with all of $A,$ $B$ and $C$ of comparable density. (This approximates the hypothesis that $A+B$ has density at most $1-\delta,$ where $\delta$ is the density of $A$ and of $B.$)

A possible approach to obtaining good cap-set bounds.

If $A$ is a set of density $\delta$ and $A+A+A$ is not all of $\mathbb{F}_3^n,$ then in particular $A+A$ has density at most $1-\delta.$ Suppose now that we could find a subspace $V$ such that $A*A*\mu_V$ was approximately equal to $A*A.$ Recall that convolving $f$ with $\mu_V$ replaces the value $f(x)$ with the average of $f$ over the coset $x+V.$ Thus, what we are supposing is that the convolution $A*A$ tends to be approximately equal to its averaging projection, and therefore approximately invariant under translations by vectors in $V.$

Since $A+A$ has density at most $1-\delta$, there is a set of density $\delta$ on which $A*A$ takes the value 0. And since $A*A\approx A*A*\mu_V,$ we can say something like that there is a union of cosets of $V$ of total density at least $\delta$ on each of which $A*A$ has small density.

All we actually need is that there should be one coset $u+V$ on which $A*A$ is small. If we now pick any non-zero $w,$ then it cannot often be the case that the density of $A$ is large in both $u+w+V$ and $u+2w+V,$ since otherwise we would add up a large number of reasonably large contributions to $A*A$ in the coset $u+V,$ contradicting the fact that $A*A$ was supposed to be small there.

Speaking very approximately, for each $w,$ we can put $A$ into $u+w+V$ or $u+2w+V$ but not both. (That would be correct if $(A+A)\cap V=\emptyset$. I am claiming that it is approximately true if we only approximately have that assumption.) It follows that $A$ must have at least double the density in some cosets (since it can intersect only half of them).

This sketchy argument would be of zero interest were it not for the fact that there is now a rather exciting new technique for obtaining statements of precisely the desired form. Having provided some motivation for finding approximate translation invariance of convolutions, let me now turn to the Croot-Sisask approach to obtaining it.

My aim in this section is not to give a full proof of the lemma, but merely to explain enough of the method to turn it into an exercise for anyone with a little experience in additive combinatorics. (I could try to spread the net a little wider, but my guess is that anyone who does not at least find the sketch below plausible enough to be persuaded that the result is correct will have lost interest in these posts by now.)

Suppose we have two subsets $A$ and $B$ of $\mathbb{F}_3^n,$ with densities $\alpha$ and $\beta.$ We want $A*B$ to be approximately invariant under all the translates from some subspace. But let’s settle for less for the time being — we’ll merely try to get it to be approximately invariant under as many translations as we can, not worrying about whether those translations form (or contain) a low-codimensional subspace.

Why should $A*B$ have any translation invariance properties at all? One answer comes from Fourier analysis: the Fourier coefficients of $A*B$ decay reasonably fast (they are absolutely summable), so we can approximate $A*B$ in $L_2$ (or indeed in any $L_p$-norm with $p>1$) by keeping only the part of the Fourier transform with large Fourier coefficients, of which, by Parseval, there are not too many. But a character on $\mathbb{F}_3^n$ is constant on cosets of a 1-codimensional subspace, so a linear combination of $k$ characters is constant on cosets of a $k$-codimensional subspace. The problem with this argument is that $k$ has to be rather large. I won’t go into details, but this kind of argument is nothing like strong enough to give an improvement to the cap-set bounds, or even to equal them.

Croot and Sisask have a different approach that does not involve the Fourier transform. To explain it, it will be helpful to normalize as follows: instead of considering the convolution $A*B$ I’ll consider the convolution $A*\mu_B.$ We can think of this probabilistically. Its value at $x$ is the probability that $x-b\in A$ if you choose $b$ uniformly at random from $B.$ Equivalently, you take all the sets $A+b$ with $b\in B$ and take the average of their characteristic functions.

Suppose now that we could find a small $k$ such that $A*\mu_B$ is approximately equal to $A*\mu_K$ for a positive fraction $\theta$ of all sets $K$ of size $k.$ Then there must be some set $K$ of size $k$ such that $A*\mu_B$ is approximately equal to $A*\mu_L$ for a positive proportion of all translates $L$ of $K.$ But if $A*\mu_B$ is approximately equal to both $A*\mu_{K+x}$ and $A*\mu_{K+y},$ then $A*\mu_B$ is approximately equal to $A*\mu_B+(y-x).$ (Just to spell this out, if $A*\mu_B$ is close to $A*\mu_{K+x},$ then trivially $A*\mu_{B+y-x}$ is close to $A*\mu_{K+y},$ so the claim follows from the triangle inequality.)

How, then, are we to find a small set $K$ such that $A*\mu_K\approx A*\mu_B?$ There is an obvious answer: just pick a random small subset of $B.$ And this is exactly what Croot and Sisask do. Or rather, it’s almost exactly what they do, but because independent random variables are easier to handle, what they actually do is take $k$ elements $b_1,\dots,b_k$ from $B$ uniformly at random (with replacement) and consider the function $k^{-1}\sum_{i=1}^k(A+b_i).$ If $b_1,\dots,b_k$ are distinct (which they usually are) and we let $K=\{b_1,\dots,b_k\}$ then this is indeed $A*\mu_K.$

To analyse how this works, consider the value at a point $x$ of the function $k^{-1}\sum_{i=1}^k(A+b_i).$ It is the average of $k$ independent random variables, each of which involves picking a random $b\in B$ and setting the value to be 1 if $x-b\in A$ and 0 otherwise. That is, we have independent Bernoulli variables with mean $A*\mu_B(x)$ and variance at most 1.

How many of these do we need in order that their sample mean will, with high probability, be close to $A*\mu_B(x)?$ Well, the variance of the sample mean of $k$ of them is at most $k^{-1},$ so at the very least we need this to be at most $A*\mu_B(x)^2.$ To give an important example, suppose that $A=B$ is a random subset of $\mathbb{F}_3^n$ of density $\delta.$ Then $A*\mu_B(x)$ is approximately $\delta,$ so we need $k^{-1}\leq\delta,$ or $k\geq\delta^{-1}.$ And a moment’s thought shows that this is trivially necessary: the convolution $A*\mu_B$ is roughly constant, and since $A$ has density $\delta$ there is no hope of making it roughly constant if you take the average of fewer than $\delta^{-1}$ translates.

I haven’t actually said what kind of approximation I am talking about here, but it is sort of implicit in the fact that I looked at the value at an individual $x$ is that I am looking at an $L_p$ approximation for some $p.$ And this is indeed what Croot and Sisask, and also Sanders, do.

Now let us think about two further aspects of the method: how large a set it gives, and whether that set has any useful structure. The answer to the first question is that there are $(\beta N)^k$ ordered subsets of $B$ of size $k$ (where $\beta$ is the density of $B$ and $N=3^n$), so if we have chosen $k$ such that for at least half of them we have $A*\mu_K\approx A*\mu_B,$ then we can find at least $\beta^k N/2$ of those that are translates of each other. So as a rule of thumb, the density of the set of good translates you get is $\beta^k,$ where $\beta$ is the density of $B$ and $k$ is the size of your random samples.

In the case of the example where $A$ and $B$ are random sets of density $\delta,$ this gives us a set of density $\delta^{1/\delta}.$ Even if that set is a subspace, it has codimension $(1/\delta)\log(1/\delta),$ so isn’t that good.

Before I discuss that further, let me quickly mention the matter of structure. Croot and Sisask observed that if you choose $k$ so that the approximations are all very good, by which I mean that we obtain a set $X$ such that $A*\mu_B$ is very close to $A*\mu_B+x$ for every $x\in X,$ then by the triangle inequality we still get pretty good approximations even if all we know about $x$ is that it belongs to $X+X+\dots+X$ for some not too large number of summands. But iterated sumsets have much more structure than arbitrary sets, a fact that was heavily exploited by Sanders (using Chang’s theorem).

However, the example above suggests that one might be able to do something else. After all, in that case $A*\mu_B$ is roughly constant, so it is roughly invariant under all translates. Can we get anywhere by understanding why the Croot-Sisask approach gives only a few of these translates?

Here is one natural thought that sheds some light on what goes wrong in the random example. Suppose we look for plenty of small sets $K$ such that $A*\mu_K$ approximates $A*\mu_B$ not in some $L_p$ norm but in the $U_2$ norm. Since all the translates of $A$ are roughly equal in the $U_2$ norm, this is going to be very easy — in fact, we can take $k=1.$ The triangle inequality will then give us plenty of values $x$ such that $\|A*\mu_B-A*\mu_{B+x}\|_{U_2}$ is small. But we know that the Fourier coefficients of $A*\mu_B$ are absolutely summable. If the difference has small $\ell_4$ norm on the Fourier side, then it must have small $\ell_2$-norm, so by Parseval we get some kind of bound on $\|A*\mu_B-A*\mu_{B+x}\|_2.$ This suggests that the problem with the argument might be that the intermediate step it aims for (an $L_p$ bound) is too strong.

I haven’t checked what the bound is that comes out of the above argument, because in the next section I want to discuss an example that messes up all the thoughts I have had along these lines.

A killer example?

In this section I want to discuss an example that, to put things rather melodramatically, has been the bane of my life for many years. And I am not alone: for example, Nets Katz briefly alluded to it in one of his comments on my previous post. I am not yet well enough up on the new methods to know how damaging it is. I will say what I can about how I think the example might be dealt with, and will be very interested to know whether others have thoughts on the matter.

The basic idea behind the example is simple: take a random union of subspaces. Of course, there are some choices to be made here. We need to decide on the dimensions of the subspaces and the number that we shall take. Broadly speaking, if we take too many subspaces then their union will start to look like a fairly random set and we will no longer see its internal structure, whereas if we take too few then their union will be highly structured.

Suppose, then, that we take $m$ subspaces $V_1,\dots,V_m$ of codimension $k.$ Since we have chosen them randomly, their pairwise intersections have codimension $2k$ (with high probability). A property that will be very handy will be if we can treat the union as though its characteristic function were just the sum of the characteristic functions of the $V_i,$ so we would like it if the measure of the intersections was small.

Let us write $V_i$ both for the subspace and for its characteristic function (a practice I was adopting above with the set $A$ when I convolved it with $\mu_B$). Let $\eta=3^{-k}$ be the density of each $V_i.$ The density of the union of the $V_i$ is, by the inclusion-exclusion formula, at least $m\eta-\binom m2\eta^2,$ which is approximately $m\eta$ provided $\eta m$ is substantially less than 1. So let us assume this. Note that the square of the $L_2$ norm of the sum of the $V_i$ is $m\eta+m(m-1)\eta^2$ (since the inner product of $V_i$ with $V_j$ is $\eta^2$ when $i\ne j$ and $\eta$ when $i=j$). So with this condition on $\eta$ and $m$ we have that the union and the sum are approximately equal in $L_2.$

Let $A=V_1\cup\dots\cup V_m.$ The next thing I want to look at is the convolution $A*A.$ Let $f=V_1+\dots+V_m$ and assume that $\eta m<<1,$ which, as we have seen, implies that $\|A-f\|_2$ is small. By an easy case of Young’s inequality it follows that $\|A*A-f*f\|_\infty$ is small. I won’t bother with the precise calculation here — I’ll just feel free to use $f$ instead of $A$.

Of course, $f*f$ is very easy to calculate: it is $\sum_{i,j}V_i*V_j.$ Now $V_i*V_i=\eta V_i,$ whereas if $i\ne j$ then $V_i*V_j$ is the constant function that takes value $\eta^2$ everywhere. (This second assertion follows from the fact that $V_i$ and $V_j$ are subspaces of density $\eta$ and that $V_i+V_j=\mathbb{F}_3^n.$) Therefore, $f*f$ equals $\eta f+m(m-1)\eta^2.$

I would like to choose $\eta$ and $m$ so that the first term $\eta f$ is not tiny compared with the second term $m(m-1)\eta^2.$ Recall that we were talking about an $\ell_\infty$ approximation. A typical non-zero value taken by $\eta f$ is $\eta,$ so we need $m(m-1)\eta^2$ to be smaller than $\eta,$ for which in turn we need $m$ to be at most $\eta^{-1/2}$ or so.

If we also want $m\eta$ to be about $\delta,$ so that $A$ has density $\delta,$ then it seems that a good choice is $\eta=\delta^2$ and $m=\delta^{-1}.$ So let us make that choice. Then for reasonably small $\delta$ the function $A*A$ is close to the function $\delta^2A+\delta^2(1-\delta).$ In other words, instead of being a constant function, it is biased by having twice as much density as it should on the set $A$ and slightly less everywhere else.

The rough point I want to make now is that $A*A$ is roughly constant only in directions that belong to almost all the subspaces $V_i.$ But that forces us down to a subspace of codimension $m\log(\eta^{-1}),$ which is about $\delta^{-1}\log(\delta^{-1}).$ So it seems as though if we want a subspace $V$ such that $A*A\approx A*A*\mu_V,$ then it has to have disappointingly large codimension.

Properties of the set A just constructed.

In this section I shall try to understand the set $A$ a bit better, and try to see as precisely as I can what proof methods it rules out. (That is very much the point here. Obviously, a union of subspaces does not threaten to be a cap set, but it does have annoying properties that show that certain lemmas one would like are false.)

To begin with, let us work out the Fourier transform of $A.$ By our choices of $k$ and $m,$ this is well approximated by the Fourier transform of the function $f=k^{-1}\sum_{i=1}^kV_i.$ Let $W_i$ be the annihilator of $V_i$ for each $i.$ Then it is easy to check that the Fourier transform of $V_i$ is $\eta W_i=\delta^2W_i.$ Recalling that $k=\delta^{-1}$ and noting that the subspaces $W_i$ are disjoint except for $0$ and have cardinality $\eta^{-1}=\delta^{-2},$ this gives us $\delta^{-3}$ Fourier coefficients of size $\delta^2.$ Moreover, they span a subspace of dimension $mk=\delta^{-1}\log(\delta^{-2}),$ which equals the Chang bound.

Next, let us observe that since $A$ is symmetric, the 3AP density of $A$ is $\langle A*A,A\rangle$ which, by our discussion of $A*A$ above, is approximately $\langle \delta^2A+\delta^2(1-\delta),A\rangle\approx\delta^3(2-\delta).$ This is significantly more than the $\delta^3$ that one expects in the random case.

Next, let us think what happens when we pass to an affine subspace $U.$ Our set then becomes the union $\bigcup_{i=1}^k(V_i\cap U),$ which has density at most $\sum_{i=1}^k\mu_U(V_i)$ (where I am also using $\mu_W$ to denote the uniform distribution on $U$). Let $Z$ be the annihilator of $U.$ Then

$\mu_U(V_i)=\langle \mu_U,V_i\rangle=\langle Z,\mu_{W_i}\rangle=\delta^2|Z\cap W_i|.$

If $Z$ intersects $W_i$ in a subspace of dimension $k_i,$ then this works out as $\delta^2\sum_i3^{k_i}.$ If we fix the dimension of $Z$ to be $r,$ then by convexity if $r\leq k$ then we maximize this by taking one $k_i$ to be $r$ and the rest to be zero. (I’m assuming that the $W_i$ are in general position.) Thus, the best density increment we can get on an $r$-codimensional subspace is obtained by the obvious method of making that $r$-codimensional subspace contain one of the subspaces $V_i.$ That multiplies the density of that subspace by $3^r$ and doesn’t affect the densities of the other $V_j.$

Next, let us think about translation invariance of $A*A.$ Recall that this is approximately equal to $\delta^2A+\delta^2(1-\delta).$ Now take a point $x\in\mathbb{F}_3^n.$ For what density of $y$ do we have that $A*A(y)\approx A*A(x+y)$? Before we discuss that, we ought to decide how good we want the approximation to be. Since a typical value of $A*A$ is around $\delta^2$ it seems reasonable to ask for the difference to be small compared with $\delta^2.$ For this it is important not to choose $y$ such that $y\in A$ and $x+y\notin A.$

Now the density of $y\in A$ is around $\delta$ (since we chose $m$ and $k$ to make this the case). If $x\notin V_i$ then it is not possible for $y$ and $x+y$ both to belong to $V_i,$ and otherwise it is. So let $J$ be the set of $i$ such that $x\in V_i.$ Then to a pretty good approximation, $A\cap(A+x)$ has density around $(j/m)\delta=j\delta^2.$

It follows that unless $x$ belongs to at least half the $V_i,$ the density of points $y$ such that $A*A(y)-A*A(x+y)$ is small compared with $\delta^2$ is at most $1-c\delta.$ Since in the sketchy argument earlier I wanted to know $A*A$ with reasonable accuracy on a set of density $1-c\delta,$ and since the density of $x$ that belong to half the $V_i$ is extremely small, this example seems to show that there is a fundamental limitation to any argument that seeks to find a subspace $U$ and prove that $A*A$ is roughly constant on translates of $U.$

The above argument is a bit sketchy and I confess that I have not (yet) backed it up with precise statements and rigorous calculations. But I am reasonably confident that I am drawing the correct moral from it.

Some further observations concerning the above example.

1. There is an obvious analogue in $\mathbb{Z}_N$ of the set I have just constructed in $\mathbb{F}_3^n.$ Instead of taking a random union of subspaces, one takes instead a random union of random Bohr sets (which will have predictable densities because they are defined by highly dissociated frequencies). This suggests that one ought to be able to prove the sharpness of Chang’s inequality in $\mathbb{Z}_N$ without resorting to niveau sets. I plan to check this. (Alternatively, if anyone feels like doing some of the estimates, then perhaps there is scope for a joint paper here. The project would be to take the discussion of the previous paragraph, make it rigorous, and generalize it to all finite Abelian groups. Of course, all that would depend on its being correct in the first place.) [Added later: I’ve now realized that what I wrote here is wrong, since the example is weaker than the Chang bound. Chang’s inequality tells us that the number of dissociated frequencies where the Fourier coefficient is at least $\delta^2$ is at most $\delta^{-2}\log(1/\delta)$ whereas this union-of-subspaces example gives only $\delta^{-1}.$]

2. It is interesting to note that there is an easy way of getting a density increase for the set $A$ above, but getting the density to decrease is much harder. To get the density to increase, we pick on a subspace $V_i$ and choose $U$ in such a way that $V_i\subset U.$ If we want to decrease the density, then it looks as though we should choose $U$ to be disjoint from $V_i.$ But this has a much smaller effect. This appears to demonstrate that the one-sidedness that both Nets Katz and I have run up against (and actually Seva Lev talks about it too) is a phenomenon that genuinely occurs. It also suggests that one might perhaps be able to find an interesting example by taking the complement of a union of subspaces, so that density increases become hard to obtain. However, I do not yet see how to make this thought work. (I can’t just take the complement of the set $A$ above as that has density $1-\delta.$ But if I try to make it smaller, then the union and the sum are no longer roughly the same, so the example becomes harder to analyse. And now, a while later, I have made a fairly serious effort to analyse it and it seems to me to be hard to get the density to be small without the set becoming quasirandom.)

3. The main feature I would like to draw attention to in the above example is that it splits naturally into pieces, each of which has excellent translation-invariance properties. So a natural thing to try to do is find a general argument for splitting sets up into their “components” before attempting to apply an improved Croot-Sisask lemma. Alternatively, perhaps one can give an argument of the following kind: if a set $A$ contains pieces $B$ and $C$ that have low mutual additive energy, then $B*C$ is roughly constant, which implies that $A$ contains 3APs. If it does not contain such pieces, then $A$ ought to have some unusual structure that prevents this. Perhaps, for instance, there must be many directions $x$ such that $A\cap(A+x)$ is unusually large, and perhaps that implies that we can obtain a better density increment than Roth-Meshulam gives.

4. The annoying example has a large spectrum that splits into subspaces. We know that we can in fact assume that no subspace can contain too many elements of the large spectrum. Could it be that we can obtain useful translation invariance of $A*A$ under the additional assumption that the large spectrum of $A$ is “well-dispersed”?

As with my previous post, I end this one feeling as though there is much more to say. But I think my thoughts will be more organized if I leave it a few days, especially if I get some useful feedback on this post: I’ll be particularly interested in any ideas people might have for dealing with random unions of subspaces. Given that these cause difficulties for many of the ideas that occur to me, perhaps the title “What is difficult about the cap-set problem?” would have been more appropriate for this post than the last one.

### 24 Responses to “More on the cap-set problem”

1. The Cap-Set Problem and Frankl-Rodl Theorem (C) | Combinatorics and more Says:

[…] (March 2009). Here are two “open discussion” posts on the cap set problem (first, and second) from Gowers’s blog (January 2011). These posts include several interesting Fourier theoretic […]

2. gowers Says:

I have a further remark to make concerning the “killer example”, still based on the assumption that it has the properties I claimed it has. It is closely related to observation 2 above.

Suppose we want to try to prove that a set $A$ of density $\delta$ with no APs of length 3 must have a substantial density increase on some subspace of not very high codimension. It is tempting to try to prove a more general assertion, namely that if $f$ is any bounded function that averages $\delta$ and $\mathbb{E}_{x+y+z=0}f(x)f(y)f(z)$ is significantly smaller than $\delta^3,$ then $f$ averages significantly more than $\delta$ on some low-codimensional subspace.

However, I think the “killer example” shows that this statement is false. It gives us an example of a set $A$ with density $\delta$ and with AP-density roughly $2\delta^3.$ Now consider the function $2\delta-A.$ This again has density $\delta.$ Its 3AP count is $8\delta^3-3(2\delta^2)\delta+3(2\delta)\delta^2-2\delta^3=0.$ However, the fact that it is hard to obtain a density decrease for $A$ makes it hard to obtain a density increase for $2\delta-A.$

This shows that any proof that hopes to use a density-increment method to obtain a substantial improvement to the Roth-Meshulam bound must use in a crucial way the fact that the characteristic function of $A$ takes values in $[0,1]$ rather than, say, in $[-1,1].$ More suggestively, the balanced function of $A$ takes values in $[-\delta,1-\delta],$ and we must use the “one-sided” fact that its negative values cannot have large modulus.

Since this is potentially an important point, let me try to be more precise about what kinds of arguments the example rules out. To do that, I’ll bound from below the density of $A$ in an $r$-codimensional affine subspace $U.$ Let $Z$ be the annihilator of $U$ and let $W_i$ be the annihilator of $V_i.$ Then the density of $V_i$ in $U$ is $\delta^2$ unless $Z$ contains a point of $W_i.$ In the second case, $Z$ contains a vector capable of distinguishing $V_i$ from $U.$ Whether or not it does so depends on whether we choose the right translate of $Z^\perp,$ but the point is that the density of $A$ in $U$ is roughly $(m-t)\delta^2,$ where $t$ is the number of the $W_i$ that are intersected by $Z.$ Since the $W_i$ are assumed to be in general position, the density of $A$ in $U$ is at least roughly $\delta(1-\delta\dim(Z)).$ Thus, to get a density increase proportional to $\delta$ the number of dimensions we must lose is also proportional to $\delta.$

If this is correct, it gives a test that should be applied to any attempted proof that tries to use a density-increment argument: does the argument generalize in a natural way to $[-1,1]$-valued functions? If it does, then the attempt will not improve on the Roth-Meshulam bound.

As with the post, this comment comes with a health warning: I haven’t written things up carefully so it is possible that what I am saying is nonsense.

3. gowers Says:

I’d like to point out briefly that the arguments of Nets Katz and Michael Bateman (and the more simple-minded arguments from my previous post) pass the test I discussed in my previous comment. That is, they exploit positivity in a crucial way. Let me spell this out.

It’s clear that what I have just said must be true, since one of the key facts driving what Nets was writing about is that you can’t have more than $k/\delta$ Fourier coefficients of size $\delta^2$ in a subspace of dimension $k,$ whereas the function I discussed in my previous comment had $\delta^{-2}$ Fourier coefficients lying in a subspace of dimension $\log(1/\delta).$ This isn’t a contradiction, because the function does not take values in $[0,1],$ though it does take values in $[-1,1].$ But let us see in more detail why it isn’t a contradiction.

The argument for the $k/\delta$ bound goes like this. (I have already given it, and Nets has given it in a different form.) Suppose we can find a subspace $W$ of dimension $k$ such that for $m$ values of $r\in W$ the Fourier coefficient $\hat{A}(r)$ has modulus at least $\delta^2.$ Then $\sum_{r\in W}|\hat{A}(r)|^2\geq \delta^2+m\delta^4.$ (The initial $\delta^2$ comes from the fact that $\hat{A}(0)=\delta.$ Now the restriction of $\hat{A}$ to $W$ is the Fourier transform of $A*\mu_V,$ where $V$ is the subspace annihilated by $W$ (so $V$ has codimension $k$). Therefore, by Parseval, $\|A*\mu_V\|_2^2\geq\delta^2(1+m\delta^2).$ If $m\delta^2$ is substantially less than 1, then this roughly tells us that $\|A*\mu_V\|_2\geq\delta(1+m\delta^2/2),$ and therefore that there is an affine subspace of codimension $k$ (a translate of $V$) where $A*\mu_V$ takes at least the value $\delta(1+m\delta^2/2),$ which is the same as saying that the density of $A$ in that affine subspace is at least $\delta(1+m\delta^2/2).$ It is here that I have used positivity: if I didn’t know that $A*\mu_V$ was positive, then its $L_2$ norm might be accounted for by some large negative values instead.

Now if we did $k$ Roth-Meshulam iterations, we would obtain a density increment of $k\delta^2,$ whereas actually we have obtained $m\delta^3/2.$ So the assumption that we cannot improve on Roth-Meshulam implies that $m$ cannot be substantially bigger than $k/\delta,$ as claimed.

4. Ernie Croot Says:

Your construction of a union of a bunch of random subspaces made me think of a nice idea/result due to A. Blokhuis. Rather than write this all out, let me direct you to the following problem I posed at the ICM satellite conference on Analytic Number Theory:

http://www.math.gatech.edu/~ecroot/icm_question3.pdf

See section 1.4 of the note. Basically, Blokhuis’s idea is an algebraic method that gives good bounds on the “union of subspaces” problem in that section (sec. 1.4).

Perhaps there is a way to combine Tom’s Bogolyubov theorem (or my work with Olof) with this algebraic approach to give a strong solution to the problem at the beginning of that note (since Tom’s result implies that sumsets $A_g+A_g$ contain 99 percent of the elements of a large coset progression, one almost has that $\cup_g (A_g \hat + A_g + g)$ contains the union $\cup_g (V_g^* + g)$, where $V_g^*$ is some large subspace/subgroup with the $0$ element removed). If so, then, as I comment, one would get strong upper bounds on the size of subsets in $Z_4^n$ without 3APs, under a certain “uniformity assumption” (which perhaps can be removed) mentioned in the note.

Assuming this could be done, it makes me wonder whether one could say something similar in $F_3^n$ — can algebraic and combinatorial/analytic methods be combined to attack the CAP set problem?

5. JSE Says:

It is late so forgive me if what I write is not correct — but Lev’s example is a quadric surface S in A^3 over F = F_{3^m}, and I suppose what’s going on is that the non-squareness of lambda means that the two familes of lines contained in S are switched by the Galois group of F — in particular, there is no line at all defined over F and contained in S. But then, by Bezout, any line intersects S in at most two points. So I think something even stronger is true; that in Lev’s example, any F-arithmetic progression (i.e. a set of the form a + Fb with a,b in F^3) intersects S in at most two points. Any ordinary three-term arithmetic progression is contained in one of these.

6. gowers Says:

This is a bit of a non-comment, but I wanted to write some kind of acknowledgement of the comments of Ernie Croot and JSE above: I may not have thought of anything worth saying in response, but I found both of them interesting and thought-provoking.

7. Felipe Voloch Says:

A comment to follow up on Jordan’s one. First, a set in $F^3$ with no three collinear points is called a cap because it looks like a cap (=hat like thing). An elliptic quadric (i.e. a quadric whose ruling is only defined in a quadratic extension) is a cap with $q^2+1$ points (where $q = \# F$). More surprising is the following theorem of B. Segre, if $q$ is odd, then a cap in $F^3$ whose is cardinality is at least $q^2 -cq$ (where $c$ is some constant I don’t remember) is contained in an elliptic quadric.

• gowers Says:

Many thanks for this interesting comment. I have to confess that the hattishness of a set with no three collinear points doesn’t jump out at me, unless you are talking about subsets of $\mathbb{R}^3$ — is that the point?

I was about to ask why the last statement doesn’t solve the cap-set problem, but suddenly realized that not containing three collinear points in $F^3$ is much stronger than not containing a line in $\mathbb{F}_3^n.$ But that is an interesting result you mention.

In case you are wondering why your latex didn’t initially come out, it is because you put backslashes in front of “latex”.

• Felipe Voloch Says:

Thanks for fixing my latex. Yes, the terminology is supposed to evoke the geometry of $\mathbb{R}^3$ although it may not be necessarily true even there that a set with no 3 collinear points looks like a hat.

8. Ernie Croot Says:

While we are on the topic of collinear points (and incidences), I thought I would mention a problem related to 3APs:

Problem. Can one encode three-term progressions in terms of point-line incidences in $R^2$? More specifically: suppose you are given a set of pairs $(p_i,\ell_j)$, which represents the fact that the point $p_i$ is on the line $\ell_j$. Furthermore, suppose that this list of incidences may not be exhaustive in that there could be other point-line incidences in the drawing not accounted for by the list. Then, no matter how you draw the configuration of points and lines, so long as the incidences are respected, must the set of points $p_i$ ALWAYS contain a 3AP?

Take, for example, the $3 \times 3$ grid of $9$ points, with $8$ lines, each incident with three of those points. Some points are incident with $3$ of those lines; some are incident with only $2$; and one point is incident with $4$ lines. It turns out that there are many OTHER ways of drawing the points and lines (uncountably many, even after accounting for affine transformations), besides the $3 \times 3$ lattice grid, such that the $9$ points do not contain a three-term progression. (It is a nice exercise to work out some examples!)

But what about $4 \times 4$ grids, or $5 \times 5$, etc.?

….

The original motivation of this problem was to find a way to convert problems about 3APs into incidence questions that could then be approached algebraically (e.g. using elliptic curves).

• Christian Says:

I’m not sure whether I understand the problem correctly, but do you mean by three points forming an 3AP that one of them is the midpoint of the line segment joining the two other ones? If so (or also otherwise) could you, please, indicate a point/line-configuration that enforces the presence of a 3AP? At the moment I cannot think of any, mainly because there seem to be far too many projective maps that preserve incidence but tend to destroy the midpoint property. (To be more precise: Take a drawing of a given configuration, colour all points which are fourth harmonic points of collinear triples of points you have used yellow, consider a line avoing all these yellow points and send it to infinity.)

• Ernie Croot Says:

Christian,

“but do you mean by three points forming a 3AP that one of them is the midpoint…?” Yes, that is what I mean.

“Can you indicate a… ?” I don’t know of a single one! That’s my question — prove that one exists.

Assuming that the point-line incidence restrictions for all the lines associated to some $k \times k$ grid pin down the grid structure up to affine transformations, it should be possible to decide this algebraically: in order to bypass the problems of multiple configurations due to affine transformations (which preserve incidences), one can anchor three of the points to be $(0,0), (1,0)$ and $(0,1)$. Then, one can represent triples $p_1, p_2, p_3$ of collinear points by quadratic questions, which reflect the fact that the slopes of the segments $p_1 \to p_2$ and $p_2 \to p_3$ are equal. In principle, then, one can represent all the point-line incidences (using only the lines meeting three or more of the points) by a large number of these quadratic equations, which *should* have a unique solution (since the anchoring prevents multiple solutions coming from affine transformations). Perhaps one of these computer algebra packages (like MAGMA) can then verify that the equations indeed have a unique solution in the reals.

Assuming this is true (the incidence data for sufficiently large grids uniquely determines the structure of the grid up to affine transformations), a bigger and more interesting challenge would be to produce a set of point-line pairs where every line is incident to EXACTLY three points (i.e. not four or more), such that for any drawing of the incidence the points $p_i$ must ALWAYS contain a 3AP. I happen to believe it is possible to find such a configuration; in fact, I believe that projecting down a $3 \times 3 \times \cdots \times 3$ grid (and the associated lines hitting three points each) to the 2D plane (bijectively) will give such a configuration when the number of dimensions is large enough.

Why consider point configurations where every line meets exactly three points? Well, that has to do with ellitpic curves… another time, perhaps.

9. Christian Says:

Dear Ernie,

your argument involving a large number of quadratic equations sound compelling, but I still don’t see why the following rules out that, e.g. a $k\times k$ grid, can enforce 3APs. The map $(x, y)\mapsto (\frac{x){1+x\pi+ye}, \frac{y){1+x\pi+ye}$ preserves collinearity of points, so if we apply it to all points $(x, y)$ with integers $x, y$ ranging from $1$ to $k$ we get a copy of our grid. (I assume here that $e$ and $\pi$ are linearly independent over the rationals, so that we do not divide by zero. It is also useful to assume these numbers to be algebraically independent for the sake of discussion. If one does not want to rely on that, one can replace them by two other “sufficiently general” real numbers). Now in the new drawing no point seems to be the midpoint of two other points (basically because this would tell us something non-trivial about $e$ and $\pi$.)

So it may be that the quadratic equations you produce are not very independent from one another. A possible reason for that, apart from getting only $k-2$ insted of ${k \choose 3}$ equations for any $latex k$ collinear points might be hidden in results such as Pappus’ or Desargues’ theorem, which seem to tell us that sometimes one of these equations is entailed by eight or nine others in a non-trivial way. Hilbert’s Nullstellensatz suggests that these implication can be made rather explicit in single equations, which would incidentally yield – probably rather ugly – one-line-proofs of these two theorems.)

• Christian Says:

Oups, apart from that I failed to press the reply button, a crucial formula did not parse, so here is another attempt to write it down with slightly more English: The point $(x, y)$ is mapped to $(x/(1+xe+y\pi), y/(1+xe+y\pi))$.

• Ernie Croot Says:

Dear Christian,

Wow! That’s fantastic. I had thought about various kinds of maps like you had described, but had somehow overlooked this particular possibility. I suppose it sinks my hope of encoding 3APs by incidences, but at the same time it means that problems about 3APs really are fundamentally different from incidence-type problems (at least superficially).

By the way, there is an old problem posed by T. Tao (and I think also J. Solymosi and others) appearing in the following (problem 4.6)

about classifying collections of points and lines with almost the maximal number of incidences. The fact that there are maps such as you describe suggests that the answer will not simply be “dense subsets of uniform grids”, but will instead be much more complicated.

• Christian Says:

Dear Ernie,

it seems to me that this kind of map is by far to general to imply anything interesting about the inverse question associated with the Szemerédi-Trotter thereom, except for that it suffices to classify the relevant configurations up to “projective isomorphisms”. By this I mean: These maps can be applied not only to grids, but to all configurations of points and lines, so in particular to those maximizing the number of incidences. By the way, I think I can do Problem 4.13 from your list, but probably that’s not new given that the list seems to be a few years old.

Best regards,

Christian.

• Ernie Croot Says:

About that problem 4.13. Let’s continue this conversation off of this blog. Send me an email.

I guess I was being silly about the 3APs and incidences question, now that I think about it. I should have thought about the artist’s drawing of a horizon — that transformation preserves lines and incidences, but not 3APs.

But funnily enough I had asked this problem about 3APs to several people (including a geometer), and not one of them thought of using these projective transformations.

10. Seva Says:

Concerning my example — here is an idea of how one could try to develop it further, to construct even larger caps. Let ${\mathbb F}$ be the field of order $q=3^m$, and let $Q_1$ and $Q_2$ be quadratic forms over ${\mathbb F}^5$ (more precisely, over ${\mathbb F}$ in five variables). Denote by $Z$ the set of common zeroes of $Q_1$ and $Q_2$. Suppose that $A\subset{\mathbb F}^5$ contains no (non-trivial) arithmetic progression \emph{with a difference from $Z$}, and consider the set
$S := \{ (x,Q_1(x),Q_2(x)) \colon x \in A \} \subset {\mathbb F}^7.$
Any arithmetic progression in $S$ is of the form
$(x-d,Q_1(x-d),Q_2(x-d)),(x,Q_1(x),Q_2(x)),(x+d,Q_1(x+d),Q_2(x+d)),$
where $x-d,x,x+d\in A$ and $Q_i(x-d)+Q_i(x)+Q_i(x+d)=0$ for $i=1,2$. The latter condition implies $Q_1(d)=Q_2(d)=0$, whence $d\in Z$. But now the former condition along with the assumption that $A$ contains no arithmetic progressions with the difference in $Z$ show that $d=0$! Thus, $S$ is a cap, and we have $|S|=|A|$. Can we choose $Q_1,Q_2$, and $A$ so that $|A|>q^{5-o(1)}$ (as $m\to\infty$)? (What gives a hope that this is possible is that we do allow progressions in a vast majority of directions!) If so, we have constructed a cap in ${\mathbb F}^7$ of size $|{\mathbb F}^7|^{5/7+o(1)}$ which, as far as I remember (I may well be wrong about it though), beats the best known constructions.

• Michael Says:

I did a Monte Carlo with q=9, and I think found two quadratics whose zero-sets shared q points, all on a line through zero. (That is not surprising for homogeneous quadratics.)
If that happened in other fields, it would allow $q^4$ points in A, out of a total $q^7$ points, which is not as good as your other solution.
The only better result for this method would be if there are two quadratics whose only zero-point in common is the zero vector itself. In that case, the size of A would be $q^5$ and you get your density

• Michael Says:

The Chevally-Warning Theorem (1936) says that any set of polynomials over a finite field, with more variables than the total of the degrees of the polynomials, has simultaneous zeros.
In our case, that means two quadratics in five variables have simultaneous non-trivial zeros, so A can’t have $q^5$ points.
Or three quadratics in seven variables have simultaneous non-trivial zeros, and again we can’t do better than your first solution.
Perhaps two quadratics in eight variables could have just q simultaneous zeros?

• Michael Says:

Consider two quadratics in four variables whose only common zero is the zero vector. They lead to a solution (w,x,y,z,Q1,Q2) with $q^4$ points, and the same density as Seva’s original example.
I don’t know if they include new examples or not.
The only example I can think of is $Q1 = w^2-\lambda x^2$ and $Q2 = y^2-\lambda z^2$. where $\lambda$ is not a square.
Unfortunately, that is just two solutions $(w,x,w^2-\lambda x^2)$ multiplied together.

11. Polymath6 « Euclidean Ramsey Theory Says:

[…] Polymath6 has started. It is about improving the bounds for Roth’s theorem. See this post. Also there is a page on the polymath blog here. There is also a wiki here. Also see this post and this post. […]

12. Tentative Plans and Belated Updates II | Combinatorics and more Says:

[…] after many years, the Roth-Meshulam bound.  See these two posts on Gowers’s blog (I,II). Very general theorems discovered independently by David Conlon and Tim Gowers and by Matheas […]

13. Gil Kalai Says:

Regarding random union of subsets: It looks that if when we consider moving from V to a hyperplane, and then a codim 2 space and so on, the density-increase grows geometrically with the codimension, are there examples avoiding this feature?