EDP17 — are we nearly there?

Apologies for the attention-seeking title, but that really is the purpose of this post. I want to draw attention to some ideas that are buried in more comments that most people are likely to want to read, because I think there is a chance that all that stands between where we are now and a solution to EDP is a few soluble technical problems. It goes without saying that that chance is distinctly less than 100%, but I think it is high enough for it to be worth my going to some trouble to lay out as precisely as I can what the current approach is and what remains to be done. I’ll try to write it in such a way that it explains what is going on even to somebody who has not read any of the posts or comments so far. The exception to that is that I shall not repeat very basic things such as what the Erdős discrepancy problem is, what a homogeneous arithmetic progression is, etc. For that kind of information, I refer the reader to the front page of the Polymath5 wiki.

Representing the identity.

Let me now discuss an approach that doesn’t work. (If you have been keeping up with the discussion, then this will be familiar material explained in a slightly different way.) Let N be a large integer, and if P and Q are two HAPs contained in [N]=\{1,2,\dots,N\}, then write P\otimes Q for the N\times N matrix that is 1 at (m,n) if m\in P and n\in Q and 0 otherwise. In other words, it’s the characteristic function of P\times Q. Note that if x\in\mathbb{R}^N, then \langle x,(P\otimes Q)(x)\rangle=(\sum_{i\in P}x_i)(\sum_{j\in Q}x_j. Let us write \sum_{i\in P}x_i as \langle x,P\rangle, etc.

Suppose that we could find for every pair (P,Q) of HAPs contained in [N] a coefficient \lambda_{PQ} in such a way that I_N=\sum_{P,Q}\lambda_{PQ}(P\otimes Q) and \sum_{PQ}|\lambda_{PQ}|\leq cN. Then for every real sequence x_1,\dots,x_N we have

\sum_i x_i^2=\sum_{PQ}\lambda_{PQ}\langle x,P\rangle\langle x,Q\rangle.

It follows by averaging that there exist P and Q such that

|\langle x,P\rangle \langle x,Q\rangle|\geq(cN)^{-1}\sum_ix_i^2.

In particular, if each x_i=\pm 1, then there must exist P and Q such that |\langle x,P\rangle\langle x,Q\rangle|\geq c^{-1}, from which it follows that either |\langle x,P\rangle| or |\langle x,Q\rangle| is at least c^{-1/2}. So if we can get c to tend to zero as N tends to infinity then we are done.

Unfortunately, it is impossible to get c to tend to zero. The reason is that the above argument would imply that if x_i=1,-1,0 when i=1,2,0 mod 3, then for every C there exists a HAP P such that |\sum_{i\in P}x_i|\geq C. But that is not true: it can never be greater than 1.

Because of this example, we have been trying a different approach, which is to look for a more general diagonal matrix D and write that as a linear combination of matrices of the form P\otimes Q. If one generalizes the approach in this way, then it is no longer clear that it cannot work — indeed, it seems likely that it can. However, it also seems to be hard to find a suitable diagonal matrix, and hard to think how one might decompose it once it is found.

Working over the rationals instead.

The single main point of this post is to suggest a way of overcoming this last difficulty. And that is to resurrect an idea that was first raised right near the beginning of this project, which is to look at the problem for functions defined on the positive rationals rather than the positive integers. (It is a straightforward exercise to show that the two problems are equivalent. For details go to the Polymath5 wiki and look at the section on the first page with simple observations about EDP.)

The point is that the counterexamples that show that the approach cannot work for the integers all make crucial use of the fact that some numbers are more divisible by small factors than others. But over the positive rationals all numbers are equally divisible. Or to put it another way, multiplying by a positive rational is an automorphism of \mathbb{Q}. This suggests that perhaps over the rationals it would be possible to use the identity matrix.

Dealing with infinite sets.

Now a problem arises if we try to do this, which is that the rationals are infinite. So what are we supposed to say about the sum of coefficients when we decompose the identity into a linear combination \sum_{P,Q}\lambda_{PQ}P\otimes Q?

Let me answer this question in two stages. First I’ll say what happens if we decompose the identity when the ground set is \mathbb{N}, which shows a way of dealing with infinite sets, and then I’ll move on to \mathbb{Q}, where some additional problems arise.

Suppose, then, that the infinite identity matrix (that is, the function \delta_{mn} where m and n range over \mathbb{N}) has been expressed as a linear combination \sum_{P,Q}\lambda_{PQ}P\otimes Q. Now let (x_n)_1^\infty be a \pm 1 sequence. We’d like to show that it has unbounded discrepancy: that is, we’d like to show that for every C there exists a HAP P such that |\sum_{n\in P}x_n|>C.

Our problem is that \sum_{P,Q}|\lambda_{PQ}| is going to be infinite. We’d somehow like to show that it has “density” at most c, where c is an arbitrary positive constant, or perhaps show that it has density zero. One way we might do this is as follows. For each positive integer N, Let S_N be the sum of the |\lambda_{PQ}| over all (P,Q) such that both P and Q have non-empty intersection with [N]. Then define the upper density of the coefficients to be \lim\sup S_N/N. If this is zero we can say that the coefficients have density zero. And if it is at most c then we can say that they have density at most c. (In fact, even the \lim \inf would be OK — we just want the density to be small infinitely often.)

Let’s suppose that S_N/N is at most c. Then if we truncate the sequence (x_n) at N, by changing all values after N to zero, we find that

N=\sum_{n=1}^Nx_n^2=\sum_{P,Q\in H_N}\lambda_{PQ}\langle x,P\rangle\langle x,Q\rangle,

where I have written H_N for the set of all HAPs that have non-empty intersection with [N]. Since \sum_{P,Q\in H_N}|\lambda_{PQ}|\leq cN, the same averaging argument as before gives us a HAP (either P or Q — WLOG P) such that |\sum_{n\in P}x_n|\geq c^{-1/2}.

I fully admit that this is not very infinitary, but it is simple, and I’m not sure it matters too much that it is not infinitary. I’ll just briefly mention that one can use it to express the proof of Roth’s theorem (about AP discrepancy rather than HAP discrepancy). One expresses the infinite identity matrix as the following integral:

I=\int_0^1 e_\alpha\otimes \overline{e_\alpha} d\alpha,

where e_\alpha(n)=e(\alpha n)=\exp(2\pi i\alpha n). One then expresses each function e_\alpha as a linear combination of HAPs of length m (an arbitrary positive integer) and common difference at most m. One then obtains some cancellation in the coefficients, and proves that the density of the coefficients is at most m^{-1/2} (up to a constant factor). For details of how this calculation works, see this write-up, and in particular the third section.

The importance of the restriction on the length and common difference is that the edge effects (that is, APs that intersect [N] without being contained in [N]) are negligible for large N. It is this feature that is slightly trickier to obtain for the rationals, to which I now turn.

Transferring to HAPs and rationals.

One useful feature of the set of APs of length m and common difference at most m is that each number greater than or equal to m^2 is contained in precisely m^2 such APs. A first question to ask ourselves is whether we can find a set of HAPs that covers the rationals in a similarly nice way. To start with, observe that if x is a rational, then we can easily describe every HAP of length m that contains x. Indeed, for every r between 1 and m, we have the HAP consisting of the first m multiples of x/r, and that is all of them (since x must be in the rth place of the HAP for some r, and that and the length determine the HAP). So we have the extremely undeep result that every x is contained in precisely m HAPs of length m. (Note, however, how untrue this is if we work in the positive integers rather than the positive rationals.)

This looks promising, but we now need an analogue of the “increasing system of neighbourhoods” that was provided for us by the sets [N]. (It might have been more natural to work in \mathbb{Z} and take the sets [-N,N].) What is a sensible collection of finite sets with union equal to \mathbb{Q}?

One way of thinking about the sets [N] is as follows. Using our system of APs, we can define a graph: we join x to y if there is an AP of length m and common difference at most m that contains both x and y. The sets [N] are quite close to increasing neighbourhoods in this graph: start with the number 1 and then take all points of distance at most r from it. If we work with \mathbb{Z} rather than \mathbb{N}, then this graph is a Cayley graph, and after a while the neighbourhoods grow linearly with r, which is why the boundary effects are small.

What happens if we define a similar graph using HAPs in \mathbb{Q}? Now we are joining x to y if there exists 1\leq r\leq m and 1\leq p\leq m such that px/r=y. That is precisely the condition that there exists a HAP of length m that contains both x and y. In other words, we take the multiplicative group of \mathbb{Q} and inside it we take the Cayley graph with generators all numbers p/q with 1\leq p,q\leq m.

This feels like the right graph to take, but it has the slight drawback that it is not connected: it is impossible to get from x to y if x/y is a rational such that in its lowest terms either its numerator or denominator is divisible by a prime greater than m. The connected component of 1 in the graph is the set of all rationals r/s, where both r and s are products of primes less than or equal to m. But this is not really a problem: we’ll just work in that component of the graph.

Let’s write \mathbb{Q}_m for the component of \mathbb{Q} we are working in, and N(r) for the set of all points at distance at most r from 1 in the graph. Now we can say how it would in principle be possible to prove EDP. We would like to find a way of writing the identity (this time thought of as the function \delta_{rs} where r and s range over \mathbb{Q}_m) in the form

\sum_{P,Q}\lambda_{P,Q}P\otimes Q,

where P and Q are HAPs of length at most m that are contained in \mathbb{Q}_m. For each r, let H_r be the set of all such HAPs that have non-empty intersection with the neighbourhood N(r). Then we can define S(r) to be \sum_{P,Q\in H_r}|\lambda_{PQ}|, and we can define the upper density of the coefficients to be \lim\sup S(r)/|N(r)|.

Now let me show that if S(r)/|N(r)| is at most c for some sufficiently large r, then the HAP-discrepancy of every \pm 1 function f on \mathbb{Q} is at least c^{-1/2}. This is by almost exactly the same argument that worked in \mathbb{N}. The first step is to consider the restriction of f to N(r). Then we know that

|N(r)|=\sum_xf(x)^2=\sum_{P,Q\in H_r}\lambda_{PQ}\langle f,P\rangle\langle f,Q\rangle.

Now any HAP of length at most m that intersects N(r-1) is contained in N(r), from which it follows that any HAP of length at most m that intersects N(r) but is not contained in N(r) must intersect N(r)\setminus N(r-1). Since \mathbb{Q}_m is a finitely generated Abelian group, the sets N(r) grow polynomially, which implies that the ratio |N(r)\setminus N(r-1)|/|N(r)| tends to zero with r.

I’ll now be very slightly sketchy. We are supposing that S(r)/|N(r)| is at most c. It follows that either S(r)\approx S(r-1) or S(r-1)/|N(r-1)| is noticeably smaller than S(r)/|N(r)|. In the second case we can change r to r-1 and start again — we won’t be able to do this too many times so eventually we’ll reach the first case, where S(r)\approx S(r-1).

In that case we have that

|N(r)|\approx\sum_{P,Q\subset N(r)}\lambda_{PQ}\langle f,P\rangle\langle f,Q\rangle

and that

\sum_{P,Q\subset N(r)}|\lambda_{PQ}|\approx c|N(r)|.

After that, the argument really is the same as before (give or take the small approximations). [Remark: I have not checked the details of the above sketch, but I’m confident that something along these lines can be done if this doesn’t quite work. It’s slightly more difficult than in \mathbb{N} because it isn’t obvious that the intersection of a HAP with N(r) is a HAP, whereas the intersection of an AP with \{1,2,\dots,N\} is an AP.]

Characters on the rationals.

Now we must think about how to express the identity as a linear combination of HAP products with coefficients of density at most c, where c is some arbitrary positive constant. Taking our cue from the \mathbb{N} case, it would be natural to express the identity on \mathbb{Q}_m in terms of characters, and then to decompose the characters. So a preliminary task is to work out what the characters are.

This is (not surprisingly) a piece of known mathematics. I shall discuss it in a completely bare-hands way, but readers who don’t like that kind of discussion may prefer to look at the comment by BCnrd to this Mathoverflow question.

First I’ll work out what the characters on \mathbb{Q} are, and then I’ll look at \mathbb{Q}_m. Recall that a character is a function \chi from \mathbb{Q} to the unit circle in \mathbb{C} such that \chi(x+y)=\chi(x)\chi(y) for every x,y\in\mathbb{Q}. That is, it is a homomorphism from (\mathbb{Q},+) to \mathbb{T}.

Suppose we know the value of \chi at a rational x. That tells us what \chi(nx) is for every integer n. However, it does not tell us what \chi(x/2) is, say. All we know is that \chi(x/2)^2=\chi(x), which gives us two possibilities for \chi(x/2). So in order to specify \chi, we need to specify its values at enough rationals that we can write every other rational as a multiple of one of them. And the choices we make at those rationals have to be compatible with each other.

A simple way of doing that is to choose the value of \chi at 1/n! for every positive integer n, making sure that \chi(1/(n+1)!)^{n+1}=\chi(1/n!). That is, we choose \chi(1) to be any point in \mathbb{T}, and then for each n we have n choices for \chi(1/n!) given our choices up to that point.

An equivalent way of thinking about this is that we choose a sequence of real numbers \alpha_1,\alpha_2,\dots satisfying the conditions that 0\leq \alpha_n < n! and \alpha_{n+1}-\alpha_n is a multiple of n!. Then the corresponding character \chi is defined at x=p/q to be e(\alpha_n x) for any n\geq q. Note that this is well-defined, since if m and n are both at least q, then q|(\alpha_n-\alpha_m), so e(\alpha_np/q)=e(\alpha_mp/q).

Now because we chose an element of [0,1] and then made a sequence of finite choices, it is easy to put a probability measure on the set of characters. We can therefore make sense of the expression

\mathbb{E}_\chi \chi(x)\overline{\chi(y)}=\mathbb{E}_\chi \chi(x-y)

and prove that it is \delta_{xy}. Let me briefly sketch this. Suppose that x=y. Then \chi(x-y)=\chi(0)=1 for every \chi, so we get 1. If x\ne y, then let x-y=p/q when written in its lowest terms. If \alpha_1,\alpha_2,\dots is the (random) sequence that determines \chi, then each \alpha_n is uniformly distributed in the interval [0,n!), and \chi(p/q)=e(\alpha_qp/q). But \alpha_q/q is uniformly distributed in the interval [0,(q-1)!), so this expectation is zero (as p\ne 0).

We have therefore shown that

I=\mathbb{E}_\chi\chi\otimes\overline{\chi},

where I is the identity indexed by \mathbb{Q}.

What do we do if we want to modify this to work for \mathbb{Q}_m? Well, an initial complication that (I hope) turns out not to be a serious complication is that \mathbb{Q}_m is not an additive group: it contains 1 and it does not contain any prime greater than m. However, it generates a subgroup \Gamma_m of \mathbb{Q}, which consists of all rationals with denominators that are products of primes less than or equal to m. When I refer to “characters on \mathbb{Q}_m” I will really mean characters on \Gamma_m.

To describe these, we no longer need a sequence of reciprocals such that every rational is a multiple of one of them: we just want to capture all rationals in \Gamma_m. But that is straightforward: instead of taking the sequence (1/n!) we could take the sequence ((1/(m!)^n)_{n=1}^\infty, or we could replace m! by the product of all the primes up to m. There are any number of things that we could do.

Decomposing a character into “non-local” HAPs.

The thing that seems to me to make this approach very promising is that for any character \chi on \mathbb{Q} it is possible to partition \mathbb{Q} into long HAPs on each of which \chi is approximately constant. As this result suggests, it is possible to decompose \chi in an efficient way as a linear combination of HAPs, which is very much the kind of thing we need to do in order to imitate the Roth proof.

I should warn in advance that it is not quite good enough for our purposes, because the HAPs we use are not “local” enough: they are sets of the form \{(a+1)x,(a+2)x,\dots,bx\} such that b-a is small, but we do not also know that a and b are small. Without that, each number is contained in infinitely many HAPs, so we no longer have the condition that enabled us to define the “density” of a set of coefficients. Later I shall present a different idea that does use “local” HAPs, but fails for a different reason. My gut instinct is that these difficulties are not fundamental to the approach, but whether that is mathematical intuition or wishful thinking is hard to say.

Before I go any further, here is an easy lemma.

Lemma. Let \chi be a character on \mathbb{Q} and let \epsilon>0. Then there exists M such that for every x there exists a positive integer n\leq M such that |\chi(x/n)-1|\leq\epsilon.

Proof. Without loss of generality x=1. Let t=\lceil 2\pi/\epsilon\rceil and let M be the lowest common multiple of the numbers from 1 to t. [Thanks to David Speyer for pointing out at Mathoverflow that the l.c.m. is around e^t, a non-negligible improvement over t!.] Then by the usual pigeonhole argument we can find a positive integer s\leq t such that |\chi(s/M)-1|\leq 2\pi t^{-1}\leq\epsilon so we can take n=M/s.

For the next result we define a HAP to be a set of the form \{ax,(a+1)x,\dots,bx\}.

Corollary. Let \chi be a character on \mathbb{Q}, let \epsilon>0 and let m be a positive integer. Then we can partition \mathbb{Q} into HAPs of lengths between m/2 and m on each of which \chi varies by at most \epsilon.

Proof. We begin by covering the integers. Find a positive integer n such that |\chi(1/n)-1|\leq \epsilon/m. Then on any HAP with common difference 1/n and length at most m \chi varies by at most \epsilon. We can partition the multiples of 1/n into HAPs of length m, and they will cover all the integers. (Indeed, they will cover all the multiples of 1/n.)

We now want to fill in some gaps. Let us write n_1=n and let us pick an integer n_2, a multiple of n_1 and greater than mn_1, such that |\chi(n_2)-1|\leq\epsilon/m. Between any two multiples of 1/n_1 there are at least n_2/n_1 multiples of 1/n_2 forming a HAP. This HAP can be partitioned into HAPs of common difference 1/n_2 and lengths between m/2 and m. If we continue this process and make sure that every positive integer divides at least one n_i (which is easy to do), then we are done.

Now a character restricted to a HAP is just a trigonometric function. If the character varies very slowly as you progress along the HAP, then convolving it with an interval (as in the Roth argument, but now we are talking about a chunk with the same common difference as the HAP) we obtain a multiple very close to 1 of the character itself. With the help of this observation, we can actually decompose the character efficiently as a linear combination of HAPs. Since this does not obviously help us, I will leave the details as an exercise.

What can we do with local HAPs?

Let us fix a positive integer m and define a HAP to be local if it is of the form \{ax,(a+1)x,\dots,bx\}, where 1\leq a\leq b\leq m. (Thus, the definition of “local” depends on m.) What happens if we have a character \chi and try to decompose its restriction to \mathbb{Q}_m as a linear combination of local HAPs (of reasonable length) on each of which \chi varies very little?

The short answer is that it is easy to cover \mathbb{Q}_m with HAPs of this kind, but it doesn’t seem to be easy to partition \mathbb{Q}_m into them. In order to achieve the partition into non-local HAPs in \mathbb{Q}, we helped ourselves to smaller and smaller common differences, and correspondingly larger and larger values of a and b. Another problem with that method was that what we are really searching for is a very nice and uniform way of decomposing characters, such as we had in the Roth proof. There the niceness was absolutely essential to getting enough cancellation for the proof to work, but it wasn’t essential to represent the identity — we could allow ourselves a bit extra as long as that extra was positive semidefinite.

So let’s not even try to partition \mathbb{Q}_m. Instead, we could simply take our character and use it as a guide to the coefficients that we will give to our HAPs.

The rough idea would be something like this. Given a character \chi and a HAP P=\{u,2u,\dots,mu\}, we choose a coefficient \lambda_{\chi,u} in some nice way that makes it large when |\chi(u)-1| is small. It could for example be \max\{1-\epsilon^{-1}|\chi(u)-1|,0\} for some suitable \epsilon>0. And then we could use convolution to represent \lambda_{\chi,u} times the restriction of \chi to P as a linear combination of sub-HAPs of P of length smaller than m but not too much smaller, and common difference u.

We would know from the lemma above that every x would be contained in at least some HAPs with large coefficients, so the restriction of every x would in some sense be catered for. I think there would be some x that were catered for too much (the precise relationship between those x and \chi remains to be worked out, but I think this will be straightforward), but I hope that the whole decomposition can be defined in a nice enough way for the function \psi_\chi that results to be the pointwise product of the original character \chi with a “nice” non-negative real function that’s bounded away from zero. More speculatively, one can hope that the coefficients have small density and that it is possible to subtract a not too small multiple of the identity from the matrix \mathbb{E}_\chi \psi_\chi\otimes\overline{\psi_\chi} and still be left with something that’s non-negative definite.

An attempt to be more precise.

In this final section I want simply to guess at a matrix decomposition that might potentially prove EDP. As I write this sentence I do not know what the result will be, but the two most likely outcomes are that it will fail for some easily identifiable reason or that the calculations will be such that I cannot tell whether it fails or succeeds.

Actually, to make the guess just a little bit more systematic, let’s suppose that for each HAP P in the class of HAPs we are considering and for each character \chi we have a coefficient \lambda_{P,\chi}. That is, corresponding to \chi we are taking the function \psi_\chi=\sum_P\lambda_{P,\chi}P. Given two HAPs P and Q, what will the coefficient of P\otimes Q be in the decomposition \mathbb{E}_\chi\psi_\chi\otimes\overline{\psi_\chi}?

Expanding this out gives us

\mathbb{E}_\chi(\sum_P\lambda_{P,\chi}P)\otimes(\sum_Q\overline{\lambda_{Q,\chi}}Q).

If we reverse the order of expectation and summation then we see that the coefficient of P\otimes Q is \mathbb{E}_\chi\lambda_{P,\chi}\overline{\lambda_{Q,\chi}}.

Now let us think what we are trying to achieve with the HAPs P and Q. Given a HAP of the form \{u,2u,\dots,mu\}, we want to use it to contribute to the representation of characters \chi for which |\chi(u)-1| is small. If \chi is such a character, then we know that the numbers \chi(u),\chi(2u),\dots,\chi(mu) vary only slowly. Therefore, if we take short subprogressions of the form \{au,(a+1)u,\dots,bu\}, then on each one \chi will be roughly constant. If we fix a length 2s+1 (which may have to be logarithmic in m or something like that), then we can represent the restriction of \chi to \{u,2u,\dots,mu\} as a linear combination of the HAPs \{(a-s)u,\dots,(a+s)u\} of length 2s+1, give or take some problems at the two ends.

Roughly speaking, the coefficient of \{(a-s)u,\dots,(a+s)u\} will be (2s+1)^{-1}\chi(au). That is, if we write P_{u,a} for the HAP \{(a-s)u,\dots,(a+s)u\}, then it is approximately true to say that the restriction of \chi to \{u,2u,\dots,mu\} is (2s+1)^{-1}\sum_a\chi(au)P_{u,a}.

Now we want to do this only if \chi(u) is close to 1. So the coefficient of P_{u,a} in the decomposition of \chi will in general be (2s+1)^{-1}\mu(\chi(u))\chi(au), where \mu is some nice function (which, if the Roth proof is anything to go by, means that it has non-negative and nicely summable Fourier coefficients) that is zero except at points that are close to 1.

What would we expect (2s+1)^{-1}\sum_{a,u}\mu(\chi(u))\chi(au)P_{u,a} to be like? Is there any chance that it is close to a multiple of \chi?

Since \chi is roughly constant on P_{u,a} whenever \mu(\chi(u)) is non-zero, the value of this function at x is roughly the same as it is if we take just singletons — that is, if we set s=0 and define P_{u,a} to be \{au\}. So the function we get should have a value at x that is close to \sum_{au=x}\mu(\chi(u))\chi(x)=\chi(x)\sum_{r=1}^m\mu(\chi(x/r)).

In other words, we get something that has the same argument as \chi but that has big modulus at a rational x if it so happens that \chi(x/r) is close to 1 for unexpectedly many positive integers r\leq m. Now this can happen, but I would think that for nearly all \chi and x, if m is not too small, then \sum_{r=1}^m\mu(\chi(x/r)) would be about its expected value of m\|\mu\|_1. So I am hoping that we will have some kind of rough proportionality statement.

Going back to the coefficient of P\otimes Q in the decomposition, let us take the two HAPs P_{u,a} and P_{v,b}. We have decided to define \lambda_{P_{u,a},\chi} to be (2s+1)^{-1}\mu(\chi(u))\chi(au), so \mathbb{E}_\chi\lambda_{P_{u,a},\chi}\overline{\lambda_{P_{v,b},\chi}} is equal to

(2s+1)^{-2}\mathbb{E}_\chi\mu(\chi(u))\mu(\chi(v))\chi(au-bv).

It is a significant problem that we are forced to consider the case where u\ne v: in the Roth proof, we did not have to look at products of APs with different common differences. But if we are to have any chance of decomposing \chi into “local” HAPs, then necessarily we must use completely different common differences to deal with rational numbers that are a long way apart in the graph. This is such a significant difference from the Roth proof that it may be a reason for this approach not working at all. However, it does look as though there is plenty of cancellation.

I think the expression we would be trying to bound is

(2s+1)^{-2}\sum_{u,v\in N(r)}\sum_{a,b\leq m}|\mathbb{E}_\chi\mu(\chi(u))\mu(\chi(v))\chi(au-bv)|,

which is perhaps better thought of as

(2s+1)^{-2}\sum_{u,v\in N(r)}\sum_{a,b\leq m}|\mathbb{E}_\chi\mu(\chi(u))\chi(au)\mu(\chi(v))\overline{\chi(bv)}|.

And our main strength would be that we would be free to choose \mu as it suited us. There would be two challenges: to obtain a good bound on the above expression, and to prove that we could subtract a reasonable-sized multiple of the identity and still be left with a positive semi-definite matrix.

I think the above counts as a calculation for which I cannot tell whether it fails or succeeds. But I hope it may be enough to provoke a few thoughts.

21 Responses to “EDP17 — are we nearly there?”

  1. Polymath5 « Euclidean Ramsey Theory Says:

    […] By kristalcantwell There is a new thread for Polymath5 here. The main theme of the post is that the problem may be closed to being […]

  2. gowers Says:

    I have realized that the Roth proof can be simplified, and that the simplification has an analogue for rationals and HAPs that makes it much easier to think about.

    In the Roth proof, a (small) technical complication came from the need to choose non-negative coefficients \alpha_{r,d}. The three criteria that lay behind the choice of these coefficients were as follows.

    (i) For each d the function \alpha_d(r)=\alpha_{r,d} should be “nice”, meaning that it has non-negative real Fourier coefficients that sum to 1.

    (ii) \alpha_{r,d} should be non-zero only if |1-\exp(2\pi i rd/N)| is small.

    (iii) For each r there should exist d\leq m such that \alpha_{r,d}\geq 1/2.

    I realized last night (and checked this morning) that we can dispense with (ii). All we actually use in the proof is (i) and (iii). Intuitively, (ii) feels necessary, because every time \alpha_{r,d} is large, it is contributing to the sum of the coefficients that we are trying to make small. However, that intuitive argument is wrong: it is (iii) that controls the cancellation in the coefficients, and as long as they cancel, we don’t care how big the sum is that cancels.

    The upshot of all that is that we can make the sort of simplification that usually sets alarm bells ringing: it is enough to take \alpha_{r,d}=1 for every r and d.

    Let me say what happens to the Roth write-up if you do that. Up to the end of page 8, the only change needed is to cross out \alpha_{r,d} wherever it appears. Also, Lemma 3.1 is no longer needed. Then at the top of page 9, the calculations now look like this. (Recall that d is being summed from 1 to m.)

    \frac1{N|P|^2}\sum_d\sum_{x,y}|\sum_r\omega^{r(x-y)}|

    =\frac1{N|P|^2}\sum_d\sum_{x,y}N\delta_{xy}

    = mN/|P|^2\leq N/4m.

    In my next comment, I’ll say what the picture now looks like for rationals and HAPs.

  3. gowers Says:

    At the end of the post, I wrote that the expression we are trying to bound is

    (2s+1)^{-2}\sum_{u,v\in N(r)}\sum_{a,b\leq m}|\mathbb{E}_\chi\mu(\chi(u))\mu(\chi(v))\chi(au-bv)|.

    In this argument, \mu was playing the role of \alpha_{r,d}. So let’s set it to 1. Then we get

    (2s+1)^{-2}\sum_{u,v\in N(r)}\sum_{a,b\leq m}|\mathbb{E}_\chi\chi(au-bv)|.

    But \mathbb{E}_\chi\chi(au-bv)=\delta_{au,bv}, so this gives us (2s+1)^{-2}|N(r)|m^2. (That is because for each a,b,u there is a unique v such that \delta_{au,bv}=1, at least if we ignore edge effects, as we are allowed to do.)

    This looks a bit disastrous, given that we were hoping for a sum of coefficients that was small compared with |N(r)|. However, I am not quite sure that it is disastrous. The first piece of potentially good news is that the decomposition we create could well be of a decent-sized multiple of the identity. Let me try to explain why.

    We are in some sense approximating each character \chi by the function

    (2s+1)^{-1}\sum_{u,a}\mu(\chi(u))\chi(au)P_{u,a}.

    Setting \mu to be identically 1, we get a function whose value at x is … actually, for technical reasons this doesn’t work — we need to do a double convolution as in the Roth proof. So let me try to reason informally. We are interested in the value of this function at x. The only u for which x belongs to some P_{u,a} are x, x/2, x/3, \dots, x/m. If \chi(u) is not within about 1/s of 1, then the sum of \chi(au) over all a such that x\in P_{u,a} is small. Otherwise, it is about s.

    We would ordinarily expect \chi(u) to be within 1/s of 1 about 1/s of the time. So we might expect to get a contribution of s for about m/s values of u. And this is them multiplied by (2s+1)^{-1}, so the total contribution at x ought to be around $m\chi(x)/s,$ up to a constant. But that means that we ought to be approximating (m/s)^2 times the identity rather than the identity itself. Therefore, we want our sum of coefficients to be small compared with m^2|N(r)|/s^2.

    Tantalizingly, what we actually have is equal to this, up to a constant.

    What I hope this means is that any observation that gives even a tiny improvement to what is going on will solve EDP.

    But what possible room is there for improvement? The one place I can see is this: I’ve been talking about representing the identity — indeed, the fact that one could talk about the identity was the whole reason for getting excited about the rationals — but we still have considerable flexibility in our choice of matrix. Indeed, it seems that we have not in fact represented the identity, and my impression is that what we have actually done is represent something slightly “larger” than the identity (because for some \chi and some x we have “too many” u=x/r such that \chi(u) is close to 1, but I think we never have “too few”). Can we somehow squeeze out just the tiniest bit more from the argument?

    I have another idea for getting an improvement, but it is a bit harder to explain. I’ll save it for another comment.

  4. gowers Says:

    I think I have an explanation for why the Roth argument gives an improvement on the trivial bound, whereas the argument for HAPs and the rationals so far appears not to. If this explanation is correct, then it suggests a way of trying to improve the HAPs argument.

    In the Roth argument, there are two things one might think of doing in order to approximate the rank-1 matrix e_\alpha\otimes\overline{e_\alpha}, where e_\alpha is a character on \mathbb{Z}. At our disposal is the fact that for each d\leq m the function e_\alpha*\pi_d*\pi_d is a non-negative real multiple of e_\alpha, and it is a linear combination of HAPs. Moreover, for some values of d it is quite a large multiple. So either we could take the function \sum_{d\leq m}e_\alpha*\pi_d*\pi_d (which will be a non-negative multiple of e_\alpha) and then tensor it with itself, or we could look at the function \sum_{d\leq m}(e_\alpha*\pi_d*\pi_d)\otimes(e_\alpha*\pi_d*\pi_d). It turns out to be very important to do the latter. The reason it helps is that in the decomposition we only ever have products P\otimes Q of APs with the same common difference.

    I haven’t checked, but I’m pretty sure that if we went down the first route, so that we would have to consider arbitrary pairs of common differences up to m, then we would lose a factor m in the sum of the absolute values of the coefficients. This factor of m is precisely what gives us a non-trivial bound at the end of the proof.

    Now let’s switch to HAPs and the rationals. Here it is essential to consider products of HAPs with different common differences, or at least it is essential if we want a “local” argument. The reason is that if x and y are far apart in the graph, then there just isn’t a common difference u such that both x and y are contained in local HAPs of common difference u. The result of this is that in the expression near the beginning of the previous comment, we ended up having to sum over all pairs a,b\leq m.

    But what if we could somehow get away with looking only at o(m^2) pairs? This is still a slightly vague thought, so let me say what I can and then stop. In the Roth case, we exploit the fact that if e_\alpha is roughly constant on a progression of common difference d, then it is roughly constant on any translate of that progression. That allows us to restrict attention to pairs of progressions with the same common difference.

    What I would like to have is something similar for HAPs and characters. It seems to be far too much to hope that if \chi is roughly constant on a HAP that contains x then we can deduce that it is roughly constant on some other HAP that contains y. However, perhaps we can at least find a class of HAPs of size o(m) and say that on at least one HAP in that class \chi must be roughly constant, or perhaps something weaker such as that that is true for almost all \chi.

    The trouble is that if x and y are far apart in the graph, then it feels as though we don’t have any correlation at all. So I’m not altogether sure what it is that I am trying to say here. But the general point remains that we seem to have a non-trivial decomposition of something like the identity into HAPs, and if we can make it just the teeniest bit more efficient then we should be in extremely good shape.

  5. Kristal Cantwell Says:

    “What I would like to have is something similar for HAPs and characters. It seems to be far too much to hope that if \chi is roughly constant on a HAP that contains x then we can deduce that it is roughly constant on some other HAP that contains y. ”

    Could \chi be chosen so it satisfies \chi (x)\chi (y)=\chi (xy)? Then the above problem looks like it could be solved by multiplying by y/x.

    • gowers Says:

      That’s certainly an interesting thought — to use characters on the multiplicative group of non-negative rationals instead. It wouldn’t instantly solve the problem, because writing a multiplicative character in terms of APs (let alone HAPs) is likely to be much more challenging than decomposing an additive character. But it may be that there is some way of doing it, and then the fact that the set of HAPs of length m naturally gives rise to a multiplicative structure would fit in well with the decomposition.

      In a way, this illustrates the fundamental difficulty of EDP: that it mixes multiplication and addition in a strange way. Somehow we have to get from one to the other, and what you are suggesting could be thought of as trying to deal with the difficulty in a different place. Definitely worth thinking about.

    • Klas Markström Says:

      Are there any theorems saying that a +1,-1-valued function on the positive integers must have large discrepancy either along HAPs or geometric progressions? That is certainly true for multiplicative functions.

      Given that the multiplicative and additive groups are connected by the log-mapping we know what the characters of the multiplicative group are. But using that right seems to require quite a bit of untangling.

    • gowers Says:

      I’ve thought about Kristall’s suggestion a bit more and I now think that it cannot work. Or perhaps it is more accurate to say that it can work if and only if we manage to solve some problems that we are already stuck on.

      Let me try to explain. The characters on the multiplicative group of \mathbb{Q}_+ are just the completely multiplicative functions from \mathbb{Q}_+ to \mathbb{T}, (each of which is determined by its behaviour on \mathbb{Z}). To represent a completely multiplicative function \mu as a linear combination of HAPs, we would need at least some kind of correlation between \mu and some HAPs. But that requires one to solve EDP (positively) for completely multiplicative functions.

      That does, however, suggest that if we knew a sufficiently strong statement about completely multiplicative functions, then we could use the Roth method to deduce EDP for general functions. It seems that for this to work we would need the partial sums of a completely multiplicative complex-valued function to be large on average. Does this ring any bells anyone? It seems that we are being inexorably drawn to Terry’s observation that EDP follows from a suitable statement about completely multiplicative functions taking values in \mathbb{T}. My guess is that if we pushed this idea, we would end up with a proof that is essentially the same as the argument Terry gave.

    • Kristal Cantwell Says:

      One idea I have is to assume the character function is completely multiplicative use that to get a result that implies that any completely multiplicative function is unbounded and then prove the result that any bounded EDP implies the existence of a bounded completely completely multiplicative function and thus get the desired result. I don’t know if assuming the character function is completely multiplicative improves the situation.

  6. gowers Says:

    I had a thought in the negative direction concerning this approach — not negative enough to kill the approach but certainly negative enough to place quite strict limits on how well it could work.

    The thought is that whereas Roth’s theorem gives you an AP of length m on which the discrepancy is cm^{1/2}, we know that we cannot achieve that for HAPs and functions on the rationals. The reason is that we have our old friends, the completely multiplicative functions \lambda_3 and \mu_3, which extend easily to the rationals. As a reminder of how this is done, let me define the simpler of the two (I can’t remember whether this was \mu_3 or \lambda_3). Given a rational $p/q,$ we can write it in the form 3^kr/s, where k\in\mathbb{Z} and neither r nor s is divisible by 3. We then define f(p/q) to be 1 if rs^{-1}\equiv 1 mod 3 and -1 if rs^{-1}\equiv 2 mod 3.

    This function is completely multiplicative, and as we have worked out many times, the sum along any HAP of length m is at most \log_3m or thereabouts.

    It follows that the best improvement we can hope for over the trivial bound is logarithmic in the lengths of the HAPs that are used in the decomposition.

  7. Uwe Stroinski Says:

    Formally, in the Roth case we get the group acting on \mathbb{Z} by considering T(t)=\sum_{n\in\mathbb{Z}} e^\frac{2 \pi i n t} {d}P_n with P_n being the rank one projections from the ROI approach and d the period (the common difference of the AP). This is indeed a group homomorphism since T(t)T(s):=\sum_{n\in\mathbb{Z}} e^\frac{2 \pi i n(t+s)}{d}P_n=T(t+s) by orthogonality and T(0)=I. Thus we have represented a (periodic) group acting on \mathbb{Z} (the domain of the Fourier coefficients) as a sum of rank one operators. The same holds true for the corresponding Fourier series, at least if one interprets the representation correctly. Is it just a gut feeling or does ROI more than it claims? We represent the whole group and not just identity. If this is true we might get some information if we try to construct a group from I = \mathbb{E}_\chi \chi \otimes \bar{\chi}. The following guess T(t) = \mathbb{E}_\chi \chi\left(\frac{t}{m}\right) \chi \otimes \bar{\chi} with m as in your introduction and t from \mathbb{Q}_m cannot be correct, since we do not have an m-periodic action. At least I do not see it.

  8. gowers Says:

    I hoped I could have a precise thought here, but at the moment it’s a bit vague. If it is correct to say that we want to improve on a “trivial” bound (actually, not so trivial but that’s another story) and that there is no hope of improving on it by more than a logarithmic factor, then searching for any place where logs might come in seems to be a good idea. And there is one that I can think of. But at the moment I don’t see how to work it in.

    The little observation is this. Suppose you have a rational x and a character \chi and you know, for some integer a\leq m, that \chi(x/a) is close to 1. What can you say about \chi(x/2a)? You can’t say that it is close to 1, but you can say that it is close to 1 with probability 1/2 (if \chi is chosen randomly), since it has to be close to a square root of 1. More generally, \chi(x/ra) is close to 1 with probability r^{-1}. So the expected number of r\leq m/a such that \chi(x/ra) is close to 1 is about \log(m/a).

    The problem we were having earlier was that whereas we had “long-range correlations” in the Roth proof (if e_\alpha is roughly constant on P then it is roughly constant on all translates of P) there seemed to be nothing comparable here. But perhaps the observation above is saying that there are some tiny correlations after all, and perhaps they can be exploited to get a correspondingly (and necessarily) tiny improvement on the trivial bound.

  9. gowers Says:

    Here’s a different but related thought. What I talked about in the previous comment is not really “long-range” correlations after all, since we are talking about correlations between values of \chi at points that are close to each other in the graph. (And it is fairly easy to see that those are the only pairwise correlations that exist, so to make use of them one would perhaps have to consider correlations between values at more points.)

    But if we were dealing with completely multiplicative functions, then the functions themselves would have long-range correlations all over the place. So couldn’t we get rid of our difficulties by looking at the multiplicative case? And since that is morally equivalent to the whole problem (in the sense that any proof is likely to give not just a single HAP with big discrepancy, but unbounded discrepancy on average, and that would imply EDP), isn’t it worth thinking rather hard about that case, and in particular about whether it is possible to get the ROI (on the rationals) approach to work for it?

    The question is, how would one exploit the fact that we only cared about completely multiplicative functions? I think the answer would be something like this. When we express the identity as \mathbb{E}_\chi \chi\otimes\overline{\chi}, we do so because we need a matrix M for which we can argue that \langle f,Mf\rangle is large for every \pm 1 function f. But what if we just wanted a statement like that for completely multiplicative functions f? It seems that we could then afford to throw away a lot of (additive) characters \chi, because we would know that they couldn’t correlate with a completely multiplicative function.

    Let me be a bit speculative. Suppose we identify a class K of characters \chi that have “multiplicative structure” and that we can show that any \chi that does not belong to K has only a tiny correlation with any completely multiplicative function. Then the matrix \mathbb{E}_\chi 1_{\chi\in K}\chi\otimes\overline{\chi} would appear to be almost indistinguishable from the identity as long as one applied it only to completely multiplicative functions.

    Now we do have some examples of characters that correlate with completely multiplicative functions. For instance, the …

    … actually, I’m not sure I like what I was about to write there. The example I had in mind (\mu_3 and the function $x\mapsto e(x/3)$) is too much of an integers example as opposed to a rationals example. Also, I’m a bit worried that when we express a function as an integral combination of characters, we may have to use lots of characters that don’t correlate well with the function (just as the point (n^{-1/2},...,n^{-1/2}) has norm 1 despite having only small correlation with any vector from the standard basis). It seems that we would have to show that the correlation between a non-special additive character and a completely multiplicative function was not just small but very small.

    Suppose we could somehow do that. The hope would be that special additive characters had long-range correlations for HAPs — not as strong as the perfect correlations you get with completely multiplicative functions, but good enough to be exploited.

    Here’s a very brief thought about what would count as “very small” correlation. Suppose we have a function f that takes values of modulus 1 everywhere on N(r). Then its \ell_2 norm (on that set) is |N(r)|^{1/2}. So if we are representing f as \mathbb{E}_\chi \langle f,\chi\rangle\chi, then we can afford to disregard all \chi for which \langle f,\chi\rangle is substantially smaller than |N(r)|^{1/2}.

    So a preliminary question is this. Suppose you have an additive character \chi on the rationals and you know that its correlation with some completely multiplicative function is at least |N(r)|^{1/2} on the set N(r). What can you say about \chi? I have an unpleasant feeling that the answer may be nothing at all: if you look at the mean square over all completely multiplicative functions, then I think you will get |N(r)| (by representing the identity in terms of multiplicative characters).

    I’m not sure this line of thought is going anywhere.

  10. The plan « Random Thoughts Says:

    […] finished and with bearable temperatures returning I plan to spend some free time to think about the Erdös Discrepancy Conjecture. Is it wise to post this plan? I think so. It puts some pressure on me and this is […]

  11. Alec Edgington Says:

    Can we learn anything useful by considering EDP for the cyclic group \mathbb{Z}_M instead of \mathbb{Z}?

    For f : \mathbb{Z}_M \to \{\pm 1\}, define the (zero-based) discrepancy

    \delta (f) = \sup \{ \lvert \sum_{0 \leq r < k} f(rd) \rvert : d \in \mathbb{Z}_M \setminus \{0\}, k \in \mathbb{N}\},

    and let

    \delta_M = \inf \{ \delta(f) : f : \mathbb{Z}_M \to \{\pm 1\}\}.

    A couple of observations:

    (1) If \delta_M < \infty, then M is a power of 2.

    (Suppose M is not a power of 2. Write M = 2^k m where m > 1 is odd. Then \sum_{0 \leq r < m} f(r 2^k) has an odd number of terms, so is non-zero, so the discrepancy with respect to the difference 2^k is unbounded.)

    (2) \delta_{2^k} \leq \lfloor \frac{k}{2} \rfloor + 1.

    To show this, first define the function q : \mathbb{N} \to \{\pm 1\} by q(0)=1, q(4n+1)=1, q(4n-1)=-1, and q(2n)=-q(n) (for n \geq 1). Then define q_k : \mathbb{Z}_{2^k} \to \{\pm 1\} by q_k(n) = q(n \mod 2^k) if k is even, and q_k(n) = q(2n \mod 2^{k+1}) if k is odd. Then \delta (q_k) = \lfloor \frac{k}{2} \rfloor + 1. This follows from the observations: that q is completely multiplicative (on the positive integers); that q(4^k a + m) is equal to q(m) except when m = 0 (when it equals q(a)) or m  = 4^k/2 (when it equals (-1)^{a+1}); and that \sum_{0 \leq r < 4^k} q(r) = 0.

    It would be interesting to know whether this bound is best possible. In other words, is it true that for every function f : \mathbb{Z}_{2^k} \to \{\pm 1\}, \delta (f) > \frac{k}{2}? If it is true, this seems like a simple enough problem to be tractable, and I wonder if the proof could shed some light on EDP for \mathbb{Z} when we let k \to \infty.

    In particular, I wonder if a ‘representation of the diagonal’ proof might work over \mathbb{Z}_M, which is isomorphic to its own character group. I haven’t yet thought about this in any detail.

    • gowers Says:

      Alec, I find this thought interesting (despite taking ages to respond to it). But at the moment I’m being slow to understand your point (1). Would it be possible to make both the statement and its proof more precise?

      A slight variant of what you suggest could also fit in rather well with what I was trying to do over \mathbb{Q}, while avoiding some of the technical problems. That would be to work in \mathbb{Z}_p for some very large prime p, but to restrict the lengths of the HAPs to at most m, for some much smaller, but unbounded, m. Basically, m would be thought of as a fixed, but large, constant, while p tended to infinity. I don’t instantly see how a lower bound for this problem would give a lower bound for EDP, but the converse clearly holds, and it’s hard to believe that a non-trivial lower bound for this problem wouldn’t shed light on EDP.

    • Alec Edgington Says:

      Here’s another way to think about it which may be clearer. Consider functions f : \mathbb{N} \to \{\pm 1\} that are periodic with period M. Obviously the discrepancy of such a function with respect to HAPs of common difference d is unbounded if d is a multiple of M. But we can ask whether it is possible for the discrepancy to be bounded with respect to all other d.

      This is only possible if M is a power of 2. For if M = 2^k m where m > 1 is odd, then f has unbounded discrepancy with respect to HAPs with common difference 2^k (since \sum_{r=0}^{Nm-1} f(2^k r) = N \sum_{r=0}^{m-1} f(2^k r), which tends to \pm \infty as N \to \infty because the factored-out sum is the sum of an odd number of \pm 1 terms and so non-zero).

      The second observation is that it is possible when M is a power of 2 — and that we can get the discrepancy as low as \lfloor \frac{k}{2} \rfloor + 1 if M = 2^k, by simply repeating the first 2^k terms of the function q(n) (if k is even) or of the function q(2n) (if k is odd).

      It may be more interesting to consider functions f : \mathbb{N} \to \mathbb{T}. Then of course the first observation is no longer valid.

    • gowers Says:

      I see. It didn’t occur to me that you were allowing your HAPs to have length greater than M, so that they could actually repeat themselves. It seems to me that asking exactly the same question but restricting the length of the HAP to be at most M leads to a more natural question.

      Actually, I take that back. In \mathbb{Z}_M we can express any AP with common difference coprime to M as a difference of two HAPs, so if e.g. M is prime then the question becomes equivalent to Roth’s theorem.

  12. gowers Says:

    I’d like to make it clear that even though this project is moving at a snail’s pace compared with how it was moving before, I do not regard it as closed. I am not thinking about it continuously, but I do intend to return to it from time to time and I hope others will too. I’m writing this now because I had a little session with the problem a couple of hours ago and I’d like to report on my (far from conclusive) thoughts.

    A quick reminder of what I was trying to do in some of the comments above. I wanted to find a representation-of-diagonal approach, and had noted an idea that potentially made the task much easier. That idea was that working over the rationals could eliminate one of the main difficulties of the approach — understanding what the diagonal matrix should be. The point is that if one works with the integers up to N, then there is a hugely important distinction between smooth integers and non-smooth ones. But in \mathbb{Q}_+ all numbers are equal, so it becomes reasonable to search for a representation of the \textit{identity}.

    I was trying to do this by using what we had learned from the representation-of-identity proof of Roth’s theorem about AP-discrepancy. The basic idea there is to start by decomposing the identity using characters and then decomposing the characters. This seemed reasonably promising, but eventually I got a bit stuck. But it now occurs to me that we could attack the problem from the other end. We could ask ourselves a question of the following kind: if the approach works, what properties will the resulting decomposition have, and does there actually exist a decomposition with those properties?

    The real question there is the second one. We could use the previous ideas as a guide to tell us, non-rigorously, what sort of decomposition we might expect, and then we could search directly for a decomposition that is much less general than an arbitrary decomposition, or else try to prove that the much more specific decomposition cannot exist (which would then inform all subsequent efforts to find a decomposition).

    In order to keep my comments bite-sized, I’ll stop this one here, and make a more technical comment on this theme separately.

  13. gowers Says:

    Right, here is a more technical expression of what I was saying in my previous comment.

    Let’s suppose we want to write the identity on \mathbb{Q} as a linear combination of products of HAPs. What I want to think about here is whether there is some “beautiful formula” that “magically works”. I’ll ignore, for the moment, any concerns about the fact that \mathbb{Q} is infinite and that certain sums might not converge.

    First of all, here is a very quick argument that we cannot get away with HAPs that are all of the form \{x,2x,\dots,mx\} for some fixed m. To see this, note that we could find a multiplicative function that happens to have a small partial sum at m, and it will not correlate with any such HAP. But a representation-of-diagonals proof shows that every \pm 1 function correlates with an HAP that is used in the decomposition.

    The next simplest thing one could hope for is to use HAPs that are of length bounded above by m. Since one can subtract one HAP from another, it seems nice to allow HAPs of the form \{ax,(a+1)x,\dots,bx\}, where both a and b are at most m. However, choosing coefficients that depend on a, b and m seems nasty.

    It can be made a bit nicer if we consider functions of the form f_{\alpha,d}, which I define to take x to e(\alpha x/d) if x\in\{d,2d,\dots,md\} and to 0 otherwise. That is, f_{\alpha,d} is the pointwise product of the HAP \{d,2d,\dots,md\} with a trigonometric function. If we do this, it is important to restrict \alpha in such a way that this function can be efficiently decomposed into HAPs. A sufficient condition for this is that \alpha should be bounded above by a constant that tends to zero (so that e(\alpha s) is approximately constant on intervals of unbounded length).

    So let’s ask the following question. Can we find a decomposition of the form

    I=\sum_{d\in Q}\int_\alpha \lambda_{\alpha,d}f_{\alpha,d}\otimes\overline{f_{\alpha,d}}d\alpha?

    The value of the right-hand side at (x,y) then works out to be

    \sum_{d\in Q_{x,y}}\int_\alpha\lambda_{\alpha,d} e(\alpha(x-y)/d),

    where Q_{x,y} is the set of all d such that both x and y are of the form rd for some r\leq m. So we would like this to be 1 if x=y and 0 otherwise.

    I’ll say more about this later, but have to go now.

  14. gowers Says:

    I haven’t yet explained the main point of the calculations in the previous comment. They are that if we are searching for a pretty formula, then it makes sense to give some symmetry to the coefficients \lambda_{\alpha,d}. And the most obvious way of doing that is to make them independent of d. That is, we could search for a decomposition of the more specific kind

    \delta_{xy}=\sum_{d\in Q_{x,y}}\int_\alpha\lambda_\alpha e(\alpha(x-y)/d)d\alpha.

    Also, we might, in order to give ourselves a bit more flexibility, the opportunity for smooth cutoffs, etc., like to replace the sum over d\in Q_{x,y} by a sum with weights. That is, we might like to consider an expression of the form

    \sum_d \mu_{x,y,d}\int_\alpha\lambda_\alpha e((x-y)/d)d\alpha.

    Again, if we are searching for coefficients that are independent of d, we might like to place restrictions on \mu. Note that we are already hoping that \mu will be concentrated in the set of (x,y,d) such that d\in Q_{x,y}, which is the set of (x,y,d) such that x and y are small multiples of d. So it feels natural to make \mu{x,y,d} a function of x/d and y/d. A nice aspect of that is that the integral is also a function of x/d and y/d.

    If we do this, then I think the entire expression depends on x/y only. Let me just check that. Yes, it’s clear, since if we multiply x and y by r, we can just substitute rd for d and end up with the original expression.

    I now need to do a little bit of paper calculation in order to work out a nice way of expressing the matrix as a function of x/y. Then one can think about how to make that function 1 at 1 and 0 everywhere else.

Leave a comment