EDP16 — from AP-discrepancy to HAP-discrepancy?

In this post I want to elaborate on a strategy for proving EDP that I discussed in EDP15. Briefly, the idea is to take a representation-of-identity proof of Roth’s AP-discrepancy result and modify it so that it becomes a representation-of-diagonal proof of unbounded HAP-discrepancy.

The first step of this programme was an obvious one: obtain a clean and fully detailed proof in the APs case. That has now been completed, and a write-up can be found here. For the benefit of anyone who is interested in thinking about the next stage but doesn’t feel like reading a piece of formal mathematics, let me give a sketch of the argument here. That way, this post will be self-contained. Once I’ve given the sketch, I’ll say what I can about where we might go from here. It is just possible that we are in sight of the finishing line, but that is unlikely as it would depend on various guesses being correct, various potential technical problems not being actual, various calculations giving strong enough bounds, and so on. Thus, the final part of this post will be somewhat speculative, but I will make it as precise as I can in the hope that it will give rise to a fruitful line of enquiry.

Roth’s theorem on AP-discrepancy.

Recall that Roth proved that for every function f:\{1,2,\dots,n\}\rightarrow\{-1,1\} there is an arithmetic progression P\subset\{1,2,\dots,n\} such that |\sum_{x\in P}f(x)|\geq cn^{1/4}, and that this result was shown to be sharp by Matousek and Spencer (who removed a log factor from an upper bound of Beck). Roth actually proved a more precise result: the discrepancy is attained on a progression of length at most \sqrt{n}. In this post I shall sketch an argument that shows that for any m\leq c\sqrt{n} we can find an AP P of length at most m and common difference at most m such that |\sum_{x\in P}f(x)|\geq c\sqrt{m}. (I haven’t actually checked, but I’m 100% confident that this slightly stronger result would also follow from Roth’s proof, and also from a proof due to Lovász that uses the semidefinite programming method that we’ve discussed at length in the past.)

Recall first some notation. If A and B are subsets of [n]=\{1,2,\dots,n\}, then I shall also write A and B for their characteristic functions. In general, given two functions u and v defined on [n], u\otimes v will stand for the rank-1 n\times n matrix (u\otimes v)(x,y)=u(x)v(y).

Suppose now that we can find a diagonal matrix D, arithmetic progressions P_1,\dots,P_M and Q_1,\dots,Q_M and coefficients \lambda_1,\dots,\lambda_M such that D=\sum_i\lambda_i P_i\otimes Q_i. Let the diagonal entries of D be real numbers b(1),\dots,b(n). Then for any function f defined on [n] we have

\sum_xb(x)|f(x)|^2=\langle f, Df\rangle=\sum_i\lambda_i\sum_{x\in P_i}f(x)\sum_{y\in Q_i}f(y).

Let R be the ratio \sum_xb(x)/\sum_i|\lambda_i|. It follows from the above identity that there exists i such that

|\sum_{x\in P_i}f(x)\sum_{y\in Q_i}f(y)|\geq R,

and hence that there exists an arithmetic progression P such that |\sum_{x\in P}f(x)|\geq R^{1/2}.

That is the representation-of-diagonal method in its purest form. However, as Moses pointed out (I haven’t actually found the precise comment), it is enough to obtain an identity of the form D+S=\sum_i\lambda_i P_i\otimes Q_i, where S is positive semidefinite, and everything else is as before. That is because \langle f,(D+S)f\rangle\geq\langle f,Df\rangle, so including S does not make things worse for us. And in fact, it turns out to introduce a useful extra flexibility.

The decomposition.

We start with the following simple lemma. Up to now, the inner product has been the obvious one \langle f,g\rangle=\sum_xf(x)\overline{g(x)}. But from now on it is more convenient to take the inner product \mathbb{E}_xf(x)\overline{g(x)} (where \mathbb{E}_x is, as usual, shorthand for n^{-1}\sum_x).

Lemma 1. Let u_1,\dots,u_n be an orthonormal basis of \mathbb{C}^n. Then I_n=\mathbb{E}_ru_r\otimes\overline{u_r}.

Proof. Let U be the unitary matrix with rows u_1,\dots,u_n. The entry at (x,y) of \mathbb{E}_ru_r\otimes\overline{u_r} is \mathbb{E}_ru_r(x)\overline{u_r(y)}, which is the inner product of the xth and yth columns of U. Since the columns of a unitary matrix also form an orthonormal basis, this gives us \delta_{xy}, as required.

Corollary 2. For each 0\leq r\leq n-1 let \omega_r be the function defined by \omega_r:x\mapsto\exp(2\pi irx/n). Then I_n=\mathbb{E}_r\omega_r\otimes\overline{\omega_r}.

Proof. The functions \omega_r form an orthonormal basis.

The plan now is to decompose each trigonometric function \omega_r into a linear combination of characteristic functions of suitably chosen APs. This automatically gives us a decomposition of each matrix \omega_r\otimes\overline{\omega_r} into a linear combination of products P\otimes Q. We then find that each individual P\otimes Q is used for several different rs, and when we add up the coefficients there is a considerable amount of cancellation. It is that cancellation that gives us a good upper bound on \sum_i|\lambda_i| and leads to the results claimed above.

At this point, let us fix m. We would like to decompose \omega_r into a linear combination of arithmetic progressions mod n with common difference at most m. It will turn out that their cardinalities will be at most m as well. Provided that m is at most \sqrt{n}, this will mean that each AP wraps around (that is, passes zero) at most once, so we can split it into two genuine APs.

Given r, how do we choose a suitable common difference? Well, in an earlier and less successful version of this argument we chose for each r a d such that 0<d\leq m and \|rd/n\|\leq m^{-1}. Here \|\alpha\| stands for the distance from \alpha to the nearest integer. This estimate implies that |1-\exp(2\pi ird/n)|=|1-\omega_r(d)|\leq 2\pi m^{-1}, and that (together with the triangle inequality) implies that |1-\omega_r(kd)|\leq 2\pi km^{-1}, and more generally that |\omega_r(x)-\omega_r(x+kd)|\leq 2\pi km^{-1}. In more qualitative terms, adding a small multiple of d to x does not change the value of \omega_r(x) very much.

Let P be the interval [-t,t] inside \mathbb{Z}_n, where t=cm for some smallish absolute constant c. Let P_d=dP and let \pi_d be the characteristic measure of P_d — that is, the function that takes the value n/|P_d|=n/(2t+1) on P_d and 0 outside P_d. From the observation just made it is straightforward to prove the following lemma. The fact that \lambda_{r,d} is real follows from the symmetry of P_d.

Lemma 3. Let r\in\mathbb{Z}_n. Suppose that 0<d\leq m and \|rd/n\|\leq m^{-1}. Then there is a real constant \lambda_{r,d} between 1/2 and 1 such that \omega_r*\pi_d=\lambda_{r,d}\omega_r.

The next lemma is also easy to prove. Note that we have dropped the hypothesis that \|rd/n\| is small and weakened the conclusion accordingly.

Lemma 4. Let r\in\mathbb{Z}_n and suppose that 0<d\leq m. Then there is a real constant \lambda_{r,d} such that \omega_r*\pi_d=\lambda_{r,d}\omega_r.

It will be problematic for us if \lambda_{r,d} is negative, but a simple trick makes sure that that never happens: we simply convolve with \pi_d twice. By the two lemmas above, we may conclude that \omega_r*\pi_d*\pi_d=\mu_{r,d}\omega_r, where \mu_{r,d} is always non-negative, and is at least 1/4 when \|rd/n\|\leq m^{-1}. (That is because \mu_{r,d}=\lambda_{r,d}^2.)

A discrepancy bound of n^{1/8} can be obtained if for each r we choose 0<d\leq m such that \|rd/n\|\leq m^{-1} (which is possible by the usual pigeonhole principle argument), and replace \omega_r\otimes\overline{\omega_r} by (\omega_r*\pi_d*\pi_d)\otimes(\overline{\omega_r}*\pi_d*\pi_d), which can be seen to be a linear combination of functions of the form P\otimes Q.

The main ingredient that improves the bound from n^{1/8} to n^{1/4} is instead to choose rather carefully some coefficients \alpha_{r,d} and to replace \omega_r\otimes\overline{\omega_r} by \sum_d\alpha_{r,d}(\omega_r*\pi_d*\pi_d)\otimes(\overline{\omega_r}*\pi_d*\pi_d). For each r, the earlier proof took \alpha_{r,d} to be 1 for precisely one d and 0 otherwise. What we do instead is approximate this in such a way that for each d with 0<d\leq m the function r\mapsto\alpha_{r,d} is "sufficiently smooth".

The next lemma, which I shall state without proof (the proof is straightforward), gives a good idea of what we mean by "sufficiently smooth". Given a function g:\mathbb{Z}_n\rightarrow\mathbb{C}, we define its Fourier transform \hat{g} by the formula \hat{g}(r)=\mathbb{E}_xg(x)\exp(2\pi i rx/n).

Lemma 5. Let t\leq n/2 and define a function g:\mathbb{Z}_n\rightarrow\mathbb{R} by g(x)=\max\{1-|x|/t,0\} whenever -n/2\leq x\leq n/2. Then \hat{g}(r) is non-negative and real for every r, and \sum_r\hat{g}(r)=1.

From this and easy Fourier manipulations, we know that the function g_d(x)=g(dx) also has these two properties: that is, \hat{g}_d(r)\geq 0 for every r and \sum_r\hat{g}_d(r)=1.

We now take t to be an integer approximately equal to N/m, and we define \alpha_{r,d} to be g(rd) (the dependence on t is via the definition of the function g).

We note here that for a typical r there will be O(1) values of d such that \alpha_{r,d}\geq 1/2, and that the usual pigeonhole argument gives us at least one. (There are some r for which there are more values of d with rd small, but this does not matter.) On the other hand, since \alpha_{r,d} is nothing other than g_d(r), we know that for each fixed 0<d\leq m the function r\mapsto\alpha_{r,d} has non-negative Fourier coefficients that sum to 1. These properties are what are needed for the rest of the calculations to work.

We then take the matrix

M=\mathbb{E}_r\sum_{0<d\leq m}\alpha_{r,d}(\omega_r*\pi_d*\pi_d)\otimes(\overline{\omega_r}*\pi_d*\pi_d).

Since for each r there is at least one d with \alpha_{r,d}\geq 1/2, the remark following Lemmas 3 and 4 tells us that this matrix has the form \mathbb{E}_r\gamma_r\omega_r\otimes\overline{\omega_r}, with each \gamma_r at least 1/8. It follows that M-I_n/8 is positive semidefinite, since it equals \mathbb{E}_r(\gamma_r-1/8)\omega_r\otimes\overline{\omega_r}.

It remains to bound from above the sum of the absolute values of the coefficients of the various products of progressions used in the above decomposition. I won’t reproduce those here, but the Fourier estimates for the functions r\mapsto\alpha_{r,d} turn out to come in and allow us to calculate the sum exactly. It equals Nm/|P|^2. Since |P| is proportional to m, this is proportional to N/m, a gain of a factor proportional to m on the trace N/8 of I_n. It follows that the discrepancy bound we get is proportional to \sqrt{m}. When m=\sqrt{n} this gives us n^{1/4}.

From APs to HAPs.

Is there any possibility of adapting the above proof to give a proof of EDP? I think there is, and in this final section I shall try to explain why.

To begin with, let us set D to be the diagonal matrix with f down the diagonal. (By f I mean the function discussed in the previous post and set of comments, defined by the properties f(1)=1 and f(m)=\sum_{d|m, d<m}f(d).) Now let us suppose that we could find an efficient HAP-decomposition of a matrix of the form
D^{1/2}(\mathbb{E}_r\gamma_r\omega_r\otimes\overline{\omega_r})D^{1/2} with every \gamma_r\geq 1. The inner matrix is of the form I_N+S with S positive semidefinite, so this would give us an efficient decomposition of D+D^{1/2}SD^{1/2}=D+T, where T is also positive semidefinite. Now the matrix we would like to decompose can also be written as \mathbb{E}_r\gamma_r(D^{1/2}\omega_r)\otimes(\overline{D^{1/2}\omega_r}).

It is therefore tempting to look for a proof that is similar to the proof of Roth’s discrepancy theorem above: we try to decompose each function D^{1/2}\omega_r (the value of which at x is f(x)^{1/2}\exp(2\pi irx/N)) in an efficient and unified way as a linear combination of characteristic functions of homogeneous arithmetic progressions rather than just arithmetic progressions. And then we hope that there will be plenty of cancellation in the coefficients.

Now the decomposition of \omega_r into APs very definitely used non-homogeneous ones. So why is there any hope of decomposing \sqrt{f}\omega_r into HAPs? Well, the kind of thing one might try is this. Choose a small m, and for each r choose some 0<d\leq m such that \|dr/n\|\leq m^{-1}. Let P be (as before) the interval [-cm,cm] and let \pi_d be (as before) the characteristic measure of dP. Then \omega_r=\lambda_{r,d}\omega_r*\pi_d, which tells us that \omega_r can be written as a (very nice) linear combination of translates of P_d. The trouble is that most of these translates are non-homogeneous.

However, if we choose m to grow to infinity so slowly that it can be thought of as a constant (this is a standard trick — treat m as a large constant, and as long as for sufficiently large n we can get a discrepancy lower bound that tends to infinity with m, we are done), then the fraction of translates of P_d that are non-homogeneous is “only” 1-1/d\leq 1-1/m. Furthermore, and much more importantly, \sqrt{f} is on average very small on the complement of the HAP \{d,2d,3d,\dots,\lfloor n/d\rfloor d\}. So perhaps we can afford to ignore the decomposition outside this HAP.

How would we decompose the part inside the HAP? Well, let’s suppose we were trying to decompose f\omega_r rather than \sqrt{f}\omega_r. (We could do this, and the result would be a decomposition of a matrix D^2+U with U positive semidefinite. I don’t know whether changing the coefficients from f(x) to f(x)^2 would be a problem.) Since f is naturally expressed as a Dirichlet convolution (except at 1) we have a natural decomposition of f into HAPs. I think we should also have a natural decomposition into HAPs of the restriction of f to \{d,2d,3d,\dots,\lfloor n/d\rfloor d\}. So if we use that, and then take the coefficients \alpha_{r,d} from the Roth argument (or perhaps something else tailored to the new situation), then … just perhaps … we will have an efficient decomposition that gives a good approximation to D^2+U. And if we want more than just a good approximation, then we could decompose the part of \omega_r that lies outside \{d,2d,3d,\dots,\lfloor n/d\rfloor d\} into functions supported on singletons. This would be inefficient, but (after cancellation of the contributions from different r) the total weight of the inefficiency should be small.

50 Responses to “EDP16 — from AP-discrepancy to HAP-discrepancy?”

  1. Alec Edgington Says:

    How can we decompose f efficiently into HAPs? Let T_{kd} be the characteristic function of \{d, 2d, \ldots, kd\}. One idea is to fix k, and try to decompose f as \sum_d a(d) T_{kd}. The coefficients a(n) will be given by the Dirichlet convolution of f with the Dirichlet inverse of the characteristic function of \{1, 2, \ldots, k\}. If we want the growth of a(n) to be essentially slower than that of f(n), then in terms of the Dirichlet series, I think there would have to be a zero of \sum_{n \leq k} n^{-s} lying in the strip 1 < \Re s < \alpha (where \zeta(\alpha) = 2).

    This disposes of the case k = \infty: we know that the Riemann zeta function has no zeros to the right of 1. Of course k = 1 is no good (a(n) = f(n)); and k = 2 is also eliminated (since the only zeros of 1 + 2^{-s} are pure-imaginary). Is anything known in general about the zeros of \sum_{n \leq k} n^{-s}? I suspect the decomposition will have to be cleverer than this…

  2. Uwe Stroinski Says:

    As a sidenote. Even if we cannot exploit the above to prove EDP, we might be able to generalize Roth’s result. The intuition is as follows. Consider \displaystyle\mathbb{E}_y\omega^{ry}\pi_{d,y}=\omega_r *\pi_d=\lambda_{r,d}\omega_r and read it (with some hand-waving) as an equation with the Laplace transform of a translation group to the left and an eigenvector of the generator of this group to the right. In the situation of EDP we have no group, no generator, no eigenvectors and hence the trouble with finding a suitable orthonormal basis and thus Tim’s approximative approach with some f. In the AP situation we have a translation group (which I now let act on an infinite dimensional space with not more than intuition as justification), a first derivative as generator and exponentials as eigenvectors (together with estimates on the eigenvalues) and get Roth’s result. In the non-AP case, with \displaystyle\mathcal{A} (from the draft) being a set of subsets with some (my hands wave through the air) symmetry group which can be represented as acting on some Hilbert space giving us a basis and eigenvalues to work with. To make a long and shaky story short, the representation of diagonal matrices argument can probably be made independent of the translation group/exponentials as basis-part and go through for other group/eigenvectors of the generator couples thus giving us a chance for discrepancy results for more general \displaystyle\mathcal{A} .

    • gowers Says:

      I think at some point Moses suggested a related project, which was to go through various other known discrepancy results that have been proved using the SDP method and see whether it is possible to find a representation-of-diagonals approach too. I don’t know enough about the area to know what result fall into this category, but I think it could be quite interesting, and perhaps a good way of beefing up the Roth note if we can’t get EDP.

  3. gowers Says:

    I think I have some slightly more precise thoughts about how to decompose the function f(x)\omega_r(x) into a linear combination of HAPs.

    First, let me look at the simple case where r=0. Here, we have the formula f(x)=\sum_{d|x, d<x}f(d)=\sum_df(d)\sum_{1<y\leq n/d}\mathbf{1}_{dy=x}. If (as we may) we define a HAP to be any arithmetic progression of the form \{rd,(r+1)d,\dots,sd\}, then this gives us a decomposition of f into a linear combination \{1\}+\sum_{d\geq 1}f(d)Q_d, where Q_d is the HAP \{2d,3d,\dots,\lfloor n/d\rfloor d\}.

    For general r, this gives us a decomposition into functions of the form \omega_r(x)Q_d(x). Suppose now that we have found k\leq m such that \|rk/n\|\leq m^{-1}. This gives us a splitting of \{1,2,\dots,n\} into APs of length about m and common difference k, on each of which the function \omega_r is roughly constant. However, what we would really like is to decompose functions of the form \omega_r(x)Q_d(x) and make those roughly constant.

    On the face of it, that looks possible to me. For each such function (which is supported on a HAP of common difference d) we find 0<k\leq m such that \|krd/n\|\leq m^{-1}. Then we use that to split up the HAP Q_d into HAPs of length roughly m and common difference kd on which \omega_r is roughly constant.

    That defines a decomposition (though instead of choosing one k for each r and d, one would choose coefficients, as in the Roth proof). Of course, we are by no means finished, since most of the APs used in the decomposition are not HAPs. The hope is that the non-homogeneous APs make a small-weight contribution to the whole. The next step is perhaps to be a bit more precise about what exactly is needed. In the best of all possible worlds, we would end up reducing EDP to a technical question about properties of the function f, but that is being quite optimistic at this stage.

    • Alec Edgington Says:

      That sounds reasonable. I think you mean Q_d = \{2d, 3d, \ldots, \lfloor n/d \rfloor d\} and f = \{1\} + \sum_{d \geq 1} f(d) Q_d?

      Incidentally, I’ve just found an interesting paper by Borwein, Fee, Ferguson and van der Waall discussing the zeros of the truncated zeta function \zeta_k(s) = \sum_{n \leq k} n^{-s}. It turns out to be a result of Montgomery that for any 0 < c < \frac{4}{\pi} - 1 and for all sufficiently large k, \zeta_k has zeros to the right of 1 + c \frac{\log \log k}{\log k}. Moreover, they are always to the left of \alpha = \zeta^{-1}(2). This implies that we can write f = \sum_d a(d) T_{kd}, with \sum_{m \leq n} a(m) / \sum_{m \leq n} f(m) = O(n^{-\beta}), for some fixed HAP-length k and \beta > 0. Whether this is useful or not I don’t know.

    • gowers Says:

      Thanks for pointing out the typos, which I have now corrected. I’m still not quite sure what the genuine difficulties will turn out to be. What I’m expecting is that if we pursue this basic idea by just continually guessing what to do next, then eventually we’ll reach something that doesn’t work, at which point we’ll want to try to adjust it to deal with whatever the difficulties are. Then the more we know about how f behaves, including the kinds of things you are talking about, the better. I’m about to try to do some calculations offline to see if I can reach a clearer understanding of this kind.

  4. Polymath5 « Euclidean Ramsey Theory Says:

    […] Let me update this there is now a new thread for Polymath5 here. […]

  5. Kristal Cantwell Says:

    Polymath4 is using Subversion to in writing up results. I wonder if it could be used here. As I recall from the last thread there was a problem with Scribtex in that there was a limit with the number of collaborators per project. To avoid this problem perhaps Subersion could be used instead.

    • gowers Says:

      I think Subversion is similar in that it is somewhat limited unless you pay for a better version. I don’t mind doing that (as I will probably need it for other purposes), but if someone knows somewhere where one can download a Subversion client (if that’s the right word) that’s free, good, and Mac-compatible, then that would be great. Failing that, good and Mac-compatible. Oh, and downloadable by an idiot would be a nice bonus too.

  6. Robin Houston Says:

    Versions (http://versionsapp.com) is a good Subversion client for the Mac, though it is not free.

  7. gowers Says:

    I’ve had a first attempt to pursue the suggestion in this comment and it got a bit complicated, so I’m giving up for the day.

    Basically, it feels at the moment as though I’m hoping for a miracle. If you just take a function Q_d\omega_r and decompose it into APs, then, as I mentioned before, you have the problem that most of those APs are HAPs. I then waved my hands and said that I hoped that the non-homogeneous parts would make a small-weight contribution. But they definitely don’t do so if you fix d and sum over r, since then the calculation is essentially the Roth calculation, and we know that getting the identity matrix is hopeless. So if there is to be significant cancellation, it will have to be inter-d cancellation, so to speak. What I’m hoping is that when you add everything up with the coefficients f, some reason will emerge for there being far more cancellation in the coefficients of the non-homogeneous parts than there is in the coefficients of the homogeneous parts, so much so that we can break up the non-homogeneous parts in a trivial way. But this would be the miracle: I don’t have a serious reason to expect it actually to happen.

    Anyhow, when I’ve got the stomach for it, I’ll get a nice big piece of paper and write everything out very carefully and see what happens, in particular keeping an eagle eye out for any hint that the coefficients of non-homogeneous progressions might cancel out nicely.

  8. gowers Says:

    I still haven’t got very far, but one quite interesting question is rather persistently raising itself, so I thought I’d mention it.

    If we are trying to decompose the function x\mapsto f(x)\exp(2\pi i\alpha x) into a linear combination of HAPs (up to some good approximation), then the obvious thing to try is to decompose f into HAPs with common differences d such that \|\alpha d\| is small. If we can do that, then we can chop each of those into pieces on which the function \exp(2\pi i\alpha x) is roughly constant and then multiply each piece by its roughly constant value.

    Is there any reason to suppose that this can be done? Well, there is a small piece of evidence in its favour, which is that it can be done if \alpha is a rational with small denominator. For instance, suppose that \alpha=2/5. Then we can take the obvious decomposition of f (mentioned in this comment) and split each HAP that we use into five parts if its common difference is not a multiple of 5, discarding the four non-homogeneous ones. The result will be a decomposition of the restriction of f to multiples of 5, but that is almost all of f, so we’re OK.

    If \alpha is strongly irrational, this argument breaks down completely, because we do not have long arithmetic progressions with small common differences on which \exp(2\pi i\alpha x) is constant, or even approximately constant. The effect of this is that when we decompose a HAP into pieces on which the function is roughly constant, they don’t line up nicely in the way they did in the rational case (when we just restricted to multiples of the denominator).

    So here is a rather concrete question. Suppose we take a smallish m and for each d we choose 0<k\leq m such that d|k and \|\alpha k\| is small. (Or perhaps better, suppose we define some nice coefficients \theta_k in a similar way to the Roth proof.) Now suppose we replace each Q_d in the decomposition of f by its restriction R_d to multiples of k (which will give us all multiples of k unless k=d in which case we miss out d). Is the resulting sum \sum_{d\geq 1}f(d)R_d a good approximation to f?

    The trouble is, the minute I ask that question the answer no seems to come back loud and clear. It looks as though we are removing a lot of stuff from each Q_d, and if we are, then we cannot possibly hope to get almost all of f.

    But why doesn’t that argument apply to the rational case? The difference there is that if \alpha=p/q for some small q, then f-almost every {}x is a multiple of q, so f-almost every Q_d is restricted to itself, so f-almost nothing is thrown away.

    This is beginning to suggest to me that we don’t actually want to start with the trigonometric functions in this case. Recall that for the method to work, we can start with any orthonormal basis (u_r) we like and the decomposition I_n=\mathbb{E}_ru_r\otimes u_r. The trigonometric function \omega_r seems to give us trouble (though I don’t know for sure that it’s real trouble) when r/n is not very close to a rational with small denominator. Perhaps we could start with an orthonormal basis that is tailored more to the symmetries of this problem, such as dilations by small primes and their reciprocals. (Indeed, perhaps we should go back to the rationals and an infinitary approach, or at least get some inspiration from that direction.)

    • gowers Says:

      Continuing this thought a little, I realize that I could express it slightly differently. What I observed in the comment just above is that if \alpha is a rational with small denominator, then the function f(x)e^{2\pi i\alpha x} is very close indeed to f(x) itself. (This follows from the fact that they are equal on the HAP defined by the denominator of \alpha and that almost all of f is supported on this HAP.)

      This seems to me to confirm the view that starting with the trigonometric decomposition of the identity and multiplying it pointwise by f is not the way to go: all the functions you get are either almost equal to f, in which case it’s hard to see where any cancellation is going to come from, or hard to decompose into HAPs.

      Hmm, writing that has given me another idea. (Funny how that can happen — a sort of scepticism about scepticism.) I haven’t time to pursue it right now, so let me ask it instead as a question. Let \alpha be a real number. We know that we can cover \{1,2,\dots,n\} with APs on which the function e^{2\pi i\alpha x} is almost constant. We also know that we can’t cover it with HAPs with that property (for instance, if x is prime then there are only two possible common differences, and it may well be that neither of them works). But what if x is f-typical. Then there will be many possible common differences of HAPs containing x, so it is rather less obvious that we couldn’t find a common difference d|x such that \|\alpha d\| is small. It’s by no means obvious that we could put those HAPs together in a nice way to approximate the function f(x)e^{2\pi i\alpha x}, but it seems worth thinking about.

      I’m not quite sure what the question is there: basically it’s to see whether any of those speculations can be justified. A good starting point is the assertion that an f-typical x is divisible by some d for which \|\alpha d\| is small — I think that could be a useful lemma if it’s true.

    • Moses Charikar Says:

      You can get a weak version about your assertion that an f-typical x is divisible by some d for which \|\alpha d\| is small from the (handwavy) arguments we had earlier about the prime factorization of f-typical x. But this seems almost trivial and the conclusion is quite weak (i.e. \|\alpha d\| is only bounded by something like 1/\log n, so I doubt that this is what you are looking for.

      We argued earlier that the exponent of prime p in the factorization of f-typical x \leq n should be at least c(\log n)/p^{\alpha_p}. (Here \alpha_p < \alpha). In particular, this means that an f-typical x is divisible by all primes \leq (c\log n)^{1/\alpha}, and hence by all d \leq (c\log n)^{1/\alpha}). Of these, there must be some d such that \|\alpha d\| \leq O(1/(\log n)^{1/\alpha}.

    • Moses Charikar Says:

      One more thought about your question. I don’t think it is possible to achieve \|\alpha d\| smaller than something like 1/\log n if you only allow d that divide an f-typical x.

      Let’s say we allow any d that contain only small primes in its
      factorization, i.e. d = \prod_{i \leq t} p_i^{a_i}. Consider \alpha = 1/p_{t+1}. Then \|\alpha d\| \geq 1/p_{t+1}.

    • gowers Says:

      I can’t quite decide whether a bound of “only” 1/\log n is too weak or whether any bound that tends to zero is good enough. Let’s be optimistic and imagine that the latter is the case. What would happen here? We’d be trying to approximate the function f(x)e^{2\pi i\alpha x} as a linear combination of HAPs. We have a decomposition f(x)=\sum_df(d)Q_d(x) of f into HAPs. One thing we could do if we were feeling daring is simply get rid of all d for which \|\alpha d\| is not small and split the remaining Q_d into (not very long, but with lengths tending to infinity) chunks on which e^{2\pi i\alpha x} is roughly constant.

      Let D_\alpha be the set of all d such that \|\alpha d\| is small. Is there any hope that the function \sum_{d\in D_\alpha}f(d)Q_d is approximately proportional to f? Its value at x is \sum_{d\in D_\alpha, d|x, d<x}f(d). So the question seems to be asking this: for an f-typical x, will we always get roughly the same f-proportion of factors d belonging to D_\alpha? I don’t see how to prove such a statement, but it doesn’t look completely ridiculous either (though it does look as though it might be one small observation away from ridiculousness).

    • gowers Says:

      Let me try to explain why I think that the statement discussed at the end of the previous comment might not be ridiculous.

      I want to argue that it might be the case that if \alpha is a real number, and D_\alpha is the set of all d such that \|\alpha d\| is small (I don’t want to be precise at this stage about what “small” means), then there is some constant c(\alpha) such that for f-almost every x we have

      \sum_{d|x, d<x, d\in D_\alpha}f(d)\approx c(\alpha)\sum_{d|x, d<x}f(d)=f(x).

      My rough heuristic justification for this statement is as follows. For simplicity I’ll discuss the case where \alpha is a badly approximable real number. The hypothesis I have in mind is that if we choose a random factor of a typical x, where each factor d of x is chosen with probability proportional to f(d), then the prime factorization of d will have a “typical profile”. (I’m talking here about measure concentration. A different example might be that if you choose a random partition of [n] into three sets A, B and C such that |A|+|B|+2|C|=3n/2, then almost always you would have |A|\approx n/4, |B|\approx n/4 and |C|\approx n/2. Proof: for each C you would almost always have A and {}B of about the same size, and that then forces the rest to hold.) And it would usually be the case in these situations that a small change to a typical element produces another typical element. So for instance, mutliplying a typical factor of x by 2 ought to give us another typical factor of x. But this means that multiplying by a small integer gives us a map that is approximately an f-measure-preserving bijection from the set of factors of x to itself. Now if \alpha is badly approximable, then for all but a few factors d we would expect that the proportion of small multiples rd of d such that \|\alpha rd\|\leq\theta is about the expected 2\theta. (Occasionally we would hit a d such that \alpha d is very close to an integer, and then this would not be true.) Therefore, it might be that we can approximately uniformly cover the set of factors of x (with the appropriate measure) using HAPs on which e^{2\pi i\alpha x} is approximately constant.

      One other thing to say is that if it turns out to be very hard to obtain a good enough understanding of f to get something like the above to work, there would always be the option of simply concocting an f that has the properties we want. For example, perhaps we could define a fairly simple-minded function such as taking some C to the power the number of prime factors of x and arguing that a typical factor of a typical x has a typical profile. Or we could even choose primes to go into the factorization according to some nice distribution like the Poisson distribution (which seems to show up) and make everything independent so that it’s easier to analyse. That might allow a few numbers bigger than n, but with sufficiently small total weight for it not to be a problem.

      I’d like to make very clear why I’m interested in this. The point is that if we restrict to a set of smooth numbers — or a measure that’s concentrated on smooth numbers — then there is much more chance of building things using HAPs, and therefore much more prospect of imitating the Roth proof. Somehow, it makes us work in an area where the HAP-constraint really bites.

  9. gowers Says:

    I’ve got another question that should be reasonably easy to sort out. I’m wondering whether we could get away with a system of weights that’s a lot simpler than the function f that we’ve been discussing. We sort of know that any system of weights that has a chance of working needs to be concentrated on highly smooth numbers. Up to now, I’ve been imagining that that property has to be “hereditary”, in the sense that if you restrict to any HAP, then the restriction of the weights to that HAP would need to be concentrated on smooth multiples of the common difference. But I now see that that is not the case. For instance, let p be a large prime (for the sake of example one could take it to be around n^{1/2}, say). Is it a problem if the weight of the multiples of p that are not multiples of 3 is comparable to the weight of the multiples of 3p? I think not. The example that would show that it was a problem would be the function that takes the value 1 and -1 at numbers of the form (3m+1)p and (3m+2)p and 0 everywhere else. This sequence has discrepancy 1, but its weight is tiny, because most of the weight is on non-multiples of p.

    Loosely speaking, if our system of weights is given by the function f, we need to worry only about sequences that are “spread out relative to f“. This point has probably been obvious to many people, but somehow it had eluded me.

    The kind of thing that might allow us to do is simply define a measure of smoothness in some fairly simple-minded way and use just that. It might even be something as simple as insisting on a certain number of prime factors up to multiplicity. That may well be too simple, but let me ask the question anyway. Let k=k(n) be some parameter that tends to infinity slowly, and let S be the set of all numbers up to n with at least k primes in their prime factorization. Does there exist a sequence (x_m) of HAP-discrepancy at most 1 such that (\mathbb{E}_{m\in S}x_m^2)^{1/2}\geq c for some absolute constant c>0? In looser terms, can we find a sequence of bounded discrepancy that has mean square value at least 1 on the smooth numbers?

    A small remark about that: a sequence of bounded discrepancy must in particular be bounded. So we are asking for the mean square value on S to be within a constant of the maximum possible, which is equivalent to asking for the mean absolute value on S to be within a constant of the maximum possible.

    So here is another formulation. Does there exist a sequence (x_m) of bounded discrepancy such that \mathbb{E}_{m\in S}|x_m|\geq 1? If the answer is no, then EDP follows (unless I’ve made a mistake in the above reasoning, which is possible). So really I’m asking this as a way of testing whether using the characteristic function of S as a system of weights is a reasonable thing to try.

    Another remark is that k would have to be bigger than \log\log n, since otherwise we know that almost all numbers have k prime factors, even if we insist that they are not multiples of 3, and the usual counterexample works. But that is not a problem: nobody would want to call a number around n smooth if it had only \log\log n prime factors, given that that is typical behaviour.

    • Moses Charikar Says:

      What is the measure of the set of non-multiples of 3 in the set S (defined for some suitable k > \log \log n) ?

    • gowers Says:

      Good question. Here’s a heuristic argument that it might be o(1). Suppose, as seems plausible, that a typical k-smooth integer less than n is divisible by 3^{\Omega(1)}. Then disallowing multiples of 3 places quite a big restriction on the integer. To put it slightly differently, if you believe the (to me plausible) assertion that there will be a certain degree of measure concentration in the powers of small primes that divide a typical k-smooth integer, then the probability that the power of 3 is zero should be o(1). I need to think a bit about whether that measure-concentration assertion is a reasonable one though.

      Sorry, I realize that some of what I wrote above is a bit daft: if a typical k-smooth integer is divisible by 3^{\Omega(1)} then it is divisible by 3. So let me try to say it differently. If you want to build up a smooth number without using 3, then you’re forced to use either powers of 2 or powers of primes larger than 3. The latter cause your number to get bigger faster, and the former give you less flexibility than you would have had if you were allowed to use 3 as well.

      I’m trying to get something useful out of thinking about the simplex

      x_1\log 2 + x_2\log 3 + x_3\log 5 + \dots + x_k\log(p_m)\leq \log n

      and x_i\geq 0 for every i. We are interested in integer points in this simplex with coordinates adding up to at least k. Can we say what a typical such point should look like? And does restricting to the plane x_2=0 cut down the number of points substantially?

    • gowers Says:

      It now seems to me that measure concentration is not really what’s going on. The plane x_2=0 ought to be the biggest of all the planes x_2=C, but small simply because we have lost a dimension.

    • Moses Charikar Says:

      Your heuristic argument sounds believable. I was trying to think about your simplex argument. I think one can show by induction that the number of points in the simplex you define (without the k restriction) is something like (\log n)^m/(m!\prod_{i=1^m} log p_i). However I don’t have a good way to estimate the number of points with the lower bound on the sum of coordinates.

  10. Gil Kalai Says:

    I just wanted to say that the renewed efforts and the idea to go from AP insights to HAP insights look very exciting to me.

  11. Alec Edgington Says:

    Going back to the question of what conditions on f make it suitable for proving EDP: obviously, a necessary condition is that for every sequence \epsilon_n \in \{+1, -1, 0\} of bounded discrepancy, f is concentrated on \{n : \epsilon_n = 0\}.

    It seems to me that this is also a sufficient condition. Here’s a rough argument. Let \Delta_n be the set of sequences x_1, \ldots, x_n with discrepancy less than or equal to 1. Now the maximum value of \sum_m f(m) x_m^2 for \mathbf{x} \in \Delta_n will be attained at an extreme point of \Delta_n. But I think that the extreme points of \Delta_n are all \{+1, -1, 0\}-sequences. (I need to check this, but I think it can be shown by induction on n.) Therefore it is sufficient to consider these. The step from finite to infinite sequences also needs checking.

    If this is correct, it suggests that it would be worth looking for classes of \{+1, -1, 0\} sequences of bounded discrepancy to see what they can tell us about f (and whether a proof along these lines is possible). Do we have some nice examples of these, in addition to the characters that we have already considered?

    • gowers Says:

      That’s a very nice observation, one that makes me less worried about just guessing a function f, since it suggests that the problem is not too sensitive to the choice of f. In that connection, it would be interesting to see whether it can be extended. For example, does the condition characterize sequences for which the SDP proof can be made to work (in the sense that there is a quadratic form built out of HAPs with coefficients not too large such that subtracting off the f-diagonal matrix results in something positive definite)? Or the representation-of-diagonal proof?

      Largely for my own benefit, let me see if I can spell out all the details of your argument. First, it is trivial that the set of sequences of discrepancy at most 1 is convex. Secondly, they trivially take values in [-1,1] (since we can look at singleton HAPs).

      The less trivial claim is that the extreme points take values in \{-1,0,1\}. You suggest proving this by induction. So let n be minimal such that there is an extreme point x_1,\dots,x_n with a value x_i that does not belong to \{-1,0,1\}. Let’s consider first the case where x_1,\dots,x_{n-1} is also an extreme point. In that case it takes values in \{-1,0,1\}, which implies that the sum up to but not including x_n along any HAP that contains n is 1, 0 or -1. If there are sums of 1 and -1 then we are forced to take x_n=0, and otherwise we can (and therefore, by extremeness, must) take x_n=\pm 1. So in this case we are done.

      Now suppose that the sequence x_1,\dots,x_{n-1} is not extreme. It occurs to me that the argument above allows us to conclude that x_n\in\{-1,0,1\} unless there is a HAP that contains n and also some j such that x_j\notin\{-1,0,1\}.

      What is the precise constraint on x_n, once we know the values of x_1,\dots,x_{n-1}? Well, we take each HAP that includes n and sum along it. This gives us a bunch of numbers t_d\in[-1,1], one for each factor d of n. We then know that x_n+\max t_d\leq 1 and x_n+\min t_d\geq -1. So x_n is confined to some interval [-S,T] that contains zero. If our sequence is to be extreme, then x_n will have to equal either -S or T. WLOG it equals T. If the sequence up to x_{n-1} is not extreme, then let it be a non-trivial convex combination of two other sequences y_1,\dots,y_{n-1} and z_1,\dots,z_{n-1} of discrepancy 1. Then they …

      I’m getting a bit stuck. Let me ask a general question at this point. Does the proof use the fact that we’re talking about HAPs, or does it apply to the discrepancy relative to any system of sets that contains all singletons?

      Having asked that second question, let me try to tackle it by a different induction. I’ll start with the cube (that is, taking all singletons) and add sets one at a time. And I’ll attempt to argue that all the new extreme points I create have coordinates in \{-1,0,1\}.

      Just as a warm-up, let’s think about the very first set. That is, I want to know what the extreme points are of the set of all x such that \|x\|_\infty\leq 1 and |\sum_{i\in A}x_i|\leq 1. Without loss of generality, A=\{1,2,\dots,m\}. Clearly if x is extreme, then x_i\in\{-1,1\} for every i>m. Also, if there is exactly one i\leq m such that x_i is not an integer, then the A-constraint is not biting, so we do not have an extreme point. Finally, if x_i and x_j are both non-integral, then we can replace them by x_i+\delta and x_j-\delta for any small \delta (either positive or negative) so we don’t have an extreme point.

      As a matter of interest, what are the new extreme points? Well, they must be defined by the intersection of one of the hyperplanes \sum_{i\in A}x_i=\pm 1 with n-1 of the coordinate hyperplanes x_i=\pm 1. So we choose n-1 values of the sequence to be \pm 1 and let the final value be determined by the sum in A. If A has even cardinality that does not add any new extreme points, but if A has odd cardinality then we get points that are \pm 1 everywhere except in one place in A where they are zero (and of course, the sum in A has to be \pm 1).

      Now let’s suppose I’ve already got a set system \mathcal{A}. Actually, I think I have a counterexample, at least at this level of generality. Let’s consider the set system consisting of all subsets of \{1,2,3\} apart from \{1,2,3\} itself. Then the point (1/2,1/2,1/2) is an extreme point.

      So this shows that the proof must use in some important way the fact that the sets are HAPs, whereas my previous attempt wasn’t engaging with that condition in a serious way. So I am now stuck properly.

    • Alec Edgington Says:

      You’re right, it’s not as clear as I thought. I’ll have a think about what the extreme points for the HAP system actually are…

    • Moses Charikar Says:

      Simple observation: you can implement Tim’s counterexample in terms of HAPs:
      Pick primes p_1,p_2,p_3 and consider the numbers p_1 p_2, p_1 p_3, p_2 p_3. Two of them appear in the HAP for p_1, two in the HAP for p_2 and two in the HAP for p_3. Assign 1/2 to each of these three numbers and zero to everyone else.

    • gowers Says:

      That doesn’t yet contradict Alex’s assertion since you’ve not included all HAPs — in particular you haven’t included the HAP with common difference 1 — and as a result the sequence has discrepancy 3/2. But my guess is that it should be possible to find a counterexample with all HAPs included.

    • Moses Charikar Says:

      Ah … good point. So a HAP counterexample must use values of different signs, else the sum of absolute values would be at most 1 (because of the HAP of common difference 1) and it would be easy to express this as a convex combination.

    • gowers Says:

      Maybe there’s some easy fix actually. Suppose we set the sequence equal to 1/2 at pq, qr and pr, where p, q and r are three primes as in your example, and then we stick a value of -1 in between qr and pr at a number s that’s coprime to p, q and r. I’m assuming here that the order of the four numbers is pq, qr, s, pr. I haven’t checked carefully, but I think this may work.

    • Alec Edgington Says:

      I think I have a counterexample: I claim that the sequence (1, -1, -\frac12, \frac12, 1, -\frac12) is an extreme point of \Delta_6.

      Suppose not. Then we can find \mathbf{\delta} = (\delta_1, \ldots, \delta_6) \neq \mathbf{0} such that (1, -1, -\frac12, \frac12, 1, -\frac12) \pm \mathbf{\delta} \in \Delta_6. By considering the singleton HAPs \{1\}, \{2\} and \{5\}, we immediately deduce \delta_1 = \delta_2 = \delta_5 = 0.

      Then the HAP \{3, 6\} tells us that \delta_3 = -\delta_6; the HAP \{1, 2, 3, 4, 5\} tells us that \delta_3 = -\delta_4; and the HAP \{2, 4, 6\} tells us that \delta_4 = -\delta_6. It follows that \delta_3 = \delta_4 = \delta_6 = 0. So \mathbf{\delta} = \mathbf{0}, a contradiction.

    • gowers Says:

      I realize after thinking about that that the approach in my earlier comment (following Moses’s yet earlier one) doesn’t quite work. The smallest example you get from the method I was suggesting is


      But there’s no reason to suppose that you can’t mess around with the zeros. Indeed, in this case you certainly can, but even if one makes the whole sequence add up to 1, it’s still hard to stop it being possible to add a small c to one zero and subtract c from another. So it’s nice that you’ve come up with a genuine example.

    • Alec Edgington Says:

      Yes, in general to show that x \in \Delta_n is an extreme point, we need to exhibit n linearly independent HAPs on which x sums to \pm 1.

      This still leaves open the question whether we can form such a sequence ‘uniformly’: that is, whether there is an infinite sequence with elements not all in \{\pm 1, 0\} whose restriction to each interval [1,n] is an extreme point of \Delta_n. I’m pretty sure it must be possible.

  12. gowers Says:

    This will be another rather speculative post. It seems to me that a property that could be very useful to have if we want a measure on \{1,2,\dots,n\} to work for us is that it should be approximately invariant under multiplication by small numbers. That is, if we call the measure \mu, then \mu(A) should be approximately equal to \mu(tA) for all small t. This is certainly achievable: for example, one could pick an integer m and multiplicatively convolve the characteristic measure of \{1,2,\dots,m\} with itself a few times. This measure would be concentrated on smooth numbers and would have a useful approximate invariance property.

    The nice thing about the approximate invariance is that it gives us plenty of HAPs to play with. For example, writing \nu for the characteristic measure of \{1,2,\dots,m\}, I think we have that \mu\approx\mu*\nu (where that is supposed to be a multiplicative convolution) in a strong enough sense that we can say that \mu is roughly constant along HAPs of length m. More concretely, if \mu(x) is large, then I think it approximately equals \mu(ax/b) whenever a,b\leq m, which gives us m HAPs containing x along each of which \mu is roughly constant.

    Now let’s suppose we want to approximate the function e(\alpha x)=e^{2\pi i\alpha x} on \mu. One way we might try to do it is this. Out of the m HAPs of length m that contains x, we pick those with common difference d such that \|\alpha d\| is small. And those we approximate by a suitable linear combination of chunks.

    Hmm … I have to go now, but may come back to this. I’ve only just thought of the invariance idea, so its apparent promisingness may be illusory.

    • gowers Says:

      A brief further thought is that the invariance property is easy to achieve and doesn’t really need multiplicative convolutions. One can simply define a non-trivial and non-negative Lipschitz function \phi on the simplex mentioned earlier — that is, the set of all points (x_1,x_2,\dots,x_r) such that each x_i\geq 0 and

      x_1\log 2+x_2\log 3 + \dots + x_r\log p_r\leq\log n,

      with the distance between x and y being \sum_{i\leq r}|x_i-y_i|\log p_i — that vanishes on the boundary, and let \mu(2^{x_1}...p_r^{x_r})=\phi(x_1,\dots,x_r). By “Lipschitz” I mean Lipschitz with constant o(1) or something like that.

    • gowers Says:

      Now let’s suppose we have a measure \mu that is approximately invariant in this sense. I would like to \mu-approximate the function e(\alpha x) by a linear combination of HAPs. Let’s go back to the idea from the main comment above. How many HAPs containing x do I expect to have common difference d with \|\alpha d\|\leq\gamma? The possible common differences are x/s for s=1,2,\dots,m. So I am asking for how many s\leq m we have \|\alpha x/s\| small. (Note that if x is \mu-typical, then x/s is an integer for every s\leq m.)

      Let me make a big assumption here, namely that the set A=\{1,1/2,\dots,1/m\} is nicely quasirandom, in the sense that for \mu-almost all dilates tA (by integers t divisible by all of 1,2,…,m), the set \{e(\alpha y): y\in tA\} is uniformly spread around the circle, or something along those lines. That would tell us that for almost all x, the number of HAPs containing x along which w\mapsto e(\alpha w) varied slowly would be (for almost every \alpha?) approximately the expected number, which would tell us that the function x\mapsto\sum_{r: \|\alpha x/r\|\leq\gamma}e(\alpha x) was \mu-approximately proportional to e(\alpha x), which would approximately express (a multiple of) e(\alpha x) as a linear combination of HAPs.

      There are quite a lot of potentially dodgy steps there, but if it could be got to work, then we might have a hope of showing that there was cancellation between the coefficients resulting from different \alpha.

      I’m not sure how much I believe this hazy sketch, but I think a further feasibility study might be called for.

    • gowers Says:

      Another way of looking at the function discussed above is as follows. For each y (which is playing the role of x/r above) we look at \|\alpha y\|, and if it is small then we add in the HAP \{y,2y,\dots,my\}, weighted by \mu(y). Or more likely we take some nice function \pi(\alpha,y) that’s concentrated on the set of y such that \|\alpha y\| is small and has nice Fourier behaviour for each fixed y (to get good cancellation when we look at all \alphas at once). Then we could consider the function F_\alpha=\sum_y\mu_y\pi(\alpha,y)\{y,2y,\dots,my\}. We would be hoping that F_\alpha was roughly proportional to \mu.

      I’m aware that I’m probably becoming less and less comprehensible, with more and more statements whose motivations are likely to be difficult to grasp since I’m not really explaining them. At some point I might try to do some proper calculations to see whether I can be more precise and formal.

  13. Uwe Stroinski Says:

    Let me elaborate on my last comment and try to free the ROI approach to Roth’s theorem of the numbers. The goal is to get a ‘qualitative’ (infinitary) discrepancy result which hopefully is easier to adapt to EDP. The program looks as follows:

    1) Choose an appropriate Hilbert space

    2) Choose a group of linear operators acting on this space

    3) Get a representation of identity by eigenvectors of these operators

    4) Make sense of \displaystyle\mathbb{E}_y\omega^{ry}\pi_{d,y}=\omega_r *\pi_d=\lambda_{r,d}\omega_r with the omegas as eigenvectors, pi as characteristic sets, lambdas as eigenvalues and convergence of the involved sums.

    5) Solve the above for the eigenvector and plug it into 3)

    6) Get some condition in the eigenvalues

    7) Derive a qualitative discrepancy result on the sets used in 4)

    Maybe I am too optimistic, but that seems to be doable. Since APs with common difference d are invariant under translations we get a periodic symmetry group and it is natural to consider: 1) \displaystyle L^2 \left(\frac{d}{2\pi}\Gamma\right) and 2) the rotation group on it. We then get 3) a representation of the group action in terms of rank one operators at least on a dense subspace. Since identity is part of the group we are on the track.

    How would these first three steps look like in the EDP situation? ROI is ‘essentially’ equivalent to a periodic symmetry. We do not have that. Hence we just try to get ROD or even less. Maybe not with trigonometric function, since they are tied to periodic symmetries. ROD ‘means’ that there is no group, which is reasonable. However, each step (but the estimates) in the ROI approach heavily uses groups. That would indicate that the hoped for transfer from AP to HAP will be more upon the estimates than the ‘abstract’ content.

    • gowers Says:

      This definitely seems worth thinking about. I’d like to make one small remark, which is that if we work over the rationals then there are some important symmetries of the set of HAPs — we can dilate them. Having said that, I’m not sure how important it is, given that the eigenvectors corresponding to the group of dilations are likely to be more like GPs than APs and therefore not obviously decomposable in nice ways.

    • Uwe Stroinski Says:

      It looks as if we had to work on an ultra power (along a non-principal ultra filter). Such constructions usually are space extensions turning approximate point spectrum of our operators into point spectrum. That means, we get eigenvectors. Then, translations, eigenvectors and ROI might yield a periodic group action on this larger space. Vague, but indeed a small hint that we should reconsider the problem over the rationals. This time with ROI in mind.

  14. Klas Markström Says:

    I have been away from this problem for a while now, going to conferences and other things, so I am still catching up with the recent developments.

    While traveling I did try one thing which have mentioned here before, namely taking a look at the longest discrepancy 2 examples, truncated to the first 1024 values, in the Haar wavelet basis. I didn’t see any obvious structure, but as expected the fourier coefficients are nicely balanced. 536 of them being 0, 268 being 2 and 268 being -2.
    One of the vague reasons for my interest in this basis is that these wavelets behave nicely under scalings, thereby giving a nice connection between HAPs with large and small differences with a common divisor.

    Regarding decomposition of ON-bases in terms of APs, one might note that the Haar wavelets are very simple to decompose exactly using APs. This can be done for any difference d using 3k/d APs for a basis element of period 2k. So a nice proof of Roth should work in this basis as well.

    Now back to reading.

  15. gowers Says:

    I want to continue the discussion begun in this comment of Uwe’s, but am starting a new comment so that I have a full column’s width to work with.

    The main thing I wanted to say was that I’ve become quite interested in the idea of trying to apply a representation-of-diagonals approach to the problem over the positive rationals. The main reason for that is that I think that there is some hope that we might be able to take the diagonal matrix to be the identity in this case, and if so then the problem ought to be a whole lot easier. Of course, we introduce a new difficulty, which is to make sense of various concepts over the rationals — in particular measures, probabilities, averages, etc. But there are plenty of techniques one can try to use to get round the problems that arise.

    Why might the identity be OK? Well, the example that shows that it isn’t OK over the natural numbers is our old favourite 1 -1 0 1 -1 0 … and it doesn’t seem to extend to the rationals. Let me try to give a better justification for that than merely saying that I don’t see how to extend it. What we would want is a simple function that is large on a positive proportion of the rationals (whatever that means). That statement should be scale invariant … hmm … let me try again. The thing that makes the proof work is that every HAP intersects the mutliples of 3 in a set of relative density at least 1/3. Can we hope to find a set of rationals that intersects every HAP densely (with a uniform lower bound on the density)? We certainly can’t do it just by making the function vanish on one HAP.

    I’m not getting very far with that, so I’ll just leave it dangling: if someone can think of a convincing example of a function defined on the rationals that is large on a “positive proportion” of rationals in some sense, and has bounded discrepancy, then I’ll stop feeling optimistic about this idea. And if someone can formulate this question in a nice way then I’ll be quite grateful.

    Anyhow, back to the rationals. The general strategy would be to work out what the characters are on the rationals (I think we’d want all rationals and not just the positive ones, so we can talk about the additive group, but that’s OK) and then to try to decompose those into HAPs.

    Let’s just think what a homomorphism from the additive group of \mathbb{Q} to \mathbb{T} would be. An obvious example is to pick a real number \alpha and map p/q to \exp(2\pi i \alpha p/q). Are there any others? I think there are (and this will either be well-known stuff or wrong). Let me try to explain why.

    Suppose \chi is a character such that \chi(1)=1. It follows that \chi(n)=1 for every integer n. But this does not determine \chi. For instance, \chi(p/q) could be \exp(4\pi i p/q). So far that is just taking \alpha=2. But what if we went further and ensured that \chi(2^{-k})=1 for every positive integer k. Does that imply that \chi is identically 1? It certainly implies that \chi is 1 on the dyadic rationals. But what if we look at the subgroup generated by the dyadic rationals and 1/3? Then we seem to have the possibility of sending r + m/3 to \exp(2\pi i m/3), where r is any dyadic rational and m is a positive integer. Earlier I thought I had a complete description of the characters, but I didn’t check it and now I’m not sure it was correct, so I won’t risk a botch job here.

    What I will say is that I think it is worth sorting this question out and then seeing whether there are nice ways of making characters out of HAPs. The character that takes p/q to \exp(2\pi i \alpha p/q) is I think splittable up into HAPs. One simply picks a quickly growing sequence of positive integers n_1 < n_2 < n_3 < \dots such that \alpha/n_1 is small and each integer in the sequence divides the next. Then the function \exp(2\pi i\alpha p/q) varies very slowly as you go through mutliples of 1/n_1, and then between each of those you can get all the multiples of 1/n_2 that you haven’t yet included (in a long HAP, since n_2 is so much bigger than n_1), and so on. I forgot to add that every positive integer should divide some n_k so that all rationals are eventually included.

    One other reason for all this is that the talk of a measure on smooth numbers that is almost invariant under multiplying by small integers felt like an attempt to give a finite approximation of a much tidier world with rationals where the counting measure that is genuinely invariant.

    • gowers Says:

      I’ve now sorted out to my own satisfaction what the characters on \mathbb{Q} are. Whether the description to follow is the nicest one I don’t know, but it works.

      Observe that a character \chi on \mathbb{Q} is determined by what it does to 1/n! for every n. Given any character \chi, we can define a sequence \alpha_1,\alpha_2,\dots of real numbers such that 0\leq \alpha_n < n! and \chi(1/n!)=\exp(2\pi i \alpha_n/n!).

      Since \chi is a character, it follows that \chi(x/n!)=\exp(2\pi i\alpha_nx/n! for every integer x. Therefore, the numbers \alpha_n must satisfy the compatibility condition that \alpha_{n+1}/n! and \alpha_n/n! differ by an integer (so that \exp(2\pi i\alpha_{n+1}(n+1)/(n+1)!)=\exp(2\pi i\alpha_n/n!)). In other words, \alpha_{n+1} can be any non-negative real number less than (n+1)! that differs from \alpha_n by a multiple of n!.

      Conversely, if we are given a sequence \alpha_1,\alpha_2,\dots of real numbers satisfying this condition, then we can define \chi(p/q) to be \exp(2\pi i\alpha_np/q) for any n such that q|n!. Equivalently, we can define it to be the limit of \exp(2\pi i\alpha_np/q as n tends to infinity, the compatibility condition ensuring that the sequence will be constant once q|n!.

      The characters of the form r\mapsto\exp(2\pi i\alpha r) are the particular case where the sequence (\alpha_n) is itself eventually constant.

      One nice aspect of this is that we can define a character by choosing an element of \mathbb{T} followed by an infinite sequence of finite choices. It is therefore possible to put a product measure on the dual of \mathbb{Q} and make sense of expressions such as \mathbb{E}_\chi \chi(r)\overline\chi(s). Therefore, we have a representation of the identity using characters. (Note that for each pair of rationals r and s, we need only worry about an initial segment of the infinitely many choices we have to make in order to determine a character, since after that the choices will have no effect on \chi(r) or \chi(s). So the fact that we are talking about an infinite product measure is particularly simple to deal with.)

      The next thing to think about is whether there is a nice canonical way of decomposing characters into HAPs.

    • gowers Says:

      I have a simple lemma that may help with this last task, but may not be strong enough. The lemma says that if \chi is a character on \mathbb{Q}, then for every \epsilon>0 there exists a positive integer n such that |\chi(n^{-1})-1|\leq\epsilon.

      The proof is simple. Let r be an integer greater than 2\pi\epsilon^{-1}, let N=r! and let \chi(1/N)=e(\alpha), where as usual e(\alpha) means \exp(2\pi i\alpha). Now choose a positive integer n\leq r such that the distance from n\alpha to the nearest integer is at most r^{-1}\leq\epsilon/2\pi. It follows that |\chi(n/N)-1|\leq\epsilon. Since N/n is an integer, we are done.

      The bound that comes out of that argument is pretty bad: turning things round we get that \epsilon can be made to be around \log\log n/\log n. Does anyone see an improvement to this? I quite like this question and I think I’ll go and ask it at Mathoverflow.

      Update: I’ve asked it here.

    • gowers Says:

      A byproduct of posting the question on MO is that someone has pointed out that the dual group of \mathbb{Q} is normally defined in terms of adeles. (My knowledge of what an adele is is almost nonexistent, but there was just enough of an impression of the kind of thing it was for this not to come as a surprise.)

      I’m about to see whether the lemma above gives a decomposition of characters into HAPs that could be used to to give a ROI proof of EDP, modelled on the ROD proof of Roth’s theorem.

    • Moses Charikar Says:

      It seems to me that the question considered in the lemma you prove is very similar to the question you raised earlier here and the bounds are in the same ballpark too. There, we reasoned that f-typical integers less than n are divisible by all integers upto about \log n and that gave an approximation with error about 1/\log n.

    • gowers Says:

      I agree that the question is very similar — indeed, I am trying to model the smooth-number situation in an infinitary way by looking at the rationals instead. Whether the bounds are in the same ballpark is less obvious to me, because what the bounds are supposed to depend on is different.

      Just to explain this, let me recall your observation. The earlier question was how large a number N you would have to take if you wanted it to have the property that for every \alpha there was a factor m of N such that \|\alpha m\|\leq\epsilon. You pointed out that if \alpha=1/p, for some prime that does not divide N, then we cannot do better than \epsilon=1/p, which roughly translates into N having to be exponential in 1/\epsilon, or equivalently \epsilon cannot be smaller than 1/\log N.

      In the rationals, however, we are interested in the dependence of \epsilon not on N (since in the additive group of the rationals numbers don’t have different “sizes”) but rather on N/m. In fact, though, we phrase the question slightly differently: we have a character \chi and want to know how big M has to be before there is guaranteed to be m\leq M such that \chi(1/m)\leq\epsilon. The proof of the exponential bound is essentially the same as before, but the example that showed that it was best possible no longer works, since all primes divide 1 in the rationals.

  16. gowers Says:

    Once again a new comment, but following on from the previous block of comments.

    Let \chi be a character on \mathbb{Q}. I want to show that \chi can be decomposed efficiently as a linear combination of HAPs, using the lemma above, which for convenience I will restate.

    Lemma. For every \epsilon>0 there exists a positive integer n such that |\chi(1/n)-1|\leq\epsilon.

    The first step towards decomposing \chi is to choose a large integer m and to find an integer n_1 such that |\chi(1/n_1)-1|\leq 1/m. This means that \chi varies very slowly as you run through the multiples of 1/n_1. To be a bit more precise, we can split up the multiples of 1/n_1 into HAPs of length cm on each of which \chi varies by at most c or so. (I’m defining a HAP to be a set of the form \{ar, (a+1)r,\dots,br\}.)

    Once we’ve done that, we can find a much larger integer n_2 such that n_1|n_2 and such that |\chi(1/n_2)-1|\leq 1/m. That allows us to divide up the multiples of 1/n_2 into HAPs of length cm on each of which \chi varies by at most c or so. We don’t want these HAPs to contain multiples of 1/n_1, but since n_2 is much larger than n_1, we can get the lengths approximately correct and just partition all the multiples of 1/n_2 that are not multiples of 1/n_1.

    We continue this process. If as we do so we also ensure that r|n_r, then we will have partitioned all of \mathbb{Q} into HAPs on which \chi is roughly constant. And that is morally the same as decomposing a character into HAPs. (We can do it better by convolving.)

    Does that mean the proof is finished? No, but I think it’s very encouraging that we have a set of characters that can be split up into HAPs.

    The missing ingredients — and at this stage I’m not clear whether there’s a missing idea or whether it’s just that this idea needs to be investigated more carefully — are that in the proof of Roth’s theorem we exploited a lot of cancellation between coefficients of products of APs used in the decomposition, and that to achieve that cancellation we used a nice formula for the decomposition. What’s more, the formula didn’t actually give a decomposition of the identity, but rather a decomposition of the identity plus a positive semidefinite matrix. (That last aspect might work more tidily in the rationals, however.)

    So the next step in investigating this approach is clear: one should try to think of a suitably canonical way of expressing a character as a linear combination of HAPs, and one should then look at the coefficients that result when one rewrites the decomposition I = \mathbb{E}_\chi \chi\otimes\overline{\chi} in terms of the HAPs used to make the characters \chi.

    • gowers Says:

      Actually, an even better next step might be to pursue what Uwe was suggesting and try to get a clean infinitary version of the Roth argument. The reason that feels like a good idea is that there as well the natural measure on the group (which is \mathbb{Z}) is infinite but the natural measure on the dual group (which is \mathbb{T}) is finite. Also, the measure on the dual group is continuous. I don’t know whether the proof would be difficult to adapt (the new statement would be that for any \pm 1 function defined on \mathbb{N} and any positive integer m you can find an AP of length m on which the discrepancy is at least c\sqrt{m}). My hope is that it would not be hard to adapt, and that the adapted proof would provide a better model for the modification to HAPs and the rationals.

    • gowers Says:

      A question that naturally arises if one wants a ROI proof when the ground set is infinite is what to do about the fact that the trace of the identity is infinite (as, presumably, is the sum of the coefficients of the HAP or AP products). Is there some way of arguing that the “density” of coefficients is small?

      As a toy problem, consider the following integral representation of the identity on \mathbb{Z} (that is, the matrix \delta_{mn}, where m and n range over all integers):

      \delta_{mn}=\int_0^1e(\alpha m)\overline{e(\alpha n)}d\alpha.

      We can also think of this as I=\int_0^1e_\alpha\otimes\overline{e_\alpha}d\alpha. The “sum of coefficients” is in fact an integral of coefficients, which comes to 1, whereas the trace of the identity is infinite. Therefore, given any \pm 1 function f on \mathbb{Z} we ought to expect a trigonometric function e_\alpha (which takes n to e(\alpha n)) with respect to which f has infinite discrepancy.

      Can we make this rigorous? It’s tempting to say that f has infinite \ell_2 norm, and therefore that \int_0^1|\hat{f}(\alpha)|^2d\alpha is infinite, and therefore that \hat{f}(\alpha) is infinite for some \alpha, which is precisely saying that \sum_nf(n)e(\alpha n) is infinite.

      But that isn’t rigorous. Isn’t it more correct to say that the Fourier transform of f doesn’t exist? And even if it did exist, there would be questions about the nature of the convergence of \sum_nf(n)e(\alpha n).

      Maybe one can do something along these lines but I live so much in the finite world that I don’t have the right tricks at my fingertips.

      It looks to me as though things get yet more difficult if we have a decomposition of the identity where the sum of the coefficients is infinite, but “less infinite” than the trace of the identity. I don’t think it is an insuperable difficulty, but it’s not completely obvious that it can be solved in a purely infinitary way. Possibly as Uwe suggests, taking limits along ultrafilters would help.

      Let me end this remark with a question: can anyone think of a good adaptation of the basic ROI or ROD lemma (the one that says if you have a representation such that the sum of the moduli of the coefficients is significantly less than the trace of the diagonal matrix then you are done) that works for infinite sets?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: