In this post I want to elaborate on a strategy for proving EDP that I discussed in EDP15. Briefly, the idea is to take a representation-of-identity proof of Roth’s AP-discrepancy result and modify it so that it becomes a representation-of-diagonal proof of unbounded HAP-discrepancy.

The first step of this programme was an obvious one: obtain a clean and fully detailed proof in the APs case. That has now been completed, and a write-up can be found here. For the benefit of anyone who is interested in thinking about the next stage but doesn’t feel like reading a piece of formal mathematics, let me give a sketch of the argument here. That way, this post will be self-contained. Once I’ve given the sketch, I’ll say what I can about where we might go from here. It is just possible that we are in sight of the finishing line, but that is unlikely as it would depend on various guesses being correct, various potential technical problems not being actual, various calculations giving strong enough bounds, and so on. Thus, the final part of this post will be somewhat speculative, but I will make it as precise as I can in the hope that it will give rise to a fruitful line of enquiry.

**Roth’s theorem on AP-discrepancy.**

Recall that Roth proved that for every function there is an arithmetic progression such that , and that this result was shown to be sharp by Matousek and Spencer (who removed a log factor from an upper bound of Beck). Roth actually proved a more precise result: the discrepancy is attained on a progression of length at most . In this post I shall sketch an argument that shows that for any we can find an AP of length at most and common difference at most such that . (I haven’t actually checked, but I’m 100% confident that this slightly stronger result would also follow from Roth’s proof, and also from a proof due to Lovász that uses the semidefinite programming method that we’ve discussed at length in the past.)

Recall first some notation. If and are subsets of then I shall also write and for their characteristic functions. In general, given two functions and defined on will stand for the rank-1 matrix

Suppose now that we can find a diagonal matrix , arithmetic progressions and and coefficients such that Let the diagonal entries of be real numbers Then for any function defined on we have

Let be the ratio It follows from the above identity that there exists such that

and hence that there exists an arithmetic progression such that

That is the representation-of-diagonal method in its purest form. However, as Moses pointed out (I haven’t actually found the precise comment), it is enough to obtain an identity of the form where is positive semidefinite, and everything else is as before. That is because so including does not make things worse for us. And in fact, it turns out to introduce a useful extra flexibility.

**The decomposition.**

We start with the following simple lemma. Up to now, the inner product has been the obvious one But from now on it is more convenient to take the inner product (where is, as usual, shorthand for ).

**Lemma 1.** Let be an orthonormal basis of Then

**Proof.** Let be the unitary matrix with rows The entry at of is which is the inner product of the th and th columns of Since the columns of a unitary matrix also form an orthonormal basis, this gives us as required.

**Corollary 2.** For each let be the function defined by Then

**Proof.** The functions form an orthonormal basis.

The plan now is to decompose each trigonometric function into a linear combination of characteristic functions of suitably chosen APs. This automatically gives us a decomposition of each matrix into a linear combination of products We then find that each individual is used for several different s, and when we add up the coefficients there is a considerable amount of cancellation. It is that cancellation that gives us a good upper bound on and leads to the results claimed above.

At this point, let us fix We would like to decompose into a linear combination of arithmetic progressions mod with common difference at most It will turn out that their cardinalities will be at most as well. Provided that is at most this will mean that each AP wraps around (that is, passes zero) at most once, so we can split it into two genuine APs.

Given how do we choose a suitable common difference? Well, in an earlier and less successful version of this argument we chose for each a such that and Here stands for the distance from to the nearest integer. This estimate implies that and that (together with the triangle inequality) implies that and more generally that In more qualitative terms, adding a small multiple of to does not change the value of very much.

Let be the interval inside where for some smallish absolute constant Let and let be the *characteristic measure* of — that is, the function that takes the value on and 0 outside From the observation just made it is straightforward to prove the following lemma. The fact that is real follows from the symmetry of

**Lemma 3.** Let . Suppose that and Then there is a real constant between 1/2 and 1 such that

The next lemma is also easy to prove. Note that we have dropped the hypothesis that is small and weakened the conclusion accordingly.

**Lemma 4.** Let and suppose that Then there is a real constant such that

It will be problematic for us if is negative, but a simple trick makes sure that that never happens: we simply convolve with twice. By the two lemmas above, we may conclude that where is always non-negative, and is at least 1/4 when (That is because )

A discrepancy bound of can be obtained if for each we choose such that (which is possible by the usual pigeonhole principle argument), and replace by which can be seen to be a linear combination of functions of the form

The main ingredient that improves the bound from to is instead to choose rather carefully some coefficients and to replace by For each the earlier proof took to be 1 for precisely one and 0 otherwise. What we do instead is approximate this in such a way that for each with the function is "sufficiently smooth".

The next lemma, which I shall state without proof (the proof is straightforward), gives a good idea of what we mean by "sufficiently smooth". Given a function we define its Fourier transform by the formula

**Lemma 5.** Let and define a function by whenever Then is non-negative and real for every and

From this and easy Fourier manipulations, we know that the function also has these two properties: that is, for every and

We now take to be an integer approximately equal to and we define to be (the dependence on is via the definition of the function ).

We note here that for a typical there will be values of such that and that the usual pigeonhole argument gives us at least one. (There are some for which there are more values of with small, but this does not matter.) On the other hand, since is nothing other than we know that for each fixed the function has non-negative Fourier coefficients that sum to 1. These properties are what are needed for the rest of the calculations to work.

We then take the matrix

Since for each there is at least one with the remark following Lemmas 3 and 4 tells us that this matrix has the form with each at least 1/8. It follows that is positive semidefinite, since it equals

It remains to bound from above the sum of the absolute values of the coefficients of the various products of progressions used in the above decomposition. I won’t reproduce those here, but the Fourier estimates for the functions turn out to come in and allow us to calculate the sum exactly. It equals Since is proportional to this is proportional to a gain of a factor proportional to on the trace of It follows that the discrepancy bound we get is proportional to When this gives us

**From APs to HAPs.**

Is there any possibility of adapting the above proof to give a proof of EDP? I think there is, and in this final section I shall try to explain why.

To begin with, let us set to be the diagonal matrix with down the diagonal. (By I mean the function discussed in the previous post and set of comments, defined by the properties and ) Now let us suppose that we could find an efficient HAP-decomposition of a matrix of the form

with every The inner matrix is of the form with positive semidefinite, so this would give us an efficient decomposition of where is also positive semidefinite. Now the matrix we would like to decompose can also be written as

It is therefore tempting to look for a proof that is similar to the proof of Roth’s discrepancy theorem above: we try to decompose each function (the value of which at is ) in an efficient and unified way as a linear combination of characteristic functions of *homogeneous* arithmetic progressions rather than just arithmetic progressions. And then we hope that there will be plenty of cancellation in the coefficients.

Now the decomposition of into APs very definitely used non-homogeneous ones. So why is there any hope of decomposing into HAPs? Well, the kind of thing one might try is this. Choose a small and for each choose some such that Let be (as before) the interval and let be (as before) the characteristic measure of Then which tells us that can be written as a (very nice) linear combination of translates of The trouble is that most of these translates are non-homogeneous.

However, if we choose to grow to infinity so slowly that it can be thought of as a constant (this is a standard trick — treat as a large constant, and as long as for sufficiently large we can get a discrepancy lower bound that tends to infinity with we are done), then the fraction of translates of that are non-homogeneous is “only” Furthermore, and much more importantly, is on average very small on the complement of the HAP So perhaps we can afford to ignore the decomposition outside this HAP.

How would we decompose the part *inside* the HAP? Well, let’s suppose we were trying to decompose rather than (We could do this, and the result would be a decomposition of a matrix with positive semidefinite. I don’t know whether changing the coefficients from to would be a problem.) Since is naturally expressed as a Dirichlet convolution (except at 1) we have a natural decomposition of into HAPs. I think we should also have a natural decomposition into HAPs of the restriction of to So if we use that, and then take the coefficients from the Roth argument (or perhaps something else tailored to the new situation), then … just perhaps … we will have an efficient decomposition that gives a good approximation to And if we want more than just a good approximation, then we could decompose the part of that lies outside into functions supported on singletons. This would be inefficient, but (after cancellation of the contributions from different ) the total weight of the inefficiency should be small.

July 4, 2010 at 12:53 pm |

How can we decompose efficiently into HAPs? Let be the characteristic function of . One idea is to fix , and try to decompose as . The coefficients will be given by the Dirichlet convolution of with the Dirichlet inverse of the characteristic function of . If we want the growth of to be essentially slower than that of , then in terms of the Dirichlet series, I think there would have to be a

zeroof lying in the strip (where ).This disposes of the case : we know that the Riemann zeta function has no zeros to the right of 1. Of course is no good (); and is also eliminated (since the only zeros of are pure-imaginary). Is anything known in general about the zeros of ? I suspect the decomposition will have to be cleverer than this…

July 4, 2010 at 3:36 pm |

As a sidenote. Even if we cannot exploit the above to prove EDP, we might be able to generalize Roth’s result. The intuition is as follows. Consider and read it (with some hand-waving) as an equation with the Laplace transform of a translation group to the left and an eigenvector of the generator of this group to the right. In the situation of EDP we have no group, no generator, no eigenvectors and hence the trouble with finding a suitable orthonormal basis and thus Tim’s approximative approach with some f. In the AP situation we have a translation group (which I now let act on an infinite dimensional space with not more than intuition as justification), a first derivative as generator and exponentials as eigenvectors (together with estimates on the eigenvalues) and get Roth’s result. In the non-AP case, with (from the draft) being a set of subsets with some (my hands wave through the air) symmetry group which can be represented as acting on some Hilbert space giving us a basis and eigenvalues to work with. To make a long and shaky story short, the representation of diagonal matrices argument can probably be made independent of the translation group/exponentials as basis-part and go through for other group/eigenvectors of the generator couples thus giving us a chance for discrepancy results for more general .

July 4, 2010 at 4:59 pm

I think at some point Moses suggested a related project, which was to go through various other known discrepancy results that have been proved using the SDP method and see whether it is possible to find a representation-of-diagonals approach too. I don’t know enough about the area to know what result fall into this category, but I think it could be quite interesting, and perhaps a good way of beefing up the Roth note if we can’t get EDP.

July 4, 2010 at 5:27 pm |

I think I have some slightly more precise thoughts about how to decompose the function into a linear combination of HAPs.

First, let me look at the simple case where Here, we have the formula If (as we may) we define a HAP to be any arithmetic progression of the form then this gives us a decomposition of into a linear combination where is the HAP

For general this gives us a decomposition into functions of the form Suppose now that we have found such that This gives us a splitting of into APs of length about and common difference on each of which the function is roughly constant. However, what we would really like is to decompose functions of the form and make

thoseroughly constant.On the face of it, that looks possible to me. For each such function (which is supported on a HAP of common difference ) we find such that Then we use that to split up the HAP into HAPs of length roughly and common difference on which is roughly constant.

That defines a decomposition (though instead of choosing one for each and one would choose coefficients, as in the Roth proof). Of course, we are by no means finished, since most of the APs used in the decomposition are not HAPs. The hope is that the non-homogeneous APs make a small-weight contribution to the whole. The next step is perhaps to be a bit more precise about what exactly is needed. In the best of all possible worlds, we would end up reducing EDP to a technical question about properties of the function but that is being quite optimistic at this stage.

July 4, 2010 at 9:27 pm

That sounds reasonable. I think you mean and ?

Incidentally, I’ve just found an interesting paper by Borwein, Fee, Ferguson and van der Waall discussing the zeros of the truncated zeta function . It turns out to be a result of Montgomery that for any and for all sufficiently large , has zeros to the right of . Moreover, they are always to the left of . This implies that we

canwrite , with , for some fixed HAP-length and . Whether this is useful or not I don’t know.July 4, 2010 at 9:39 pm

Thanks for pointing out the typos, which I have now corrected. I’m still not quite sure what the genuine difficulties will turn out to be. What I’m expecting is that if we pursue this basic idea by just continually guessing what to do next, then eventually we’ll reach something that doesn’t work, at which point we’ll want to try to adjust it to deal with whatever the difficulties are. Then the more we know about how behaves, including the kinds of things you are talking about, the better. I’m about to try to do some calculations offline to see if I can reach a clearer understanding of this kind.

July 4, 2010 at 6:45 pm |

[…] Let me update this there is now a new thread for Polymath5 here. […]

July 4, 2010 at 9:27 pm |

Polymath4 is using Subversion to in writing up results. I wonder if it could be used here. As I recall from the last thread there was a problem with Scribtex in that there was a limit with the number of collaborators per project. To avoid this problem perhaps Subersion could be used instead.

July 4, 2010 at 9:41 pm

I think Subversion is similar in that it is somewhat limited unless you pay for a better version. I don’t mind doing that (as I will probably need it for other purposes), but if someone knows somewhere where one can download a Subversion client (if that’s the right word) that’s free, good, and Mac-compatible, then that would be great. Failing that, good and Mac-compatible. Oh, and downloadable by an idiot would be a nice bonus too.

July 4, 2010 at 10:11 pm |

Versions (http://versionsapp.com) is a good Subversion client for the Mac, though it is not free.

July 4, 2010 at 11:26 pm |

I’ve had a first attempt to pursue the suggestion in this comment and it got a bit complicated, so I’m giving up for the day.

Basically, it feels at the moment as though I’m hoping for a miracle. If you just take a function and decompose it into APs, then, as I mentioned before, you have the problem that most of those APs are HAPs. I then waved my hands and said that I hoped that the non-homogeneous parts would make a small-weight contribution. But they definitely don’t do so if you fix and sum over since then the calculation is essentially the Roth calculation, and we know that getting the identity matrix is hopeless. So if there is to be significant cancellation, it will have to be inter- cancellation, so to speak. What I’m hoping is that when you add everything up with the coefficients some reason will emerge for there being far more cancellation in the coefficients of the non-homogeneous parts than there is in the coefficients of the homogeneous parts, so much so that we can break up the non-homogeneous parts in a trivial way. But this would be the miracle: I don’t have a serious reason to expect it actually to happen.

Anyhow, when I’ve got the stomach for it, I’ll get a nice big piece of paper and write everything out very carefully and see what happens, in particular keeping an eagle eye out for any hint that the coefficients of non-homogeneous progressions might cancel out nicely.

July 5, 2010 at 10:00 am |

I still haven’t got very far, but one quite interesting question is rather persistently raising itself, so I thought I’d mention it.

If we are trying to decompose the function into a linear combination of HAPs (up to some good approximation), then the obvious thing to try is to decompose into HAPs with common differences such that is small. If we can do that, then we can chop each of those into pieces on which the function is roughly constant and then multiply each piece by its roughly constant value.

Is there any reason to suppose that this can be done? Well, there is a small piece of evidence in its favour, which is that it can be done if is a rational with small denominator. For instance, suppose that Then we can take the obvious decomposition of (mentioned in this comment) and split each HAP that we use into five parts if its common difference is not a multiple of 5, discarding the four non-homogeneous ones. The result will be a decomposition of the restriction of to multiples of 5, but that is almost all of so we’re OK.

If is strongly irrational, this argument breaks down completely, because we do not have long arithmetic progressions with small common differences on which is constant, or even approximately constant. The effect of this is that when we decompose a HAP into pieces on which the function is roughly constant, they don’t line up nicely in the way they did in the rational case (when we just restricted to multiples of the denominator).

So here is a rather concrete question. Suppose we take a smallish and for each we choose such that and is small. (Or perhaps better, suppose we define some nice coefficients in a similar way to the Roth proof.) Now suppose we replace each in the decomposition of by its restriction to multiples of (which will give us all multiples of unless in which case we miss out ). Is the resulting sum a good approximation to ?

The trouble is, the minute I ask that question the answer no seems to come back loud and clear. It looks as though we are removing a lot of stuff from each and if we are, then we cannot possibly hope to get almost all of

But why doesn’t that argument apply to the rational case? The difference there is that if for some small then -almost every is a multiple of so -almost every is restricted to itself, so -almost nothing is thrown away.

This is beginning to suggest to me that we don’t actually want to start with the trigonometric functions in this case. Recall that for the method to work, we can start with any orthonormal basis we like and the decomposition The trigonometric function seems to give us trouble (though I don’t know for sure that it’s real trouble) when is not

veryclose to a rational with small denominator. Perhaps we could start with an orthonormal basis that is tailored more to the symmetries of this problem, such as dilations by small primes and their reciprocals. (Indeed, perhaps we should go back to the rationals and an infinitary approach, or at least get some inspiration from that direction.)July 5, 2010 at 11:54 pm

Continuing this thought a little, I realize that I could express it slightly differently. What I observed in the comment just above is that if is a rational with small denominator, then the function is very close indeed to itself. (This follows from the fact that they are equal on the HAP defined by the denominator of and that almost all of is supported on this HAP.)

This seems to me to confirm the view that starting with the trigonometric decomposition of the identity and multiplying it pointwise by is not the way to go: all the functions you get are either almost equal to in which case it’s hard to see where any cancellation is going to come from, or hard to decompose into HAPs.

Hmm, writing that has given me another idea. (Funny how that can happen — a sort of scepticism about scepticism.) I haven’t time to pursue it right now, so let me ask it instead as a question. Let be a real number. We know that we can cover with APs on which the function is almost constant. We also know that we can’t cover it with HAPs with that property (for instance, if is prime then there are only two possible common differences, and it may well be that neither of them works). But what if is -typical. Then there will be many possible common differences of HAPs containing so it is rather less obvious that we couldn’t find a common difference such that is small. It’s by no means obvious that we could put those HAPs together in a nice way to approximate the function but it seems worth thinking about.

I’m not quite sure what the question is there: basically it’s to see whether any of those speculations can be justified. A good starting point is the assertion that an -typical is divisible by some for which is small — I think that could be a useful lemma if it’s true.

July 6, 2010 at 4:35 am

You can get a weak version about your assertion that an -typical is divisible by some for which is small from the (handwavy) arguments we had earlier about the prime factorization of -typical . But this seems almost trivial and the conclusion is quite weak (i.e. is only bounded by something like , so I doubt that this is what you are looking for.

We argued earlier that the exponent of prime in the factorization of -typical should be at least . (Here ). In particular, this means that an -typical is divisible by all primes , and hence by all . Of these, there must be some such that .

July 6, 2010 at 7:20 am

One more thought about your question. I don’t think it is possible to achieve smaller than something like if you only allow that divide an -typical .

Let’s say we allow any that contain only small primes in its

factorization, i.e. . Consider . Then .

July 6, 2010 at 10:18 am

I can’t quite decide whether a bound of “only” is too weak or whether any bound that tends to zero is good enough. Let’s be optimistic and imagine that the latter is the case. What would happen here? We’d be trying to approximate the function as a linear combination of HAPs. We have a decomposition of into HAPs. One thing we could do if we were feeling daring is simply get rid of all for which is not small and split the remaining into (not very long, but with lengths tending to infinity) chunks on which is roughly constant.

Let be the set of all such that is small. Is there any hope that the function is approximately proportional to ? Its value at is So the question seems to be asking this: for an -typical will we always get roughly the same -proportion of factors belonging to ? I don’t see how to prove such a statement, but it doesn’t look completely ridiculous either (though it does look as though it might be one small observation away from ridiculousness).

July 6, 2010 at 1:04 pm

Let me try to explain why I think that the statement discussed at the end of the previous comment might not be ridiculous.

I want to argue that it might be the case that if is a real number, and is the set of all such that is small (I don’t want to be precise at this stage about what “small” means), then there is some constant such that for -almost every we have

My rough heuristic justification for this statement is as follows. For simplicity I’ll discuss the case where is a badly approximable real number. The hypothesis I have in mind is that if we choose a random factor of a typical where each factor of is chosen with probability proportional to then the prime factorization of will have a “typical profile”. (I’m talking here about measure concentration. A different example might be that if you choose a random partition of into three sets and such that then almost always you would have and Proof: for each you would almost always have and of about the same size, and that then forces the rest to hold.) And it would usually be the case in these situations that a small change to a typical element produces another typical element. So for instance, mutliplying a typical factor of by 2 ought to give us another typical factor of But this means that multiplying by a small integer gives us a map that is approximately an -measure-preserving bijection from the set of factors of to itself. Now if is badly approximable, then for all but a few factors we would expect that the proportion of small multiples of such that is about the expected (Occasionally we would hit a such that is very close to an integer, and then this would not be true.) Therefore, it might be that we can approximately uniformly cover the set of factors of (with the appropriate measure) using HAPs on which is approximately constant.

One other thing to say is that if it turns out to be very hard to obtain a good enough understanding of to get something like the above to work, there would always be the option of simply concocting an that has the properties we want. For example, perhaps we could define a fairly simple-minded function such as taking some to the power the number of prime factors of and arguing that a typical factor of a typical has a typical profile. Or we could even choose primes to go into the factorization according to some nice distribution like the Poisson distribution (which seems to show up) and make everything independent so that it’s easier to analyse. That might allow a few numbers bigger than but with sufficiently small total weight for it not to be a problem.

I’d like to make very clear why I’m interested in this. The point is that if we restrict to a set of smooth numbers — or a measure that’s concentrated on smooth numbers — then there is much more chance of building things using HAPs, and therefore much more prospect of imitating the Roth proof. Somehow, it makes us work in an area where the HAP-constraint really bites.

July 6, 2010 at 9:38 am |

I’ve got another question that should be reasonably easy to sort out. I’m wondering whether we could get away with a system of weights that’s a lot simpler than the function that we’ve been discussing. We sort of know that any system of weights that has a chance of working needs to be concentrated on highly smooth numbers. Up to now, I’ve been imagining that that property has to be “hereditary”, in the sense that if you restrict to any HAP, then the restriction of the weights to that HAP would need to be concentrated on smooth multiples of the common difference. But I now see that that is not the case. For instance, let be a large prime (for the sake of example one could take it to be around say). Is it a problem if the weight of the multiples of that are not multiples of 3 is comparable to the weight of the multiples of ? I think not. The example that would show that it was a problem would be the function that takes the value 1 and -1 at numbers of the form and and 0 everywhere else. This sequence has discrepancy 1, but its weight is tiny, because most of the weight is on non-multiples of

Loosely speaking, if our system of weights is given by the function we need to worry only about sequences that are “spread out relative to “. This point has probably been obvious to many people, but somehow it had eluded me.

The kind of thing that might allow us to do is simply define a measure of smoothness in some fairly simple-minded way and use just that. It might even be something as simple as insisting on a certain number of prime factors up to multiplicity. That may well be too simple, but let me ask the question anyway. Let be some parameter that tends to infinity slowly, and let be the set of all numbers up to with at least primes in their prime factorization. Does there exist a sequence of HAP-discrepancy at most 1 such that for some absolute constant ? In looser terms, can we find a sequence of bounded discrepancy that has mean square value at least 1 on the smooth numbers?

A small remark about that: a sequence of bounded discrepancy must in particular be bounded. So we are asking for the mean square value on to be within a constant of the maximum possible, which is equivalent to asking for the mean absolute value on to be within a constant of the maximum possible.

So here is another formulation. Does there exist a sequence of bounded discrepancy such that ? If the answer is no, then EDP follows (unless I’ve made a mistake in the above reasoning, which is possible). So really I’m asking this as a way of testing whether using the characteristic function of as a system of weights is a reasonable thing to try.

Another remark is that would have to be bigger than since otherwise we know that almost all numbers have prime factors, even if we insist that they are not multiples of 3, and the usual counterexample works. But that is not a problem: nobody would want to call a number around smooth if it had only prime factors, given that that is typical behaviour.

July 6, 2010 at 4:27 pm

What is the measure of the set of non-multiples of 3 in the set (defined for some suitable ) ?

July 6, 2010 at 4:58 pm

Good question. Here’s a heuristic argument that it might be o(1). Suppose, as seems plausible, that a typical k-smooth integer less than n is divisible by Then disallowing multiples of 3 places quite a big restriction on the integer. To put it slightly differently, if you believe the (to me plausible) assertion that there will be a certain degree of measure concentration in the powers of small primes that divide a typical k-smooth integer, then the probability that the power of 3 is zero should be o(1). I need to think a bit about whether that measure-concentration assertion is a reasonable one though.

Sorry, I realize that some of what I wrote above is a bit daft: if a typical k-smooth integer is divisible by then it is divisible by 3. So let me try to say it differently. If you want to build up a smooth number without using 3, then you’re forced to use either powers of 2 or powers of primes larger than 3. The latter cause your number to get bigger faster, and the former give you less flexibility than you would have had if you were allowed to use 3 as well.

I’m trying to get something useful out of thinking about the simplex

and for every We are interested in integer points in this simplex with coordinates adding up to at least Can we say what a typical such point should look like? And does restricting to the plane cut down the number of points substantially?

July 7, 2010 at 10:26 am

It now seems to me that measure concentration is not really what’s going on. The plane ought to be the biggest of all the planes but small simply because we have lost a dimension.

July 7, 2010 at 10:36 am

Your heuristic argument sounds believable. I was trying to think about your simplex argument. I think one can show by induction that the number of points in the simplex you define (without the restriction) is something like . However I don’t have a good way to estimate the number of points with the lower bound on the sum of coordinates.

July 7, 2010 at 7:31 am |

I just wanted to say that the renewed efforts and the idea to go from AP insights to HAP insights look very exciting to me.

July 7, 2010 at 9:14 am |

Going back to the question of what conditions on make it suitable for proving EDP: obviously, a

necessarycondition is that for every sequence of bounded discrepancy, is concentrated on .It seems to me that this is also a

sufficientcondition. Here’s a rough argument. Let be the set of sequences with discrepancy less than or equal to 1. Now the maximum value of for will be attained at an extreme point of . But I think that the extreme points of are all -sequences. (I need to check this, but I think it can be shown by induction on .) Therefore it is sufficient to consider these. The step from finite to infinite sequences also needs checking.If this is correct, it suggests that it would be worth looking for classes of sequences of bounded discrepancy to see what they can tell us about (and whether a proof along these lines is possible). Do we have some nice examples of these, in addition to the characters that we have already considered?

July 7, 2010 at 10:22 am

That’s a very nice observation, one that makes me less worried about just guessing a function , since it suggests that the problem is not

toosensitive to the choice of In that connection, it would be interesting to see whether it can be extended. For example, does the condition characterize sequences for which the SDP proof can be made to work (in the sense that there is a quadratic form built out of HAPs with coefficients not too large such that subtracting off the -diagonal matrix results in something positive definite)? Or the representation-of-diagonal proof?Largely for my own benefit, let me see if I can spell out all the details of your argument. First, it is trivial that the set of sequences of discrepancy at most 1 is convex. Secondly, they trivially take values in (since we can look at singleton HAPs).

The less trivial claim is that the extreme points take values in You suggest proving this by induction. So let be minimal such that there is an extreme point with a value that does not belong to Let’s consider first the case where is also an extreme point. In that case it takes values in which implies that the sum up to but not including along any HAP that contains is 1, 0 or -1. If there are sums of 1 and -1 then we are forced to take and otherwise we can (and therefore, by extremeness, must) take So in this case we are done.

Now suppose that the sequence is not extreme. It occurs to me that the argument above allows us to conclude that unless there is a HAP that contains and also some such that

What is the precise constraint on once we know the values of ? Well, we take each HAP that includes and sum along it. This gives us a bunch of numbers one for each factor of We then know that and So is confined to some interval that contains zero. If our sequence is to be extreme, then will have to equal either or . WLOG it equals If the sequence up to is not extreme, then let it be a non-trivial convex combination of two other sequences and of discrepancy 1. Then they …

I’m getting a bit stuck. Let me ask a general question at this point. Does the proof use the fact that we’re talking about HAPs, or does it apply to the discrepancy relative to any system of sets that contains all singletons?

Having asked that second question, let me try to tackle it by a different induction. I’ll start with the cube (that is, taking all singletons) and add sets one at a time. And I’ll attempt to argue that all the new extreme points I create have coordinates in

Just as a warm-up, let’s think about the very first set. That is, I want to know what the extreme points are of the set of all such that and . Without loss of generality, Clearly if is extreme, then for every Also, if there is exactly one such that is not an integer, then the A-constraint is not biting, so we do not have an extreme point. Finally, if and are both non-integral, then we can replace them by and for any small (either positive or negative) so we don’t have an extreme point.

As a matter of interest, what are the new extreme points? Well, they must be defined by the intersection of one of the hyperplanes with of the coordinate hyperplanes So we choose values of the sequence to be and let the final value be determined by the sum in If has even cardinality that does not add any new extreme points, but if has odd cardinality then we get points that are everywhere except in one place in where they are zero (and of course, the sum in has to be ).

Now let’s suppose I’ve already got a set system Actually, I think I have a counterexample, at least at this level of generality. Let’s consider the set system consisting of all subsets of apart from itself. Then the point is an extreme point.

So this shows that the proof must use in some important way the fact that the sets are HAPs, whereas my previous attempt wasn’t engaging with that condition in a serious way. So I am now stuck properly.

July 7, 2010 at 11:24 am

You’re right, it’s not as clear as I thought. I’ll have a think about what the extreme points for the HAP system actually are…

July 8, 2010 at 7:31 am

Simple observation: you can implement Tim’s counterexample in terms of HAPs:

Pick primes and consider the numbers . Two of them appear in the HAP for , two in the HAP for and two in the HAP for . Assign to each of these three numbers and zero to everyone else.

July 8, 2010 at 8:21 am

That doesn’t yet contradict Alex’s assertion since you’ve not included all HAPs — in particular you haven’t included the HAP with common difference 1 — and as a result the sequence has discrepancy 3/2. But my guess is that it should be possible to find a counterexample with all HAPs included.

July 8, 2010 at 1:09 pm

Ah … good point. So a HAP counterexample must use values of different signs, else the sum of absolute values would be at most 1 (because of the HAP of common difference 1) and it would be easy to express this as a convex combination.

July 8, 2010 at 1:41 pm

Maybe there’s some easy fix actually. Suppose we set the sequence equal to 1/2 at pq, qr and pr, where p, q and r are three primes as in your example, and then we stick a value of -1 in between qr and pr at a number s that’s coprime to p, q and r. I’m assuming here that the order of the four numbers is pq, qr, s, pr. I haven’t checked carefully, but I think this may work.

July 8, 2010 at 9:03 pm

I think I have a counterexample: I claim that the sequence is an extreme point of .

Suppose not. Then we can find such that . By considering the singleton HAPs , and , we immediately deduce .

Then the HAP tells us that ; the HAP tells us that ; and the HAP tells us that . It follows that . So , a contradiction.

July 8, 2010 at 9:55 pm

I realize after thinking about that that the approach in my earlier comment (following Moses’s yet earlier one) doesn’t quite work. The smallest example you get from the method I was suggesting is

But there’s no reason to suppose that you can’t mess around with the zeros. Indeed, in this case you certainly can, but even if one makes the whole sequence add up to 1, it’s still hard to stop it being possible to add a small c to one zero and subtract c from another. So it’s nice that you’ve come up with a genuine example.

July 9, 2010 at 8:19 am

Yes, in general to show that is an extreme point, we need to exhibit

linearly independentHAPs on which sums to .This still leaves open the question whether we can form such a sequence ‘uniformly’: that is, whether there is an infinite sequence with elements not all in whose restriction to each interval is an extreme point of . I’m pretty sure it must be possible.

July 7, 2010 at 12:31 pm |

This will be another rather speculative post. It seems to me that a property that could be very useful to have if we want a measure on to work for us is that it should be approximately invariant under multiplication by small numbers. That is, if we call the measure then should be approximately equal to for all small This is certainly achievable: for example, one could pick an integer and multiplicatively convolve the characteristic measure of with itself a few times. This measure would be concentrated on smooth numbers and would have a useful approximate invariance property.

The nice thing about the approximate invariance is that it gives us plenty of HAPs to play with. For example, writing for the characteristic measure of I think we have that (where that is supposed to be a multiplicative convolution) in a strong enough sense that we can say that is roughly constant along HAPs of length More concretely, if is large, then I think it approximately equals whenever which gives us HAPs containing along each of which is roughly constant.

Now let’s suppose we want to approximate the function on One way we might try to do it is this. Out of the HAPs of length that contains we pick those with common difference such that is small. And those we approximate by a suitable linear combination of chunks.

Hmm … I have to go now, but may come back to this. I’ve only just thought of the invariance idea, so its apparent promisingness may be illusory.

July 7, 2010 at 2:09 pm

A brief further thought is that the invariance property is easy to achieve and doesn’t really need multiplicative convolutions. One can simply define a non-trivial and non-negative Lipschitz function on the simplex mentioned earlier — that is, the set of all points such that each and

with the distance between and being — that vanishes on the boundary, and let By “Lipschitz” I mean Lipschitz with constant o(1) or something like that.

July 7, 2010 at 2:52 pm

Now let’s suppose we have a measure that is approximately invariant in this sense. I would like to -approximate the function by a linear combination of HAPs. Let’s go back to the idea from the main comment above. How many HAPs containing do I expect to have common difference with ? The possible common differences are for So I am asking for how many we have small. (Note that if is -typical, then is an integer for every .)

Let me make a big assumption here, namely that the set is nicely quasirandom, in the sense that for -almost all dilates (by integers divisible by all of 1,2,…,m), the set is uniformly spread around the circle, or something along those lines. That would tell us that for almost all the number of HAPs containing along which varied slowly would be (for almost every ?) approximately the expected number, which would tell us that the function was -approximately proportional to which would approximately express (a multiple of) as a linear combination of HAPs.

There are quite a lot of potentially dodgy steps there, but if it could be got to work, then we might have a hope of showing that there was cancellation between the coefficients resulting from different

I’m not sure how much I believe this hazy sketch, but I think a further feasibility study might be called for.

July 7, 2010 at 3:34 pm

Another way of looking at the function discussed above is as follows. For each (which is playing the role of above) we look at and if it is small then we add in the HAP weighted by Or more likely we take some nice function that’s concentrated on the set of such that is small and has nice Fourier behaviour for each fixed (to get good cancellation when we look at all s at once). Then we could consider the function We would be hoping that was roughly proportional to

I’m aware that I’m probably becoming less and less comprehensible, with more and more statements whose motivations are likely to be difficult to grasp since I’m not really explaining them. At some point I might try to do some proper calculations to see whether I can be more precise and formal.

July 10, 2010 at 10:25 am |

Let me elaborate on my last comment and try to free the ROI approach to Roth’s theorem of the numbers. The goal is to get a ‘qualitative’ (infinitary) discrepancy result which hopefully is easier to adapt to EDP. The program looks as follows:

1) Choose an appropriate Hilbert space

2) Choose a group of linear operators acting on this space

3) Get a representation of identity by eigenvectors of these operators

4) Make sense of with the omegas as eigenvectors, pi as characteristic sets, lambdas as eigenvalues and convergence of the involved sums.

5) Solve the above for the eigenvector and plug it into 3)

6) Get some condition in the eigenvalues

7) Derive a qualitative discrepancy result on the sets used in 4)

Maybe I am too optimistic, but that seems to be doable. Since APs with common difference d are invariant under translations we get a periodic symmetry group and it is natural to consider: 1) and 2) the rotation group on it. We then get 3) a representation of the group action in terms of rank one operators at least on a dense subspace. Since identity is part of the group we are on the track.

How would these first three steps look like in the EDP situation? ROI is ‘essentially’ equivalent to a periodic symmetry. We do not have that. Hence we just try to get ROD or even less. Maybe not with trigonometric function, since they are tied to periodic symmetries. ROD ‘means’ that there is no group, which is reasonable. However, each step (but the estimates) in the ROI approach heavily uses groups. That would indicate that the hoped for transfer from AP to HAP will be more upon the estimates than the ‘abstract’ content.

July 10, 2010 at 1:54 pm

This definitely seems worth thinking about. I’d like to make one small remark, which is that if we work over the rationals then there

aresome important symmetries of the set of HAPs — we can dilate them. Having said that, I’m not sure how important it is, given that the eigenvectors corresponding to the group of dilations are likely to be more like GPs than APs and therefore not obviously decomposable in nice ways.July 12, 2010 at 8:52 am

It looks as if we had to work on an ultra power (along a non-principal ultra filter). Such constructions usually are space extensions turning approximate point spectrum of our operators into point spectrum. That means, we get eigenvectors. Then, translations, eigenvectors and ROI might yield a periodic group action on this larger space. Vague, but indeed a small hint that we should reconsider the problem over the rationals. This time with ROI in mind.

July 10, 2010 at 2:17 pm |

I have been away from this problem for a while now, going to conferences and other things, so I am still catching up with the recent developments.

While traveling I did try one thing which have mentioned here before, namely taking a look at the longest discrepancy 2 examples, truncated to the first 1024 values, in the Haar wavelet basis. I didn’t see any obvious structure, but as expected the fourier coefficients are nicely balanced. 536 of them being 0, 268 being 2 and 268 being -2.

One of the vague reasons for my interest in this basis is that these wavelets behave nicely under scalings, thereby giving a nice connection between HAPs with large and small differences with a common divisor.

Regarding decomposition of ON-bases in terms of APs, one might note that the Haar wavelets are very simple to decompose exactly using APs. This can be done for any difference d using 3k/d APs for a basis element of period 2k. So a nice proof of Roth should work in this basis as well.

Now back to reading.

July 13, 2010 at 11:07 pm |

I want to continue the discussion begun in this comment of Uwe’s, but am starting a new comment so that I have a full column’s width to work with.

The main thing I wanted to say was that I’ve become quite interested in the idea of trying to apply a representation-of-diagonals approach to the problem over the positive rationals. The main reason for that is that I think that there is some hope that we might be able to take the diagonal matrix to be the identity in this case, and if so then the problem ought to be a whole lot easier. Of course, we introduce a new difficulty, which is to make sense of various concepts over the rationals — in particular measures, probabilities, averages, etc. But there are plenty of techniques one can try to use to get round the problems that arise.

Why might the identity be OK? Well, the example that shows that it isn’t OK over the natural numbers is our old favourite 1 -1 0 1 -1 0 … and it doesn’t seem to extend to the rationals. Let me try to give a better justification for that than merely saying that I don’t see how to extend it. What we would want is a simple function that is large on a positive proportion of the rationals (whatever that means). That statement should be scale invariant … hmm … let me try again. The thing that makes the proof work is that every HAP intersects the mutliples of 3 in a set of relative density at least 1/3. Can we hope to find a set of rationals that intersects every HAP densely (with a uniform lower bound on the density)? We certainly can’t do it just by making the function vanish on one HAP.

I’m not getting very far with that, so I’ll just leave it dangling: if someone can think of a convincing example of a function defined on the rationals that is large on a “positive proportion” of rationals in some sense, and has bounded discrepancy, then I’ll stop feeling optimistic about this idea. And if someone can formulate this question in a nice way then I’ll be quite grateful.

Anyhow, back to the rationals. The general strategy would be to work out what the characters are on the rationals (I think we’d want all rationals and not just the positive ones, so we can talk about the additive group, but that’s OK) and then to try to decompose those into HAPs.

Let’s just think what a homomorphism from the additive group of to would be. An obvious example is to pick a real number and map to . Are there any others? I think there are (and this will either be well-known stuff or wrong). Let me try to explain why.

Suppose is a character such that It follows that for every integer But this does not determine For instance, could be So far that is just taking But what if we went further and ensured that for every positive integer Does that imply that is identically 1? It certainly implies that is 1 on the dyadic rationals. But what if we look at the subgroup generated by the dyadic rationals and 1/3? Then we seem to have the possibility of sending to where is any dyadic rational and is a positive integer. Earlier I thought I had a complete description of the characters, but I didn’t check it and now I’m not sure it was correct, so I won’t risk a botch job here.

What I will say is that I think it is worth sorting this question out and then seeing whether there are nice ways of making characters out of HAPs. The character that takes to is I think splittable up into HAPs. One simply picks a quickly growing sequence of positive integers such that is small and each integer in the sequence divides the next. Then the function varies very slowly as you go through mutliples of and then between each of those you can get all the multiples of that you haven’t yet included (in a long HAP, since is so much bigger than ), and so on. I forgot to add that every positive integer should divide some so that all rationals are eventually included.

One other reason for all this is that the talk of a measure on smooth numbers that is almost invariant under multiplying by small integers felt like an attempt to give a finite approximation of a much tidier world with rationals where the counting measure that is genuinely invariant.

July 14, 2010 at 11:03 am

I’ve now sorted out to my own satisfaction what the characters on are. Whether the description to follow is the nicest one I don’t know, but it works.

Observe that a character on is determined by what it does to for every Given any character we can define a sequence of real numbers such that and

Since is a character, it follows that for every integer Therefore, the numbers must satisfy the compatibility condition that and differ by an integer (so that ). In other words, can be any non-negative real number less than that differs from by a multiple of .

Conversely, if we are given a sequence of real numbers satisfying this condition, then we can define to be for any such that . Equivalently, we can define it to be the limit of as tends to infinity, the compatibility condition ensuring that the sequence will be constant once .

The characters of the form are the particular case where the sequence is itself eventually constant.

One nice aspect of this is that we can define a character by choosing an element of followed by an infinite sequence of finite choices. It is therefore possible to put a product measure on the dual of and make sense of expressions such as Therefore, we have a representation of the identity using characters. (Note that for each pair of rationals and we need only worry about an initial segment of the infinitely many choices we have to make in order to determine a character, since after that the choices will have no effect on or . So the fact that we are talking about an infinite product measure is particularly simple to deal with.)

The next thing to think about is whether there is a nice canonical way of decomposing characters into HAPs.

July 14, 2010 at 12:22 pm

I have a simple lemma that may help with this last task, but may not be strong enough. The lemma says that if is a character on then for every there exists a positive integer such that

The proof is simple. Let be an integer greater than let and let where as usual means Now choose a positive integer such that the distance from to the nearest integer is at most It follows that Since is an integer, we are done.

The bound that comes out of that argument is pretty bad: turning things round we get that can be made to be around Does anyone see an improvement to this? I quite like this question and I think I’ll go and ask it at Mathoverflow.

Update: I’ve asked it here.

July 14, 2010 at 3:45 pm

A byproduct of posting the question on MO is that someone has pointed out that the dual group of is normally defined in terms of adeles. (My knowledge of what an adele is is almost nonexistent, but there was just enough of an impression of the kind of thing it was for this not to come as a surprise.)

I’m about to see whether the lemma above gives a decomposition of characters into HAPs that could be used to to give a ROI proof of EDP, modelled on the ROD proof of Roth’s theorem.

July 16, 2010 at 7:47 am

It seems to me that the question considered in the lemma you prove is very similar to the question you raised earlier here and the bounds are in the same ballpark too. There, we reasoned that -typical integers less than are divisible by all integers upto about and that gave an approximation with error about .

July 16, 2010 at 10:28 am

I agree that the question is very similar — indeed, I am trying to model the smooth-number situation in an infinitary way by looking at the rationals instead. Whether the bounds are in the same ballpark is less obvious to me, because what the bounds are supposed to depend on is different.

Just to explain this, let me recall your observation. The earlier question was how large a number you would have to take if you wanted it to have the property that for every there was a factor of such that You pointed out that if for some prime that does not divide then we cannot do better than which roughly translates into having to be exponential in or equivalently cannot be smaller than

In the rationals, however, we are interested in the dependence of not on (since in the additive group of the rationals numbers don’t have different “sizes”) but rather on In fact, though, we phrase the question slightly differently: we have a character and want to know how big has to be before there is guaranteed to be such that The proof of the exponential bound is essentially the same as before, but the example that showed that it was best possible no longer works, since all primes divide 1 in the rationals.

July 14, 2010 at 4:07 pm |

Once again a new comment, but following on from the previous block of comments.

Let be a character on I want to show that can be decomposed efficiently as a linear combination of HAPs, using the lemma above, which for convenience I will restate.

Lemma.For every there exists a positive integer such thatThe first step towards decomposing is to choose a large integer and to find an integer such that This means that varies very slowly as you run through the multiples of To be a bit more precise, we can split up the multiples of into HAPs of length on each of which varies by at most or so. (I’m defining a HAP to be a set of the form .)

Once we’ve done that, we can find a much larger integer such that and such that . That allows us to divide up the multiples of into HAPs of length on each of which varies by at most or so. We don’t want these HAPs to contain multiples of but since is much larger than we can get the lengths approximately correct and just partition all the multiples of that are not multiples of

We continue this process. If as we do so we also ensure that then we will have partitioned all of into HAPs on which is roughly constant. And that is morally the same as decomposing a character into HAPs. (We can do it better by convolving.)

Does that mean the proof is finished? No, but I think it’s very encouraging that we have a set of characters that can be split up into HAPs.

The missing ingredients — and at this stage I’m not clear whether there’s a missing idea or whether it’s just that this idea needs to be investigated more carefully — are that in the proof of Roth’s theorem we exploited a lot of cancellation between coefficients of products of APs used in the decomposition, and that to achieve that cancellation we used a nice formula for the decomposition. What’s more, the formula didn’t actually give a decomposition of the identity, but rather a decomposition of the identity plus a positive semidefinite matrix. (That last aspect might work more tidily in the rationals, however.)

So the next step in investigating this approach is clear: one should try to think of a suitably canonical way of expressing a character as a linear combination of HAPs, and one should then look at the coefficients that result when one rewrites the decomposition in terms of the HAPs used to make the characters

July 14, 2010 at 4:52 pm

Actually, an even better next step might be to pursue what Uwe was suggesting and try to get a clean infinitary version of the Roth argument. The reason that feels like a good idea is that there as well the natural measure on the group (which is ) is infinite but the natural measure on the dual group (which is ) is finite. Also, the measure on the dual group is continuous. I don’t know whether the proof would be difficult to adapt (the new statement would be that for any function defined on and any positive integer you can find an AP of length on which the discrepancy is at least ). My hope is that it would not be hard to adapt, and that the adapted proof would provide a better model for the modification to HAPs and the rationals.

July 14, 2010 at 6:48 pm

A question that naturally arises if one wants a ROI proof when the ground set is infinite is what to do about the fact that the trace of the identity is infinite (as, presumably, is the sum of the coefficients of the HAP or AP products). Is there some way of arguing that the “density” of coefficients is small?

As a toy problem, consider the following integral representation of the identity on (that is, the matrix where and range over all integers):

We can also think of this as The “sum of coefficients” is in fact an integral of coefficients, which comes to 1, whereas the trace of the identity is infinite. Therefore, given any function on we ought to expect a trigonometric function (which takes to ) with respect to which has infinite discrepancy.

Can we make this rigorous? It’s tempting to say that has infinite norm, and therefore that is infinite, and therefore that is infinite for some which is precisely saying that is infinite.

But that isn’t rigorous. Isn’t it more correct to say that the Fourier transform of doesn’t exist? And even if it did exist, there would be questions about the nature of the convergence of .

Maybe one can do something along these lines but I live so much in the finite world that I don’t have the right tricks at my fingertips.

It looks to me as though things get yet more difficult if we have a decomposition of the identity where the sum of the coefficients is infinite, but “less infinite” than the trace of the identity. I don’t think it is an insuperable difficulty, but it’s not completely obvious that it can be solved in a purely infinitary way. Possibly as Uwe suggests, taking limits along ultrafilters would help.

Let me end this remark with a question: can anyone think of a good adaptation of the basic ROI or ROD lemma (the one that says if you have a representation such that the sum of the moduli of the coefficients is significantly less than the trace of the diagonal matrix then you are done) that works for infinite sets?