EDP24 — an attempt to get back into the diagonal-decomposition approach

Gil has a not quite finished third post that will appear soon on this blog. Meanwhile, here are a few thoughts I’ve had recently as I tried to get my brain into EDP mode again.

The approach to the Erdős discrepancy problem that, rightly or wrongly, I found most promising when we were last working on it was to prove a certain statement about matrices that can be shown quite easily to imply a positive solution to the problem. In this post, I’m going to treat that matrix statement as the problem, and think about how one might go about trying to prove it. I’ll give the brief explanation of why it implies EDP, but not of what the possible advantages of the approach are (discussions of which can be found in some of the earlier material).

First I’ll need to recall a couple of definitions and some notation. A homogeneous arithmetic progression, or HAP, is a set of the form \{d,2d,\dots,rd\} for some pair of positive integers r and d. Let me say that u is a HAP-function if it is the characteristic function of a HAP. Finally, if f and g are functions defined on \mathbb{N}, let f\otimes g denote the function defined on \mathbb{N}^2 that takes the value f(x)g(y) at (x,y), which can be thought of as an infinite matrix.

Here, then, is the statement that it would be great to prove. What is nice (but also quite daunting) about it is that it is a pure existence statement.

Problem. Prove that for every C there exists a real matrix decomposition D=\sum_i\lambda_iu_i\otimes v_i with the following properties.

(i) D is a diagonal matrix and \sum_{x=1}^\infty D_{xx}\geq C.

(ii) Each u_i and v_i is a HAP-function.

(iii) \sum_i|\lambda_i|\leq 1.

I have stated the problem in this form because (for a reason I will outline below) I am confident that such a decomposition exists. However, it seems to be quite hard to find.

Why would the existence of an efficient decomposition of a diagonal matrix into products of HAP-functions prove EDP? Well, let \epsilon=(\epsilon_1,\epsilon_2,\dots) be a \pm 1-sequence and consider the quantity \langle \epsilon, D\epsilon\rangle. On the one hand it equals

\displaystyle \sum_x D_{xx}\epsilon_x^2=\sum_x D_{xx}\geq C,

and on the other hand it equals

\displaystyle \sum_i\lambda_i \langle \epsilon,u_i\rangle\langle\epsilon,v_i\rangle.

Since \sum_i|\lambda_i|\leq 1, there must exist i such that |\langle \epsilon,u_i\rangle\langle\epsilon,v_i\rangle|\geq C, from which it follows that there exists a HAP P such that |\sum_{x\in P}\epsilon_x|\geq C^{1/2}. This shows that \epsilon must have HAP-discrepancy at least C^{1/2} for every C, and we are done.

Of course, one can deduce anything from a false premise, so one needs at least some reason to believe that the decomposition exists. Quite a good reason is that the existence of the decomposition can be shown (without too much difficulty) to be equivalent to the following generalization of EDP.

Problem. Show that if (a_n) and (b_n) are any two sequences of vectors in a Hilbert space such that \langle a_n,b_n\rangle=1 for every n, then for every constant C there exist HAPs P and Q such that |\langle\sum_{n\in P}a_n,\sum_{m\in Q}b_m\rangle|\geq C.

This is a strengthening of EDP, since it clearly implies EDP when the Hilbert space is \mathbb{R}. However, it doesn’t seem like such a huge strengthening that it might be false even if EDP is true. At any rate, it seems just as hard to find a counterexample to this as it does to EDP itself. (You might ask why I believe that EDP is true. It’s partly blind faith, but the heuristics presented by Gil Kalai in the previous post offer some support, as does — very weakly — the experimental evidence.)

How does one ever solve existence problems?

Let’s begin by thinking very generally about the situation we’re in: we have to come up with a mathematical object X that has certain properties P_1,\dots,P_k. (Of course, we could combine those into one property, but it is often more natural to think of the object as having several different properties simultaneously.)

I’m going to list strategies until I can’t (or at least can’t immediately) think of any more that seem to have any chance of success. (An example of a strategy that I think has no chance of success for this problem is to search for a just-do-it proof, the problem being that the constraints appear to be quite delicate, whereas a just-do-it proof works better when one has a number of independent constraints that are reasonably easy to satisfy simultaneously.)

1. Use an object you already know, possibly with some small modifications.

2. Start with examples that work for a certain constant C, and use them as building blocks for larger and more complicated examples that work with a larger constant C.

3. Use some kind of duality to prove that if an example does not exist, then some other object does exist, and show that in fact the second object does not exist.

4. Use a computer to search for small examples, try to discern a pattern in those examples, use that pattern to guess how to construct larger examples, and check that the guess works.

5. Strengthen the conditions on the object until there is a unique (or almost unique) object that satisfies those conditions.

6. Weaken the conditions on the object (e.g. by forgetting about some of the properties P_i), try to describe all objects that satisfy the weaker conditions, and then reformulate the problem as that of finding an object of that description that satisfies the stronger conditions. [As a very simple example, if you were required to find a positive solution of some quadratic equation, you would solve the equation without the positivity condition and then pick out a positive root.]

7. Assume that you have an object with the required properties, deduce some interesting further properties, find a natural object with the further properties, and hope that it has the original properties as well. [This is a variant of 6.]

How do the general strategies fare when it comes to EDP?

I will discuss them briefly at first, before concentrating in more detail on a few of them.

1. This doesn’t appear to be the kind of problem where there is already some matrix decomposition (or related mathematical object) waiting in the wings to be modified so that it becomes the example we want. Or if there is, the only way we are likely to discover it is by pinning down more precisely the properties we are interested in so that someone somewhere says, “Hang on, that sounds like X.” But I think it is very unlikely that we will discover an example that way.

2. I am definitely interested in the idea of starting with small examples and using them to build bigger examples. However, an approach like this needs plenty of thought: non-trivial small examples seem hard to come by, and the only obvious method for building bigger ones is taking linear combinations of smaller ones. (A trivial small example is to take a linear combination of the form \sum_i\lambda_iu_i\otimes u_i where each u_i is the characteristic function of a singleton. This gives us C=1. To get anything interesting, we need the u_i to be HAP-functions for longer HAPs, but then it is very difficult to get the off-diagonal terms to cancel.)

3. We are trying to write D as an efficient linear combination of objects of a certain form. That is a classic situation for applying the Hahn-Banach theorem: if there is no such linear combination, then there must be a separating functional. What can we say about that functional?

This approach seems promising at first, but is in fact of no use at all, since if you pursue it, the conclusion you come to is that if there is no such linear combination, then there are sequences of unit vectors (a_n) and (b_m) in a Hilbert space and a constant C such that |\langle\sum_{n\in P}a_n,\sum_{m\in Q}b_m\rangle|\leq C for every pair of HAPs P and Q. The reason duality doesn’t help is that we used duality to reformulate EDP as a problem about decomposing diagonal matrices. If we use duality again, we get back to (a generalization of) EDP.

4. We have tried this. Moses Charikar and others have written programs to search for the most efficient decompositions (for this and for a related problem). The results are useful in that they give one some feel for what the diagonal matrix D needs to be like, but it does not seem to be possible to obtain a large enough matrix to use it to go as far as guessing a formula. For that, more theoretical methods will be necessary (but knowing roughly what the answer needs to look like should make finding those methods quite a bit quicker).

5. One difficulty is that we are not looking for a unique object: it would be nice if the properties we were trying to obtain determined the matrix and its decomposition uniquely, since then we could hope to calculate them. An obvious property to add is that the object is in some sense extremal (though what that sense is is not completely obvious). Another thing we can do is insist on various symmetries — something that has already been tried. Yet another is to restrict the class of HAPs that may be used: as far as we can tell at the moment, we can get away with HAPs with common differences that are all either prime or a power of 2. (I say that simply because it seems to be hard to find a sequence that has bounded discrepancy on all such HAPs. Gil’s heuristics suggest that the discrepancy should grow like \log\log n for sequences of length n.)

6. What weaker properties might we go for? One idea is simply to try to find a linear combination of products of HAP-functions that leads to a considerable amount of cancellation off the diagonal, even if there are still off-diagonal terms left. This would be interesting because if one naively tries to find a decomposition that works, it seems to be very hard to get a substantial amount of cancellation (except if very small HAPs are used, in which case the resulting decomposition is not very informative). Another (somewhat related) idea is to look for interesting decompositions but with not particularly large values of C. Yet another might be to allow a wider class of functions that HAP-functions.

7. I don’t have much idea of how to apply this strategy to the entire decomposition, but there is plenty one can say about the diagonal matrix that ones wishes to decompose, and that information, if it could be expressed neatly, would surely be very useful: it seems much easier to try to decompose a specific diagonal matrix than to try to decompose a diagonal matrix that you have to choose first, when the vast majority of diagonal matrices cannot be efficiently decomposed.

The proof that the diagonal decomposition implies EDP actually proves a much stronger result. It gives us a function d(n)=D_{nn} such that \sum_nd(n)=\infty such that if \epsilon=(\epsilon_n) is any sequence of real numbers (not necessarily \pm 1-valued) with \sum_nd(n)\epsilon_n^2\geq 1, then \epsilon has unbounded HAP-discrepancy. To put that loosely, the sequence (d(n)) cannot correlate with the square of any sequence of bounded discrepancy. That puts very strong conditions on (d(n)), but so far we have not fully understood what those conditions are. It seems that we should be able to make progress on this, perhaps even adding reasonable conditions that would allow us determine the function uniquely.

Creating new examples out of old ones.

Suppose that D=\sum_i\lambda_iu_i\otimes v_i and D'=\sum_j\mu_jw_j\otimes z_j. Clearly if we take a linear combination of D and D', we get another decomposition of a diagonal matrix, and if the original decompositions were into products of HAPs then so is the new one.

Can we do anything more interesting? Another possibility is to look for some kind of product. Now Dirichlet convolutions are natural objects to look at in the context of EDP, since the expression \sum_{d|n}f(d)g(n/d) can be thought of as follows. You start with a function g defined on \mathbb{N}, then you define the dilate g_d to be the function that takes the value g(r) at rd and 0 at non-multiples of d, and finally you take a linear combination \sum_df(d)g_d of those dilates. If g is identically 1, then this is taking a linear combination of HAPs.

We can’t take a Dirichlet product of two matrices, but we can do something similar: given functions f and g of two integer variables we can take the function h(x,y)=\sum_{d|x,d'|y}f(d,d')g(x/d,y/d'). If f and g are diagonal (meaning that they are zero if x\ne y), then the terms of the sum are zero unless d=d' and x=y, so in this case h(x,x), considered as a function of x only, is the Dirichlet convolution of f and g, also considered as functions of one variable.

This operation also respects products in the following sense. If f=u\otimes v and g=w\otimes z, then

\displaystyle h(x,y)=\sum_{d|x,d'|y}u(d)v(d')w(x/d)z(y/d'),

which equals p(x)q(y), where p is the Dirichlet convolution of u and w and q is the Dirichlet convolution of v and z.

One further obvious fact is that the operation is bilinear in the two matrices that it is applied to.

So far everything is working swimmingly: out of two decompositions of diagonal matrices into linear combinations of products we build another one that has some number-theoretic significance. However, there is a problem: a Dirichlet convolution of two HAP-functions is not an HAP-function except in trivial cases.

How much does that matter? Without doing some detailed calculations I don’t know. Here are a couple of simple observations. Let u and v be two HAP-functions. Then their Dirichlet convolution is a sum of dilates of v, one for each point in the HAP of which u is the characteristic function. So we can at least decompose the Dirichlet convolution of two HAP-functions as a sum of HAP-functions. The sum of the coefficients is the number of points in the HAP corresponding to u — though since the situation is symmetrical, we can add dilates of u instead, so it is better to say that the sum of the coefficients can be taken to be the length of the smaller of the two HAPs.

That doesn’t sound very efficient, but we must also remember that when we take the Dirichlet convolution of two diagonal matrices, we obtain a diagonal matrix that sums to the product of the sums of the original two matrices. So there may be some gain there to compensate for the loss of efficiency. That is why I say that more detailed calculations are necessary, a point I will return to at some stage.

What can we say about the diagonal matrix D?

As I have already mentioned, if a diagonal decomposition of the required kind exists, then we can place strong constraints on the diagonal matrix D itself. Writing \delta(n) for D_{nn}, it must have the following property: if s is any real-valued function defined on the integers such that \sum_n\delta(n)s(n)^2\geq C, then the HAP-discrepancy of s must be at least \sqrt{C}. (This follows from the proof that the existence of the decomposition for a diagonal matrix with \sum_n\delta(n)\geq C implies that every \pm 1-sequence has HAP-discrepancy at least \sqrt{C}.)

Turning that round, if we have any real-valued sequence s(1),s(2),\dots of discrepancy at most T, then \sum_n\delta(n)s(n)^2 cannot be more than T^2.

A simple example of the kind of conclusion one can draw from that is that the sum of \delta(n) over all ns that are not multiples of 3 is at most 1. This follows from the fact that the sequence 1,-1,0,1,-1,0,\dots has HAP-discrepancy 1. More generally, if \sum_n\delta(n) is large, then for all small n most of the largeness must occur on multiples of n.

In qualitative terms, this suggests that \delta ought to be concentrated at numbers with many factors, and the experimental evidence backs this up, though not quite as cleanly as one might ideally like. (However, “edge effects” could well account for some of the peculiar features that are observed experimentally, so I still think it is worth trying to find a very aesthetically satisfying diagonal matrix D.)

Here is another reason to expect that \delta(n) should be larger for numbers with many factors: they are contained in lots of HAPs. We are trying to find a measure on \mathbb{N} (at least if we insist that \delta takes non-negative values, which seems a reasonable extra condition to try to impose) such that any function with a large L_2-norm with respect to that measure must have a large HAP-discrepancy. What might make it difficult to find a function with large L_2-norm and small HAP-discrepancy? It would be that we had lots of constraints on the values taken by the functions. And the natural way that we can place lots of constraints on one value s(n) is if n has many factors and is therefore contained in many HAPs.

As an example of the opposite phenomenon, suppose we take an arbitrary \pm 1 sequence and alter its values however we like at all primes. What will happen to its HAP-discrepancy? Well, each HAP contains at most one prime, so we cannot have changed the HAP-discrepancy by more than 2. This illustrates fairly dramatically how little it matters what a sequence does at numbers with few factors.

A first wild guess.

It is tempting to jump straight from this kind of observation to a guess of a function \delta(n) that might work. For example, what if we took some primes p_1,\dots,p_m, a collection \mathcal{A} of subsets of \{1,2,\dots,m\} and let \delta be the characteristic measure of the set of all numbers that can be written as \prod_{i\in A}p_i for some A\in\mathcal{A}. That is, \delta takes the value |\mathcal{A}|^{-1} on all such numbers and zero on all other numbers.

Does a function like that have any chance of working? Because we have ensured that it sums to 1, we would now be looking for a decomposition with absolute values of coefficients summing to at most c for a small constant c. But before we even think about doing that, we should try to find a sequence that is large on many of the products of primes but that has bounded discrepancy. Only if we find it hard to do that is it worth looking for a decomposition.

Let me be more precise about what we are looking for. If we can find a decomposition of D (the diagonal matrix with entries given by \delta) as \sum_i\lambda_iu_i\otimes v_i where the u_i and v_i are HAP-functions and \sum_i|\lambda_i|\leq c, then for every sequence s(1),s(2),\dots, we have two ways of calculating \langle s,Ds\rangle. One of them gives us \sum_n\delta(n)s(n)^2, while the other gives us \sum_i\lambda_i\langle s,u_i\rangle\langle s,v_i\rangle. So if \sum_n\delta(n)s(n)^2\geq 1, then there must be some HAP on which s has discrepancy at least c^{-1/2}.

Therefore, in order to show that this particular \delta will not do, we need to find a sequence s such that s(n)^2 averages at least 1 on the numbers \prod_{i\in A}p_i but has bounded discrepancy.

Let us make our task easier to think about by looking for a sequence that takes \pm 1 values on the numbers \prod_{i\in A}p_i. Can we choose the values elsewhere in such a way that we cancel out any discrepancy that we might accidentally pick up with those values?

To answer this question, let us think about how HAPs intersect the set, which, since I’m mentioning it quite a bit, I’d better give a name to: I’ll write S=S(\mathcal{A}) for the set of numbers \prod_{i\in A}p_i with A\in\mathcal{A}. Let me also write p_A as shorthand for \prod_{i\in A}p_i. Any HAP that intersects S must have a common difference of the form p_B for some set B. The resulting HAP will contain every p_A for which B\subset A, which is rather pleasantly combinatorial.

Let’s suppose that we’ve chosen the \pm 1 values everywhere on S. What we are trying to do now is choose values off S (not necessarily \pm 1) in such a way that any HAP-discrepancy contributed by values in S is cancelled out by a roughly opposite discrepancy in the values outside S.

An obvious way to do that is to be fairly greedy about it. That is, we look at HAPs that have a substantial intersection with S and simply choose a few further values on those HAPs, hoping that our choices won’t mess up too many discrepancies elsewhere. Let’s see if we can get something like that to work.

Consider, then, the (infinite) HAP with common difference p_B. As already mentioned, this intersects S in every p_A such that B\subset A. So as we trundle along the multiples p_B, 2p_B, 3p_B, \dots, we find that the partial sums of the s(n)s that we encounter and have already chosen go up and down, and we would like to choose some further values to ensure that they remain bounded.

At which values of n are we still at liberty to choose the value s(n)? It must be a multiple rp_B of p_B, and two ways of ensuring that that multiple does not lie in S are (i) to make sure that r is also a multiple of some p_i with i\in B and (ii) to make sure that r is a multiple of p_j^2 for some j\notin B.

The question now is whether if we do that we will find that the value s(rp_B) we choose has an effect on lots of other HAPs at the same time. And it looks rather as though it will. Suppose B is quite a large set. Then any multiple rp_B that does not belong to S is also a multiple of p_C for every C\subset B, so there is indeed a danger that by choosing a value of s(rp_B) we are messing up quite a lot of HAPs and not just the one we were interested in. We could of course try to sort out the HAPs with large B first and later correct any damage we had done to smaller sets — but a large set B has a lot of subsets.

This is another situation where the devil is in the detail. Maybe there is enough flexibility that some kind of careful greedy approach would work — perhaps we could even choose s to be 1 everywhere on S — but so far it seems at least possible that there exists an efficient decomposition of a diagonal matrix with entries that are concentrated on square-free numbers with many prime factors.

It is sometimes convenient to describe arithmetic progressions of the form \{rd,(r+1)d,\dots,sd\} as HAPs. Let me do that now, and let me define two HAPs to be adjacent if they are disjoint and their union is also a HAP (in this generalized sense). In other words, two HAPs are adjacent if one is simply a continuation of the other.

If u and v are HAP-functions coming from adjacent HAPs P and Q (in which case I’ll say that they are adjacent HAP-functions), then (u-v)\otimes(u-v) is 1 on the diagonal at all points in the HAP P\cup Q, while off the diagonal it is 1 on P\times P\cup Q\times Q and -1 on latex P\times Q\cup Q\times P$. If we take lots of products of this form, we can hope to get lots of cancellation off the diagonal, while getting none at all on the diagonal.

Let’s try to pursue this idea in more detail. Suppose we have some kind of probability distribution on functions of this form. That is, we pick, according to some distribution, a random set B and then a random pair of adjacent HAPs P and Q. Now let u and v be their characteristic functions and let x and y be two positive integers. What is the expected value of (u(x)-v(x))(u(y)-v(y))?

A necessary condition for this quantity to be non-zero is that p_B should divide both x and y. So let us condition on this event. If we are clever about the way we choose our probability distribution, we should be able to organize it so that if you choose a random B, then typically plenty of its subsets have a similar probability of being chosen. The reason that is potentially important is that if we fix a common difference and pick random pairs of adjacent HAPs P and Q of some given length, then pairs (x,y) that are close to the diagonal will tend to get positive values (because normally if one of them is in P then so is the other, and similarly for Q) while pairs that are a little further away tend to get negative values. But if we have many different common differences in operation then we can hope that some of these biases will cancel out: a difference |x-y| that tends to result in a positive value at one scale might tend to result in a negative value at another scale.

While writing this I have just remembered a useful trick that we were well aware of during the previous attempt at EDP. It is straightforward to show that EDP is equivalent to the same question for the rationals. That is, one wishes to show that for every \pm 1 function defined on the positive rationals and every constant C there is a HAP (the definition is obvious) on which the sum of the function has absolute value at least C. The nice thing about this formulation is a much greater symmetry: for every rational q, the map x\mapsto qx is an isomorphism from \mathbb{Q} (considered as a group under addition) to itself. This makes it unnecessary to think about numbers with lots of factors. (It is not hard to get into this nice situation using just integers — for example, you can just multiply everything by n! for a very large n and you can divide your numbers by whatever you like, within reason — but it is more natural to use rationals.)

The hope was that if we thought about functions defined on the rationals, then the rather peculiar properties of the diagonal matrix one wishes to decompose might become less peculiar. However, one would still need to think carefully about ratios of numbers.

I’m tempted to continue thinking aloud about these possibilities, but I’ll save that up for the comments.

A second wild guess.

I’m now going to ignore what I’ve just written about working with the rationals and go back to the question of trying to find a natural function that is biased towards positive integers with many factors. What I’d like to do is build a sequence of non-negative functions \delta, each better than the last, in the hope of eventually finding one that has (or at least doesn’t obviously not have) the property that every sequence s such that \sum_n\delta(n)s(n)^2\geq 1 has large HAP-discrepancy. (To make this non-trivial, I also need \sum_n\delta(n)\geq 1.)

I’ll start with a function I know doesn’t work: let \delta(n)=1/N for every n\leq N and 0 for every n>N. That fails because the sequence 1,-1,0,1,-1,0,\dots has HAP-discrepancy 1 but \sum_n\delta(n)s(n)^2=2/3 (at least if N is a multiple of 3). That doesn’t satisfy the condition I said, but it does if we multiply it by \sqrt{3/2}.

The problem there was that \delta has a large sum on non-multiples of 3. To correct that and similar problems, it feels natural to give greater weight to numbers with more factors. But how? Well, let’s do it in a rather naive way and simply weight each number by how many factors it has (and therefore how many HAPs it belongs to). This at least has the advantage of being a natural number-theoretic function: it is (up to a constant) the Dirichlet convolution of the constant function 1 with itself, at least until you get beyond N. That is, d(n)=\sum_{d|n}1\times 1.

Roughly how big is \sum_{n\leq N}d(n)? This sum is the number of pairs (a,b) of positive integers such that ab\leq N (since d(n) is the number of pairs (a,b) of positive integers such that ab=n). Counting the number of pairs with a=1,2,\dots,N we can crudely estimate this as N+N/2+N/3+\dots, which is roughly N\log N. On the other hand, the Dirichlet convolution with itself of the function that is 1 up to N and 0 thereafter sums to N^2 (since each pair of positive integers (a,b) with a,b\leq N contributes 1 to the sum). Since the two functions are equal up to N, we find that the vast majority of the second function lies beyond the point where it equals d(n).

But let’s not worry about that. On average, d(n) grows like \log n, which is not too fast a rate of growth, so let’s simply take the function d(n) up to N. We now want to prove that every sequence s(n) such that \sum_{n=1}^Nd(n)s(n)^2\geq N\log N has HAP-discrepancy at least C (for some C that tends to infinity with N). Alternatively, we want to find a counterexample, which will induce us to try to improve the function.

An obvious example to try is the usual one, namely the sequence 1,-1,0,1,-1,0,\dots. To see whether this works, we need to estimate the sum of d(n) over all non-multiples of 3 up to N. That equals the number of pairs (a,b) such that neither a nor b is a multiple of 3 and ab\leq N. And that is roughly 4/9 times what you get without the divisibility condition. In other words, it is roughly (4/9)N\log N. So to get \sum_{n\leq N}d(n)s(n)^2 to equal \sum_{n\leq N}d(n) we need to multiply the sequence by 3/2, which gives it a HAP-discrepancy of 3/2, a slight improvement on the \sqrt{3/2} that we obtained with a constant function.

This doesn’t prove much — maybe there are better examples — but it suggests in a weak way that taking Dirichlet convolutions results in improved functions. To be clear what I mean by “improved”, I am now looking for a non-negative function \delta defined on \mathbb{N} such that for large N, or perhaps even all N, if s is any sequence such that \sum_{n\leq N}\delta(n)s(n)^2=\sum_{n\leq N}\delta(n), then s has HAP-discrepancy at least C. The bigger I can get C to be, the better I consider the function \delta to be.

If you take repeated Dirichlet convolutions of the constant function 1, then you get functions d_k, where d_k(n) is the number of k-tuples (a_1,\dots,a_k) of positive integers such that a_1a_2\dots a_k=n. (The function d(n) is equal to d_2(n).) Suppose k is large, N is very large, and s is a sequence such that \sum_{n\leq N}d_k(n)s(n)^2=\sum_{n\leq N}d_k(n). Does it follow that s has large HAP-discrepancy? More ambitiously, does it have HAP-discrepancy that grows exponentially with k, provided that N is large enough?

Back to the rationals.

The following idea is one that feels familiar — I think something similar has come up already. But there’s nothing like a break from a problem to make an old stale idea seem new and fresh, and sometimes the apparent staleness was an illusion. So here it is again.

I’ll begin by creating a function on the diagonal that seems to me to have a sporting chance of being efficiently decomposable into HAPs. Let G (for generating set) be the set \{1,2,\dots,m\}\cup\{1,1/2,\dots,1/m\} and let \phi be the k-fold multiplicative convolution of the characteristic function of G with itself. That is, \phi(x) is the number of ways of writing x as a_1a_2\dots a_r/b_1b_2\dots b_s with r+s=k and all the a_i and b_j being integers between 1 and m.

Before I think about how one might go about decomposing the diagonal “matrix” (that is, a function defined on \mathbb{Q}_+\times\mathbb{Q}_+ that is zero at (x,y) unless x=y) into products of HAP-functions, I want to see whether it looks hard to find a function s such that \sum_x\phi(x)s(x)^2\geq\sum_x\phi_x but s is of bounded HAP-discrepancy.

To have any chance of that, we need to understand a bit about \phi. We can get that understanding by representing every rational as a product of (possibly negative) powers of distinct primes. For the rationals where \phi is non-zero, the primes are all at most m, so we can think of \phi as a function defined on a lattice of dimension \pi(m) (the number of primes less than or equal to m). Then \phi is the k-fold convolution of the characteristic function of G regarded as a subset of this lattice. If k is large enough, then this convolution will look like a large and spread-out Gaussian, which will have the potentially useful property that it is approximately invariant if you multiply (or divide) by an integer between 1 and m.

If \phi has that property, does it rule out some kind of variant of the 1,-1,0,1,-1,0,\dots example? A key feature of that example is that the function in question is zero on multiples of 3, but in \mathbb{Q} there is no such thing as a multiple of 3 (or rather, everything is a multiple of 3 so the concept ceases to make sense).

But that argument is far from conclusive. How about defining a function as follows: given a rational p/q in its lowest terms, let s(p/q) be 0 if either p or q is divisible by 3, and otherwise 1 if p\equiv q mod 3 and -1 if p\not\equiv q mod 3?

Let us think what the values of s are at p/q, 2p/q, etc. If p is a multiple of 3, then all those values are 0 (since q is not a multiple of 3). If q is a multiple of 3^r but not 3^{r+1}, then s(ap/q)=0 except if a is a multiple of 3^r. If p and q/3^r are congruent mod 3, then the values of s(ap/q) go 1,-1,0,1,-1,0,\dots. If they are not congruent mod 3, then they go -1,1,0,-1,1,0,\dots. That last argument worked even if r=0, so we have covered all cases.

What is the density of rationals such that neither numerator nor denominator is a multiple of 3? I haven’t made the question precise, and that allows me to give two different answers. The first answer is that if we pick the numerator and denominator randomly, then the probability that neither is a multiple of 3 is 4/9. Also, the probability that they are coprime is 6/\pi^2, and the probability that they are both coprime and not multiples of 3 can easily be shown by similar methods to be at least an absolute constant. So it seems that our function is supported on a set of rationals of positive density.

But another way of thinking about it suggests that the support of the function has zero density: every rational can be written as a product of powers of primes, and the probability that the power of 3 that we choose is 0 tends to 0 as we choose more and more rationals.

The second answer is, I think, the correct one for us, though the question is a bit misleading. The \phi-density of rationals for which the “3-coordinate” is zero does indeed tend to zero, so if we regard \phi as our measure, rather than something more additive, then we do appear to have ruled out examples that are similar to the 1,-1,0,1,-1,0,\dots example.

Can we use information about the original problem to help us with the dual problem?

We know that for the integers there is not an efficient HAP-decomposition of a diagonal matrix that is 1 for the first N terms and zero afterwards. The proof we have is that if such a decomposition existed, then the sequence 1,-1,0,1,-1,0,\dots would have to have large HAP-discrepancy, which it doesn’t. But can we give a “direct” proof? I’m not quite sure what I mean by that, except that it should not involve the use of a separating functional. Ideally what I’d like to do is use that example and others like it to work out some rather general constraint that would end up saying a precise version of, “If an efficiently decomposable matrix equals 1 on the diagonal, then it must have quite a lot of stuff off the diagonal,” and ideally say a fair amount about that “stuff”.

The hope would be to find a clear obstacle on the dual side to a direct approach to decomposing the N\times N identity matrix, and then to show that when we move to the measure on the rationals defined in the previous section, that obstacle goes away and allows us to do what we wished we could do over the integers. The thing that would make life easier would be the ability to divide anything by small integers.

Actually, maybe it isn’t so important to go for the dual side. Let’s just understand what the existing proof is telling us goes wrong when we try to find an efficient HAP-decomposition. For the decomposition to be efficient, the HAPs we use have to be quite long. And for that to be the case, they must contain roughly as many numbers congruent to 1 mod 3 as they do numbers congruent to 2 mod 3. (They might contain none of either.) Therefore, any HAP product has roughly as many points congruent to (1,1) or (2,2) mod 3 as points congruent to (1,2) or (2,1) mod 3. Let us call these white points and black points, respectively. By roughly I mean that these two numbers differ by at most a constant. In fact, I think they differ by at most 1. But no points on the diagonal are congruent to (1,2) or to (2,1), so if a product of HAPs includes r points on the diagonal that are congruent to (1,1) or (2,2), then it must include at least r-1 more black points than white points off the diagonal.

So, roughly speaking, we can’t build up the non-multiples of 3 on the diagonal without building up a bias towards black points off the diagonal.

Does that matter? Can’t we correct that unwanted bias by adding or subtracting some other HAP products? No we can’t, unless we also cancel out much of the sum on the diagonal.

What rescues us if we deal with rationals instead (or more precisely a portion of the rationals weighted by the function \phi)? Also, how easy is it to find an efficient decomposition so that the weight off the diagonal is comparable to the weight on it? I suspect the answer to the second question is that it is not easy, since we have used just the fact that the sequence 1,-1,0,1,-1,0,\dots has bounded discrepancy, but there are many other examples.

A variant of the HAP problem.

I stopped the previous section prematurely because I was feeling a bit stuck. Instead I would like to think about a modified problem that may well be easier. EDP asks for a high discrepancy inside some HAP. The set of HAPs has the nice symmetry property that if you dilate a HAP then you get another one. This is particularly good over the rationals, when you can also contract.

Now let me ask a slightly vague question and then attempt to make it precise. Can we prove EDP for a different “base collection” of sets? For EDP I’m regarding the base collection as the collection of all sets \{1,2,\dots,m\}. If you look at all dilations of those, you get all HAPs. Since I’m having trouble proving EDP for HAPs, I’d like to try to prove it for a different base collection. Of course, the smaller the collection, and the more closely related it is to the set of HAPs, the better, but for now my priority is to be able to prove something.

I actually have a collection of sets in mind. Let us think of the rationals multiplicatively as an infinite-dimensional lattice, or rather the portion of that lattice that consists of points with only finitely many non-zero coordinates. That is, the point (a_1,a_2,\dots) corresponds to the rational 2^{a_1}3^{a_2}5^{a_3}\dots, the a_i being integers. I’m not sure exactly what I want to do next, so let me be slightly less precise. If we look at the rationals multiplicatively, as we are doing here, then dilation becomes simply translation. What I’d really like is something like an orthonormal basis, but I’d like it to consist largely of functions that are translates of one another. So far that won’t work, but it looks a lot more hopeful if we allow some kind of “spreading out” (which would be the analogue of taking longer and longer intervals). The most natural type of spreading out is, I would suggest, to take repeated convolutions. So a possible question — not necessarily the exact one I’ll end up asking — is this. Suppose we start with the set G=\{1,2,\dots,m\}\cup\{1,1/2,1/3,\dots,1/m\} that we considered earlier. Is it true that every function that has large L_2 norm relative to the measure defined by (multiplicatively) convolving G k times has a large inner product with some (multiplicative) translate of some iterated convolution of G?

This problem should be not that hard since it is purely multiplicative, so we can take logs and make it purely additive. If the answer turned out to be yes, then for any \pm 1-function we would get a large inner product with some iterated convolution of G. If we could nicely decompose that iterated convolution into HAPs, we would then be done, though that seems like quite a lot to ask.

Hmm … I’ve almost instantly run into trouble. It seems that my “multiplicative version” of EDP is just plain trivially false. Consider the function that is 1 for products of an even number of primes and -1 for products of odd numbers of primes. (This is the completely multiplicative function \lambda, with which we are already very familiar.) From a multiplicative point of view, where we think of \mathbb{Q} as an infinite-dimensional lattice, this is putting 1 at “white points” and -1 at “black points”. If we now take any “cuboid”, by which I mean a set of the form \{p_1^{a_1}\dots p_k^{a_k}:r_i\leq a_i<s_i for every i\leq k\}, then the sum of \lambda on that cuboid is either 0, 1 or -1. I was about to say that I couldn’t see an elegant proof of this, but here’s a fairly quick one. If we remove two adjacent slices from the cuboid, we remove an equal number of white and black squares. “Removing two adjacent slices” means reducing s_i by 2, provided s_i\geq r_i+2. If some s_i-r_i is even, then by repeating this operation for that i we can remove all points, which shows that the numbers of white and black points were originally the same. If all s_i-r_i are odd, then we end up reducing the cuboid to a set with just one point, in which case we get \pm 1.

If we multiplicatively convolve any two cuboids, we obtain a function that can be expressed as a linear combination of translates of a cuboid, so its inner product with \lambda will also be small, as will those of all its translates. (What’s more, those small inner products will themselves alternate in sign as you multiply or divide by primes.) So \lambda will have small inner product with everything you build out of products of GPs using multiplicative convolutions.

It could be fruitful to think about why HAPs have a better chance of giving rise to discrepancy than the kinds of multiplicative sets I’ve just considered. In preparation for that, here is a question to which I know the answer. Let A_1,A_2,\dots be a nested collection of subsets of \mathbb{N} such that |A_n|=n for every n and \bigcup_nA_n=\mathbb{N}. Is it possible for there to be a \pm 1 function \epsilon and a constant C such that |\sum_{x\in A_n}\epsilon{mx}|\leq C for every m and every n? In the case that A_n=\{1,2,\dots,n\} for every n this is EDP. But for a general nested collection, it is easy to see that bounded discrepancy is possible for that collection and all its dilates. You just take a completely multiplicative function such as \lambda and choose A_n=\{a_1,a_2,\dots,a_n\} in such a way that \lambda(a_k)=(-1)^k for every k (and such that the a_k are distinct and every integer is equal to a_k for some k). Then the sum \sum_{x\in A_n}\lambda(x) is always either -1 or 0, and multiplicativity ensures that the sum on dilates of the A_n is always either 1, -1 or 0.

The partial sums of \lambda grow (it is believed) like \sqrt{n}, so if we choose our sets A_n greedily — that is, by taking a_n to be the smallest positive integer we have not yet chosen such that \lambda takes the value (-1)^n — then A_n will contain the first n-m integers, where m is around \sqrt{n} (at most), and look a bit like a random selection of integers within around \sqrt{n} of n for the rest of the set. In other words, A_n will be pretty close to the set \{1,2,\dots,n\}, so the system of sets mA_n will be pretty close to the set of all HAPs. This suggests that proving EDP is going to be really quite delicate.

In fact, it will be more delicate still, since we can take a better multiplicative function than \lambda. If we take the unique completely multiplicative function that takes n to 1 if n is congruent to 1 mod 3, to -1 if n is congruent to 2 mod 3, and 3 to -1, its partial sums grow logarithmically, so we can say something similar to the above but with m around \log n rather than around \sqrt{n}.

We can go slightly further with this function. Because it is 1 on the infinite arithmetic progression 1,4,7,\dots and -1 on the infinite arithmetic progression 2,5,8,\dots, we can choose our sets A_n to be intervals up to roughly n plus logarithmically short (non-homogeneous) arithmetic progressions of common difference 3.

Since these sets are cooked up so as to ensure that the function has bounded discrepancy, they might seem not that interesting. But I think they are important, because if we want to find an efficient decomposition of a diagonal matrix into HAP products, we probably need to understand what it is that makes HAPs so much better than HAPs with highly structured tiny logarithmic perturbations at the end. Otherwise we are in danger of taking seriously arguments that would work just as well for statements that we know to be false.

As I write, it occurs to me that we might be able to prove a result that would be quite a bit easier than EDP but nevertheless interesting (and in particular interesting enough to be publishable, unless the answer turns out to be disappointingly easy): that there exists a permutation \pi of \mathbb{N} such that the discrepancy of every \pm 1 sequence on the sets of the form \{m\pi(1),\dots,m\pi(n)\} is unbounded. We know it is not true for all permutations — even ones that in some sense don’t permute by very much — and we suspect that it is true for the identity permutation but find that very hard to prove. What about if \pi is in some sense “fairly random”? For example, what if we chose a large N and randomly permuted the first N integers. That would give us N sets and all their dilates. Could we prove that the discrepancy on that set system is at least \omega(N)? What if we drop the condition that the union of the A_n should be all of \mathbb{N}? What if instead we ask that A_n=\{a_1,\dots,a_n\} for some strictly increasing sequence (a_i)?

I rather like those questions, which makes this a good place to stop the post — for the length of which I apologize.

51 Responses to “EDP24 — an attempt to get back into the diagonal-decomposition approach”

  1. Sune Kristian Jakobsen Says:

    “Problem. Show that if (a_n) and (b_n) are any two sequences of unit vectors in a Hilbert space, then for every constant C there exist HAPs P and Q such that |\langle\sum_{n\in P}a_n,\sum_{m\in Q}b_m\rangle|\geq C.”

    This seems to fail if you choose two orthogonal unit vectors a and b and set the sequence a to be constantly a and the sequence b to be constantly b.

  2. gowers Says:

    After a good night’s sleep I’m losing faith in the questions with which I finished the post. Let me give a rough reason that I think the answer to the last one (about increasing sequences) is yes, but for an uninteresting reason. What I say won’t quite be a proof, but it will be close. Recall first that for every prime of the form 4m-1, there is a p\times p matrix with \pm 1 entries such that each row is the previous row translated by 1 mod p, and such that the inner product of any two rows is -1. (The matrix is simple to define: A_{xy}=1 if and only if x-y is a quadratic residue. If you add an extra row and column filled with 1s, you get the Paley matrix of size 4m, which has orthogonal rows and columns.) Inspired by that, let us take our sequence a_1,a_2,\dots as follows: we let p be a large prime of the form 4m-1 and let a_r be 2^s, where s is the rth quadratic residue mod p. If we dilate this sequence, then looked at along the powers of 2 what we are actually doing is translating it. If we also had some kind of wraparound (which we don’t), then the result would be a whole lot of translates that were all equal to the constant sequence 1/2,1/2,\dots plus contributions that were orthogonal to one another. In that case, we could simply expand any \pm 1 sequence in terms of that orthogonal basis and obtain a discrepancy of around \sqrt{m}. We could then string together a lot of sequences like that for a rapidly increasing sequence of primes p.

    Actually, I think this argument can be turned into a correct one if we simply repeat the sequence a couple of times so that a segment of it has the orthogonality properties I want, but I haven’t checked the details.

    If all that is correct, then we need to impose extra conditions on the sequence (a_n) to make it interesting if we can prove unbounded discrepancy.

    • gowers Says:

      As a PS to that last remark, if it is correct, it also kills off the permutations version, since we can pick a permutation in such a way that \pi(1),\pi(2),\dots,\pi(N) is any sequence we like, and then we can go back and “fill up” much later.

    • gowers Says:

      And a PPS that kills the question off completely. There was no need to use Paley matrices. All we have to do is make sure that for every n and every subset E\subset\{1,2,\dots,n\} there exists r such that for every m\leq n, 2^{r+m} is a term in the sequence if and only if m\in E. In other words, roughly speaking you make sure that the set of powers of 2 you choose contains a copy of every finite set. Then for every \pm 1 sequence, to show that it has discrepancy at least n, pick N so large that all subsets of \{1,2,\dots,2n\} have appeared by 2^N. Then we can dilate the sequence so that it either picks out exactly the 1s in the set 2^{2N+1},\dots,2^{2N+2n} or all the -1s. At least one of those will give us a discrepancy of n.

      The remark just above shows that this deals with the permutations version as well.

  3. gowers Says:

    The comment above and its subcomments dispose of the questions in the last paragraph. The following extra conditions may possibly rescue them. In the permutations case, one can insist that n/2\leq\pi(n)\leq 2n for every n, and in the sequence case one can insist that a_n\leq 2n.

    In both cases, the conditions are there to rule out the appearance of long sequences of powers of 2 (or of any number for that matter). I hope they are a good combination of quite weak from a structural point of view, but strong enough to rule out all counterexamples that use the basic set of ideas that showed that the questions as originally formulated were uninteresting.

  4. Sune Kristian Jakobsen Says:

    “The proof that the diagonal decomposition implies EDP actually proves a much stronger result. It gives us a function d(n)=D_{nn} such that \sum_nd(n)=\infty such that if \epsilon=(\epsilon_n) is any sequence of real numbers (not necessarily \pm 1-valued) with \sum_nd(n)\epsilon_n^2\geq 1, then \epsilon has unbounded HAP-discrepancy.”
    If the d(n)’s sum to infinity, there is a finite subset that sums to at least 1. By setting e_n=1 for all n in this set and 0 for all other n, we get a sequence with bounded discrepancy (since it has only finitely many non-zero terms) that satisfies \sum_nd(n)\epsilon_n^2\geq 1. I don’t know if you meant the last formula to be \sum_nd(n)\epsilon_n^2= \infty?

    What is the simplest non-trivial (that is, C>1) example we know of a decomposition of a diagonal matrix?

    • gowers Says:

      I seem to remember that we basically didn’t have a non-trivial example: I think finding one may be a rather hard problem (related to the fact that there are long sequences with small discrepancy even in the \pm 1 case and therefore quite possibly even longer ones in the two-vector-valued-sequences case). So if we want to obtain partial results, it might be better instead to try to find interesting decompositions of matrices that are “reasonably well concentrated on the diagonal”, whatever that might mean.

    • Sune Kristian Jakobsen Says:

      I thought there was a computer search for such decompositions? Someone used linear programming to find possible values on the diagonal. Didn’t we have a decomposition of the corresponding diagonal matrix?

    • gowers Says:

      I think so, but in order to be able to regard the constant as non-trivial we had to be generous in our definition of HAP — allowing sets of the form \{ad,(a+1)d,\dots,bd\}. I think if you insist on sets of the form \{d,2d,\dots,bd\} then we didn’t find any examples with C>1. That’s a problem for any approach that might hope to use some kind of product construction, since one would like the examples to get better rather than worse when you take products!

    • Sune Kristian Jakobsen Says:

      Ah. So do we have simple non-trivial examples using the generalized HAPs? Or just using any sets? I can’t think of any examples.

    • gowers Says:

      My memory is that even with generalized HAPs we basically didn’t have any examples. But I’ll wait to see whether anyone else remembers the situation better. (In theory one could look it up I suppose.)

  5. Sune Kristian Jakobsen Says:

    \delta cannot be supported on the square-free integers as +,-,-,0,+,+,-,0,+,-,-,0,… have discrepancy 1 and is only 0 on square-divisible numbers (on multiples of 4, actually). In general, as you also noted, the sum \sum_{d\nmid n} \delta(n) must be small for small d (it is easy to see that it must be bounded by d-1 using the sequence +,+,\dots, +,0,-,-,\dots -,0,\dots where every d’th term is 0, but as I recall from last time we worked on EDP, we conjectured (or proved?) it to be O(\sqrt(d))).

    On the other hand, we know that if (a_i) is a sequence with a_i\mid a_{i+1} for all i, then \sum_i \delta(a_i)\le 1, so we cannot choose \delta to be supported on e.g. the factorials.

  6. gowers Says:

    I’ve remembered why the strengthening of EDP that says that the common difference of the HAP can be taken as either a prime or a power of 2 had a chance of being true. (But it seems a bit outrageous, so I would stop short of conjecturing that it is true.) If you want to have bounded discrepancy on all HAPs with common difference an odd prime, then that’s easy. For example, the sequence 1, -1, 1, -1, … will do the job. More generally, so will lots of periodic sequences of even length: if the period is 2m then you need to ensure that for every prime p that divides m the number of residues divisible by p taking the value 1 is equal to the number taking the value -1. An example with m=3 is 1,-1,1,1,-1,-1 repeated over and over again.

    But no periodic \pm 1 sequence can have bounded discrepancy on all HAPs with common difference a power of 2. Indeed, if the period is m and 2^k is the largest power of 2 that divides m, then the restriction of the sequence to the HAP of common difference 2^k is periodic with odd period, so has unbounded sum.

    The Morse sequence is the natural example of a sequence with bounded discrepancy on all HAPs with common difference a power of 2. But the Morse sequence has unbounded discrepancy on other HAPs. (I can’t remember whether we checked this for primes, but I think we did.)

    The fact that the strengthening isn’t obviously false suggests the following question: what sorts of matrices can we make out of products of HAPs of prime common difference? I’ll stop this comment and think about that question in a new comment.

  7. gowers Says:

    I now want to think about what we can do if we take combinations of HAP products where the HAPs have prime common differences. I’ll start with a rather simple case, where we choose weights w(p) for each odd prime and take the sum \sum_pw(p)A_p\otimes A_p, where A_p is 1 on all multiples of p and 0 everywhere else. The value of this sum at the point (x,y) is \sum_{p|(x,y)}w(p). Can we choose the weights w(p) in a way that makes this sum “nice”?

    One set of weights that springs to mind straight away is to take w(p)=\log p. In fact, I’d rather take all HAPs with prime power difference and take w(p^k)=\log p. This is using the von Mangoldt function \Lambda(n) (defined to be \log p at p^k and 0 at all integers that are not prime powers). That is, we are taking the sum \sum_{d|(x,y)}\Lambda(d), which is easily checked to equal \log((x,y)).

    The level sets of this function are sets of the form \{(x,y):(x,y)=d\} if you’ll excuse the confusing notation (the first (x,y) is an ordered pair and the second the hcf of x and y). They are of some moderate interest in that pairs on the diagonal tend to have much bigger higher common factors than pairs off the diagonal.

    The matrix we have obtained is not diagonal, but we can still ask how efficient the decomposition is. The total mass on the diagonal up to N is roughly \sum_{p^k\leq N}N\log p/p^k. The number of powers of p up to N is roughly \log N/\log p, but … hmm … that’s killed by the p^{-k} term. Up to a constant I think this is \sum_{p\leq N}N\log p/p, which is around N\sum_{m\leq N}\log m/(m\log m)=N\log N. (I’m not certain that was correct.) As for the sum of the coefficients, the sum over all powers of p is roughly \log p(\log N/\log p)=\log N, so the total sum is roughly N\log N/\log N=N. So it looks as though we would have an efficient decomposition if only we could pretend that the off-diagonal terms didn’t exist.

    What can we learn from a decomposition of a non-diagonal matrix M into HAP products? If M=\sum_i\lambda_iu_i\otimes v_i, then we can still look at \langle \epsilon,M\epsilon\rangle in two different ways. The fact that the sum of the coefficients above was at most N tells us that if \sum_{x,y}M(x,y)\epsilon(x)\epsilon(y) is at least CN, then the sequence \epsilon must have HAP discrepancy at least C^{1/2}. Recalling that M(x,y)=\log((x,y)), that tells us that for a sequence \epsilon to have bounded discrepancy on all HAPs with prime-power common difference, the quantity N^{-1}\sum_{x,y\leq N}\log((x,y))\epsilon(x)\epsilon(y) must be bounded.

    I’ll stop this comment here, because I have no idea whether that is a difficult condition for a sequence to satisfy.

    • gowers Says:

      A quick remark about that last question — one could consider attacking it using Moses Charikar’s idea. That is, one would like to try to subtract some mass from the diagonal and still have a matrix that is non-negative definite. Can that be done?

    • Sasho Nikolov Says:

      Just the obvious remark: since a diagonally dominant matrix is PSD, one question to ask is how much can we subtract from the diagonal while the remaining matrix is still diagonally dominant.

      So instead of a decomposition of a diagonal matrix, it is enough to look for a decomposition of a strongly diagonally dominant matrix, i.e. a decomposition of a symmetric matrix M where each diagonal entry M(x, x) satisfies M(x, x) - f(x) \geq \sum_{y > x}{|M(x, y)|} for f(x) \geq 0. The goal is to maximise \sum_xf(x) with respect to the coefficients in the decomposition.

  8. gowers Says:

    It occurs to me that a useful thing we might do is describe all matrices that can be efficiently decomposed into products of HAPs of common difference 1. If we understood that properly, then for the primes/powers-of-2 question, we would be in a position where we needed to choose just one matrix (of a kind we understood) for each possible common difference. There would be a similar benefit for EDP itself, though possibly one that would be harder to use.

    Let me be clear what I mean by an “efficient decomposition” in this context. I mean a decomposition M=\sum_i\lambda_iu_i\otimes v_i such that \sum_i|\lambda_i| is significantly less than \sum_xM_{xx}. That is, the sum of the absolute values of the coefficients of the products of HAP-functions is significantly less than the sum along the diagonal of the resulting matrix. If u_i and v_i are both the characteristic function of P, then the contribution to the diagonal is \lambda_i|P|, so a rough way of thinking about efficient decompositions is to say that we mostly use HAPs of length significantly greater than 1, and the sum of the positive contributions is not cancelled out by the sum of the negative contributions.

    If all the HAPs have common difference 1, then one thing we know about the resulting matrix M is that its inner product will be small with any matrix of the form A_{xy}=\epsilon(x)\epsilon(y), where \epsilon is a sequence that has bounded partial sums. Note that this very strongly rules out M being diagonal. However, it doesn’t rule out M living in a band around the diagonal: indeed, if we take a sum of the form \sum_i\lambda_iu_i\otimes u_i where each u_i is an interval of length m, then the resulting matrix will be efficiently decomposed and will live in a band of width m around the diagonal.

    As an exercise, can we characterize infinite sequences that have inner product at most C with all sequences that have partial sums bounded above (in absolute value) by 1? It seems to say something about the amount of oscillation that’s allowed. Indeed, a necessary condition is that for any m_1<n_1<m_2<n_2<\dots the sum \sum_i|a_{n_i}-a_{m_i}| should be at most C. In fact, what it seems to boil down to is that the sum \sum_n|a_{n+1}-a_n| should be bounded.

    The condition for matrices rather than sequences will probably be a bit more complicated, but I think it should be possible to work it out.

  9. gowers Says:

    Come to think of it, maybe the condition for matrices is the obvious generalization of the condition for sequences. Could it be that a matrix M can be efficiently decomposed as a linear combination of products of intervals (of integers) if and only if the sum of all |M(x+1,y+1)-M(x,y+1)-M(x+1,y)+M(x,y)| is small? If M is itself a product of intervals, then that sum evaluates to 4 (a contribution of 1 coming from each corner of the rectangle), so if M is a linear combination of such products then the sum is at most 4 times the sum of the absolute values of the coefficients. So the condition is necessary. But it remains to see whether it is sufficient: can every matrix for which that sum is bounded be decomposed efficiently into products of intervals?

  10. gowers Says:

    To try to tackle that question, let me do the 1D case first, since I haven’t actually done so. If we have an arbitrary sequence, we can express it as a sum of initial segments of \mathbb{N} using something like partial summation. If the sequence is a_1,a_2,\dots,a_N, and I_n is the set \{1,2,\dots,n\} (or rather its characteristic function), then the sequence can be written as a_NI_N+(a_{N-1}-a_N)I_{N-1}+\dots+(a_1-a_2)I_1.

    In fact, that isn’t even remotely clever: we just observe that the sequence that is 1 in the nth place and 0 everywhere else can be written as I_n-I_{n-1} and see what we get.

    Can we do something similar in 2D? Let’s write I_{m,n} for the set of all (x,y)\in\mathbb{N}^2 such that x\leq m and y\leq n. Then I_{m,n}-I_{m-1,n}-I_{m,n-1}+I_{m-1,n-1} takes the value 1 at (m,n) and 0 everywhere else.

    If we now take a function f in two positive integer variables, then the sum we end up looking at is \sum_{m,n}f(m,n)(I_{m,n}-I_{m-1,n}-I_{m,n-1}+I_{m-1,n-1}, which works out as \sum_{m,n}(f(m,n)-f(m,n+1)-f(m+1,n)+f(m+1,n+1))I_{m,n}. In particular, if the sum \sum_{m,n}|f(m,n)-f(m,n+1)-f(m+1,n)+f(m+1,n+1)| is at most C, then there is a decomposition of f into products of intervals, with sums of absolute values of coefficients at most C.

  11. gowers Says:

    I was aware while writing that that I had done precisely the same calculation when we last worked on EDP. Indeed, I feel myself being drawn inexorably towards an approach we had to the problem back then, which is worrying, because that approach seemed pretty hard to make work.

    The observation back then was that if we work in the rationals, then we can take any decomposition of a diagonal map that we might have and average it with a whole lot of dilates of the same decomposition. In that way, we can obtain a decomposition of a diagonal matrix that is approximately invariant under multiplication or division by small positive integers. If we look “locally”, then we can pretend that we have a decomposition of the identity over the rationals (that is, the function \delta defined on \mathbb{Q}^2 that’s 1 when x=y and 0 otherwise).

    What this is effectively saying (I think) is that we should be able to take a single matrix M made out of products of intervals and then add up all its dilates. As long as M is absolutely summable, this will give us a well-defined infinite matrix indexed by \mathbb{Q}. Its value at (x,y) will be M(x,y)+M(2x,2y)+M(3x,3y)+\dots. So what we’d like is a matrix with the two-dimensional bounded-total-oscillation property from the last comment that sums to zero along all “rays” in \mathbb{Z}^2. This feels like a much easier thing to look for than a general decomposition of a diagonal matrix into products of HAP-functions, but is it? I’ll think about that in a new comment.

  12. gowers Says:

    I need to correct what I said above. To work out the value at (x,y) we first write (x,y) as (rz,sz) with r and s coprime positive integers, and then the value at (x,y) is M(r,s)+M(2r,2s)+\dots. So that has to be zero whenever r and s are coprime and not both equal to 1.

    This won’t work, but I’d nevertheless like to think about a function of the form M(x,y)=x^{-s}y^{-\overline{s}} for some complex s. That gives us x^{-s}y^{-\overline{s}}\zeta(t), where \zeta is the Riemann zeta function and t=s+\overline{s}. That is completely hopeless, since whether or not it is zero appears to have no significant dependence on the pair (x,y), and in any case \zeta(t)\ne 0.

    What was the thought that led to this hopeless idea? It was roughly this. If you take a function of the form \exp(i\lambda(x-y)) for, say, an irrational \lambda, then as long as x\ne y, the average along all multiples of (x,y) will be 0. Meanwhile, if x=y, then the average is 1. However, this function fails to do the job we want, because its total 2D oscillation is infinite.

    What I’d like is something similar but with a matrix whose entries are absolutely summable. To prove that the sum along non-trivial rays was zero, I would ideally like something like a multiplicativity property that would allow me to use essentially the same argument for all rays — apart, that is, from the main diagonal where we wouldn’t get zero because we wouldn’t have the oscillation.

    There may be a good number-theoretic reason for this being impossible, but suppose we had a completely multiplicative \pm 1-valued function \mu such that the Dirichlet series \sum_n\mu(n)n^{-s} had a zero at some real number s>1. What could we say about the function f(m,n)=\mu(m)\mu(n)(mn)^{-s}? If we summed it along the diagonal, we’d get \sum_n\mu(n)^2n^{-2s}, which would be positive. But if we summed it along multiples of (x,y) for some coprime pair, we would get \sum_n\mu(nx)\mu(ny)(n^2xy)^{-s}=\mu(x)\mu(y)(xy)^{-s}\sum_n\mu(n)^2n^{-2s}. This is again hopeless, since again it doesn’t in fact distinguish between points on the diagonal and points off it.

  13. gowers Says:

    On further reflection, it is silly to look at functions of the form f(x,y)=g(x)g(y) with g completely multiplicative, since the sum along multiples of (x,y) will be g(x)g(y)(g(1)^2+g(2)^2+\dots) which will not pick out the diagonal.

    What if g is not multiplicative? Is it too simple to look for a rank-1 function f(x,y)=g(x)\overline{g(y)}, say? Then the sum over multiples of (x,y) is g(x)\overline{g(y)}+g(2x)\overline{g(2y)}+\dots. For that to work, we would need the sequences s_x=(g(x),g(2x),g(3x),\dots) to be orthogonal to each other whenever x\ne y, and we would also want the function g to have bounded total oscillation and to tend to zero. That seems to be equivalent to being absolutely summable.

    Is there an absolutely summable sequence a_n defined on the positive integers such that for any pair of coprime positive integers x and y the sum \sum_na_{nx}\overline{a_{ny}} is zero? It seems like a lot to ask, and I think it might be reasonably easy to prove that no such sequence exists.

    • gowers Says:

      Realized later: having bounded total oscillation and tending to zero is not equivalent to being absolutely summable (which is good news for a number of reasons). For example, if (a_n) has bounded total oscillation and tends to zero and (b_n) is obtained by taking the first n_1 terms to equal a_1, the next n_2 terms to equal a_2, and so on, then (b_n) has bounded total oscillation and tends to zero. Also, any decreasing sequence that tends to zero trivially has bounded total oscillation.

      I think what was in the back of my mind was that if you want to choose moduli such that choosing even quite random-looking signs results in bounded total oscillation, then you’d better make the moduli have a finite sum.

  14. gowers Says:

    Although the condition on the sequence is pretty strong, it might be worth at least attempting to obtain it with the help of a greedy algorithm or just-do-it argument. For example, perhaps one could fix some s and take g(x) to be of the form \theta(x)x^{-s} for some bounded (maybe even \pm 1-valued) sequence \theta. If s was only just bigger than 1, then it is just conceivable that one could keep track of all the sums g(x)\overline{g(y)}+g(2x)\overline{g(2y)}+\dots that are required to be zero, and ensure that they are all tending to zero.

  15. gowers Says:

    How easy is it for a non-trivial sequence a_n to be absolutely summable and to have the property that \sum_na_{dn}=0 for every d? For that it would be enough to find a completely multiplicative \pm 1-valued sequence \epsilon and a real number s>1 such that \sum_n\epsilon(n)n^{-s}=0. And by the intermediate value theorem, it would be enough to find s such that that sum is negative, since it will tend to 1 as s\to\infty.

    But let’s try a greedyish algorithm. I’ll start with a_1=1. To compensate for that I’ll take a_2=-1. Now I’m fine for d=1 but not so fine for d=2. To compensate for a_2 I could take a_4=1, but now I’ve messed up d=4, as well as d=1 again. That’s more or less inevitable, but if I keep going like this I’m not going to get an absolutely summable sequence.

    So to make things a bit more hopeful, let me start with a_1=1 and then take a_2=a_3=-1/2. The advantage of this is that I can correct the d=2 and d=3 problems by taking a_6=1/2. However, let me learn from my earlier mistakes and take a_6=a_{12}=1/4. Now I have problems with d=1 (so far the sum is 1/2), d=6 (so far the sum is 1/2) and d=12 (so far the sum is 1/4). As for the sum of the absolute values, I’ve got up to 1+1/2+1/2+1/4+1/4. I must stop for today, but I have the feeling that pursuing a construction like this I can get an example. Of course, it’s just a 1D example whereas I’m looking for something 2D-ish. But it might be a useful start.

  16. gowers Says:

    Another little thought. Suppose that s is a zero of the Riemann zeta function. What can we say about the total oscillation of the sequence 1^{-s},2^{-s},\dots? If s=1/2+it, then n^{-s}=n^{-1/2}\exp(it\log n). The sequence spirals in towards zero. Speaking roughly, there will be some number \alpha>1 such that between successive powers of \alpha the sequence has done one rotation. So the total oscillation is going to be something like 2\pi(\alpha^{-1/2}+\alpha^{-1}+\alpha^{-3/2}+\dots). In particular, it is finite.

    And now, if we sum along multiples of d we get d^{-s}\zeta(s)=0.

  17. gowers Says:

    If a series sums to zero and has bounded total oscillation, must it converge absolutely? I don’t think so. Let’s take the series \sum_n\epsilon(n)/n, where \epsilon(n)=\pm 1 and changes sign at n_1,n_2,n_3,\dots. This series doesn’t converge absolutely. The total oscillation is, up to an additive constant, n_1^{-1}+n_2^{-1}+\dots. To ensure that the sums \sum_{n=n_k+1}^{n_{k+1}}n^{-1} tend to zero as k\to\infty, we need n_{k+1}=\alpha_kn_k for some sequence \alpha_k that tends to zero. That is, the n_k must grow more slowly than exponentially. But there is no problem in doing that and getting the sums of their reciprocals to be bounded. For instance, we can take n_k=k^2. That gives us a series that converges. Adding an extra term on to the beginning, we can get the sum to be zero.

  18. gowers Says:

    I’ve just remembered something from the last time we tried EDP, which suggests that one has to be a little bit careful with this approach. It’s that it is not hard to give a non-trivial example of a linear combination f of products of intervals, with absolutely summable coefficients, such that \sum_nf(nx,ny)=0 whenever (x,y)=1 and x and y are not both 1. The rough idea is this. First you take the interval \{1,2,\dots,N\} for some large N and multiply it by itself. You now have N 1s on the diagonal and N^2-N 1s off the diagonal. Now pick a pair (x,y) of coprime integers that are at most N. Suppose that the sum along multiples of (x,y) is k. We can make it zero by picking very long intervals I and J in such a way that I\times J contains m multiples of (x,y), and attaching a coefficient of -k/m to this product. Moreover, we can make m as large as we want, so we can make the coefficient as small as we want.

    Once we have made this correction, we can go on and correct other pairs. Moreover, once we have ensured that the sum along multiples of (x,y) is zero, we can be careful to avoid those multiples in all future products of intervals that we use — just by making those intervals start a very long way away from 0.

    So in a rather trivial way we can find an infinite matrix that’s a linear combination of HAP products with long HAPs and coefficients whose absolute values sum to at most 1, and we can do so in such a way that the sum along the diagonal is at least C and the sum along multiples of (x,y) is 0 whenever x and y are coprime and not both 1.

    I remember getting very excited about this, and then realizing that it doesn’t prove anything. I need to check why it doesn’t imply EDP, since that will make it necessary to impose an extra condition on the matrix M.

  19. gowers Says:

    OK, suppose we have an infinite matrix M, indexed by the positive integers, with the following properties.

    (i) M can be written as \sum_i\lambda_iu_i\otimes v_i, where u_i and v_i are characteristic functions of intervals of integers and \sum_i|\lambda_i|\leq 1.

    (ii) \sum_nM(n,n)\geq C.

    (iii) If x and y are coprime and not both equal to 1, then \sum_nM(nx,ny)=0.

    Let me try to deduce EDP from the existence of M. Barring a major surprise, I will fail, but I want to understand why.

    Incidentally, there are two reasons that it would be extraordinary if the proof worked. The first is that I realized last time that it didn’t work. But that on its own is not a good enough reason — it’s easy to make mistakes when deciding that something can’t work. A better reason is that there is something too simple about this approach. A solution to EDP ought to involve more work, and I can’t escape the feeling that if an approach like this worked, then it would prove a number of other statements that we know to be false. (However, I don’t have a way of making that hunch precise.)

    The first thing I’m going to do is create a matrix A out of M, this one indexed by the positive rationals. To do this, for each rational q define M_q by the formula M_q(x,y)=M(x/q,y/q), and then let A=\sum_{q\in\mathbb{Q}_+}M_q. If z is the largest rational that goes into both x and y, then (x,y)=(rz,sz) for a pair r and s of coprime integers. Then M_q(x,y)=M(rz/q,sz/q), which is non-zero only if z/q is an integer (since both rz/q and sz/q are integers and z/q is an integer combination of them). The values of q for which z/q is an integer are z, z/2, z/3,\dots, where M_q(x,y) takes the values M(r,s), M(2r,2s), M(3r,3s),\dots. These sum to zero unless x=y, by property (iii) of M, while if x=y, so r=s=1, they sum to a constant greater than C.

    What we have here is a way of expressing the identity matrix indexed by \mathbb{Q}_+ as a linear combination of HAP products with coefficients that are in some sense “small”. The sense in which they are small is that each matrix M_q contributes more to the diagonal than it does to the sum of the absolute values of the coefficients, by a factor of at least C.

    The hope was to use some kind of limiting argument to show that this clean and tidy infinite situation could be approximated by a less clean and tidy large finite situation. It was there that the trouble arose. I’ll start a new comment to think about it.

  20. gowers Says:

    Let’s suppose I’ve got a \pm 1-valued function \epsilon defined on \mathbb{Q}_+. Ah, here’s where things got a bit hairy. As usual, let’s calculate \langle\epsilon,A\epsilon\rangle. On the one hand, since A is just C times the infinite identity matrix (indexed over \mathbb{Q}_+) we get \infty. On the other hand, if \epsilon has discrepancy at most D on every HAP, then the inner product \langle\epsilon,u\otimes v\epsilon\rangle is at most D^2 for any two HAP functions u and v, so the contribution to \langle\epsilon,A\epsilon\rangle from each M_q is at most D^2. That is, |\langle\epsilon,M_q\epsilon\rangle|\leq D^2 (by property (i) of the matrix M).

    So, speaking extremely loosely, on the one hand we get a contribution of C for every positive rational, since the diagonal contribution is \sum_{q\in\mathbb{Q}_+}C, and on the other hand we get a contribution of at most D^2 for every positive rational, since the sum is \sum_{q\in\mathbb{Q}_+}\langle\epsilon,M_q\epsilon\rangle.

    It would be great if this were some kind of contradiction when D^2<C, but clearly for that we need to go into a lot more detail about how things tend to infinity. I’ll turn to that in a new comment.

  21. gowers Says:

    One easy way to make everything finite is to take for \epsilon not a function that is \pm 1-valued everywhere, but rather a function that is \pm 1-valued on some very large finite set of positive rationals. To give this any chance of working, we’d want the set to be “approximately closed” under multiplication by rationals with small numerator and denominator.

    We know how to define such sets, but how big do we want “small” to be in this context? Here is where it gets problematic. One might think that it would be a good idea if numerators and denominators were allowed to go up to some N such that in some useful sense “most of M” is defined on \{1,2,\dots,N\}^2. But what does “most of” mean here? It can’t refer to the sums of the absolute values of the coefficients \lambda_i, since we can make all but the first one arbitrarily small.

    At this point one notices that even if the coefficients of the HAP products are small, the HAP products themselves are large in some pretty natural norms. For example, the Hilbert-Schmidt norm of m^{-1}P\otimes Q, if |P|=|Q|=m, is 1, and we were in a situation where the length of the HAPs we needed to take was proportional to the reciprocal of the coefficient we attached to the product.

  22. gowers Says:

    I don’t want to throw away these ideas completely, however. Let’s make sure everything is finite by looking at a very large positive integer N, and let’s go back to EDP where the HAPs have to have common differences that are either primes or powers of 2.

    I’ll say that a matrix M has total oscillation at most C if \sum_{x,y}|M(x+1,y+1)-M(x,y+1)-M(x+1,y)+M(x,y)|\leq C. I’ll also call a matrix M' a dilate of a matrix M if M'(dx,dy)=M(x,y) and M'(u,v) is zero unless both u and v are multiples of d.

    I would like to find a matrix M_d for each allowable common difference (that is, a prime or a power of 2) such that

    (i) each M_d is an m_d\times m_d matrix, where m_d=\lfloor N/d\rfloor;

    (ii) the sum of the total oscillations of the M_d is at most 1;

    (iii) if M_d' is the dilate of M_d by a factor d, then \sum_dM'_d is a diagonal matrix with trace at least C.

    Now this looks really pretty hard if almost all the d have to be prime, so I would like to try to argue that it cannot be done. If it can’t, then we might manage to find a very interesting sequence with bounded discrepancy (for one of the modified problems). If it can, then we’re in great shape for EDP.

  23. gowers Says:

    Here’s the intuitive, but very possibly wrong, reason that it seems to me not to be possible to find a set of matrices of the above form. If each M_d has low total oscillation, then there ought to be a fairly large rectangle inside which all the M_d' are roughly constant (or rather, roughly constant when both coordinates are multiples of d and zero elsewhere).

    Now let’s just look at primes and let’s suppose that the M_d are actually constant matrices. Then the value of the sum of the M_d' at (x,y) can be expressed by a formula of the form \sum_{p|(x,y)}\lambda_p. But that expression just doesn’t look as though it can be anything like constant: for instance, if we restrict attention to small primes, then in any large rectangle we can find pairs x and y such that the highest common factor is a product of all small primes for which \lambda_p is large and positive, and other pairs that are coprime.

    The weaknesses in that argument are that larger primes do exist, and they make things more complicated, and also that, as I noted earlier, one can impose quite a lot of oscillation on, say, the sequence n^{-1}, while still having a bounded total oscillation. So the large rectangle doesn’t obviously exist.

  24. gowers Says:

    I’ve just realized something else, which makes the whole discussion more complicated. I was implicitly assuming that we would try to decompose a diagonal matrix as a linear combination of products of HAPs with the same common difference. But if we do that, then for every prime p there are lots of points (x,y) that are contained only in products where the common difference is p, since a positive proportion of all points (px,py) have highest common factor p.

    I think this means that for the primes and powers-of-two problem one basically has to take products of HAPs with different common differences.

  25. gowers Says:

    I want to test that last claim more carefully. Suppose that we cannot represent any diagonal matrix of trace at least C as a linear combination of products of HAP functions where the two HAPs in each product have the same common difference that belongs to a set D and the sum of the absolute values of the coefficients is at most 1. The set of diagonal matrices of trace at least C is convex, as is the set of all linear combinations of products of HAP functions with the properties just specified, so there must be a separating functional \phi, which we can regard as another matrix. If \langle\phi,M\rangle\geq 1 for every diagonal matrix of trace at least C, then every diagonal entry of \phi must be at least C^{-1}. But also, no two diagonal entries can be distinct, because then we could add something to one diagonal entry of M and subtract the same amount from another, in such a way as to make the inner product with \phi whatever we like. So the diagonal entries of \phi must all equal some constant, and that constant must be at least C^{-1}.

    Now let’s think about what it says about \phi if \langle\phi,M\rangle\leq 1 for every linear combination of products of HAP functions, where the two HAPs in each product have the same common difference that belongs to a set D and the sum of the absolute values of the coefficients is at most 1. That will be the case if and only if it is the case for each individual product. So we are asking for |\sum_{x\in P,y\in Q}\phi(x,y)|\leq 1 whenever P and Q are HAPs that both have common difference d, where d is required to belong to D.

    This is a kind of 2D version of EDP. If we scale things up, then it asks the following. Do there exist a constant C and a function f:\mathbb{N}\to\mathbb{N} such that

    (i) for every x, f(x,x)=1;

    (ii) for every d\in D and every pair P,Q of HAPs of common difference d, the sum \sum_{(x,y)\in P\times Q}f(x,y) has absolute value at most C?

    In particular, do these exist if D is the set of all primes or powers of 2? If so, then for that problem we are forced to consider products of HAPs of different common differences.

    Maybe this is an interesting problem to add to our collection of EDP-related questions, but I think there may turn out to be a fairly simple example that has the given properties.

  26. gowers Says:

    Hmm, there is indeed a reasonably simple example. Start with a periodic doubly infinite sequence (a_n) of period 9 with the following properties.

    (i) a_0=1.

    (ii) \sum_{n=1}^9a_n=0.

    (iii) a_0+a_3+a_6=0.

    An example is the sequence where a_0,\dots,a_8 are 1,0,0,-1,0,0,0,0,0. The discrepancy of this sequence along any HAP of common difference that is not a multiple of 9 is at most 1. Now let us define f(m,n) to be a_{m-n}. Then we certainly have property (i) from the previous comment. How about property (ii)? I’m fairly sure we get it with C at most something like 6, for all d that are not multiples of 9, that is.

    Here’s a quick proof. For every m and n the sums \sum_{r=0}^8f((m+r)d,nd) and \sum_{r=0}^8f(md,(n+r)d) are both zero. But the sum over P\times Q can be decomposed into a sum of these zero sums plus the sum over a product of HAPs of length at most 8. So the discrepancy is bounded (by a constant I can’t be bothered to work out — it’s trivially at most 64 but that can be improved quite a lot; in fact, it’s trivially at most 16; in fact, it’s trivially at most 8, but even that isn’t best possible).

    This shows that we can’t use products with the same common difference if D consists only of non-multiples of 9. I have a strong suspicion that we can do the same for non-multiples of m for an arbitrary m. All we need is numbers a_0,\dots,a_{m-1} such that a_0=1 and such that for every d that divides m apart from m itself we have a_0+a_d+\dots+a_{m-d}=0. It is trivially possible to satisfy those equations. Just choose the numbers arbitrarily except that each time you reach a_{m-d} for some proper factor d of m make sure that it cancels out the sum a_0+a_d+\dots+a_{m-2d}.

    This shows that if we want to decompose a diagonal matrix efficiently into products of HAPs of the same common difference, and if we insist that the differences all belong to some set D, then D must contain multiples of every positive integer (or every small positive integer if we are thinking more quantitatively).

    But we’re still left with a 2D question related to EDP that might turn out to be interesting. Let f:\mathbb{N}^2\to\mathbb{R} be an arbitrary function such that f(n,n)=1 for every n. Is it true that for every C there must exist HAPs P and Q with the same common difference such that |\sum_{(x,y)\in P\times Q}f(x,y)|\geq C?

    If f(x,y) is forced to be of the form g(x)g(y), then this question reduces to EDP. Since we are looking at a far more general class of functions, and since EDP appears to be “only just true”, it seems highly likely that the answer to the above question is no. I just haven’t seen it yet.

  27. gowers Says:

    Suppose we restrict attention in the problem just discussed to functions f(m,n) of the form g(m-n). Let P=\{d,2d,\dots,ad\} and Q=\{d,2d,\dots,bd\}. Then \sum_{(x,y)\in P\times Q}f(x,y)=\sum_{(x,y)\in P\times Q}g(x-y). We can write this as \sum_{u=1}^a\sum_{v=1}^bg(d(u-v)). This equals \sum_z\phi_{a,b}(z)g(dz), where \phi_{a,b} is the convolution of the characteristic functions of the intervals \{1,\dots,a\} and \{1,\dots,b\}.

    This gives us quite a strange EDP-like problem. We have a class \Phi of functions \phi_{a,b,d} (d being the amount by which we dilate \phi_{a,b}) and we’d like to find a function g, defined on \mathbb{Z}, that has a bounded inner product with every function in \Phi. Of course, something has to force g to be “large” (so we can’t just take the zero function). At first, the condition appears to be ludicrously weak — all we ask is that g(0) should equal 1 — but it isn’t weak because the functions \phi_{a,b,d} are not bounded at zero.

    As ever, we can dualize. It will be impossible to find such a function g if one can write the function \delta_0 (which takes the value 1 at 0 and 0 everywhere else) as a linear combination of the functions \phi_{a,b,d} in such a way that the absolute values of the coefficients can have arbitrarily small sum.

    Another way of looking at the dual problem is that we try to find an efficient matrix decomposition into HAP products where the common differences in each product are the same, but instead of asking for the resulting linear combination to be a diagonal matrix with large trace, we ask merely that the matrix has large trace and that all diagonals apart from the main diagonal sum to zero.

  28. gowers Says:

    Actually, I don’t see that taking HAPs with different common differences helps all that much. Suppose we have our numbers a_0,\dots,a_{m-1} extended periodically as in the previous comment but one and, as then, we define f(x,y) to be a_{m-n}. If d and d' are not multiples of m then for every (x,y) we have f(x,y)+f(x+d,y)+f(x+2d,y)+\dots+f(x+(m-1)d,y)=0 and f(x,y)+f(x,y+d')+\dots+f(x,y+(m-1)d')=0. Therefore, if P and Q are HAPs with common differences d and d', we can subtract from \sum_{(x,y)\in P\times Q}f(x,y) a whole lot of zero contributions until we’re left with the sum over a product of HAPs of length less than m. So the discrepancy is bounded.

    If that is correct, then we can set m=6 and prove that there is no efficient decomposition of the identity into products of HAPs of prime-power length. That ought to prove that some generalization of EDP is false for such HAPs. I’ll try to work out the details — I’m not quite sure that this is correct.

  29. gowers Says:

    Looking back, I see that the proof that the nonexistence of a HAP-product decomposition implies the non-symmetric vector-valued EDP is very simple but also slightly strange. You take a separating functional \phi as above, thought of as a matrix with 1s down the diagonal. You then take your two sequences of vectors to be the rows of the matrix and the unit vector basis. Then we have that \langle u_i,v_i\rangle=1 for every i and that \langle\sum_{i\in P}u_i,\sum_{j\in Q}v_j\rangle is equal to the sum of the matrix values over P\times Q. What’s odd about this is that the norms of the v_i tend to infinity, so we can’t use compactness. So it looks as though we can find, for any N, sequences of vectors u_1,\dots,u_N and v_1,\dots,v_N such that \langle u_i,v_i\rangle=1 for every i and \langle\sum_{i\in P}u_i,\sum_{j\in Q}v_j\rangle has modulus at most 100 (say) for every pair of HAPs P and Q with common difference not a multiple of 6. But it doesn’t seem to follow obviously from that that we can find infinite sequences with the same property.

  30. gowers Says:

    Maybe that problem doesn’t exist actually. Let e_0,\dots,e_5 be the standard basis of \mathbb{R}^6. Suppose we define u_i to be e_i and v_j to be \sum_{s=0}^5a_se_{s+j}. (All indices are mod 6.) Then \langle u_i,v_j\rangle=a_{i-j} for every i,j, exactly as we want. In other words, the periodicity allows us to collapse everything down to six dimensions.

    So I’m pretty sure that is a very simple example that shows that the non-symmetric vector-valued EDP relies strongly on taking HAPs of every common difference. Or at least, you have to have multiples of every positive integer amongst your common differences.

    That doesn’t answer the question about whether EDP is true for common differences that are primes or powers of 2, but it says that a 6D generalization is false.

    It is genuinely unclear to me whether the 6D counterexample should lead one to believe that there is a 1D counterexample. In one dimension there is a parity argument (in any \pm 1 sequence of odd length the number of 1s cannot equal the number of -1s) that has no counterpart in higher dimensions. Maybe that is just a superficial fact that stops a certain example from working. But maybe all examples have to be “essentially periodic” in some sense, in which case there is a genuine distinction between 1D and higher dimensions.

    Perhaps we should try to find a complex example …

  31. gowers Says:

    Hmm … suppose you take a_n=\exp(2\pi in/6). Then if m is not a multiple of 6, then the discrepancy along HAPs (or even APs) of common difference m is at most 2. So the primes and powers-of-2 version of EDP does indeed fail in the complex case (as does the prime-powers version).

    What I think this (embarrassingly simple) observation shows is that if we want to prove EDP for primes and powers of 2, then we are forced to use in a strong way the fact that the sequences are \pm 1-valued: we can’t use matrix decompositions to prove the result, since they prove more general results that are false.

    And what that shows — at least if you believe that matrix decompositions are the most promising approach — is that we are not likely to make EDP easier if we try to prove a stronger result by restricting the common differences that are allowed. It seems to be important to use all common differences.

  32. gowers Says:

    Backtracking a bit, what about the earlier rather big strengthening of EDP that I was wondering about, if we don’t restrict the possible common differences? The question was this. Suppose you have a constant C and a function f:\mathbb{N}^2\to\mathbb{R} such that f(x,x)=1 for every x. Must there exist HAPs P and Q with the same common difference (but if that’s too much to ask, then I’m happy to drop that condition) such that |\sum_{(x,y)\in P\times Q}f(x,y)|>C?

    On writing that, I have suddenly had an idea for a counterexample for the same-common-difference version. I’ll describe it very roughly and think about it properly tomorrow. The rough idea is to paint \mathbb{N}^2 with diagonal stripes of constant width and alternating colours (the colours being 1 and -1), but at an angle whose tangent is a badly approximable rational. The hope would be that if P and Q have the same common difference, then …

    No, I don’t think that works. Given any two real numbers \alpha and \beta we can find an integer d such that \alpha d and \beta d are both very close to an integer, and I think that fact can be used to kill off the idea I had.

    Actually, I think a counterexample to this can be turned into a counterexample to the non-symmetric vector-valued EDP with the added condition that P and Q have the same common difference. So maybe I believe that the answer is yes (but possibly without the same-common-difference condition). It also feels like a friendlier formulation of the problem, and one where it might be interesting to do some computer searches.

    Just in case anyone’s reading this (it feels a bit monomathic, so maybe nobody is), here’s a search that would interest me a lot. How large a \pm 1 matrix, or even just a matrix with 1s on the diagonal, is it possible to find such that the sum over every product P\times Q of HAPs is at most … well, whatever small number is most interesting? Are there very big matrices where it’s at most 2? What about 3?

    • Sasho Nikolov Says:

      Seems like a good problem to try with linear programming. I hope to find the time to try it.

    • gowers Says:

      That would be great. On further reflection, I think the \pm 1 version is not so interesting (not that you could attack that with linear programming anyway) — it seems to me that a counterexample to EDP could probably be turned into a counterexample to the \pm 1 version. So I prefer to stick with the condition that the matrix is 1 on the diagonal.

    • Sasho Nikolov Says:

      Do you mean the other way around: a counterexample (= low discrepancy function) to the \pm 1 version can be turned into a counterexample to EDP? Because the direction you wrote seems clear: just take f(a, b) = \chi(a) \chi(b) where chi is the low discrepancy function.

      In fact, you constructed this problem as a weakening of vector-valued EDP. In the vector valued EDP case the function f(a, b) would need to be PSD in addition to f(a, a) =1 for all a. I am not totally clear at what step the weakening came in. We took the dual of vector-value EDP, and then restricted it more by looking for a decomposition of a diagonal matrix as opposed to a decomposition of a PSD matrix. Taking the dual of a dual with additional constraints, resulted in this weaker discrepancy problem, I suppose?

      One benefit is that this can be tried with linear programming and hopefully scale a little further than the semidefinite programming search.

    • gowers Says:

      Let me try to sort out what I meant, or at least should have meant. First, if you have a \pm 1-function on \mathbb{N}^2 of discrepancy at most C (with respect to products of HAPs of the same common difference), then for arbitrarily large N you can find a \pm 1-function defined on \{1,2,\dots,N\} of discrepancy at most 2C. That is because if you look at the N!th row of the 2D function, then any HAP of common difference at most N along that row can be expressed as the difference between two products of HAPs of that same common difference.

      Conversely, if you have a low-discrepancy \pm 1-function on \mathbb{N} then, as you say, its product with itself obviously has at most the square of that discrepancy with respect to products of HAPs.

      Looking back at your question, it seems that I did indeed mean what you thought I meant, which is not what I actually wrote.

      The one thing that I still haven’t quite seen the answer to is what happens if you look at products of HAPs of the same common difference and the same length, or perhaps even products of HAPs with themselves. I’m not sure how interesting that is, though.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: