EDP19 — removing some vagueness

In the comments on EDP18 we are considering a certain decomposition problem that can be understood in its own right. At various points I have asserted that if we can find a decomposition of a particular kind then we will have a positive solution to EDP. And at various points in the past I have even sketched proofs of this. But I think it would be a good idea to do more than merely sketch a proof. So in this post I shall (I hope) give a completely rigorous derivation of EDP from the existence of an appropriate decomposition. (Well, I may be slightly sketchy in places, but only about details where it is obvious that they can be made precise.) I shall also review some material from earlier posts and comments, rather than giving links.

Representing diagonal matrices

First, let me briefly look again at how the ROD (representation of diagonal) approach works. If P and Q are HAPs, I shall write P\otimes Q for the matrix A such that A_{xy}=1 if (x,y)\in P\times Q and 0 otherwise. The main thing we need to know about P\times Q is that \langle x,(P\otimes Q)x\rangle=(\sum_{i\in P}x_i)(\sum_{j\in Q}x_j) for every x.

Suppose now that D is a diagonal matrix with diagonal entries d_1,\dots,d_n and that we can write it as \sum_r\lambda_rP_r\otimes Q_r, where each P_r and each Q_r is a HAP. Then

\sum_r\lambda_r(\sum_{i\in P_r}x_i)(\sum_{j\in Q_r}x_j)=\sum_kd_kx_k^2.

If \sum_r|\lambda_r|=c\sum_kd_k and x_k=\pm 1 for every k, then it follows that there exists r such that

|\sum_{i\in P_r}x_i||\sum_{j\in Q_r}x_j|\geq c^{-1},

and from that it follows that there is a HAP P such that |\sum_{i\in P}x_i|\geq c^{-1/2}. So if we can make c arbitrarily small, then EDP is proved.

The advantage of this approach (and similar approaches related to semidefinite programming) is that it replaces a universal question — every \pm 1 sequence has unbounded discrepancy — with an existential one — there exists a diagonal matrix that can be decomposed in a certain way. However, it doesn’t solve the problem just like that, because it is far from clear how to find such a decomposition of a diagonal matrix.

Representing the identity over the rationals

A more recent idea that appears to simplify the problem considerably is to work over the rationals, which allows one to take the diagonal matrix to be the identity. Let me give a heuristic argument for this, which I hope will make it clear what I mean, and then follow it up with a more rigorous version of the argument.

By “work over the rationals” I mean that we shall consider infinite matrices A_{xy} where the indices x,y are positive rational numbers. The identity matrix is simply the matrix that is 1 if x=y and 0 otherwise. Matrix multiplication is defined in the usual way: (AB)_{xy}=\sum_zA_{xz}B_{zy}. We shall consider matrices with only finitely many non-zero entries in each row and column, so we do not have to worry about the sums being infinite.

Now let’s suppose, back in the normal integers-up-to-n case that we have a decomposition

D=\sum_s\lambda_sP_s\otimes Q_s.

Given any n\times n matrix A and any rational number r, define the rdilation of A to be the matrix A^{(r)}, defined by A^{(r)}(x,y)=A(x/r,y/r). That is, A^{(r)} does to (ru,rv) what A does to (u,v). Now let us take the matrix B=\sum_rA^{(r)} and call it the smearing of A. (The rough idea is that we’ve spread A all over the place.)

If A is a diagonal matrix, what is the smearing B of A? Well if x\ne y then rx\ne ry for all positive rationals r, so B(x,y)=0. And

B(x,x)=\sum_{r\in\mathbb{Q}_+}A(x/r,x/r)=\sum_i A_{ii}.

That is, B is the identity matrix multiplied by the trace of A.

Now each dilation of a matrix P_s\otimes Q_s gives us another matrix of the same form — that is, another product of HAPs. (Moreover, if P_s and Q_s have the same common difference, then that is true for the dilations as well.) So from a decomposition of a diagonal matrix into HAP products we can get a decomposition of the identity over \mathbb{Q}_+ into HAP products.

Where this gets a bit handwavy is the following statement: because the ratio of the sum of the absolute values of the coefficients to the trace of the diagonal matrix is small, the “average coefficient” in the infinite case is small too. Therefore, we can prove the same bound for functions defined on \mathbb{Q}_+ that we could prove for functions defined on \{1,2,\dots,n\}. It is this last step that I would like to make rigorous.

A finite approximation of the rationals

I shall use a technique that is standard in additive combinatorics (I’m referring here to regular Bohr sets), though the basic idea presumably goes back well beyond that. The particular way I’ll use it is essentially the same as what Terry Tao did in his Fourier reduction argument. I want to take a big finite subset X of \mathbb{Q}_+ that is almost closed under multiplication by any number I could conceivably want to multiply by. By that I mean that if X has size N and r is some rational that I might want to multiply by, then X\cap rX is almost the same size as X. To visualize this, imagine taking a huge sphere and intersecting it with \mathbb{Z}^3. Call the resulting set Z. If we take a random point x in Z and add a non-huge integer point y to it, then unless we are extremely unlucky and x is right near the edge of Z, we will find that x+y also belongs to Z. So Z is almost closed under adding small integer vectors. I want to do the same but for multiplication, which is pretty similar to vector addition by prime factorization.

Suppose, then, that we have an n\times n matrix A. Then the non-zero entries of A occur at pairs of the form (i,j) where i and j are integers between 1 and n. Therefore, the non-zero entries of the dilation A^{(r)} are at pairs (x,y) of the form (ir,jr), where i and j are integers between 1 and n. And from this it follows that the non-zero entries of the smearing B of A occur only at pairs of the form (x,y) such that y/x is a rational with numerator and denominator between 1 and n. Let Q_n be the set of all such rationals.

Let us therefore construct a set X of rationals such that Q_nx\subset X for almost every x\in X. (Here, Q_nx stands for the obvious thing: the set \{yx:y\in Q_n\}.) We can do this in any number of ways. Perhaps the simplest is to let p_1,\dots,p_m be the set of all primes up to n, to let M be some large positive integer, and to take X=X_M to be the set of all numbers of the form p_1^{a_1}\dots p_m^{a_m} such that each a_i lies between -M and M. Note that the product of any element of X_{M-\lfloor\log_2n\rfloor} by an element of Q_n is an element of X_M. Note also that X_M has cardinality M^m. So if we want (1-\epsilon)|X| of the points x\in X to have the property that xQ_n\subset X, then all we have to do is choose M so large that (M-\lfloor\log_2n\rfloor)^m\geq(1-\epsilon)M^m, which we can certainly do.

From now on, all we shall use about X is this property: the details of its construction are not important. We shall also consider \pm 1 functions defined on X, which we are thinking of as a finite approximation to \mathbb{Q}. One remark about X is that we can dilate it so that it becomes a set of integers, so the fact that I have been talking about rationals is just a convenient infinitary way of thinking about the problem rather than a fundamental difference of approach. (In particular, I am not using any clever notions of limit.)

Running the argument inside X

Now let’s suppose that we have an n\times n diagonal matrix D that is decomposed as \sum_i\lambda_iP_i\otimes Q_i, where each P_i and Q_i is a HAP. (Incidentally, by “HAP” I mean here the slightly generalized HAPs of the form \{rd,(r+1)d,\dots,sd\}.) This time, instead of looking at the sum of all dilations of D, let us look instead at the sum of all dilations D^{(r)} such that r\in X. This gives us a diagonal matrix E. What can we say about it?

Well, E_{xx}=\sum_{r\in X}D_{x/r,x/r}. For each j\in\{1,2,\dots,n\}, the entry D_{jj} will be included in this sum if x/j\in X. In particular, it will be included if xQ_n\subset X. Therefore, E_{xx} equals the trace of D for at least (1-\epsilon)|X| elements x\in X. For all x\in Q_nX it is at most the sum of the absolute values of the entries of D, and for x\notin Q_nX it is zero.

This tells us, in a way that is easy to make precise, that the entries of E are given by a function that is approximately the trace of D times the characteristic function of X. (Because the functions are bounded and the support of the difference is much smaller than X, this approximation can be arranged to be good in any \ell_p norm with p<\infty.)

But since any dilation of a HAP product P\otimes Q is another such product, this means that we have a decomposition into HAP products of a diagonal matrix that is very close to the identity, where the indexing set is now X rather than \mathbb{Q}. (Actually, the way I said it above, the matrix is supported not on X\times X but on Q_nX\times Q_nX. But that can easily be fixed by summing over dilations corresponding not to all r\in X but to all r such that rQ_n\subset X. So I won’t worry too much about this technical detail.)

The key point here is that the sum of the absolute values of the coefficients in the decomposition is roughly |X|\sum_i|\lambda_i|, and the trace of the diagonal matrix we get, which is very close to the identity, is roughly |X| times the trace of D. So we obtain roughly the same c as before but now with the “identity matrix on X“.

That is a precise and rigorously proved version of the statement that if we can find a diagonal decomposition with a small c then we can find a representation of the identity on \mathbb{Q} with the same small c.

The next thing to think about is what happens if we take a HAP product P\otimes Q and sum over all its dilates. We are making the simplifying assumption that P and Q have the same common difference. (It may turn out that we cannot achieve this. Then we simply switch to searching for more general decompositions where the common differences are allowed to be different.) Let us therefore assume that P=\{sd,\dots,td\} and Q=\{ud,\dots,vd\}. What is the sum of all dilates of P\otimes Q at a point (x,y) such that xQ_n\subset X and yQ_n\subset X?

Well, we need there to be some rational r\in X and integers a\in[s,t] and b\in[u,v] such that rad=x and rbd=y. Given such r,a,b we know that y/x=b/a. Conversely, if y/x can be written as b/a with b\in[u,v] and a\in[s,t], then we can set r=x/ad. We know then that r\in X (since xQ_n\subset X), and we have x/r=ad and y/r=bd. Thus, the value of the smearing of P\otimes Q at (x,y) is the number of ways of writing y/x=b/a with b\in[u,v] and a\in[s,t]. That is, it is the number of integer points in the intersection of the rectangle [s,t]\times[u,v] with the line of gradient y/x.

Two thing follow from this. First, if we are going to look at smearings, then we need only look at products P\otimes Q where the common differences of P and Q are 1. That is, WLOG P=[s,t] and Q=[u,v]. The second thing is that if we have a linear combination of functions \sum_i\lambda_iP_i\otimes Q_i, where each P_i and each Q_i is a subinterval of \{1,2,\dots,n\}, then its smearing will be diagonal if and only if for every pair of distinct coprime integers x,y\in\{1,2,\dots,n\} we have \sum_i\lambda_i N_{x,y}(P_i\times Q_i)=0, where N_{x,y}(P_i\times Q_i) denotes the number of multiples of (x,y) that lie in the rectangle P_i\times Q_i.

If we put all these observations together, together with the observation made in this comment, we obtain the following conclusion: the representation-of-diagonals approach to EDP works if and only if for every c>0 there exists a system of rectangles P_1\times P_1,\dots,P_k\times P_k (each P_i being an interval of positive integers between 1 and n) and coefficients \lambda_1,\dots,\lambda_k such that

  • \sum_i|\lambda_i|\leq c\sum_i\lambda_i|P_i|
  • for every pair (x,y)\ne(1,1) of coprime positive integers between 1 and n, we have \sum_i\lambda_iN_{x,y}(P_i\times P_i)=0.

I have explained the “only if” part of this. To see that a decomposition of this kind works, suppose that we have a function f defined on X. For convenience we shall assume that f takes values in [-1,1] though more general assumptions are possible. For convenience let us normalize so that \sum_i\lambda_i|P_i|=1. We then take the smearing of the matrix \sum_i\lambda_iP_i\otimes P_i. This equals 1 on the diagonal (except at the boundary of X — I’m referring to the smearing where we add all dilates for r\in X) and 0 off it (again, perhaps with a few exceptions at the boundary). In other words, it gives us a very good approximation of the identity on X.

Let us write this decomposition as \sum_j\mu_jQ_j\otimes Q_j. Here the HAPs Q_j run over all dilations of the HAPs P_i by numbers r\in X. Therefore, \sum_j|\mu_j|=|X|\sum_i|\lambda_i|. Since \sum_i|\lambda_i|\leq c and the trace of the identity on X is |X|, this tells us that our decomposition of the identity has been achieved with the same constant c.

Let us now calculate \sum_{x\in X}f(x)^2 in two ways. The first is to think of it as \langle f,I_Xf\rangle (where I_X is the identity indexed by X). The second is to use our decomposition and to think of it as

\sum_j\mu_j(\sum_{x\in Q_j}f(x))^2.

Since the two are the same (almost, at least), there must be some j such that (\sum_{x\in Q_j}f(x))^2\geq (c|X|)^{-1}\sum_xf(x)^2. In particular, if f(x)=\pm 1 for every x, then there must be some j such that (\sum_{x\in Q_j}f(x))^2\geq c^{-1}, which proves EDP (since for sufficiently large N the interval \{1,2,\dots,N\} contains a dilate of X, and HAPs remain HAPs after dilation).

EDP and EDP for multiplicative functions

Here is a thought that I haven’t followed through properly, but I’m pretty sure something can be got out of it. Suppose one tries to prove the existence of a decomposition of the kind we want, not by going ahead and finding one but by showing abstractly that such a thing exists. One can return to the formulation of the previous post, where we are trying to write \delta_1 as a linear combination of functions of the form P*Q^{-1}, which, like Alec, I prefer to write as P/Q on the understanding that this represents not the characteristic function of the set of all a/b with a\in P and b\in Q but the number of ways of representing a number as a/b with a\in P and b\in Q. By the Hahn-Banach theorem, this is impossible with a given upper bound for the sum of the absolute values of coefficients if and only if there is a certain functional that separates \delta_1 from all the functions P/Q. By that I mean that \phi(\delta_1)=1 and |\phi(P/Q)|\leq C for every P,Q. Now if \phi is such a functional, and \phi^{(r)} is a dilate of \phi, then I claim that (\phi+\phi^{(r)})/2 is another such functional. Indeed,


(I haven’t checked that that is correct — it might be that the second term should be \phi(P/rQ)/2.)

I have a slight technical difficulty here, which is that I don’t know about \phi^{(r)}(\delta_1)=\phi(\delta_r). Let me assume that it is \pm 1 and try to justify that assumption in a moment. I then want to readjust what I said above by changing the new functional to (\phi-\phi^{(r)})/2 if \phi(r)=-1. That ensures that we still send \delta_1 to 1. I think that to justify that assumption I might want to do something like assuming that that C is optimal. But I didn’t promise a rigorous proof here so will come back to this unless someone else manages to fill in the details.

What it is supposed to be leading to is that we can assume that the function \phi is a completely multiplicative \pm 1-valued function. The rough idea would be that we could replace \phi by bigger and bigger averages of the form \mathbb{E}_{r\in R}\phi(r)\phi^{(r)} and that if \phi wasn’t completely multiplicative in the first place then these averages would eventually become small, which would contradict the fact that they separated \delta_1 from all functions of the form P/Q.

If that is correct (and at the moment we have a rather dodgy step so this certainly cannot be assumed) then we seem to have something like a proof that if the representation-of-diagonals approach to EDP fails, then there is a completely multiplicative \pm 1-valued function with bounded discrepancy. I’m suspicious of that statement, because it seems to suggest that the vector-valued formulation of the problem is equivalent to the \pm 1-valued version. More likely, the effort to rescue the above argument from its technical problems would result in a modified statement that would be precisely what Terry proved in his Fourier reduction argument.

The moral of the previous section

I’m interested in trying to get the ideas of the previous section to work for their own sake, but there’s also a point I want to make about how one might hope to prove EDP by finding a rectangle decomposition of the kind discussed above. What the previous section suggests is that if we try to prove this using an abstract existence argument then we will have applied duality twice and will be back to the original hard problem (albeit in its completely multiplicative form, but even that seems to be hard). So we must resist the temptation to try to find an argument that says something like, “These functions are sufficiently numerous and spread about that we must be able to find a decomposition of the desired form,” for the simple reason that if there is some function of bounded discrepancy then it shows that they are not sufficiently numerous and spread about.

Instead, we must try to find a constructive proof that such a decomposition exists. Alec has got the process started for us by finding a general construction that gets c arbitrarily close to 1/2. I think we will learn a lot if we can beat that barrier, even if initially we do so by finding a fairly formless example by means of a brute-force search.

6 Responses to “EDP19 — removing some vagueness”

  1. Polymath5 « Euclidean Ramsey Theory Says:

    […] Polymath5 By kristalcantwell There is a new thread for Polymath5 here. Let me update this there is another thread here. […]

  2. DC Says:

    Apologies in advance for this being more meta-mathematics than mathematics. Reading over this approach to the EDP I am reminded of some notoriously thorny problems in operator theory: the Kadison-Singer problem, the Feichtinger conjecture, and other issues related to “pavable” operators and matrix pavings. Appropriately formulated, all concern the existence or nonexistence of various kinds of “decompositions” of matrices, with special attention paid to what is happening on the diagonal. (See e.g. http://www.aimath.org/WWN/kadisonsinger/ the paper “Equivalents of the Kadison-Singer problem.”) A good number of papers on pavable operators seem to follow the general theme that an operator that is “pavable” in any number of weak senses, must in fact be “pavable” in stronger, “more structured” senses. That seems very similar to what is desired here. (Nik Weaver even has a paper on Kadison-Singer stuff with the word “discrepancy” in the title— which is all I can say, not being all that familiar with his body of work, or any work on the EDP.)

    Of course in this approach to the EDP it seems that the matrices one wants decompositions of are far from arbitrary (as they tend to be in Kadison-Singer type problems). They have a ton of extra “structure.” I wonder if there is anyone who is familiar with both this approach to the EDP and Kadison-Singer type problems and can comment on the distinction between the two. (e.g. are they similar enough that somebody with a strong opinion on the probable truth of the EDP should have a strong opinion on certain kinds of matrix pavings, or vice versa?)

    • Alec Edgington Says:

      Gil also mentioned this a while back. It does look rather similar — though the impression I get from a skim through that article of Casazza and Edidin is that the problem has more of an algebraic and less of a number-theoretic flavour. (For example, the word ‘prime’ doesn’t occur anywhere in that article, whereas it’s hard to imagine a discussion of EDP that didn’t talk about primes at some point.) But it could well be that we could learn something from approaches to the Kadison–Singer problem.

  3. Jason Says:

    Not directly related to this post, but I am curious if experimental results of long sequences of discrepancy 2 are still interesting/relevant/helpful. I thought about it over the weekend after reading the EDP18 post and checking out the polymath wiki, and managed to come up with a method to generate such sequences, assuming I am computing the discrepancy correctly, and not overlooking some desired structure that makes the problem harder. Found a sequence of length 2026 that is relatively uniform here: http://pastebin.com/S3KZVx4V Another sequence of length 20386 that isn’t very uniform was found here: http://pastebin.com/fUwr6tg0

    • Klas Markström Says:

      Jason, your sequences start with three 1:s, so they have discrepancy at least 3.

    • gowers Says:

      Indeed, if you look along the even numbers, you find that the first one (and possibly the second — I haven’t checked) has discrepancy a lot more than 3. This makes me think you may have some misconception about the definition of discrepancy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: