## EDP19 — removing some vagueness

In the comments on EDP18 we are considering a certain decomposition problem that can be understood in its own right. At various points I have asserted that if we can find a decomposition of a particular kind then we will have a positive solution to EDP. And at various points in the past I have even sketched proofs of this. But I think it would be a good idea to do more than merely sketch a proof. So in this post I shall (I hope) give a completely rigorous derivation of EDP from the existence of an appropriate decomposition. (Well, I may be slightly sketchy in places, but only about details where it is obvious that they can be made precise.) I shall also review some material from earlier posts and comments, rather than giving links.

Representing diagonal matrices

First, let me briefly look again at how the ROD (representation of diagonal) approach works. If $P$ and $Q$ are HAPs, I shall write $P\otimes Q$ for the matrix $A$ such that $A_{xy}=1$ if $(x,y)\in P\times Q$ and 0 otherwise. The main thing we need to know about $P\times Q$ is that $\langle x,(P\otimes Q)x\rangle=(\sum_{i\in P}x_i)(\sum_{j\in Q}x_j)$ for every $x.$

Suppose now that $D$ is a diagonal matrix with diagonal entries $d_1,\dots,d_n$ and that we can write it as $\sum_r\lambda_rP_r\otimes Q_r,$ where each $P_r$ and each $Q_r$ is a HAP. Then

$\sum_r\lambda_r(\sum_{i\in P_r}x_i)(\sum_{j\in Q_r}x_j)=\sum_kd_kx_k^2.$

If $\sum_r|\lambda_r|=c\sum_kd_k$ and $x_k=\pm 1$ for every $k,$ then it follows that there exists $r$ such that

$|\sum_{i\in P_r}x_i||\sum_{j\in Q_r}x_j|\geq c^{-1},$

and from that it follows that there is a HAP $P$ such that $|\sum_{i\in P}x_i|\geq c^{-1/2}.$ So if we can make $c$ arbitrarily small, then EDP is proved.

The advantage of this approach (and similar approaches related to semidefinite programming) is that it replaces a universal question — every $\pm 1$ sequence has unbounded discrepancy — with an existential one — there exists a diagonal matrix that can be decomposed in a certain way. However, it doesn’t solve the problem just like that, because it is far from clear how to find such a decomposition of a diagonal matrix.

Representing the identity over the rationals

A more recent idea that appears to simplify the problem considerably is to work over the rationals, which allows one to take the diagonal matrix to be the identity. Let me give a heuristic argument for this, which I hope will make it clear what I mean, and then follow it up with a more rigorous version of the argument.

By “work over the rationals” I mean that we shall consider infinite matrices $A_{xy}$ where the indices $x,y$ are positive rational numbers. The identity matrix is simply the matrix that is 1 if $x=y$ and $0$ otherwise. Matrix multiplication is defined in the usual way: $(AB)_{xy}=\sum_zA_{xz}B_{zy}.$ We shall consider matrices with only finitely many non-zero entries in each row and column, so we do not have to worry about the sums being infinite.

Now let’s suppose, back in the normal integers-up-to-$n$ case that we have a decomposition

$D=\sum_s\lambda_sP_s\otimes Q_s.$

Given any $n\times n$ matrix $A$ and any rational number $r,$ define the $r$-dilation of $A$ to be the matrix $A^{(r)},$ defined by $A^{(r)}(x,y)=A(x/r,y/r).$ That is, $A^{(r)}$ does to $(ru,rv)$ what $A$ does to $(u,v).$ Now let us take the matrix $B=\sum_rA^{(r)}$ and call it the smearing of $A.$ (The rough idea is that we’ve spread $A$ all over the place.)

If $A$ is a diagonal matrix, what is the smearing $B$ of $A$? Well if $x\ne y$ then $rx\ne ry$ for all positive rationals $r,$ so $B(x,y)=0.$ And

$B(x,x)=\sum_{r\in\mathbb{Q}_+}A(x/r,x/r)=\sum_i A_{ii}.$

That is, $B$ is the identity matrix multiplied by the trace of $A.$

Now each dilation of a matrix $P_s\otimes Q_s$ gives us another matrix of the same form — that is, another product of HAPs. (Moreover, if $P_s$ and $Q_s$ have the same common difference, then that is true for the dilations as well.) So from a decomposition of a diagonal matrix into HAP products we can get a decomposition of the identity over $\mathbb{Q}_+$ into HAP products.

Where this gets a bit handwavy is the following statement: because the ratio of the sum of the absolute values of the coefficients to the trace of the diagonal matrix is small, the “average coefficient” in the infinite case is small too. Therefore, we can prove the same bound for functions defined on $\mathbb{Q}_+$ that we could prove for functions defined on $\{1,2,\dots,n\}.$ It is this last step that I would like to make rigorous.

A finite approximation of the rationals

I shall use a technique that is standard in additive combinatorics (I’m referring here to regular Bohr sets), though the basic idea presumably goes back well beyond that. The particular way I’ll use it is essentially the same as what Terry Tao did in his Fourier reduction argument. I want to take a big finite subset $X$ of $\mathbb{Q}_+$ that is almost closed under multiplication by any number I could conceivably want to multiply by. By that I mean that if $X$ has size $N$ and $r$ is some rational that I might want to multiply by, then $X\cap rX$ is almost the same size as $X.$ To visualize this, imagine taking a huge sphere and intersecting it with $\mathbb{Z}^3.$ Call the resulting set $Z.$ If we take a random point $x$ in $Z$ and add a non-huge integer point $y$ to it, then unless we are extremely unlucky and $x$ is right near the edge of $Z,$ we will find that $x+y$ also belongs to $Z.$ So $Z$ is almost closed under adding small integer vectors. I want to do the same but for multiplication, which is pretty similar to vector addition by prime factorization.

Suppose, then, that we have an $n\times n$ matrix $A.$ Then the non-zero entries of $A$ occur at pairs of the form $(i,j)$ where $i$ and $j$ are integers between $1$ and $n.$ Therefore, the non-zero entries of the dilation $A^{(r)}$ are at pairs $(x,y)$ of the form $(ir,jr),$ where $i$ and $j$ are integers between $1$ and $n.$ And from this it follows that the non-zero entries of the smearing $B$ of $A$ occur only at pairs of the form $(x,y)$ such that $y/x$ is a rational with numerator and denominator between $1$ and $n.$ Let $Q_n$ be the set of all such rationals.

Let us therefore construct a set $X$ of rationals such that $Q_nx\subset X$ for almost every $x\in X.$ (Here, $Q_nx$ stands for the obvious thing: the set $\{yx:y\in Q_n\}.$) We can do this in any number of ways. Perhaps the simplest is to let $p_1,\dots,p_m$ be the set of all primes up to $n,$ to let $M$ be some large positive integer, and to take $X=X_M$ to be the set of all numbers of the form $p_1^{a_1}\dots p_m^{a_m}$ such that each $a_i$ lies between $-M$ and $M.$ Note that the product of any element of $X_{M-\lfloor\log_2n\rfloor}$ by an element of $Q_n$ is an element of $X_M.$ Note also that $X_M$ has cardinality $M^m.$ So if we want $(1-\epsilon)|X|$ of the points $x\in X$ to have the property that $xQ_n\subset X,$ then all we have to do is choose $M$ so large that $(M-\lfloor\log_2n\rfloor)^m\geq(1-\epsilon)M^m,$ which we can certainly do.

From now on, all we shall use about $X$ is this property: the details of its construction are not important. We shall also consider $\pm 1$ functions defined on $X,$ which we are thinking of as a finite approximation to $\mathbb{Q}.$ One remark about $X$ is that we can dilate it so that it becomes a set of integers, so the fact that I have been talking about rationals is just a convenient infinitary way of thinking about the problem rather than a fundamental difference of approach. (In particular, I am not using any clever notions of limit.)

Running the argument inside X

Now let’s suppose that we have an $n\times n$ diagonal matrix $D$ that is decomposed as $\sum_i\lambda_iP_i\otimes Q_i,$ where each $P_i$ and $Q_i$ is a HAP. (Incidentally, by “HAP” I mean here the slightly generalized HAPs of the form $\{rd,(r+1)d,\dots,sd\}$.) This time, instead of looking at the sum of all dilations of $D,$ let us look instead at the sum of all dilations $D^{(r)}$ such that $r\in X.$ This gives us a diagonal matrix $E.$ What can we say about it?

Well, $E_{xx}=\sum_{r\in X}D_{x/r,x/r}.$ For each $j\in\{1,2,\dots,n\},$ the entry $D_{jj}$ will be included in this sum if $x/j\in X.$ In particular, it will be included if $xQ_n\subset X.$ Therefore, $E_{xx}$ equals the trace of $D$ for at least $(1-\epsilon)|X|$ elements $x\in X.$ For all $x\in Q_nX$ it is at most the sum of the absolute values of the entries of $D,$ and for $x\notin Q_nX$ it is zero.

This tells us, in a way that is easy to make precise, that the entries of $E$ are given by a function that is approximately the trace of $D$ times the characteristic function of $X.$ (Because the functions are bounded and the support of the difference is much smaller than $X,$ this approximation can be arranged to be good in any $\ell_p$ norm with $p<\infty.$)

But since any dilation of a HAP product $P\otimes Q$ is another such product, this means that we have a decomposition into HAP products of a diagonal matrix that is very close to the identity, where the indexing set is now $X$ rather than $\mathbb{Q}.$ (Actually, the way I said it above, the matrix is supported not on $X\times X$ but on $Q_nX\times Q_nX.$ But that can easily be fixed by summing over dilations corresponding not to all $r\in X$ but to all $r$ such that $rQ_n\subset X.$ So I won’t worry too much about this technical detail.)

The key point here is that the sum of the absolute values of the coefficients in the decomposition is roughly $|X|\sum_i|\lambda_i|,$ and the trace of the diagonal matrix we get, which is very close to the identity, is roughly $|X|$ times the trace of $D.$ So we obtain roughly the same $c$ as before but now with the “identity matrix on $X$“.

That is a precise and rigorously proved version of the statement that if we can find a diagonal decomposition with a small $c$ then we can find a representation of the identity on $\mathbb{Q}$ with the same small $c.$

The next thing to think about is what happens if we take a HAP product $P\otimes Q$ and sum over all its dilates. We are making the simplifying assumption that $P$ and $Q$ have the same common difference. (It may turn out that we cannot achieve this. Then we simply switch to searching for more general decompositions where the common differences are allowed to be different.) Let us therefore assume that $P=\{sd,\dots,td\}$ and $Q=\{ud,\dots,vd\}.$ What is the sum of all dilates of $P\otimes Q$ at a point $(x,y)$ such that $xQ_n\subset X$ and $yQ_n\subset X$?

Well, we need there to be some rational $r\in X$ and integers $a\in[s,t]$ and $b\in[u,v]$ such that $rad=x$ and $rbd=y.$ Given such $r,a,b$ we know that $y/x=b/a.$ Conversely, if $y/x$ can be written as $b/a$ with $b\in[u,v]$ and $a\in[s,t],$ then we can set $r=x/ad.$ We know then that $r\in X$ (since $xQ_n\subset X$), and we have $x/r=ad$ and $y/r=bd.$ Thus, the value of the smearing of $P\otimes Q$ at $(x,y)$ is the number of ways of writing $y/x=b/a$ with $b\in[u,v]$ and $a\in[s,t].$ That is, it is the number of integer points in the intersection of the rectangle $[s,t]\times[u,v]$ with the line of gradient $y/x.$

Two thing follow from this. First, if we are going to look at smearings, then we need only look at products $P\otimes Q$ where the common differences of $P$ and $Q$ are 1. That is, WLOG $P=[s,t]$ and $Q=[u,v].$ The second thing is that if we have a linear combination of functions $\sum_i\lambda_iP_i\otimes Q_i,$ where each $P_i$ and each $Q_i$ is a subinterval of $\{1,2,\dots,n\},$ then its smearing will be diagonal if and only if for every pair of distinct coprime integers $x,y\in\{1,2,\dots,n\}$ we have $\sum_i\lambda_i N_{x,y}(P_i\times Q_i)=0,$ where $N_{x,y}(P_i\times Q_i)$ denotes the number of multiples of $(x,y)$ that lie in the rectangle $P_i\times Q_i.$

If we put all these observations together, together with the observation made in this comment, we obtain the following conclusion: the representation-of-diagonals approach to EDP works if and only if for every $c>0$ there exists a system of rectangles $P_1\times P_1,\dots,P_k\times P_k$ (each $P_i$ being an interval of positive integers between 1 and $n$) and coefficients $\lambda_1,\dots,\lambda_k$ such that

• $\sum_i|\lambda_i|\leq c\sum_i\lambda_i|P_i|$
• for every pair $(x,y)\ne(1,1)$ of coprime positive integers between 1 and $n,$ we have $\sum_i\lambda_iN_{x,y}(P_i\times P_i)=0.$

I have explained the “only if” part of this. To see that a decomposition of this kind works, suppose that we have a function $f$ defined on $X.$ For convenience we shall assume that $f$ takes values in $[-1,1]$ though more general assumptions are possible. For convenience let us normalize so that $\sum_i\lambda_i|P_i|=1.$ We then take the smearing of the matrix $\sum_i\lambda_iP_i\otimes P_i.$ This equals 1 on the diagonal (except at the boundary of $X$ — I’m referring to the smearing where we add all dilates for $r\in X$) and 0 off it (again, perhaps with a few exceptions at the boundary). In other words, it gives us a very good approximation of the identity on $X.$

Let us write this decomposition as $\sum_j\mu_jQ_j\otimes Q_j.$ Here the HAPs $Q_j$ run over all dilations of the HAPs $P_i$ by numbers $r\in X.$ Therefore, $\sum_j|\mu_j|=|X|\sum_i|\lambda_i|.$ Since $\sum_i|\lambda_i|\leq c$ and the trace of the identity on $X$ is $|X|,$ this tells us that our decomposition of the identity has been achieved with the same constant $c.$

Let us now calculate $\sum_{x\in X}f(x)^2$ in two ways. The first is to think of it as $\langle f,I_Xf\rangle$ (where $I_X$ is the identity indexed by $X$). The second is to use our decomposition and to think of it as

$\sum_j\mu_j(\sum_{x\in Q_j}f(x))^2.$

Since the two are the same (almost, at least), there must be some $j$ such that $(\sum_{x\in Q_j}f(x))^2\geq (c|X|)^{-1}\sum_xf(x)^2.$ In particular, if $f(x)=\pm 1$ for every $x,$ then there must be some $j$ such that $(\sum_{x\in Q_j}f(x))^2\geq c^{-1},$ which proves EDP (since for sufficiently large $N$ the interval $\{1,2,\dots,N\}$ contains a dilate of $X$, and HAPs remain HAPs after dilation).

EDP and EDP for multiplicative functions

Here is a thought that I haven’t followed through properly, but I’m pretty sure something can be got out of it. Suppose one tries to prove the existence of a decomposition of the kind we want, not by going ahead and finding one but by showing abstractly that such a thing exists. One can return to the formulation of the previous post, where we are trying to write $\delta_1$ as a linear combination of functions of the form $P*Q^{-1},$ which, like Alec, I prefer to write as $P/Q$ on the understanding that this represents not the characteristic function of the set of all $a/b$ with $a\in P$ and $b\in Q$ but the number of ways of representing a number as $a/b$ with $a\in P$ and $b\in Q$. By the Hahn-Banach theorem, this is impossible with a given upper bound for the sum of the absolute values of coefficients if and only if there is a certain functional that separates $\delta_1$ from all the functions $P/Q.$ By that I mean that $\phi(\delta_1)=1$ and $|\phi(P/Q)|\leq C$ for every $P,Q.$ Now if $\phi$ is such a functional, and $\phi^{(r)}$ is a dilate of $\phi,$ then I claim that $(\phi+\phi^{(r)})/2$ is another such functional. Indeed,

$(\phi+\phi^{(r)})(P/Q)/2=\phi(P/Q)/2+\phi(rP/Q)/2$

(I haven’t checked that that is correct — it might be that the second term should be $\phi(P/rQ)/2.$)

I have a slight technical difficulty here, which is that I don’t know about $\phi^{(r)}(\delta_1)=\phi(\delta_r).$ Let me assume that it is $\pm 1$ and try to justify that assumption in a moment. I then want to readjust what I said above by changing the new functional to $(\phi-\phi^{(r)})/2$ if $\phi(r)=-1.$ That ensures that we still send $\delta_1$ to 1. I think that to justify that assumption I might want to do something like assuming that that $C$ is optimal. But I didn’t promise a rigorous proof here so will come back to this unless someone else manages to fill in the details.

What it is supposed to be leading to is that we can assume that the function $\phi$ is a completely multiplicative $\pm 1$-valued function. The rough idea would be that we could replace $\phi$ by bigger and bigger averages of the form $\mathbb{E}_{r\in R}\phi(r)\phi^{(r)}$ and that if $\phi$ wasn’t completely multiplicative in the first place then these averages would eventually become small, which would contradict the fact that they separated $\delta_1$ from all functions of the form $P/Q.$

If that is correct (and at the moment we have a rather dodgy step so this certainly cannot be assumed) then we seem to have something like a proof that if the representation-of-diagonals approach to EDP fails, then there is a completely multiplicative $\pm 1$-valued function with bounded discrepancy. I’m suspicious of that statement, because it seems to suggest that the vector-valued formulation of the problem is equivalent to the $\pm 1$-valued version. More likely, the effort to rescue the above argument from its technical problems would result in a modified statement that would be precisely what Terry proved in his Fourier reduction argument.

The moral of the previous section

I’m interested in trying to get the ideas of the previous section to work for their own sake, but there’s also a point I want to make about how one might hope to prove EDP by finding a rectangle decomposition of the kind discussed above. What the previous section suggests is that if we try to prove this using an abstract existence argument then we will have applied duality twice and will be back to the original hard problem (albeit in its completely multiplicative form, but even that seems to be hard). So we must resist the temptation to try to find an argument that says something like, “These functions are sufficiently numerous and spread about that we must be able to find a decomposition of the desired form,” for the simple reason that if there is some function of bounded discrepancy then it shows that they are not sufficiently numerous and spread about.

Instead, we must try to find a constructive proof that such a decomposition exists. Alec has got the process started for us by finding a general construction that gets $c$ arbitrarily close to 1/2. I think we will learn a lot if we can beat that barrier, even if initially we do so by finding a fairly formless example by means of a brute-force search.

### 6 Responses to “EDP19 — removing some vagueness”

1. Polymath5 « Euclidean Ramsey Theory Says:

[...] Polymath5 By kristalcantwell There is a new thread for Polymath5 here. Let me update this there is another thread here. [...]

2. DC Says:

Apologies in advance for this being more meta-mathematics than mathematics. Reading over this approach to the EDP I am reminded of some notoriously thorny problems in operator theory: the Kadison-Singer problem, the Feichtinger conjecture, and other issues related to “pavable” operators and matrix pavings. Appropriately formulated, all concern the existence or nonexistence of various kinds of “decompositions” of matrices, with special attention paid to what is happening on the diagonal. (See e.g. http://www.aimath.org/WWN/kadisonsinger/ the paper “Equivalents of the Kadison-Singer problem.”) A good number of papers on pavable operators seem to follow the general theme that an operator that is “pavable” in any number of weak senses, must in fact be “pavable” in stronger, “more structured” senses. That seems very similar to what is desired here. (Nik Weaver even has a paper on Kadison-Singer stuff with the word “discrepancy” in the title— which is all I can say, not being all that familiar with his body of work, or any work on the EDP.)

Of course in this approach to the EDP it seems that the matrices one wants decompositions of are far from arbitrary (as they tend to be in Kadison-Singer type problems). They have a ton of extra “structure.” I wonder if there is anyone who is familiar with both this approach to the EDP and Kadison-Singer type problems and can comment on the distinction between the two. (e.g. are they similar enough that somebody with a strong opinion on the probable truth of the EDP should have a strong opinion on certain kinds of matrix pavings, or vice versa?)

• Alec Edgington Says:

Gil also mentioned this a while back. It does look rather similar — though the impression I get from a skim through that article of Casazza and Edidin is that the problem has more of an algebraic and less of a number-theoretic flavour. (For example, the word ‘prime’ doesn’t occur anywhere in that article, whereas it’s hard to imagine a discussion of EDP that didn’t talk about primes at some point.) But it could well be that we could learn something from approaches to the Kadison–Singer problem.

3. Jason Says:

Not directly related to this post, but I am curious if experimental results of long sequences of discrepancy 2 are still interesting/relevant/helpful. I thought about it over the weekend after reading the EDP18 post and checking out the polymath wiki, and managed to come up with a method to generate such sequences, assuming I am computing the discrepancy correctly, and not overlooking some desired structure that makes the problem harder. Found a sequence of length 2026 that is relatively uniform here: http://pastebin.com/S3KZVx4V Another sequence of length 20386 that isn’t very uniform was found here: http://pastebin.com/fUwr6tg0

• Klas MarkstrĂ¶m Says: