EDP12 — representing diagonal maps

In this post I want to use my unfair advantage as host of this blog to push an approach that is the one that I am currently most interested in. So let me precede it with the qualification that although I shall present the approach as favourably as I can, it may well be that Moses’s SDP approach or Gil’s polynomial approach will in the end turn out to be more fruitful. I should also add that this new approach makes use of a crucial insight from the SDP approach, which is the idea that one can prove a result about all sequences (x_n), including our troublesome friend 1, -1, 0, 1, -1, 0, … if we bound the discrepancy below by an expression of the form \sum_ib_ix_i^2.

Lewis’s theorem

Before I explain the new result, I’d like to discuss a result from the geometry of Banach spaces. (I don’t need to do this, but it is a very nice result and this is a good excuse to write about it. If you just skim this section and take note of the definition of “well spread out” below, that will be enough to follow the rest of the post.) The result is a special case of an important theorem of Dan Lewis, though the proof that I shall sketch is a little different from Lewis’s.

Let K be a symmetric convex body in \mathbb{R}^n. The minimal volume ellipsoid of K is, as its name suggests, the ellipsoid of minimal volume that contains K. Now if E is this ellipsoid, then obviously the boundary of E will touch the boundary of K, or else we could just shrink E by a small amount and have a new ellipsoid with smaller volume that still contains K. (For what it’s worth, this argument relies on compactness, which guarantees that if the boundaries of E and K are disjoint, then there is a positive distance between them.)

However, a moment’s thought suggests that we ought to be able to say more. Suppose, for instance, that E is a sphere and that its boundary touches the boundary of K only at two antipodal points. It feels as though we ought to be able to expand K in the subspace orthogonal to those points and get a larger affine copy of K inside E. Note that the problem of finding the minimal volume ellipsoid that contains K is equivalent to the problem of finding the maximum volume linear copy of K contained in the unit sphere, where by “linear copy” I mean the image of K under a linear map.

More generally, it ought to be the case that if we take the maximum volume linear copy of K inside the unit sphere B, then the contact points (that is, the points where the boundary of K touches the boundary of B) should be “well spread about”. The intuition would be that if the contact points are not well spread about, then we could find some big area of the sphere that is nowhere in contact with K and expand K into that area (perhaps very slightly contracting in other places in such a way as to increase the volume in total).

To turn this intuition into a formal proof, we need to start with a formal definition. When do we count some unit vectors (the contact points) as being “well spread about”? There turns out to be a very nice answer to this: define the unit vectors v_1,\dots,v_N to be well spread about if there are positive scalars \lambda_1,\dots,\lambda_N such that the identity matrix I_n=\sum_i\lambda_iv_i\otimes v_i. Here, I write v\otimes w for the matrix (v\otimes w)_{rs}=v_i(r)w_j(s). One could alternatively think of the v_i as column vectors and write \sum_i\lambda_iv_iv_i^T for the sum instead. Either way, if x is any vector, then (\sum_i\lambda_iv_i\otimes v_i)x=\sum_i\lambda_i\langle v_i,x\rangle v_i. Another small remark is that saying that I_n can be written as \sum_i\lambda_iv_i\otimes v_i with positive \lambda_i is equivalent to saying that n^{-1}I_n belongs to the convex hull of the v_i\otimes v_i, since the trace of v_i\otimes v_i is \|v_i\|^2=1, and this implies that the \lambda_i have to add up to n. A more geometrical way of expressing this condition is as follows. A well-known property of orthonormal bases e_1,e_2,\dots,e_n is that each vector x is equal to the sum of its projections on to the 1-dimensional subspaces generated by the basis vectors: that is, x=\sum_i\langle x,e_i\rangle e_i for every x. The unit vectors v_1,\dots,v_N are well spread about if we can find weights such that every x is equal to the weighted sum of the “coordinate projections”: that is, we need positive constants \lambda_i such that every x is equal to \sum_i\lambda_i\langle x,v_i\rangle v_i.

Lewis’s theorem is the statement that the contact points of B and the maximal volume copy of K inside B are well spread about in this sense. (Or rather, Lewis’s theorem is a more general statement of which this is a special case.) Before I sketch the proof, let me point out that it implies Fritz John’s famous result that for any symmetric convex body K\subset\mathbb{R}^n there is an ellipsoid E such that E\subset K\subset \sqrt{n}E. To show this, let |.| be the Euclidean norm and let \|.\| be the norm with unit ball K. Since K\subset B we know that |x|\leq\|x\| for every x. If we can write x as \sum_i\lambda_i\langle x,v_i\rangle v_i, where each v_i is a contact point, then since each v_i is a unit vector in both norms, we know that \|x\|\leq\sum_i\lambda_i|\langle x,v_i\rangle|. By Cauchy-Schwarz, the square of this is at most the product of \sum_i\lambda_i and \sum_i\lambda_i|\langle x,v_i\rangle|^2. But the latter expression is equal to \langle\sum_i\lambda_i\langle x,v_i\rangle v_i,x\rangle=\langle x,x\rangle. Therefore, |x|^2\leq n\|x\|^2, so |x|\leq\sqrt{n}\|x\|, which is equivalent to saying that n^{-1/2}B\subset K. And this obviously implies John’s theorem.

And now for the proof of the assertion about the contact points. Let us suppose that n^{-1}I_n is not in the convex hull of the matrices v\otimes v with v a contact point. Then by the Hahn-Banach separation theorem there exists a linear functional that separates n^{-1}I_n from all such matrices. We can phrase this in terms of the matrix inner product \langle A,B\rangle=\sum_{i,j}A_{ij}B_{ij}, which is the trace of A^TB. Then the linear functional can be thought of as a matrix B with the property that \langle n^{-1}I_n,B\rangle, which is n^{-1} times the trace of B, is 1, whereas there is some \alpha>0 such that \langle B,v\otimes v\rangle\leq 1-\alpha for every contact point v. Now \langle B,v\otimes v\rangle is equal to \langle Bv,v\rangle.

Now if B is any n\times n matrix, then the determinant of I_n+\delta B is 1+\delta\mathrm{tr}(B)+o(\delta). It follows that if the trace of B is n, then the determinant of (1-\delta)I_n+\delta B is (1-\delta)^n+\delta n+o(\delta)=1+o(\delta).

Next, we use the fact that \langle Bv,v\rangle\leq 1-\alpha for every contact point v. If u is another unit vector, then

\displaystyle \langle Bu,u\rangle\leq\langle Bv,v\rangle+|\langle Bv,u-v\rangle|+|\langle B(u-v),v\rangle|+|\langle B(u-v),u-v\rangle|,

which is at most 1-\alpha+2\|B\|\|u-v\|+\|B\|\|u-v\|^2. If 2\|u-v\|+\|u-v\|^2\leq\alpha/4\|B\|, then this is at most 1-\alpha/2. Thus, there is a positive constant \gamma, independent of \delta, such that if v is a contact point and \|u-v\|\leq\gamma then \langle Bu,u\rangle\leq 1-\alpha/2. From this it follows that \langle ((1-\delta)I_n+\delta B)u,u\rangle\leq 1-\delta+(1-\alpha/2)\delta=1-\alpha\delta/2.

By compactness, there is some positive distance between the inner symmetric convex body K and the set of all points on the surface of the sphere that are not within \gamma of a contact point. Therefore, if we perturb K by a sufficiently small amount, we will not cause any point not within \gamma of a contact point to go inside K. But the volume of ((1-\delta)I_n+\delta B)K is equal to the volume of K up to o(\delta) and every point within \gamma of a contact point maps to a point of norm at most 1-\alpha\delta/2. Therefore, for sufficiently small \delta, the body (1+\alpha\delta/4)((1-\delta)I_n+\delta B)K has volume greater than that of K but still lies inside the unit sphere. The proof is complete.

What has this to do with EDP?

The Erdős discrepancy problem asks us to prove that if x is any \pm 1 sequence of length n, then there is some HAP with characteristic function P such that |\langle x,P\rangle|\geq C. Now one way one might imagine trying to show this would be by showing that these functions P are so spread about that every sequence of \ell_2 norm \sqrt{n} (that is, every sequence such that \sum_{i=1}^nx_i^2=n) has inner product at least C with one of them. And that one could prove if one had a representation I_n=\sum_{i=1}^N\lambda_iP_i\otimes P_i with \sum_i\lambda_i=o(n), since

\displaystyle \|x\|_2^2=\sum_{i=1}^N\lambda_i|\langle x,v_i\rangle|^2\leq (\sum_i\lambda_i)(\max_i|\langle x,v_i\rangle|^2).

Actually, there is an obvious problem with this idea, which is that if all the \lambda_i are positive, then unless all the P_i are singletons then there will be positive elements off the diagonal and therefore no hope of representing the identity. However, one can rescue the situation by considering slightly more general vectors such as convex combinations of the set of all Ps and their negatives. If v is such a combination and |\langle x,v\rangle|\geq C, then x must have discrepancy at least C in one of the HAPs that makes up v. And the advantage of these vectors v is that they are allowed to have negative coordinates.

I had this thought a few years ago when attempting solve EDP and dismissed it once I spotted that the troublesome 1,-1,0,1,-1,0,… example kills the idea. It simply isn’t true that if x is a sequence of length n and \|x\|_2^2=n, then x has unbounded discrepancy in some HAP, as the troublesome example (multiplied by \sqrt{3/2}) demonstrates.

However, an important lesson from the SDP approach is that this is a less serious problem than it at first appears, because we are not forced to use the \ell_2 norm as a lower bound for discrepancy. Instead, we can use a weighted \ell_2 norm. In the context of this proof, this suggestion becomes the idea that instead of trying to represent the identity, we could go for a diagonal matrix with large trace instead.

To spell this out, let us define a test vector to be any vector of the form v=\sum_i\rho_iP_i, where \sum_i|\rho_i|=1 and the P_i are characteristic functions of HAPs. Suppose that D is a diagonal matrix with entries b_1,\dots,b_n down the diagonal, and that we can write it as a convex combination \sum_i\lambda_iv_i\otimes v_i, where the v_i are test vectors. Then if x is any \pm 1 sequence, we know that \langle x,Dx\rangle=\sum_jb_jx_j^2=\sum_jb_j. But from the representation of D we also know that

\displaystyle \langle x,Dx\rangle=\sum_i\lambda_i|\langle x,v_i\rangle|^2\leq\max_i|\langle x,v_i\rangle|^2.

It follows that there exists i such that |\langle x,v_i\rangle|\geq(\sum_jb_j)^{1/2}, and hence, since v_i is a test vector, that the discrepancy of x in some HAP is at least (\sum_jb_j)^{1/2} as well.

Thus, we can prove EDP if we can find a convex combination \sum_i\lambda_iv_i\otimes v_i, where the v_i are test vectors, that equals a diagonal matrix with unbounded trace.

Moses Charikar points out a modification of this requirement that is somewhat simpler. If v=\sum_j\rho_jP_j with \sum_j|\rho_j|=1, then v\otimes v=\sum_{j,j'}\rho_j\rho_{j'}P_j\otimes P_{j'}, which is a convex combination of matrices of the form \pm P\otimes Q, where both P and Q are characteristic functions of HAPs. Conversely, if we write a diagonal matrix D as \sum_i\lambda_iP_i\otimes Q_i, where the P_i and Q_i are HAPs and \sum_i|\lambda_i|=1, then

\displaystyle \langle x,Dx\rangle=\sum_i\lambda_i\langle x,P_i\rangle\langle x,Q_i\rangle\leq\max_i|\langle x,P_i\rangle||\langle x,Q_i\rangle|.

From this it follows, as before, that if x is a \pm 1 sequence then there is some HAP on which the discrepancy of x is at least \sqrt{\mathrm{tr}D}.

The non-symmetric vector-valued EDP.

Why should we believe that good representations of diagonal matrices exist?The best answer I can give to this is that it turns out to be equivalent to a strengthening of EDP that has (I think) no obvious counterexample. Let me give the strengthening and then prove the equivalence.

We have already met the vector-valued EDP: given any sequence v_1,v_2,\dots of unit vectors in a Euclidean space and any constant C there exists some HAP P such that \sum_{i\in P}v_i has norm at least C. The problem that we shall now take is the “non-symmetric” vector-valued EDP: given any two sequences v_1,v_2,\dots and w_1,w_2,\dots of vectors in a Euclidean space satisfying the condition \langle v_i,w_i\rangle=1 for every i, there are HAPs P and Q such that |\langle\sum_{i\in P}v_i,\sum_{j\in Q}w_j\rangle|\geq C. If we insist that v_i=w_i, then this reduces to the usual vector-valued EDP (for \sqrt{C}).

The reason this looks as though it is probably true (or rather, that it and EDP are “equi-true”) is that if you try to make one of the sequences have small discrepancy by making it small on some HAP, then you have to make the other one large on that HAP. (This argument applies rather better to the real-valued case.) Also, the condition \langle v_i,w_i\rangle=1 means that the directional bias of the w_i is somehow similar to that of the v_i, which helps the discrepancies of the two sequences to align with each other.

These are fairly feeble arguments, which reflects the fact that just at the moment I very much want to believe this statement and have not made a big effort to disprove it.

How about the equivalence? This again follows from the Hahn-Banach theorem. Let’s suppose that we cannot represent any diagonal matrix with trace greater than C as a convex combination of \pm P\otimes Qs. Then the Hahn-Banach theorem implies that there is a separating functional, in this case a matrix B such that \langle B,D\rangle\geq 1 whenever D is a diagonal matrix with trace at least C, and |\langle B,P\otimes Q\rangle|\leq 1 whenever P and Q are characteristic functions of HAPs. Moreover, this is an equivalence: if such a matrix exists, then we trivially cannot represent D as a convex combination of \pm P\otimes Qs.

The first condition (that \langle B,D\rangle\geq 1 whenever D is a diagonal matrix with trace at least C) is easily seen to fail if B is not constant on the diagonal, and if it is constant then we see that its value on the diagonal must be at least C^{-1}.

If such a matrix B exists, then choose vectors v_1,\dots,v_n and w_1,\dots,w_n such that \langle v_i,w_j\rangle=B_{ij} for every i and j. (For instance, v_1,\dots,v_n could be the rows of B and w_1,\dots,w_n could be the standard basis of \mathbb{R}^n.) Then \langle B,P\otimes Q\rangle=\sum_{ij}P_iQ_j=\langle\sum_{i\in P}v_i,\sum_{j\in Q}w_j\rangle. Therefore, the condition that |\langle B,P\otimes Q\rangle| is always at most 1 is telling us that |\langle\sum_{i\in P}v_i,\sum_{j\in Q}w_j\rangle| is always at most 1. If we now rescale by multiplying the v_i by C then we have a counterexample to the non-symmetric vector-valued EDP.

Conversely, if we have such a counterexample, then we can set B_{ij}=C^{-1}\langle v_i,w_j\rangle and we have a separating functional that proves that no diagonal matrix with trace greater than C can be expressed as a convex combination of \pm P\otimes Qs.

What are the difficulties in carrying out this programme?

The main problem we now face is this. If we want to write a matrix with unbounded trace as a convex combination of \pm P\otimes Qs, then we need to use fairly large HAPs. Indeed, the trace of P\otimes Q is |P\cap Q| (if I may mix up sets and characteristic functions), so if our convex combination is \sum_i\lambda_iP_i\otimes Q_i then we need \sum_i\lambda_i|P_i\cap Q_i| to be unbounded. This means more than merely the statement that the weighted average size of P_i\cap Q_i is unbounded, since some of the \lambda_i are of necessity negative.

In particular, we need the sizes of the P_i and Q_i to be unbounded. But if that is the case, then we are trying to write a diagonal matrix with a large trace as a convex combination of matrices that are very far from diagonal, which forces us to rely on clever cancellations.

Having said that, I would also like to point out that the task is by no means hopeless. For example, if n=2^k and w_1,\dots,w_n are the Walsh functions (with respect to some bijection between \{1,2,\dots,n\} and \{0,1\}^k), then n^{-1}\sum_iw_i\otimes w_i equals the identity, even though each individual w_i\otimes w_i is maximally far from being diagonal. Thus, there is nothing in principle to stop cancellations of the kind we want occurring. It is just a technical challenge to find them in the particular case of interest.

Note that if we are looking for clever cancellations, then we are more likely to be able to find them if we deal as much as possible in points that belong to several HAPs. This suggests, and the suggestion is borne out by the experimental evidence we have so far, that the diagonal matrix we produce will probably be bigger at points with many factors. But one of the nice aspects of this approach is that one can concentrate on getting rid of the off-diagonal parts and not worry too much about which exact diagonal matrix results in the end. Making sure that its trace is big is a far easier task than guessing what it should be and then trying to produce it.

I will end this post by reiterating what I said after Moses first suggested using SDP: it seems to me that we have now “softened up” EDP. That is, it no longer feels like a super-hard problem with major obstacles in the way of a solution. Obviously I can’t be sure, but I feel optimistic that we are going to get there in the end.

102 Responses to “EDP12 — representing diagonal maps”

  1. Gil Says:

    A question in trying to “push” EDP’s conclusion. Suppose we consider the
    mean square weighted discrepencies, where HAP’s of period m have weights 1/d(m). ( d(m) is the number of divisors of m.) Can we expect this average is still unbounded?

    • Moses Charikar Says:

      Gil, what was your motivation in picking this particular weight function ? Did you mean to give weight 1/d(m) to each HAP of period m, or did you intend to give total weight 1/d(m) to all such HAPs ?

    • Gil Says:

      Dear Moses

      These weights came in the following way.

      Let f be the function from the natural numbers to {+1,-1}

      Let f_k be the restriction of f to integers on the HAP {ik}. and 0 for other integers.

      Let X_k(n) be 0 if n is not of the form ik and otherwise if n=ik

      X_k(n)=f(k)+f(2k)+…+f(ik)

      Now, note that

      \sum_k f_k(n)/(1+d(k))=f and if f has bounded discrepency then

      (*) \sum_k |X_k(n)|/(1+d(k)) is a bounded function.

      The hope was to play with Fourier expansions of the f_k and X_k to get some contradiction to (*). This did not went too far (I don’t see a nice fourier expansion for f_k and X_k) but it raised the question: Does (*) already leads to a contradiction.

      In other words, is there a function f so that for every HAP of period k the discrepency divided by d(k) is bounded?

      (Also here after giving this weight 1/d(k) for dicrepencies for HAP of period k you can take various l2 norms instead of infinity norm;)

      d(k) is the number of dividors of k.

    • Gil Says:

      By the way, pardon my ignorance but how does \sum_{k=1}^n(1/kd(k)) behaves?

    • gowers Says:

      If nobody knows the answer, maybe that’s one for Mathoverflow, where I imagine a few minutes would be enough. I can just make the simple observation that if you restrict the sum to primes you see that it grows at least as fast as \log\log n.

    • Moses Charikar Says:

      \log \log n would be my guess too. If we use the crude approximation d(k) \approx \log(k), then we get \log \log (n) for the sum.

    • Gil Kalai Says:

      1. Well, It took 4 hours over Mathoverflow; (But that’s ok, of course, I see no advantage to a few minutes time-scale.)

      2. Here is the link

      http://mathoverflow.net/questions/18483/an-elementary-number-theoretic-infinite-series

      3. The answer is \sqrt{\log n}

      4. I liked especially the answer by David Hansen based on a beautiful formula by Ramanujan http://mathoverflow.net/questions/18483/an-elementary-number-theoretic-infinite-series/18501#18501
      Maksym Radziwill asserted that the Selberg-Delange method gives the same answer.

      5. I would still like to see an explanation that I can understand of the form: takes numbers which are product of m different primes where m = 1/2loglogn/logloglog n (say,say) and then you see how sqrt log n arises…

      6. I think this answer makes these weights more appealing for the SDP approach. It may give the same answer as with 1 weights, and maybe maybe these weights will smooth the data so it will be easier to guess the pattern.

    • Moses Charikar Says:

      Wow! Hats off to Mathoverflow. I think insights into what weights to use for various HAPs in the SDP approach would be very valuable. Currently, I don’t have a good sense for what these weights should be.

      Gil, I wanted to clarify some things what you wrote earlier.
      \sum_k f_k(n)/(1+d(k))=f
      Presumably, here you meant to write d(n) in the denominator instead of 1+d(k) ?
      The number of k such that f_k(n) = f(n) is exactly d(n); for all other k, f_k(n) = 0. If that is what you meant, then presumably the same
      change should be made in the next expression too ?
      \sum_k |X_k(n)|/(1+d(k))

    • Gil Says:

      Right, I forgot to count 1 as a divisor, so indeed it should be /d(k) in both formulas.

      (I should also mention that Maxsym’s answer came first, about an hour before David’s.) I agree that this is a nice demonstration of MO’s usefullness.

      Also if you give all HAP of period k a weight 1/d(k), what happens to the vector values EDP? can we still expect sqrt{\log n} to be the answer or a smaller power.

    • Moses Charikar Says:

      Dear Gil,
      I don’t think I was very clear. It seems to me that in this argument, the denominator is a function of n, the last element of the HAP, rather than k, the common difference. This suggests that we should compute average discrepancy by weighting the contribution of HAPs that end in n by 1/d(n). This would not give a common weight for all HAPs with common difference k as a function of k. Is this correct ?

    • Gil Says:

      Right, I was confused it should be /d(n);

      hmm, the rationale for the weights 1/d(k) for HAP with divisors k is now based on a typo. (Still the mathoverflow computation suggest that these weights may be useful.)

      The corrected hope was to get a contradiction from (*) \sum_k X_k(n)^2/(d(n)) being bounded

      So this is a simple way of averaging the discrepencies…

      Anyway, I suppose the interesting related conjecture would be that if the average square discrepencies for all HAPs ending with an integer m are below s for every m then s should already behave like \sqrt{\log n}.

      (I think we suggested square-mean averaging over all HAPs of period k, and over all HAP’s altogether, so this one is a slightly different SDP.)

  2. Polymath5 « Euclidean Ramsey Theory Says:

    […] By kristalcantwell There is a new thread for polymath5. The new thread is more of a place holder. It has material but that material is not yet applied to […]

  3. Moses Charikar Says:

    I wanted to mention some thoughts on the SDP approach. Recall that we are trying to prove that the quadratic form
    \displaystyle \sum_{k,d} c_{k,d} (x_d+\ldots+x_{kd})^2 - \sum_n b_n x_n^2
    is positive semidefinite. The lower bound established by this is (\sum b_n)/(\sum_{k,d} c_{k,d}).
    Under the assumption that the tails \tau_{k,d} = \sum_{i \geq k} c_{i,d} = C_d e^{-\mu kd}, we get the following expression:
    \displaystyle \sum_{m,n} x_m x_n e^{-\mu \mbox{max}(m,n)} \sum_{d|(m,n)} C_d - \sum_n b_n x_n^2

    Now let’s build in the assumption that b_n fall off exponentially as well, in fact with the same decay parameter \mu. Suppose that b_n = e^{-\mu n} b_n' and e^{-\mu n} x_n^2 = y_n^2, i.e. x_n = e^{-\mu n/2} y_n. Writing the quadratic form in terms of $y_n$ and $b_n’$ we get:
    \displaystyle \sum_{m,n} x_m x_n e^{-\mu |m-n|/2} \sum_{d|(m,n)} C_d - \sum_n b_n' y_n^2.
    Notice that the \mbox{max}(m,n) term in the exponent has been replaced by |m-n|/2. Does this sort of quadratic form look familiar ? The lower bound established by this is (\sum e^{-\mu n} b_n')/(\sum_{k,d} e^{-\mu d} C_d).

    One could go further and assume that b_n' = \sum_{d|n} a_d and now the goal is to determine a_d$. The quadratic form now is
    \displaystyle \sum_{m,n} x_m x_n e^{-\mu |m-n|/2} \sum_{d|(m,n)} C_d - \sum_n y_n^2 \sum_{d|n} a_d.
    The lower bound this proves is \frac{\sum_d a_d e^{-\mu d}/(1-e^{-\mu d})}{\sum_d C_d e^{-\mu d}}.

    So far, we haven’t really figured out what the unknown coefficients a_d, C_d should be for this proof to work. Also, how should we show that the quadratic form obtained after subtracting the diagonal term is positive semidefinite ? Giving a sum of squares expression is one way to do it, but we should also consider other direct ways to argue that the remainder is psd. For example, given a matrix A, consider f(A), obtained by applying a function f() to each entry of A. Given that A is psd, f(A) is also psd if f() satisfies the following condition: all derivatives of f are non-negative and f(0) is non-negative. This gives a technique to prove that a matrix is psd without explicitly constructing a sum of squares representation. We have a whole slew of functions we could use, e.g. e^{\lambda x}, any polynomial f(x) with non-negative coefficients. Is there a clever choice of a_d, C_d and a direct argument to show that the resulting quadratic form is psd ?

  4. gowers Says:

    Another point I’d like to make is this. We know that sequences do not have to have unbounded discrepancy on HAPs with common difference 1. We also know that they do not have to have unbounded discrepancy on HAPs of the form \{d,2d,\dots,\lfloor n/d\rfloor d\}. This should imply that it is not possible to write a large-trace diagonal matrix as a convex combination of \pm P\otimes Qs if the HAPs P and Q are restricted to one or other of these classes. This is indeed the case, and it is interesting to see why.

    One way of seeing it is the obvious one of finding a separating matrix B. In the common-difference-1 case we can take B to be the matrix B_{ij}=(-1)^{i+j}. Then any matrix of trace C has inner product C with B. But the inner product of B with any P\otimes Q is 1 if |P| and |Q| are both odd, and 0 otherwise. Therefore, we cannot get C to be greater than 1. In the full-HAPs case, we can partition \{1,2,\dots,n\} into the integers up to n/2 and the integers beyond n/2. And then we can define B_{ij} to be 1 if i and j belong to the same class, and -1 otherwise.

    Going back to the difference-1 HAPs, the problem they have is that if x and y are close, then for almost all HAPs P with common difference 1, either both x and y belong to P or neither does. This means that a typical linear combination of products of HAPs with common difference 1 contains not just the diagonal but also much of the band of points close to the diagonal — unless, that is, one takes very very short HAPs.

    Similarly, if we use HAPs with common difference d, we will find it hard to separate two nearby multiples of d.

    With that thought in mind, let us informally think of two integers x and y as “close” if |x-y| is a small multiple of the highest common factor (x,y). This makes it difficult to find HAPs that contain one of x and y without the other.

    Having said that, if (x,y) has plenty of small factors, then |x-y| can be a largish multiple of some of these small factors, so there is still a chance of separating x and y.

    This suggests that in order to contribute to the diagonal and get plenty of cancellation off the diagonal, one should have plenty of products of short (but still of unbounded length) HAPs with small common differences.

    Staring at the experimental evidence makes me despair slightly of finding any obvious pattern, so I am tempted by probabilistic methods: I would like to choose randomized combinations of products of shortish HAPs with small common differences, hoping to obtain enough cancellation that what remains off the diagonal can be tidied up in a more brutal way without destroying the benefit of what has been done already.

  5. gowers Says:

    Here’s an example of the kind of method I had in mind for attempting to represent a diagonal map. First, let’s look at the matrix one obtains by adding together all P\otimes P such that P is the intersection of an interval of length k with the set \{1,2,\dots,n\}. If this matrix is A, then A_{xy} equals \max\{k-|x-y|,0\}. Note that it is concentrated around the diagonal, but also that we cannot make it diagonal unless we take k=1, which is not an economical representation.

    More generally, if we do the same but with progressions of the form \{rd,(r+1)d,\dots,(r+\lfloor k/d\rfloor)d (and shorter ones at the beginning and end) and multiply by an appropriate constant, then we can produce exactly the same function, but restricted to pairs (x,y) such that d|x-y. (Confession: I haven’t checked exactly what the effect is of taking those integer parts. In fact, I think what I’ve just written is wrong, but that by averaging over all pointwise products of real intervals of length k with the set of multiples of d between 1 and n we can indeed get the matrix I am claiming we can get.)

    Now let us add together all the matrices we have obtained. This gives us a new matrix whose xy entry is d(x-y)\max\{k-|x-y|,0\}.

    I find this encouraging, because there ought to be a negative correlation between d(x-y) being large and k-|x-y| being large. Indeed, k-|x-y| is large only if x and y are close, and if two numbers are close (but not equal) then their highest common factor is necessarily fairly small.

    On the other hand, I’m not going to get too excited about it, because it somehow feels as though the cost of “localizing to distance k” is sufficiently high that having highest common factors of size at most k is no longer all that much of a gain.

    While I’m at it, another tiny observation is that if P_d is the HAP \{d,2d,\dots,\lfloor n/d\rfloor d\}, then \sum_d\mu(d)P_d\otimes P_d takes the value 1 when x and y are coprime and 0 otherwise.

  6. Alec Edgington Says:

    I’m intrigued by the generalized notion of multiplicativity that Tim mentioned at the end of EDP 11. To recap (and give it a name), call a sequence of unit vectors x_n (n \geq 1) in k-dimensional Euclidean space quasi-multiplicative if \langle x_{dm}, x_{dn} \rangle = \langle x_m, x_n \rangle for all d, m, n \geq 1.

    These sequences are potentially interesting to us because if a quasi-multiplicative sequence has bounded partial sums then it has bounded discrepancy (with respect to the Euclidean norm).

    Can we characterize such sequences in some more arithmetic way? The simplest examples are those of the form x_n = f(n) u where f is a \pm 1-valued multiplicative sequence and u is a unit vector; but there are plenty of others.

    For example, given a multiplicative \pm 1-valued sequence f, an additive function \theta : \mathbb{N}^+ \rightarrow \mathbb{Z} (by which I mean \theta(mn) \equiv \theta(m) + \theta(n)), an integer a \geq 1, and any two unit vectors u and v, we can define x_n = f(n) u if a \mid \theta(n), and x_n = f(n) v otherwise.

    • gowers Says:

      Another source of examples is to take an orthonormal basis (e_n) of Hilbert space and define a multiplication on it by e_ne_m=e_{nm}. Then for each prime you can choose a multiple \mu_pe_{n_p} of some basis vector, where |\mu_p|=1, and extend by multiplicativity (setting f(1)=e_1). Moses’s modification of \lambda_3 and \mu_3 (which has discrepancy that grows like \sqrt{\log n}) was of this type.

    • gowers Says:

      More generally, one can take any semigroup G, index the orthonormal basis by elements of G, and define e_ge_h to be e_{gh}.

    • gowers Says:

      I’m not sure how general this really is without thinking about it for a bit longer, but one could take a family of commuting orthogonal (or unitary in the complex case) transformations of a finite-dimensional space, map each p to some unitary U_p in the family and extend by multiplicativity. The inner product would be the usual one \langle U_1,U_2\rangle=n^{-1}\mathrm{tr}(U_1^TU_2).

      As I write, I realize that we might as well look at the complex case and take the maps all to be diagonal. And then we can forget about the maps altogether and simply take vectors in \mathbb{C}^n with each coordinate of modulus 1, with pointwise multiplication, and with inner product \langle v,w\rangle=n^{-1}\sum_iv_i\overline{w_i}.

    • Alec Edgington Says:

      Ah yes — I hadn’t thought about the group or semigroup idea. (Actually, I don’t quite see how it works in a general semigroup — don’t we need cancellation, ab = ac \Rightarrow b = c?)

      The last example is nice, and suggests a general ‘direct sum’ construction: if x_n \in X and y_n \in Y are quasi-multiplicative, then (x_n, y_n) \in X \oplus Y is quasi-multiplicative, if we define

      \langle (x_1, y_1), (x_2, y_2) \rangle = \frac{1}{2}(\langle x_1, x_2 \rangle + \langle y_1, y_2 \rangle)

      (and similarly for a general finite direct sum).

  7. gowers Says:

    Following on from this comment, here is another speculation about how a proof might go that one can represent a diagonal matrix with unbounded trace as a convex combination of \pm P\otimes Qs.

    First, let us look at the sum of P\otimes P over all full HAPs P with prime power common difference. If A is this matrix, then A_{xy} is the number of prime powers that divide (x,y), which is the number of prime factors of (x,y), with multiplicity.

    Now let me get a bit woolly. The Erdős-Kac theorem tells us that the number of prime factors of a random integer n is pretty concentrated around its average \log\log n. So perhaps we can think of this matrix A as behaving like \log\log((x,y)) plus a small error. Is there any hope of producing \log\log((x,y)) in an economical way?

    It seems to me that there might conceivably be. After all, I observed above that one can produce the matrix that is 1 when (x,y)=1 and 0 otherwise fairly easily. Unfortunately, that representation is not all that economical, but it suggests that there are ways of favouring pairs with fewer common factors. The idea here would be to add something to the matrix \log\log((x,y)) to make it roughly constant off the diagonal and end by subtracting a multiple of 1\otimes 1 to obtain something that is very small off the diagonal.

    If this worked it would prove the yet stronger “prime-power-difference-non-symmetric-vector-valued EDP”, which would be like the normal non-symmetric vector-valued EDP except that the HAPs would have common differences equal to prime powers.

    • gowers Says:

      Oops — I need to think more carefully about that, because so far all the HAPs mentioned are full ones, and we know that those are not enough.

  8. gowers Says:

    Here’s another thought about how to get the decomposition to work. I think it may be more promising than my other recent ideas (which wouldn’t be hard).

    Let’s choose a k that tends to infinity very slowly with n. Next, let us choose, by some random procedure, a collection P_1,\dots,P_m of HAPs of length k (here, by “length” I mean cardinality rather than diameter) and let us form the matrix \sum_iP_i\otimes P_i. This has trace mk and on the diagonal it tends to favour numbers with more common factors. Also, the sum of the coefficients we have used so far is m, so as things stand we have an unbounded ratio (namely k) of trace to sum of coefficients. However, the matrix is not diagonal, so this is not very interesting.

    Now let us try to remove everything that is not on the diagonal by subtracting matrices of the form P\otimes Q where P and Q are disjoint HAPs. We will be done if we can achieve this using at most o(k)m such matrices. Is there any chance that that can be managed?

    If m is something like n^2/k^2, then \sum_iP_i\otimes P_i will be fairly dense (though more so when (x,y) has plenty of factors) and fairly random. So one might expect to be able to find reasonably large disjoint pairs P,Q of HAPs such that P\times Q\subset\bigcup P_i\times P_i. Indeed, if \bigcup P_i\times P_i has density \theta and P and Q have length t, then, very naively (and ignoring all sorts of important number-theoretic considerations) one might expect that the probability that P\times Q\subset\bigcup P_i\times P_i was around \theta^{t^2}. And then we might hope that what is left after we have subtracted off P\otimes Q was still reasonably random-looking (perhaps we could do something like a Rödl nibble to ensure this) so that we could repeatedly remove fairly large HAP products.

    As we continued the process, the matrix that was left would become sparser and sparser and the pieces we subtracted would have to become smaller and smaller. There would therefore be a delicate balancing act: the hope is that even if some of the pieces had to be small, they would on average be large enough for the sum of coefficients to be at most o(k)m.

    After thinking about this proof strategy for all of a few seconds, I don’t see an obviously fatal flaw in it. However, it’s quite possible that one will emerge.

    Incidentally, I do not at all imagine that the P_i would be chosen uniformly from the set of all HAPs of length k. My guess is that they would tend to have fairly small common differences and that larger differences would come into play when one began the nibbling process. I say that because of the fact that the diagonal seems to favour smooth numbers. (Note also that the above suggestion would lead to a non-negative diagonal, whereas the optimal solution has some negative coefficients on the diagonal. I suppose it isn’t crucial to choose P\otimes Q with P and Q disjoint, as typically their intersections will be much smaller than the sets themselves.)

  9. Moses Charikar Says:

    I have implemented the suggestions about presenting the LP solutions. The solution is expressed as rational numbers for n <= 40, and both P \otimes Q and Q \otimes P are explicitly written down. Look for the files *a.txt here:
    http://webspace.princeton.edu/users/moses/EDP/try-lp/

  10. Moses Charikar Says:

    In thinking about what the SDP approach and what choices of C_d might be good, the following question comes to mind: What functions f(n) can be represented in the form f(n) = \sum_{d|n} C(d) where the C(d) are non-negative ? Is there a well known characterization ? For example, all multiplicative functions that are non-decreasing on prime powers can be represented in this form. Let f be such a function. Then we choose C(d) to be multiplicative and C(p^i) = f(p^i) - f(p^{i-1}).

    It seems to me that we can represent more complicated functions as well. e.g. I think we can represent f(n)\log f(n) where f(n) is multiplicative and non-decreasing on prime powers. In order to do this, note that we can represent f^{1+\epsilon} (since it is multiplicative etc). Now
    \displaystyle \lim_{x \rightarrow 0} \frac{f^{1+\epsilon}-f}{\epsilon} = f \log f
    Say that C_f(d) is the function C constructed above in representing a multiplicative function f. Then consider
    C'(d) = \displaystyle \lim_{x \rightarrow 0} \frac{C_{f^{1+\epsilon}}(d)-C_f(d)}{\epsilon}
    Then \sum_{d|n} C'(d) = f(n) \log f(n).

    • Alec Edgington Says:

      We can use Mobius inversion to get some kind of characterization. If f(n) = \sum_{d \mid n} C(d), then C(n) = \sum_{d \mid n} f(d) \mu(n/d). Therefore, if we know f(1), \ldots, f(n-1), f(n) is constrained by:

      f(n) \geq -\sum_{d \mid n, d < n} f(d) \mu(n/d)

  11. gowers Says:

    Continuing the line of enquiry from this comment, I want to consider in abstract terms what the obstacle might be to representing the off-diagonal part of \sum_iP_i\otimes P_i as a sum \sum_j\lambda_jQ_j\otimes R_j with some given bound on \sum_j|\lambda_j|.

    By Hahn-Banach once again, we know that if A is a matrix that cannot be represented as a sum of that kind with the moduli of the coefficients adding up to at most M, then there is a matrix B such that \langle A,B\rangle>M and |\langle Q\otimes R,B\rangle|=|\sum_{(i,j)\in Q\times R}B_{ij}|\leq 1 for every pair (Q,R) of HAPs.

    In our situation, the assumption was that we had a matrix A=\sum_{i=1}^mP_i\otimes P_i, where each P_i was a HAP of cardinality k. This gave us a trace of mk, so we needed to write the off-diagonal part as a linear combination of Q\otimes Rs with absolute values of coefficients adding up to o(mk). To put this in perspective, if all coefficients were equal to \pm 1, then we would need the average cardinality of the sets Q\times R to be mk^2/o(mk)=\omega(k). So the typical size of each Q or R should be distinctly bigger than \sqrt{k}.

    The Hahn-Banach argument tells us that we can do this if and only if the following statement holds: whenever B is a matrix such that \langle A,B\rangle\geq o(mk), there exist HAPs Q and R such that |\sum_{(i,j)\in Q\times R}B_{ij}|\geq 1. Note that if the sets P_i\times P_i are approximately disjoint, then the average value of B_{ij} inside that set is about o(k^{-1}). Note also that B can be negative outside the support of A. So what we actually have is a new discrepancy problem. I haven’t yet thought about it properly, but a preliminary point to make is that a slight disadvantage of this perspective is that it is requiring us once again to make a big guess: namely, what the matrix should be. But I think that may be a necessary feature of this problem. Presumably it will be bigger at pairs xy such that the highest common factor (x,y) has many factors. So we could ask ourselves: why should such matrices be better candidates for ones that give a positive answer to the discrepancy problem just described.

    It would also be a useful (and presumably simple, but I have to stop for a bit) exercise to prove that something obvious like the matrix that is 1 everywhere cannot have its off-diagonal part decomposed. The matrix B that shows this is probably (-1)^{i+j}. I’ll check later.

    • gowers Says:

      That final sentence was, on reflection, ridiculous. Not only is it not true that \langle A,B\rangle is large, but it is easy to find Q\otimes Rs that correlate with it.

      A better example is this. Let A be the matrix with A_{ij}=1 if i\equiv j\equiv 1 mod 3 and 0 otherwise. (This is not of the form \sum_i P_i\otimes P_i, but that’s the point.) Then let B_{ij}=r(i)r(j), where r is our old friend that takes a number to 1 if it’s 1 mod 3, -1 if it’s 2 mod 3 and 0 if it’s 0 mod 3. Then B certainly correlates with products of progressions, but not with products of HAPs.

    • gowers Says:

      Actually, a slight modification of that example illustrates better what is going on. Let us take A to be the all-1s matrix (which equals \mathbf{1}\otimes \mathbf{1}, where \mathbf{1} is the constant vector (1,1,\dots,1)). Now let B be r\otimes r as above. Then (assuming that n is not 1 mod 3) we find that \langle A,B\rangle=0. However, the off-diagonal part of A correlates with B: if we let A' be the same as A except with 0s down the diagonal, then \langle A',B\rangle=-2n/3, since the diagonal of B is 1,1,0,1,1,0,… . Now \langle B,Q\otimes R\rangle=(\sum_{x\in Q}r(x))(\sum_{y\in R}r(y)), which is at most 1 in modulus. Therefore, if we want to represent the off-diagonal part of A as a linear combination of Q\otimes Rs, the sum of the absolute values of the coefficients has to be at least 2n/3, rather than the o(n) that we would need to prove EDP.

  12. Moses Charikar Says:

    Here is a question about the relationship of the representation-of-diagonals approach versus the SDP approach. We know that the best lower bound provable via the SDP approach is at least as large as the best bound provable via the representation-of-diagonals approach. One way to see this is the fact that a representation-of-diagonals proof proves a lower bound on the non-symmetric vector valued EDP, which is a stronger hypothesis than the vector valued EDP. A lower bound for the non-symmetric vector valued EDP applies to the vector valued EDP but not vice versa.

    So, given a lower bound using the representation-of-diagonals approach, is there a direct way to construct an SDP proof that establishes the same (or higher) lower bound ? We know that such an SDP argument must exist.

    • gowers Says:

      That’s a good question. I think I have an answer for it. Let me sketch the answer. First, observe that

      \displaystyle P\otimes Q+Q\otimes P=

      \displaystyle =P\otimes P+Q\otimes Q-(P-Q)\otimes(P-Q)

      and

      \displaystyle -P\otimes Q-Q\otimes P=

      \displaystyle =P\otimes P+Q\otimes Q-(P+Q)\otimes(P+Q).

      Therefore, if we have D=\sum_i\lambda_i(P_i\otimes Q_i+Q_i\otimes P_i) we have

      \displaystyle D=\sum_i|\lambda_i|(P_i\otimes P_i+Q_i\otimes Q_i)-

      \displaystyle -\sum_i|\lambda_i|(P_i\pm Q_i)\otimes(P_i\pm Q_i),

      where the sign corresponds to the sign of \lambda_i.

      Therefore, \sum_i|\lambda_i|(P_i\otimes P_i+Q_i\otimes Q_i) is written as the sum of a diagonal and a sum of squares. What’s more, the ratio of trace to sum of relevant coefficients is preserved.

      If this is correct, it might be interesting to try to understand what is special about this particular kind of SDP solution that does not have to happen in general.

    • Moses Charikar Says:

      Very nice. This looks correct, except that you don’t need the 2’s everywhere. So does this suggest that we ought to look for differences of HAPs (or more generally, linear combinations of HAPs) in building a sum of squares expression in the SDP proof ?

      Do you have a sense for what pairs of HAPs might be useful in getting a diagonal representation ? If we ignore the lengths of the HAPs for now, what pairs of common differences might make good combinations ?

    • gowers Says:

      I’ve removed the 2s now.

    • gowers Says:

      In answer to your last paragraph, at the moment I have absolutely no sense at all of how to get a good diagonal representation. The only thing I would say is that if we are trying to represent the same matrix in two different ways, then it will probably help if as much of that matrix as possible is concentrated at points with plenty of factors — to give us plenty of choice. So that would suggest taking pairs P,Q such that both P and Q have plenty of factors.

      I find it quite hard to see what’s going on in the decompositions found by computer. However, it does seem that they have a number of “one-dimensional” parts: for example, things like P\otimes Q-R\otimes S, where R is P without its last element and S is Q with an extra element on the end.

      With that thought in mind, is there an obvious counterexample to the following yet further strengthening of the non-symmetric version? It would say that for every C, if (u_n) and (v_n) are sequences of vectors with \langle u_n,v_n\rangle=1 for every n, then either there exist n and a HAP Q such that |\langle u_n,\sum_{m\in Q}v_m\rangle|\geq C or there exist P and m such that |\langle\sum_{n\in P}u_n,v_m\rangle|\geq C.

    • gowers Says:

      Oops, the answer to that last question is obvious. Just take u_n=v_n to be an orthonormal basis.

      I think that means we can’t do the decomposition into one-dimensional pieces.

  13. Gil Kalai Says:

    I have a question regarding greedy algorithms approach for multiplicative sequences.

    Let’s assume we consider sequences in {1,2,…,N} when N is very large.

    Did we try the following? we assign the value +1 or -1 for each prime one by one. When we assign the value for the (i+1)th prime we do it as to minimize the disrepancy.

    By mimizing the discrepancy I mean that we compute the discrepancy for ALL intervals {1,..,t} where t is at most N under the two possible values for the (i+1)th prime where we assign value 0 to all larger primes. Then we take the possibility that gives smaller discrapancy.

    • gowers Says:

      My guess is that that would have some undesirable features. For instance, if an interval has lots of zeros in it (as the larger ones will) we should care less about its discrepancy because it will be much easier to correct. But some variant of a look-ahead greedy algorithm would definitely be interesting. I can’t remember what we’ve tried along those lines.

    • Gil Kalai Says:

      I dont think so; it is true that the first few prime assignmenets will be rather arbitrary but I think this very simple look-ahead greedy actually optimize the “right” thing.

      So I would expect that the overall discrepency for my greedy proposal will behave more or less logarithmically. Even for problems we dont know such sequences at all (like EDP for square free numbers).

      (I do agree that maybe by giving less weight to sequences with many zeroes you can “fine-tune” this suggestion and perhaps practically improve it somewhat. Also if you want a provably-low discrepency maybe you need to make the algorithm practically worse by introducing randomness. In any case, I propose trying this algorithm as is.)

    • gowers Says:

      I definitely agree that that would be worth doing, and perhaps it is better to think about refinements only after looking at the simplest possibilities first.

      I think part of my reason for being anxious about your suggestion is that it mininmizes only the maximum discrepancy at any one time, which leaves everything else free to misbehave. Another thing that I think could be very interesting, especially in the light of recent approaches to the problem, would be to minimize some kind of average discrepancy. If that led to an average discrepancy that grew very slowly, it would be pretty interesting, even if the maximum was somewhat bigger than the average. And this feels more likely to work, somehow.

      Having said that, I see your point that once you start assigning values to slightly larger primes, my objections become weaker.

  14. gowers Says:

    Here’s a question that might be helpful. We are aiming to find a sequence (b_i) for which we can prove that the discrepancy of any sequence (x_i) is at least (\sum_ib_ix_i^2)^{1/2}. Now if the b_i are non-negative, then \sum_ib_ix_i^2=\max\{\sum_ib_ix_iy_i:\sum_ib_iy_i^2\leq 1\}. Therefore, if we succeed in our goal, then for every sequence (y_i) with \sum_ib_iy_i^2\leq 1 we will have proved that the discrepancy of (x_i) is at least (\sum_ib_ix_iy_i)^{1/2}, which depends on a linear function of the x_i rather than a quadratic one. So it might be interesting to ask the following general question: for what sequences (w_i) of non-negative weights can we prove that the discrepancy of every sequence (x_i) is at least (\sum_iw_ix_i)^{1/2}? If W is the set of all the sequences we manage to find, and (b_i) is a sequence of weights such that \sum_ib_ix_i^2\leq\max\{\langle x,w\rangle:w\in W\} for every x, then we have a lower bound of the kind we are looking for.

    A trivial example of a sequence w that belongs to W

    OK, this comment is slightly embarrassing because I’m going to end up with the triviality that W consists of convex combinations of characteristic functions of HAPs, and I’ve ended up with almost exactly the question we started with. However, I had a different thought that led to this one, and I haven’t yet seen that the different thought is a waste of time. I’ll address it in a new comment.

  15. Kristal Cantwell Says:

    I have been looking at the wiki. I think the possibility of some pages for some of the new approaches have been mentioned before. The last change seems to be February 20 so I think it has fallen behind some of what has been done for the last four threads.

  16. gowers Says:

    A small observation, but unless it can be strengthened I don’t think it’s good for anything.

    Let us define a graph on \mathbb{N} by joining x to y if and only if |x-y| divides x and y. The observation is that this graph contains arbitrarily large complete bipartite graphs.

    To see this, pick a sequence n_1,\dots,n_k such that n_i\geq n_{i+1}!+\dots+n_k!+k! for every i and n_k\geq k!. For i=1,\dots,k let r_i=n_1!-n_2!-n_3!-\dots-n_i! and let s_i=n_1!-n_2!-\dots-n_k!-i. Then r_i-s_j=n_{i+1}!+\dots+n_k!+j, which is at most n_i. It therefore divides n_i!, and indeed it divides n_1!-n_2!-\dots-n_i!=r_i. And if r_i-s_j divides r_i, then it must divide s_j as well.

    The motivation for this was to see whether the rather sparse graph where you join two numbers if they are consecutive in some HAP has dense bits. What I would prefer is something much stronger, where one had a multipartite graph in several layers. I tried something like this before, and I can’t remember whether I had a good reason for giving it up. I came back to it because I thought that maybe it would be a way of producing decompositions of diagonal maps.

  17. Moses Charikar Says:

    Here is a suggestion for an exercise to sharpen our understanding of diagonal representations: Can we give a proof of Roth’s theorem on the discrepancy of arithmetic progressions using diagonal representations ? If so, it would be a generalization of the n^{1/4} result for APs to the non-symmetric vector valued case. We know that the result is true in the vector valued case, e.g. via Lovasz’s SDP proof of Roth’s theorem. See Section 5.2, page 31-33 of these notes that were pointed out before:
    http://www.cs.elte.hu/~lovasz/semidef.ps

    One caveat: We know that the answer for HAPs is very different from that for APs. Not only is the value of the lower bound different, the nature of the problem is very different too, since there are symmetries in the case of APs which don’t hold for the case of HAPs. One way this manifests itself is that for APs, in lower bounding the discrepancy by \sum b_i x_i^2, you don’t need to use different b_i values because of symmetry. So perhaps this exercise will give no insights about constructing a proof for the HAP case. Nevertheless, it would be a good “proof of concept” and increase our understanding of diagonal representations.

    • Klas Markström Says:

      Moses, have you tried to solve the SDP corresponding to Roth’s theorem numerically? If one were to formulate an SDP for this problem along the same lines as the one we have solved for the EDP there might be some insight to be gained from comparing the numerical solutions for the two problems.

    • gowers Says:

      Another very good question, and again (after a failed attempt) I think I have found an answer. I have a train to catch, so for now I’ve got time only for a very brief sketch.

      The idea is to start by representing the identity in terms of trigonometric functions. I’ll use complex ones for now, to save time, but in the end I’d do it with sines and cosines. Let \omega=e^{2\pi i/n} and let \omega_r(x)=\omega^{rx}. Then I_n=\sum_r\omega_r\otimes\omega_r (if this is interpreted appropriately with complex conjugates — again, it will work better with sines and cosines).

      We now pick, for each \omega_r, some arithmetic progression P with common difference at most C\sqrt{n} and length c\sqrt{n}, such that \omega_r convolved with P, which has to be a multiple of \omega_r, is a reasonably big multiple. To do this we use the pigeonhole principle to find a common difference d such that \omega^{rd} is very close to 1.

      But \omega_r*P is a linear combination of lots of translates of P. So this allows us to convert the trigonometric representation of the identity into one that involves APs that do not wrap around too much.

      I’ll check the details on the train, but I’m fairly confident that this is going to work. I’ll be able to report back in about four or five hours from now.

    • gowers Says:

      I’m now back from my train journey and have realized that the question is subtler than I thought. I still think that a representation-of-identity proof is possible, however. Let me explain what was wrong with my suggestion above, and then why I think that the problem is not fatal.

      Let m be about \sqrt{n} and let P be the interval \{-m,-(m-1),\dots,m\} and let P_d be the characteristic function of dP. My previous idea was this. First, for each r we find some d\leq \sqrt{n} such that \omega_r*P_d is \lambda m\omega_r for some real number \lambda\geq 1/2. (Here I’m defining convolutions using sums rather than expectations.) Then we take the representation of the identity I_n=n^{-1}\sum_r\omega_r\otimes\overline{\omega_r} and expand each \omega_r into (\lambda m)^{-1}\sum_x\omega^{rx}(P_d+x), which represents \omega_r\otimes\overline{\omega_r} as a sum of P\otimes Qs with absolute values of coefficients adding up to (n/\lambda m)^2, which is proportional to n. Therefore, the average of the \omega_r\otimes\overline{\omega_r} is represented with coefficients adding up to Cn, and it gives us the identity, which has trace n. So we get a ratio equal to a constant and have proved nothing at all.

      What, one might ask, is the weakness in the above argument? The answer, which is obvious in retrospect, is that we are missing the chance to exploit some cancellation. As soon as I can demonstrate that, I have already more or less proved that the discrepancy on arbitrary APs is unbounded. But I think that if the argument is pushed as far as it will go, then we will get the n^{1/4} boud (though it looks as though a bit of effort would be needed to remove logarithmic factors).

      To see that there could be cancellation, let us go back to how we choose d such that \omega_r*P_d is a large multiple of \omega_r. The idea is to use the pigeonhole principle to show that for every r there exists d\leq\sqrt{n} such that |1-\omega^{rd}|\leq Cn^{-1/2}. But this means that each d is used on average for about \sqrt{n} different values of r, and therefore we attach several different coefficients \omega^{r(x-y)} to (P_d+x)\otimes(P_d\otimes y). Let R(d) be the set of r for which I chose d. (It will be something like an arithmetic progression with common difference d^{-1}, all in a mod-n sense.) Then the sum of \omega^{r(x-y)} over all r\in R(d) is a geometric progression that the bad argument above bounds above with the trivial bound |R(d)|. But of course for most x-y a much better bound holds.

      I did a back-of-envelope calculation and convinced myself that this would lead to an n^{1/4}-type bound (give or take log factors, and I had ideas for what to do about those). I then tried it more carefully and got into some trouble. But I think I see where I was going wrong the second time. I will try to sort it out and write something rigorous.

      I think that this was, as Moses said, a valuable exercise. It suggests a general way of going about finding a diagonal representation for EDP that I had not previously thought of. The idea would be to start with a representation of a diagonal matrix D as D=\sum \lambda_iv_i\otimes v_i, where the v_i are convex combinations of HAPs and \sum_i|\lambda_i| equals the trace of D. So far, that would prove nothing. But then one would look more closely at the representation and see that the coefficients it was attaching to individual P\otimes Qs involved a certain amount of cancellation, enabling us to rewrite the expression for D as \sum_i\mu_iP_i\otimes Q_i with \sum_i|\mu_i|=o(\sum_i|\lambda_i|).

      I don’t know whether that is a feasible approach, but I would believe in it more if we could find a rather general sufficient condition for a vector v to be a convex combination of HAPs. This condition should say something like that v is not too big on APs that are not HAPs. The Hahn-Banach theorem says that if v is not a convex combination of \pm HAPs, then there is a function w such that \langle v,w\rangle>1 and w has discrepancy at most 1 on all HAPs. Can anyone think of a function v such that it is easier to prove that the HAP discrepancy of w is always at least \langle v,w\rangle than it is to show that v is a convex combination of HAPs and their negatives?

  18. Moses Charikar Says:

    No, I haven’t tried to solve the SDP corresponding to discrepancy of APs. Lovasz’s exposition describes a feasible solution to this SDP (not optimal, but good enough to prove the n^{1/4} lower bound). This solution is as follows:
    Let k = \lfloor \sqrt{n/8} \rfloor. Consider all APs with common difference at most 8k and length exactly k. (One small technical detail is that he considers arithmetic progressions modulo n, i.e. allows APs to wrap around.) There are 8kn such APs and in the SDP solution, he gives equal weight to all the APs in this set. The crux of the matter is to argue that the value of this solution is at least \Omega(n^{1/2}). Here the proof exploits the symmetries of APs crucially.

    • Klas Markström Says:

      I thought it might be interesting to see the optimal solutions as well, as it is optimal solutions for the EDP-SDP we have been looking at. This might give some idea for how to good enough solutions to the EDP-SDP.

    • Moses Charikar Says:

      I see … so you’re saying that we know there is a clean understandable SDP solution for the discrepancy problem for APs. If we weren’t aware of its existence, will looking at the optimal SDP solution for small n give us any clues about how to construct an SDP solution for general n ?

      One issue with solving the discrepancy problem for general APs is that there are many more APs than HAPs. If we impose reasonable symmetry conditions on the SDP solution (i.e. by averaging discrepancy over all translations of an AP), the number of constraints is the same as the number of HAPs, but the constraints are a lot denser. So my sense is that the SDP solver will have a harder time solving these SDPs and we may only get solutions for smaller n than we have got for the HAP case.

      Nevertheless, it might be interesting to look at these solutions. Similarly, it might be interesting to look at the LP solution corresponding to the optimal diagonal representation to see if we would have had any hope of reverse engineering a solution for general n by looking at such solutions.

  19. Gil Says:

    A sort of a vague general question: Do Tim’s new approaches or some variant of the Moses SDP approaches allow us to give lower bounds for EDP (or some related problem) based on an LP problem rather than an SDP problem?

    • Moses Charikar Says:

      Dear Gil,
      Yes, linear programming can be used to find the optimal solution to Tim’s approach of expressing a diagonal matrix as a convex combination of P \otimes Q and -P \otimes Q (where P,Q are characteristic vectors of HAPs, and the objective is to maximize the sum of the diagonal entries).

      In order to do this, you have two non-negative variables for every pair of HAPs: a_{P,Q}, the coefficient for P \otimes Q and b_{P,Q}, the coefficient of -P \otimes Q. You stipulate that \sum a_{P,Q} + \sum b_{P,Q} = 1. The convex combination is an n \times n matrix; call it M. Each entry of M can be expressed as a linear function of the variables a_{P,Q},b_{P,Q}. We impose the condition that M_{i,j} = 0 for i \not = j and the objective function is \sum_i M_{i,i}.

      The SDP approaches do not directly lead to a linear programming problem,
      although generally speaking, there are iterative methods to solve SDPs where each iteration requires the solution of a linear program. I don’t think this is what you were asking though.

    • Gil Says:

      Thanks, Moses

      One “role-model” case I have in mind is the LP approach for the maximum size of errore correcting codes and spherical codes. There, after moving to the dual description there is a way to analytically study the LP, which ties with certain hypergeometric functions and get good upper bounds which cannot be achieved by other methods. It is not known what the limits of the LP method is. There is a recent SDP approach which leads to better bounds in particular cases. For small cases the LP is much easier to compute than the SPD. It is not clear if the SDP can potentially lead to better asymptotic results and, in any case, it looks too difficult.

      In our case since the LP talks about pairs of HAPs to start I dont think it is computationally better for small cases. As for theoretically studying it, I suppose trying to look at the dual is an instinct (probably it was already considered, was it?) and then the difficult part is trying to relate it to some familiar mathematics. This looks difficult.

  20. gowers Says:

    In a somewhat similar spirit to Gil’s question, here is a question that I should have asked in this comment, but have only just thought of. The question is whether we can find interesting sequences (w_i) of positive numbers such that the discrepancy of every sequence (x_i) is bounded below by \sum_iw_i|x_i|. We know that \sum_iw_i|x_i|\leq(\sum_iw_i)^{1/2}(\sum_iw_i|x_i|^2)^{1/2}, so if the discrepancy is bounded below by (\sum_ib_i|x_i|^2)^{1/2}, then we can take w_i=b_i(\sum_jb_j)^{-1/2}.

    • Moses Charikar Says:

      If your question is about efficient computational methods to find such sequences w_i, then I don’t know of any other than the SDP approach and your diagonal-representation approach. If you think about how one might get such sequences via a convex programming approach, the w_i should correspond to constraints |x_i| \geq 1. But |x_i| \geq 1 is not a convex constraint, i.e. the set of feasible vectors x is not convex. Both the SDP and the diagonal-representation approach get around this problem in similar, but slightly different ways by solving an optimization problem on a different set of variables: In the SDP approach, the ‘variables’ are all products x_i x_j, and we place the constraint x_i x_i \geq 1. In the representation of diagonals approach, we consider two sequences (x_i), (y_i), the ‘variables’ are the products x_i y_j and we impose the constraint x_i y_i =1. As stated, the feasible set is still not convex, but actually, we allow convex combinations of matrices: convex combinations of x^T x are simply the set of positive semidefinite matrices. In the diagonal-representation approach, convex combinations of x^T y are arbitrary matrices.

      Ok … I think this comment went off on a little bit of a tangent. What you are really asking: is can we find interesting sequences (w_i) by any means, not necessarily by convex programming. I don’t know the answer to that.

    • gowers Says:

      Yes, my thought was roughly this. If the SDP approach works, then there must exist a lower bound of the form (\sum_ib_ix_i^2)^{1/2} for the discrepancy, with \sum_ib_i unbounded. But (\sum_ib_ix_i^2)^{1/2} is at least as big as (\sum_ib_i)^{-1/2}(\sum_ib_i|x_i|), so there must be a lower bound of the form \sum_iw_i|x_i| with \sum_iw_i unbounded. Moreover, since we cannot reverse the implication, this should be a weaker statement.

      Looking at it now, I realize that it won’t be weak enough to be easy, since it clearly implies EDP.

  21. Uwe Stroinski Says:

    From computer experiments and various computer aided proofs we know that there is no completely multiplicative function with values in {-1,+1}, 2-bounded discrepancy and domain larger than {1,…,246}. Our human proofs are still inefficient in the sense that they contain a huge amount of cases and that they are not extendable to the general situation. This is maybe the case why we have lost interest in them. Let me just point out that we can reduce (or at least disguise) the number of cases to consider by exploiting properties of the set of all such partially defined completely multiplicative functions. Each of the following 6 facts is interesting in its own and comparatively easy to prove (at least compared to what we have so far). Let the functions be defined on {1,…,n}.

    1. If n >= 123, then f(2)=-1.
    2. If n >= 133, then f(3)=-1.
    3. If n >= 215, then f(13)=-1.
    4. If n >= 215, then f(19)=1 (the optimal bound is 123, but this is what we need, easier to prove and still the toughest case).
    5. If n>= 225, then f(5)=-1.
    6. If n>= 225, then f(41)=-1.

    The final statement: let f:{1,…n}->{-1,+1} be a completely multiplicative function with discrepancy bounded by 2. Then n=247. Using the lemmas we get f(242)=f(2)f(11)^2=-1 and f(243)=f(3)^3=-1. This implies a cut at 242 and thus the partial sum f[242] is zero. Since f(245) = f(5)f(7)^2 = -1, f(246) = f(2)f(3)f(41) = -1 and f(247) = f(13)f(19) = -1 we have f[247]=-4+f(244) which is a contradiction.

    • Kristal Cantwell Says:

      I’ve looked at these statements. They do seem to fit together to prove the theorem. I think the first two are proved in the wiki at http://michaelnielsen.org/polymath1/index.php?title=Human_proof_that_completely_multiplicative_sequences_have_discrepancy_at_least_2 but the proofs do not look particularly easy. The last 4 do not have easy proof as far as I know. If you have easy proofs for these six theorems I would like to see them as the would probably result in a proof of the overall result that is more simple than anything we have now.

    • Uwe Stroinski Says:

      What I meant to say, but probably didn’t was that with this 6 lemmas and one theorem approach we can reorganize the multitude of considered cases to reach subgoals which can be communicated on their own (as opposed to being subcases of some rather long proof).

      That might motivate us to do the polishing (of the proof quoted by you), something so far nobody has done yet.

      I have tex-ed the first lemma which is somewhere between 1 and 2 pages and let the prover work on the other cases. Its output makes me believe that we might need 20 (tex-)pages.

  22. Uwe Stroinski Says:

    In the proof of the final statement we have to assume n>=247. That got lost.

  23. gowers Says:

    Here is an attempt to answer Moses’s exercise. That is, I shall try to be more precise about my earlier suggestion of how to produce a representation-of-diagonals approach to Roth’s theorem that the APs discrepancy of a \pm 1 sequence of length n is always at least cn^{1/4}.

    I start with the well known lemma that for every real number \alpha and every m there exists some k\leq m such that |1-e(\alpha k)|\leq C/m. (Here, C is an absolute constant — 2\pi should do the job.)

    Now let \omega=e^{2\pi i/n}. Then for every r we can find d\leq\sqrt{n} such that |\omega^{rd}-1|\leq Cn^{-1/2}.

    Let m=c\sqrt{n} for a sufficiently small absolute constant c>0. Let P be the interval \{-m,-(m-1),\dots,m\}. Let us estimate \sum_{x\in dP}\omega^{rx}. For every x\in dP we know that |\omega^{rx}-1|\leq Cmn^{-1/2}=cC. (This follows easily from the triangle inequality and the fact that |\omega^{rd}-1|\leq Cn^{-1/2}.) We also know that \omega^{rx}+\omega^{-rx} is real. Hence, \sum_{x\in dP}\omega^{rx} is a real number that lies between (1-cC)(2m+1) and 2m+1.

    Let us now define \pi_d to be the characteristic measure of dP. (That is, \pi_d takes the value (2m+1)^{-1} on P_d and 0 everywhere else.) Let \omega_r be the function \omega_r(x)=\omega^{rx}. Then the above calculation proves that \pi_d*\omega_r=\lambda_r\omega_r for some real constant \lambda_r that lies between 1-cC and 1.

    Now \omega_r*\pi_d(y)=\mathbb{E}_x\omega^{rx}\pi_d(y-x). (That is how I am going to define the convolution in this post.) Thus, if we define \pi_{d,x}(y) to be \pi_d(y-x), then \omega_r*\pi_d=n^{-1}\sum_x\omega^{rx}\pi_{d,x}, from which it follows that \omega_r=\lambda_r^{-1}n^{-1}\sum_x\omega^{rx}\pi_{d,x}.

    From this we deduce that \omega_r\otimes\overline{\omega_r}=\lambda_r^{-2}n^{-2}\sum_{x,y}\omega^{r(x-y)}\pi_{d,x}\otimes\pi_{d,y}.

    We can use this to express nI_n=\sum_r\omega_r\otimes\overline{\omega_r} as a linear combination of tensor products of HAPs. Note that \pi_{d,x} is the characteristic measure of dP+x, so associated with \omega_r\otimes\overline{\omega_r} are coefficients with absolute values adding up to \lambda_r^{-2}n^{-2}n^2(2m+1)^2, the order of magnitude of which is n (since m is around n^{1/2}). Adding this together for all r takes us up to n^2, which is no good since the trace of nI_n is also n^2.

    Now let’s try to do better. For each d let R(d) be the set of all r such that we chose d for that r. The above argument has a notable inefficiency in it, which is that attached to each \pi_{d,x}\otimes\pi_{d,y} is a sum of coefficients \sum_{r\in R(d)}\lambda_r^{-2}n^{-2}\omega^{r(x-y)}, and we were bounding \sum_{r\in R(d)}\lambda_r^{-2}\omega^{r(x-y)} above by |R(d)|\max_r\lambda_r^{-2}, thereby ignoring a great deal of potential cancellation.

    What can we say about \sum_{x,y}|\sum_{r\in R(d)}\lambda_r^{-2}\omega^{r(x-y)}|? It is at most n(\sum_{x,y}|\sum_{r\in R(d)}\lambda_r^{-2}\omega^{r(x-y)}|^2)^{1/2}, which after a small back-of-envelope calculation can be seen to equal n^2(\sum_{r\in R(d)}\lambda_r^{-4})^{1/2}. Now \lambda_r^{-4} is bounded above by some absolute constant and R(d) has size at most C\sqrt{n}, where C is another absolute constant. Therefore, this is at most Cn^{9/4}.

    The contribution to the sum of the coefficients from d is therefore n^{-2}Cn^{9/4}, which equals Cn^{1/4}. There are Cn^{1/2} different values of d to consider, so the total sum of coefficients is at most Cn^{3/4}. Now recall that each \pi_{d,x} is itself around n^{1/2} times the characteristic function of dP+x, so we have to multiply this by a further n (because so far I’ve evaluated a sum of coefficients attached to functions of the form \pi_{d,x}\otimes\pi_{d,y}). That gets us up to Cn^{7/4}. Since the trace of nI_n is n^2, we have proved a lower bound for AP discrepancy of (cn^{1/4})^{1/2}, which is the square root of what I wanted.

    Now I’m pretty sure I know why that happened. It’s because I used an inefficient method for estimating the amount of cancellation. The reason it was inefficient is that the sets R(d) are not just any old sets of size \sqrt{n} but are more like dilations of arithmetic progressions. We were essentially working out \ell_1 norms of Fourier transforms of sets R(d), but using only their cardinalities and not their special structure.

    However, to exploit this structure, we have to choose the R(d) in a more careful way, probably with weights. The weights would be needed because the Fourier transform of the characteristic function of an arithmetic progression will introduce a logarithmic factor. So we want to apply the Fejer kernel trick, and we also want to make sure that the coefficients \lambda_r vary smoothly somehow.

    This I haven’t yet done. Let me state the problem more clearly. I would like to define for each d\leq C\sqrt{n} a function R_d, and I would like these functions to have the following three properties. First, they sum to the constant function 1. Secondly, R_d(r)\ne 0 only if |\omega^{rd}-1| is at most Cn^{-1/2}. Thirdly, each R_d should vary smoothly enough for the \ell_1 norm of its Fourier transform to be as small as it can reasonably be (which, if you normalize appropriately, is at most C, because roughly n^{1/2} coefficients will take the value roughly n^{1/2}/n and the rest will be small).

  24. gowers Says:

    Something I’ve been meaning to think about properly for some time is whether the existence of character-like functions gives us some clues about what an SDP solution or a representation-of-diagonal solution could possibly look like. I plan to try to work out something more precise in due course, but for now let me merely point out the triviality that if you think you have an approach to writing a diagonal map with unbounded trace as a linear combination A=\sum_i\lambda_iP_i\otimes Q_i with \sum_i|\lambda_i|=1, then a good test is to see whether there is any chance that x^TAx is unbounded, when x is a character-like function such as \mu_3. For this to happen, it will be necessary for the \lambda_i to attach plenty of weight to HAPs on which the sum of the x_i is unbounded.

    Having said that, maybe it isn’t such a huge constraint, since for a typical large integer the partial sum of \mu_3 up to that integer will be unbounded.

  25. gowers Says:

    This is not a new thought, but an old one that is perhaps due a revival. It’s been a while since we tried working on the rationals rather than the integers. Given that the rationals enjoy much better symmetry properties than the integers, and given that symmetry is very useful in the proof that the discrepancy for ordinary APs is unbounded, there is at least a chance that the SDP or ROD (= “representation of diagonal”) method could work more easily for the rationals.

    For example, if we are working on the integers, then we find ourselves having to guess some strange diagonal map with coefficients that favour, in a way that we do not fully understand, numbers with more factors. If we were working over the rationals, then all numbers would have equal status (since the automorphisms of the additive group of \mathbb{Q} act transitively), so the diagonal map we would be trying to work with would be the identity.

    Of course, that instantly raises a problem, which is that everything’s a bit too infinite. I think that means we’d have to resurrect some of the limiting techniques that Terry explained back in about EDP2 or something. Or perhaps we’d somehow be able to represent the identity using only a finite sum of coefficients of HAPs.

    What, one might ask, about the usual example that shows that representing the identity efficiently is impossible? Well, over the rationals it is far less clear that that example exists, if I can put it that way. After all, what might it be? We could take the function that’s defined only on the positive integers that takes the value 1 when an integer is 1 mod 3, -1 when it is 2 mod 3, and 0 otherwise. If we call that function {}f then we are not going to be able to prove that the square of the discrepancy of {}f is at least \sum_{q\in\mathbb{Q}_+}f(q)^2. But it’s not clear that that’s the right Hilbert space to be taking for our functions defined on the rationals. Maybe something that’s a bit more like an average (but infinite for functions that are \pm 1 everywhere) could be defined that would do the job: it doesn’t seem unreasonable to suggest that the square of the function {}f just defined has zero density, rather than the density 2/3 that it would have if it were just defined on \mathbb{N}.

    Just to get my head round this, I want to think about what the sum of P\otimes P over all HAPs P of length m would be. When do rational numbers r and s both belong to the same HAP of length m? Note that the notion of a highest common factor makes sense over the rationals. If (r,s)=t, then the number of HAPs of length m that contain both r and s is the number of positive integers k such that k\max\{r/t,s/t\}\leq m, which equals the integer part of mt/\max\{r,s\}.

    Here’s another idea. (For now I’m just going to play around a bit.) We can form out of HAPs a function that’s -1 at odd multiples of some rational d and 1 at even multiples, simply by taking twice the characteristic function of the HAP with common difference 2d and subtracting the characteristic function of the HAP with common difference d. So let’s try summing all functions of the form f\otimes f, where f alternates signs on a HAP of length m.

    Once again, let us pick rationals r and s with (r,s)=t. Then the number of HAPs of length m that contain both r and s is, as we have seen, the integer part of mt/\max\{r,s\}. The common differences of these HAPs are all of the form t/h for some positive integer h. If f_h is the alternating signs function on the HAP of common difference t/h, then f_h(r)f_h(s)=(-1)^{h(r+s)/t}. Therefore, if (r+s)/t is odd, we get a lot of cancellation.

    I’m not sure where to go with these observations, so instead let me discuss another idea. The ROD approach to Roth’s discrepancy theorem was to take a known representation of the identity and show that it can be rewritten in terms of characteristic functions of appropriately chosen APs (ones that do not wrap around too much). Can we argue that the known representation of the identity (in terms of trigonometric functions) is a natural one? Obviously yes, so I suppose what I mean is, can we argue that it is natural in a relevant way? I think the answer is still yes: the trigonometric functions are simultaneous eigenvectors of all shifts, and dilations move them from one to another. So they are very closely connected to the symmetries of the problem.

    That makes me wonder whether we might be able to use some representation-theoretic thoughts to help us guess what to do in the HAPs case. The rough idea would be that we have a group of symmetries that take HAPs to HAPs (except that it’s only a group if we work on the rationals) and we’d like to find a nice basis of characters that will be eigenvectors for all these symmetries.

    Actually, I think we’ve got those — aren’t they simply the completely multiplicative functions to the complex numbers?

    So the next step might be to “integrate” over all \mu\otimes\overline{\mu} and hope desperately that the result was a diagonal map (by which I mean a map from \mathbb{Q}_+\times\mathbb{Q}_+ to \mathbb{C} — but in fact to \mathbb{R}_+ — that is zero at (r,s) unless r=s).

    And then it gets a bit less clear, but I think the idea would be to show that each multiplicative function can be decomposed in a fairly natural way as a linear combination of HAPs, and then to show that the same pairs of HAPs often get used for several different multiplicative functions, with coefficients that point in different directions and therefore cancel to a significant extent.

    I’ve gone far enough into the realms of fantasy at this point, so will finish for the day.

    • gowers Says:

      A small extra remark: decomposing a multiplicative function into HAPs might not be that easy, since to do it efficiently one would seem to want to use HAPs that have a reasonably large inner product with the function, but that seems to require showing that the function has large discrepancy. But perhaps we can do an inefficient decomposition and then gain on the cancellation later.

  26. Alec Edgington Says:

    Klas, re this comment: I’ve tried running my search program with these sets of constraints, but I doubt whether it will get anywhere (even as far as 1124). It might be possible to make better use of the structure of the constraints, but I can’t immediately see how to do so.

    • Klas Markström Says:

      Alec, I have resumed that calculation as well. The program has eliminated several of the stubs and I think it will be able to eliminate one more in the next few days.

      There is then one left which you could focus on, the last stub in the file. Right now my program is not working on that case at all, and has eliminated all but one of the others.

    • gowers Says:

      Do I understand correctly that if you can eliminate all stubs, then we will have for the first time ever a proof that for sufficiently large n (in fact, for n slightly bigger than 2000), every \pm 1 sequence of length n has HAP discrepancy at least 3?

    • Alec Edgington Says:

      Yes, I believe so.

    • Klas Markström Says:

      Tim, that is correct.

    • Klas Markström Says:

      My program has now eliminated all stubs but one. The remaining case is the last stub in the file. This problem has required a surprisingly large amount of CPU time.

    • Alec Edgington Says:

      Good progress! My program has been working on the last case for a few days now (on a single CPU) — but from its progress so far I’m not encouraged to believe it will finish any time soon, so if you’re working on it with multiple CPUs I’m inclined to abandon my attempt.

  27. Moses Charikar Says:

    Let H be a t \times n matrix whose rows are characteristic functions of HAPs (t is the number of HAPs on 1,2,…,n). Let U,V be n \times n matrices such that \langle U^i,V^j \rangle = 0 for i \not = j where M^i denotes the ith column of matrix M. Note that U^T V = D, a diagonal matrix. We find n \times t matrices B,C, such that B H = U and C H = V. Then H^T B^T C H = (BH)^T CH = U^T V = D. Now H^T B^T C H = \sum_{i,j} \langle B^i,C^j \rangle H_i \otimes H_j, where H_i denotes the ith row of H. Thus this gives a diagonal representation. The trace of the diagonal matrix D is \sum_i \langle U^i,V^j \rangle. On the other hand, the sum of the absolute values of the coefficients of H_i \otimes H_j in the representation is \sum_{i,j} |\langle B^i,C^j \rangle |. The lower bound on discrepancy established by this diagonal representation is (\sum_i \langle U^i,V^j \rangle)/(\sum_{i,j} |\langle B^i,C^j \rangle |).

    It is not clear that finding a diagonal representation of this form that yields a good lower bound is a simpler task in any way. There is considerable freedom in picking matrices U,V and even after picking them, there is freedom in picking B and C. This ought to be connected to eigenvalues and eigenvectors, but I don’t know the connection yet.

    Are the optimal diagonal representations constructed by the LP for small n of this form ? … Actually, now that I think about it, perhaps any diagonal representation is of this form, because given the coefficients of H_i \otimes H_j in the representation, we can always express the matrix of coefficients as B^T C … ok, that’s not quite true because B,C are supposed to be n \times t matrices and n < t. For this to work, the rank of the matrix of coefficients should be at most n.

    • Moses Charikar Says:

      Oops … that previous comment started a little abruptly. In reading it, insert the following text at the beginning:

      Here is one class of diagonal representations. (Although I haven’t fully understood Tim’s construction for proving Roth’s theorem, I believe it is of this form).

  28. gowers Says:

    Continuing for a moment the line of thought that can be found in this comment (and I think it is also closely related to Moses’s comment just above, but I haven’t yet got a computational feel for that one yet), let me make a couple of further observations.

    Recall that a character of the multiplicative group \mathbb{Q}_+ is nothing other than a completely multiplicative function from \mathbb{Q}_+ to \mathbb{T}. It is determined by its values at primes, and that allows us to put a probability measure on the space of all characters: we just uniformly and independently choose the values at the primes. If \mu is a random character, then what can we say about the matrix \mu\otimes\overline{\mu}? Its value at (r,r) is |\mu(r)|^2=1, and its value at (r,s) is \mu(r)\overline{\mu(s)}=\mu(r/s), which averages zero if r\ne s. Therefore, \mathbb{E}\mu\otimes\overline{\mu} is the identity. (This is of course just a special case of a much more abstract result.)

    Now let us think about how we might try to decompose a multiplicative function into a linear combination of HAPs. The obvious way to do it, which we do not expect to be efficient, is to use a greedy algorithm that’s close to how one proves Möbius inversion. To illustrate it, let us take a function that might be regarded as the first non-trivial completely multiplicative function: it takes a rational in its lowest terms and returns 1 if the number of 2s in the prime factorization is even and -1 if the number of 2s is odd. For instance, f(2/3)=-1 and f(5/12)=1.

    Having written that, I suddenly realize that I was thinking about what to do when we are talking about integers rather than rationals. If we write P_d for the characteristic function of \{d,2d,3d,\dots\} then we would end up taking P_1-2P_2+2P_4-2P_8+\dots. It’s not hard to check that that gives the right answer at every integer. As a matter of fact, so does -P_{1/2}+2P_1-2P_2+\dots, so maybe we could say that this function is, in some formal sense, equal to 2\sum_{n\in\mathbb{Z}}(-1)^nP_{2^n}. A formal sense that would work is to say that it’s the pointwise limit of those sums from -N to N as N tends to infinity.

    By that kind of technique we can build multiplicative functions that take the value 1 at all primes except one, and by multiplying those together we can get all multiplicative functions. However, I’m a bit troubled by the fact that the coefficients look as though they are getting very large. For instance, if we took the Liouville function, then the coefficient of P_6 would be 4 (since P_6 is the pointwise product of P_2 and P_3), and more generally, the absolute value of the coefficient of n would be 2 raised to the power the number of distinct prime factors of n. I suppose that usually isn’t too big.

    I’m going to stop this comment before it gets too long and do a bit of work away from the computer. What I want to do is take an arbitrary completely multiplicative function \mu and see what its decomposition of this kind is. Then for any pair (d,d') of rationals I want to take the corresponding coefficient of P_d\otimes P_{d'} in the expansion of \mu\otimes\overline{\mu}. Finally, I want to average over all \mu. At this last stage I am hoping for a significant saving as a result of cancellation.

    You might object that it shouldn’t be possible to prove anything just using infinite HAPs. That objection has considerable force, but note that we do not have the usual “1 on first half, -1 on second half” counterexample at our disposal in the infinite case. So I’m going to pursue this thought for a bit longer and see where it gets me. The dream scenario would be to get a purely formal and nonrigorous argument that feels like a proof of EDP, and then to find some way of making it rigorous via a limiting argument.

  29. gowers Says:

    For some reason I was blind in the previous comment and failed to notice that what I was doing was exactly Möbius inversion. In fact, it’s not even quite accurate to say that I didn’t notice — I noticed the possibility and then for some reason wrongly concluded that it wasn’t quite the same, even though in the past I have known perfectly well that it was.

    So, here I am with sanity restored. Let me start with the integer case, and then I’ll try to move to the rationals. Let {}f be an arbitrary completely multiplicative function from \mathbb{N} to \mathbb{T}. We would like to write {}f as \sum_m \lambda(m)P_m for some coefficients \lambda(m). But that is saying that we want f(n) to equal \sum_{m|n}\lambda(m). But that’s saying f=\lambda*1, where * is the Dirichlet convolution and 1 denotes the constant function 1. The Dirichlet inverse of 1 is the Möbius function \mu, so if we Dirichlet convolve on both sides by \mu we get that \lambda=\mu*f. That is, \lambda(m)=\sum_{d|m}\mu(d)f(m/d). Since f is multiplicative, we can write this as f(m)\sum_{d|m}\mu(d)\overline{f(d)}. Now let the primes that divide m be p_1,\dots,p_k and write [k] for the set \{1,2,\dots,k\}. Then this expression becomes f(m)\sum_{A\subset[k]}\prod_{i\in A}(-\overline{f(p_i)}), which equals f(m)\prod_{i=1}^k(1-\overline{f(p_i)}).

    Now I’m going to try to work out what Möbius inversion over the rationals is.

  30. gowers Says:

    Actually, before I do that I’d like to think about the integer case. The coefficient attached to the HAP P_m in the expansion of the multiplicative function {}f is, as we have seen, \mu*f(m). It follows that the coefficient attached to P_m\otimes P_n in f\otimes \overline{f} is \mu*f(m)\overline{\mu*f(n)}.

    Now I need to think about what this works out to be. At first I thought it was going to be rather easy, since all one has to do is average over the possible values of the f(p_i). But now I see that it’s more complicated because of the double appearance of \mu. So I’m going to do a calculation based on the formula in the previous comment, and I’ll come back with the answer.

  31. gowers Says:

    OK, let’s begin with the case m=n. Then we’re just looking at |f(m)|^2\prod_{i=1}^k|1-f(p_i)|^2=\prod_{i=1}^k|1-f(p_i)|^2. Expanding that back out, we get \sum_{A,B\subset[k]}(-1)^{|A|+|B|}\prod_{i\in A}f(p_i)\prod_{j\in B}\overline{f(p_j)}. Now if A\ne B then there is at least one i that occurs in exactly one of A and B. Then for every choice of the f(p_j) with j\ne i the average over the possible choices of f(p_i) is zero. And if A=B then obviously we get 1. So the whole thing works out to be \sum_{A\subset[k]}1=2^k.

    To summarize: the coefficient attached to P_m\otimes P_m in this expansion is 2^k, where k is the number of distinct prime factors of m. I’ll tackle the off-diagonal case in the next comment.

  32. gowers Says:

    This time let us write m=\prod_{i\in X}p_i^{r_i} and n=\prod_{j\in Y}p_j^{s_j}. Then we get f(m)\overline{f(n)}\sum_{A\subset X}\sum_{B\subset Y}(-1)^{|A|+|B|}\prod_{i\in A}\overline{f(p_i)}\prod_{j\in B}f(p_j). Once again, the terms where A\ne B contribute zero on average, and when A=B the two products multiply together to give 1. Therefore, in this case we get f(m)\overline{f(n)}2^k, where k=|X\cap Y| is the number of distinct primes in the prime factorization of (m,n).

    The next thing to think about is whether this observation counts as some kind of useful cancellation, along similar lines to what I was talking about in connection with Roth’s theorem. The answer is almost certainly no, since it looks as though we can simply truncate and fit everything into \{1,2,\dots,n\}, and we know that the result is false for HAPs of the form \{d,2d,\dots,\lfloor n/d\rfloor d\}.

    • gowers Says:

      The answer is that it certainly gives us a lot of cancellation, but the initial estimate (before the cancellation) is so bad that all the cancellation does is get us to something that is not quite so bad. But it’s pretty bad nonetheless — it proves a lower bound for EDP that tends to zero!

      Another way of decomposing a multiplicative function into HAPs is to observe that each vector e_m in the standard basis is the difference between the characteristic function of the first m integers and the characteristic function of the first m-1 integers. If we then decompose a multiplicative function f as \sum_mf(m)e_m, we find when we expand out f\otimes\overline{f} and average out, that all the off-diagonal terms cancel and we end up writing the identity as \sum_m e_m\otimes e_m, which (since each e_m is made out of two HAPs) gives us the usual trivial lower bound of 1/4.

      After all that, it’s not clear that the initial decomposition of the identity as \mathbb{E} f\otimes\overline{f} (where f ranges over all completely multiplicative functions) is especially helpful as a staging post to an efficient HAP decomposition.

  33. Alec Edgington Says:

    Here is a half-baked idea following on from Tim’s remarks at the top about volumes of convex bodies, and probably not a new one, but it’s only just dawned on me. I wonder if we can say anything about the volume of the symmetric convex hull of the characteristic vectors of HAPs in \{ 1, 2, \ldots, n\}.

    Let K_n be that convex hull (so K_n is the set of all \sum_i \lambda_i P_i where \sum_i \lvert \lambda_i \rvert \leq 1 and P_i is the characteristic function of a HAP). And let B_n be the unit ball in l_{\infty}^n (so K_n \subseteq B_n). Since EDP would imply that every extreme point of B_n correlates well with a point of K_n, we might hope that the volume of K_n would be large (as a proportion of the volume of B_n, which is 2^n).

  34. Moses Charikar Says:

    I wanted to think about diagonal representations where one constructs a linear combination of tensor products of characteristic vectors of HAPs with the same common difference. So for example, we are allowed to take tensor products of HAPs of common difference 1, and add them to tensor products of HAPs of common difference 2, and so on, but we cannot mix HAPs with two different common differences. Hopefully, this additional restriction makes it easier to think about possible diagonal representations, but it may be possible that the best bound achievable by imposing this restriction is bounded, even if the best bound achievable in general is unbounded.

    Does anyone have a reason to believe that this restriction is too strong ? In order to show that the best bound achievable by this restriction is bounded, you need to devise a construction of n \times n matrices A_n such that for all d, for all HAPs P and Q of common difference d, |P^T A_n Q| \leq C, where C is a constant. (Here, I am using P to represent both the HAP and its characteristic vector, but the meaning should be clear from context).

    • gowers Says:

      Another way of thinking about what it would mean for the proof to work in principle (that is, for the best bound using just these matrices to be unbounded) is that the following strengthening of non-symmetric vector-valued EDP holds: given any two sequences of vectors u_1,\dots,u_n and v_1,\dots,v_n such that \langle u_i,v_i\rangle=1 for every i, there exist HAPs P and Q with the same common difference such that \langle\sum_{i\in P}u_i,\sum_{j\in Q}v_j\rangle\geq C(n), where C(n) tends to infinity.

      At the moment, I don’t have much feel for how far it is reasonable to expect to be able to strengthen EDP in this direction. For instance, it’s not obvious to me that one can’t go as far as insisting that P=Q. That would say that we can decompose a diagonal matrix with unbounded trace as a linear combination \sum_i\lambda_iP_i\otimes P_i with the absolute values of the coefficients summing to 1.

    • Moses Charikar Says:

      I looked at some LP solutions for the optimal diagonal representation with this restriction. Inspired by them, here is a simple construction of a diagonal representation A=\sum_i\lambda_iP_i\otimes Q_i such that the ratio of the trace of A to \sum_i |\lambda_i| approaches 0.5 asymptotically. Admittedly not a great lower bound, but recall that so far, the best LP solution (without this restriction) we had been able to obtain earlier had a value of about 0.43178 (this was for n=100). I will deliberately not use singleton HAPs in this construction since we get a bound of 1 in a trivial way by using them.

      In order to describe this family of solutions, I’ll introduce some notation. I use (n,k) to denote the characteristic vector of the HAP of common difference k ending in n (this is only defined if k divides n). If v is a linear combination of HAPs, I use v^2 to denote v \otimes v. Let k be a large integer, and let n = lcm(1,2,\ldots,k). Now consider \sum_{i=1}^k  ((n,i)-(n-2i,i))^2. Note that ((n,i)-(n-2i,i))^2 is a matrix with 1s in locations (n,n), (n-i,n-i), (n-i,n), (n-i,n) and 0s elsewhere. Thus this summation has a total of 2k on the diagonal and the non-diagonal entries are of the form (n-i,n) and (n,n-i) for i=1,…,k. The sum of absolute values of coefficients used up in the summation is 4k. Now the non-diagonal entries are all in row n and column n. They can be easily killed off by subtracting ((n,1)-(n-1,1)) \otimes ((n-1,1)-(n-k-1,1)) and its transpose. (There may be a better way to do this, but it doesn’t matter). These additional terms add a total of 8 to the sum of coefficients. Thus the ratio of the trace to the sum of coefficients is 2k/(4k+8) which approaches 0.5 asymptotically.

      Now it is clear that this solution was doomed to achieve no better than 0.5 because it built up the diagonal by using terms of the form ((n,i)-(n-2i,i))^2. Each of these terms contributed 4 to the sum of coefficients, but only 2 to the diagonal. In order to do better, presumably, we should use terms of the form ((n,i)-(n-ti,i))^2 where t is a large integer. This would contribute t to the diagonal, while the contribution to the sum of coefficients would still be 4. However, it is now much harder to subtract off the non-diagonal terms.

      One final comment: expressing such solutions may be easier if we include 0 in every HAP, and have 0 play the role of n in the solutions above.

    • Moses Charikar Says:

      I thought the following argument should rule out the stronger restriction P=Q. I think one should be able to construct a matrix A with 1s on the diagonal such that P^T A P = 0 for all HAPs P. We have n^2-n variables available to us (one for each of the non-diagonal entries), and only (about) n \log n constraints. … not quite a proof, but it seems believable that such a matrix should exist for any n. If so, the stronger restriction will not yield any positive lower bound.

    • gowers Says:

      Another small comment, which follows on from my previous one, but is actually relevant to what you have just posted too. Observe first that if P and Q are (characteristic functions of) HAPs of the same common difference, then P-Q is plus or minus a segment of a HAP.

      Since P\otimes Q+Q\otimes P=P\otimes P+Q\otimes Q-(P+Q)\otimes(P+Q), it follows that any (WLOG symmetric) decomposition into P\otimes Qs where P and Q have the same common difference can be made into a decomposition of the form \sum_i\lambda_iP_i\otimes P_i, except that now the P_i are sets of the form \{ad,(a+1)d,\dots,bd\}. In your construction you clearly did the reverse — starting with a decomposition of the latter type and translating it into one of the former type.

    • gowers Says:

      Your most recent comment came while I was writing mine. Yes, I think what you say sounds plausible. Actually, I had in mind the more general idea of a HAP, and then a decomposition of the desired kind is easily seen to be equivalent to what you were doing, as I have just pointed out. Even so, it would be quite interesting to try to make your argument rigorous.

    • gowers Says:

      Just to be clear — in your informal argument, you are presumably still insisting that your HAPs are not singletons.

    • Moses Charikar Says:

      Good point (about using the more general notion of HAPs). So perhaps decompositions of the form \sum_i\lambda_iP_i\otimes P_i, where P_i are sets of the form \{ad,(a+1)d,\dots,bd\} are worth thinking about. Now the informal argument about the existence of a matrix A such that P_i^T A P_i = 0 for all P_i breaks down. We have about \sum_i n^2/i^2 constraints now. And yes, the informal argument assumes that singleton HAPs are excluded.

      In order to make the informal argument precise, I think something like this may suffice: for any subset of variables S, let C(S) be the constraints that involve only variables in S. Then for all S, |C(S)| \leq |S|. Alternately, one can simply give an explicit construction of such matrices.

    • Alec Edgington Says:

      I haven’t checked exactly what matrix this gives, but I think we can easily construct such a matrix step by step, by imposing the constraints (in addition to a_{nn} = 1):

      a_{mn} = a_{nm}

      a_{mn} = 0 if m < n and m \nmid n

      Then, once we’ve got a_{mn} for all m, n < N and a_{m N} for m > d \mid N (d < N), we can calculate a_{d N} from the single linear equation

      2 \sum_{1 \leq a < b \leq \frac{N}{d}} a_{da,db} + \frac{N}{d} = 0

      which we can solve because the coefficient of a_{dN} is 2 and the other terms are all known.

    • Alec Edgington Says:

      Actually, I have found the formula for this matrix. It is given by a_{d,kd} = \frac{1}{2} (\mu(k) - \mu(k/2)) for k > 1, and a_{d,d} = 1. Here \mu is the Mobius function at integers and zero at non-integers. For k > 1 this is Sloane’s A092673. It’s straightforward to prove by direct calculation that this works.

      In the course of finding this I’ve stumbled upon a fact that has (I assume) nothing to do with EDP, but which is rather odd. Consider

      \{n : a_{1,n} = 0\} = \{n: \mu(n) = \mu(n/2)\} = \{ 8, 9, 16, 18, 24, \ldots\}

      This seems to be precisely this sequence from Sloane’s database. I’m rather confused by the definition of that sequence, however…

    • gowers Says:

      I am confused both by that definition and yours. Actually, no I’m not. Your set consists of all multiples of 8, and all numbers that are multiples of an odd perfect square. Equivalently, its complement consists of all square-free numbers and all numbers that are twice a square-free number. But the definition on Sloane’s database seems to be circular — or at least the principal definition does.

    • Alec Edgington Says:

      For what it’s worth, I now see that one can just take the (non-symmetric) matrix A defined for all m, n \geq 1 by a_{mn} = \mu(\frac{n}{m}) - \mu(\frac{n}{2m}) (where by convention \mu is zero at non-integers). This satisfies P^{\mathrm{T}} A P = 0 for all HAPs P (excluding singletons), and has ones on the diagonal.

  35. gowers Says:

    This is not a new point, but I want to articulate it clearly for my own benefit, and perhaps for that of some others. In a sense, what we are trying to do is “quantitative linear algebra”. Given a bunch of HAPs P_1,\dots,P_m used to form a quadratic form \sum_i c_i\langle x,P_i\rangle^2, we know that this form is positive semidefinite, and it is positive definite if the P_i span. (Proof: then the only way for the sum to be zero — I’m assuming the c_i are positive — is for every \langle x, P_i\rangle to be zero, which implies that x is zero.) But we want to show that the P_i are not just a spanning set but “well spread about”, to enable us to prove not just the qualitative fact that that quadratic form is positive definite, but the quantitative fact that it is bounded below by a diagonal quadratic form with unbounded trace.

    Now it still seems worthwhile to investigate aspects of the purely qualitative question. For example, we could look, and indeed to some extent have looked, at small sets of HAPs where we know that they are not spanning sets, and we could think about what subspaces they annihilate. If we could show that the subspace annihilated by one set of HAPs is very different from the subspace annihilated by some disjoint set of HAPs, it might help us quite a bit.

    The idea of this vague proposal would be to identify sets of HAPs (such as the ones of the form “all multiples of d up to n”) where we can explicitly calculate the annihilated subspace. Well, those ones span, but we could remove one of them, say.

    I admit that I don’t have a clear view of how this might work, so I’ll leave this comment here for now.

  36. Moses Charikar Says:

    I explored this strengthened diagonal-representation question that we discussed recently: Can we construct n \times n diagonal matrices of the form A=\sum_i\lambda_iP_i\otimes P_i, where P_i are sets of the form \{ad,(a+1)d,\dots,bd\} and the ratio of the trace of A to \sum_i |\lambda_i| is larger than any fixed constant for n large enough ?

    I looked at LP solutions that construct the best such representation for a given n. One change from before was that I looked at HAPs over {0,1,2,…n}. The experience with the LP solutions before indicated that the LP benefits from having n be a highly divisible number. However, this requires very large n (I chose n=lcm(1,2,…,k) earlier). Including 0 gets around this since 0 plays the role of the highly divisible number and you might get a good solution for much smaller n.

    Note that allowing sets of the form \{ad,(a+1)d,\dots,bd\} can potentially lead to solutions that are better by a factor of 4, since representing (P-Q)\otimes (P-Q) can be done using a single term instead of 4. Indeed the optimal LP values that I got seem to approach 2 (instead of 0.5 earlier). Moreover, the optimal solutions were exactly of the form \sum_d P_d\otimes P_d where P_d are the sets \{0,d\}. This gives a sum of 2d on the diagonal and off-diagonal terms in the 0 row and column. As before, these off-diagonal terms can easily be cancelled off by using a constant number of additional terms.

    I had hoped to see more complex LP solutions but the fact that the optimal solutions are of this form (for n upto 100) leads me to conjecture that perhaps 2 is the best bound we can prove using these restricted diagonal representations.

    To confirm this, we need to construct n \times n matrices A such that the absolute value P^T A P is at most 2. (Again, $P$ is a set of the form \{ad,(a+1)d,\dots,bd\}. The LP actually constructs such matrices (in fact this is the dual solution). Having looked at them briefly, I didn’t see any obvious pattern. I believe the LP may have a lot of flexibility in constructing such a matrix, so the LP solution may be a result of some arbitrary choices made by the LP solver. We may get some clues from these LP solutions if we suitably constrain them.

    • gowers Says:

      Here is a small thought in that direction. Let r and s be two coprime integers, both with plenty of factors, such that rs> n, and consider the matrix M_r\otimes M_s, where M_t stands for the set (or its characteristic function) of multiples of t.

      The value of P^T(M_r\otimes M_s)P is |P\cap M_r||P\cap M_s|, which is zero.

      OK I made some mistake in my mind before I started this comment — I was hoping to use M_r\otimes M_s as a building block but I don’t see what use it is. (The idea in the back of my mind was that we are trying to find matrices that have a different effect on P\otimes Ps from the effect they have on P\otimes Qs.)

    • Moses Charikar Says:

      Along those lines, here is a half baked idea: Is it possible to construct such a matrix A = \sum_i \lambda_i v_i \otimes v_i where the the coordinates of the vectors v_i are cosine (or sine) functions ? Then P^T A P = \sum_i \lambda_i \langle P,v_i \rangle^2 and hopefully the dot products \langle P,v_i \rangle have small magnitude.

    • gowers Says:

      Here are some difficulties with that idea, though I don’t have a proof that it cannot work.

      Obviously just taking one sine or cosine function won’t work. This is partly for the trivial reason that you won’t get 1s down the diagonal. And in fact, that is the only thing wrong with taking v = (0,1,-1,0,1,-1,0,...) = \sin (2\pi x/3). But for a sine with irrational period we can also pick a HAP along which it varies slowly and then pick a segment of that HAP where it is persistently close to 1.

      These simple thoughts push us towards some kind of average. But the average over all trigonometric functions gives us the identity, which is no good to us.

      Maybe another point I’d make is that I don’t understand why if A is a sum of trigonometric functions then P^TAP might always be small even though P^TAQ=\sum_i\lambda_i\langle P,v_i\rangle\langle Q,v_i\rangle could be large.

      While I’m at it, doesn’t Cauchy-Schwarz imply that whatever symmetric matrix A you take, it will not be able to positive semidefinite (at least if this method of proof has any chance of working)? After all, if A can be written in the form \sum_i\lambda_iv_i\otimes v_i with all the \lambda_i positive, then by Cauchy-Schwarz we have

      P^TAQ=\sum_i\lambda_i\langle P,v_i\rangle\langle Q,v_i\rangle

      \leq (\sum_i\lambda_i\langle P,v_i\rangle^2)^{1/2}(\sum_i\lambda_i\langle Q,v_i\rangle^2)^{1/2}

      =(P^TAP)^{1/2}(Q^TAQ)^{1/2}.

      I haven’t thought that through to the point of seeing whether it is a sensible thing to say, and I’m not claiming to be certain that trigonometric functions won’t help.

    • Moses Charikar Says:

      Yes, if we have a construction of arbitrarily large n \times n matrices A = \sum_i \lambda_i v_i \otimes v_i such that |P^T A P| is bounded, then I would hope that this is not possible with all \lambda_i positive. If they were all positive, A would be positive semidefinite, and then A would be a counterexample to the SDP approach as well. What you point out above is that a counterexample to the SDP approach automatically is a counterexample to the representation of diagonals approach, since P^T A Q \leq (P^TAP)^{1/2}(Q^TAQ)^{1/2}.

      Any matrix A has a representation of the form A = \sum_i \lambda_i v_i \otimes v_i where \lambda_i,v_i are eigenvalues and corresponding eigenvectors. If we know that P^T A P = \sum_i \lambda_i \langle P,v_i \rangle^2 is small for all HAPs (over intervals), what does that tell us about good choices for v_i ? (Here I’m thinking of P as being the characteristic vector of a set of the form \{ad,(a+1)d,\dots,bd\}, so we don’t need to think about expressions of the form P^T A Q for HAPs P,Q.) Somehow, choosing the coordinates of v_i to be a periodic function with alternating signs seems to be a natural choice to try.

  37. gowers Says:

    Some time ago we had a discussion of how much it might be possible to strengthen EDP by reducing the set of HAPs against which you test the discrepancy. The smallest one that we couldn’t give an answer to consisted of all HAPs with common difference either a prime or a power of 2. (If you just go for primes then the sequence 1,1,-1,-1,1,1,-1,-1,… is a counterexample.) Sune asked whether one could take only HAPs that are of the form \{d,2d,\dots,\lfloor n/d\rfloor d\}, but the sequence that’s 1 for the first n/2 terms and -1 for the rest is a counterexample to that. However, I don’t immediately see a counterexample if we enlarge this set slightly by allowing all HAPs with common difference 1, while continuing to insist that all other HAPs are “full”. In other words, we would insist that the partial sums are bounded.

    As I was writing that, I suddenly wondered about a trigonometric function (in the complex case) such as \exp(2\pi i\alpha x), where \alpha is a badly approximable real. But this doesn’t work: we just pick a pretty large m such that \alpha m is very close to an integer and let that be the common difference.

    It seems a bit too much to hope that EDP could be true for this restricted set of HAPs, but if it is, then it might be slightly easier to think about because it would divide the “reason for the high discrepancy” up more clearly into what I think of as the number-theoretic part (to do with divisibility and so on) and the metric part (to do with lengths of HAPs).

    One reason to think that it’s false is that the number of HAPs we would be considering is only 2n-1. That is at least more constraints than we have variables, but it still feels like a dangerously small number. Also, we know that a lot of the constraints play no serious role. The number of HAPs we’d be taking with length greater than C would be down to n(1+1/C) or so.

    So let me go the whole hog and ask this. Is it true that for every C there exists k such that for every n if you take all HAPs with common difference 1 and all full HAPs with common difference at most k you get discrepancy at least C? Or, more realisitically, can anyone think of a counterexample?

    If not, then the next question would be whether we can represent a diagonal matrix if we restrict to just this class of HAPs …

    • Gil Says:

      Well, the probabilistic heuristics (in its more reliable direction) suggests that this is NOT true; that the number of such sequences when C is large should be larger than 2^{0.999}n.

    • Gil Says:

      Sorry I got the computation wrong (I considered only divisors d of n.) The PH (in its less reliable direction) indicates that this may well be correct (and also for the original version which we know fails).

    • Alec Edgington Says:

      On the subject of strengthening EDP by reducing the set of HAPs under consideration, I have done some experiments restricting the difference to approximately the square root of the maximum of each HAP. To be precise, I considered the discrepancy with respect to HAPs \{ d, 2d, \ldots, kd\} where \lvert \log d - \log k \rvert < \epsilon. It may be that even this restricted set of HAPs suffices to guarantee unbounded discrepancy (for any \epsilon > 0).

    • Gil Says:

      Actually I think I have to retrect the retraction. The probability for the full HAP with period d (If d is not too large) to sum up to a number below x is up to a multiplicative constant x/\sqrt {n/d}. So for large x multiplying all these numbers gives which is not exponentially small. Adding a condition that the partial sums for period 1 are below x or that all bounded sums of HAP with periods at most t are beloe x will still leave us with probability above 2^{0.999}n.

Leave a comment