EDP10 — a new and very promising approach

From time to time, there has been an input into this project that has given rise to a burst of optimism (on my part anyway). Perhaps the first was the rapid discovery of very long sequences with low discrepancy, and especially the fact that these sequences had interesting structure to them. (The length led me to hope that the conjecture might be false, or at the very least that it might be possible to construct sequences with extremely slow-growing discrepancy, and the structure led me to hope the opposite.) I’m probably forgetting a few things, but the next one I remember is Terence Tao’s amazing observation that we could restrict attention to multiplicative functions if we were prepared to change the problem very slightly. We then discovered (though we sort of knew it anyway) that multiplicative functions are not easy objects to understand …

Since I posted EDP9, there has been a development that has radically changed my perception of the problem, and I imagine that of anyone else who is following closely what is going on. It began with this comment of Moses Charikar.

Moses’s idea, which I shall partially explain in a moment (for about the fifth time) is based on the theory of semi-definite programming. The reason I find it so promising is that it offers a way round the difficulty that the sequence 1, -1, 0, 1, -1, 0, 1, -1, 0, … has bounded discrepancy. Recall that this fact, though extremely obvious, is also a serious problem when one is trying to prove EDP, since it rules out any approach that is just based on the size of the sequence (as measured by, say, the average of the squares of its terms). It seemed to be forcing us to classify sequences into ones that had some kind of periodicity and ones that did not, and treat the two cases differently. I do not rule out that such an approach might exist, but it looks likely to be hard.

Moses proposes (if you’ll excuse the accidental rhyme) the following method of proving that every \pm 1 sequence has unbounded discrepancy. I’ll state it in infinitary terms, but one can give finitary versions too. Suppose you can find non-negative coefficients c_{k,d} (one for each pair of natural numbers k,d) that sum to 1, and non-negative coefficients b_i summing to infinity, such that the quadratic form

\displaystyle \sum_{k,d}c_{k,d}|x_d+x_{2d}+\dots+x_{kd}|^2-\sum_ib_ix_i^2

is positive semi-definite. Then you are done. Why? Because if (x_n) were a sequence of \pm 1s with discrepancy at most C, then the first term in the above sum would be at most C^2, while the second would be -\sum b_i=-\infty, which contradicts the positivity in a rather big way.

Why does this deal with the troublesome sequences? Because it is perfectly possible (and necessary, if this method is to work) for the sum of the b_i over all i that are not multiples of 3 to be finite. So this method, unlike many previous proof ideas, would not accidentally be trying to prove something false.

Note that to prove that the quadratic form is positive semi-definite, it is sufficient to write it as a sum of squares. So EDP is reduced to an existence problem: can we find appropriate coefficients and a way of writing the resulting form as a sum of squares?

Now this idea, though very nice, would not be much use if there were absolutely no hope of finding such an identity. But there is a very clear programme for finding one, which Moses and Huy Nguyen have started. The idea is to begin by using semidefinite programming to find the optimal set of coefficients for large n (that is, for a finite truncation of the infinite problem), which can be done on a computer, and which they have already done (see the comments following the one I linked to above for more details). Next, one stares very hard at the data and tries to guess a pattern. It is not necessary to use the very best possible set of coefficients, so at this point there may be a trade-off between how good the coefficients are and how easy they are to analyse. (This flexibility is another very nice aspect of the idea.) However, looking at very good sets of coefficients is likely to give one some idea about which choices have a chance of working and which don’t. Having made a choice, one then tries to prove the positive semidefiniteness.

As Moses points out, if such coefficients can be found, then they automatically solve the vector-valued problem as well, since we can look at the expression

\displaystyle \sum_{k,d}c_{k,d}\|x_d+x_{2d}+\dots+x_{kd}\|^2-\sum_ib_i\|x_i\|^2

instead, and the positivity will carry over. As he also points out, if you modify our low-discrepancy multiplicative examples such as \lambda_3 by multiplying \lambda_3(n) by the unit vector e_k, where 3^k is the largest power of 3 that divides n, then you get a sequence of discrepancy that grows like \sqrt{\log n}, which shows that this method cannot hope to do better than a \sqrt{\log n} bound. But I’d settle for that!

Finally, I want to draw attention to another comment of Moses, in which he introduces a further idea for getting a handle on the problem. I won’t explain in detail what the idea is because I haven’t fully digested it myself. However, it gives rise to some Taylor coefficients that take values that are all of the form n/2 for some integer n. It is clear that they have a great deal of structure, but we have not yet got to the bottom of what that structure is. If we do, then it may lead to a concrete proposal for a matrix of coefficients that should be a good one.

My optimism may fade in due course, but at the time of writing it feels as though these new ideas have changed the problem from one that felt very hard into one that feels approachable.

108 Responses to “EDP10 — a new and very promising approach”

  1. New ideas in Polymath5 « Euclidean Ramsey Theory Says:

    […] ideas in Polymath5 By kristalcantwell There is a new thread for Polymath5. There some new ideas involving among other things quadratic forms which seem to make […]

  2. Matemáticos colaboram num projecto aberto de investigação conjunta através do blogue do Professor Timothy Gowers « problemas | teoremas Says:

    […] Com o título sugestivo EDP — <i>a new and very promising approach</i> prossegue a […]

  3. Moses Charikar Says:

    I just want to review the approach suggested here and assess where we are with this line of thought. I was trying to compute an approximate Cholesky decomposition of a symmetric matrix A whose entries are A_{m,n} = e^{-\alpha \max(m,n)} d((m,n)) where d(i) = the number of divisors of i. Getting a closed form expression for the Cholesky decomposition seems hopeless, but perhaps getting a truncated Taylor series approximation of each entry of the form \gamma_{i,j} + \beta(i,j)\alpha is tractable. We know what these \gamma(i,j) are. Can we figure out expressions for \beta(i,j) ?

    Further down the line, I was hoping to pick \alpha small, and then consider such a matrix of size 1/\alpha or perhaps \log(1/\alpha)/\alpha, subtract a large diagonal matrix and show that the remainder is positive semidefinite. If such a diagonal matrix could be found with sum of entries at least C/\alpha, this would establish a lower bound of \sqrt{C} for EDP.

    The hope was that the approximate Cholesky decomposition will give hints about what this diagonal matrix should look like. Intuitively, if a diagonal entry
    is large (because of a large \beta(n,n) value), then this might be where a large quantity could be subtracted out in the diagonal matrix. Now we don’t quite understand the form of these \beta(n,n) values. Tim has worked out nice expressions for prime powers and a fairly complicated expression for products of two prime powers. This suggests that getting a general expression will be quite tricky. Put that aside for a moment. If we did understood these \beta(i,j) completely, are their values good enough to be useful ?

    By looking at the \beta(i,j) values generated by a program, it looks like the sum of diagonal values \beta(i,i) values for a size
    n matrix grows as n^2 \log n. Say n = 1/\alpha.
    The ith diagonal term in the Cholesky decomposition is 1+\beta(i,i)\alpha. Assume optimistically that we can subtract out
    a term like \beta(i,i)\alpha from the (i,i) entry. This
    would mean that the sum of diagonal entries subtracted out would be \alpha n^2 \log(n) = \log(1/\alpha)/\alpha. This would be great of course, since our goal is for this quantity to be larger than C/\alpha.
    If this works as planned, this would give a \sqrt{\log n} lower bound.

    Other than the optimistic assumption that we haven’t justified, there is another issue we need to deal with. The use of the Taylor series approximation assumes that the coefficients of \alpha are much smaller than 1/\alpha.
    But in fact the maximum of these coefficients for an n \times n matrix seems to grow like n \log^2 n (this is from looking at values output by a program. In fact the average on the diagonal is already n \log n). Recall that n =1/\alpha. This means that these coefficients are too large for us to use the Taylor series approximation we used.

    If we are going to think along these lines, we need a different way to approximate the Cholesky decomposition.

    • Gil Says:

      Dear Moses, Two questions: What again is the reason that you and Huy do not include the HAP of difference 1? The second: Perhaps there will be some better insight if we exclude all HAPs of small difference <=c and see how the solution behaves as c grows. EDP suggests that we will get sqrt log n bounds for every c but maybe the pattern will become clearer if we consider large c or the behavior as c grows.

    • Moses Charikar Says:

      Dear Gil, We did include HAPs of difference 1. What we excluded was singleton HAPs, because they trivially give a lower bound of 1 on the discrepancy. In this case, the SDP dual solution would not have exhibited any useful structure. We haven’t thought of excluding HAPs of small common differences to see the effect on the value. It does look like the SDP dual weight placed on HAPs of difference d is a (roughly) decreasing function of d. But I don’t know how much the value will drop when we exclude those with small differences. We’ll try it out and report back.

  4. gowers Says:

    Moses, can I check with you that the condition for success in the Cholesky decomposition, as you describe it in this comment, should be that A_{ii}-\sum_{j<i}V_{i,j}^2\geq 0? I don’t understand what you’ve written at the moment (which could be a sign that it was garbled by WordPress), but this seems to be the condition that would allow the algorithm to continue to find a decomposition with non-negative coefficients.

  5. Alec Edgington Says:

    Given the formulae that Tim has worked out for f(n) = \beta(n,n), a simple guess is that f(n)/n is of the order of \frac{1}{2} d(n) - 1, where d(n) is the divisors function. This plot seems to support this:

    In particular, it seems reasonable to conjecture that 2f(n)+1 \leq n (d(n) - 2) for all n.

  6. gowers Says:

    This is a response to Moses’s comment above. First, I spent a little time rereading the comments in which you introduced the truncated-Taylor-series idea, and I now understand it rather better. (Yesterday I just saw a matrix with some patterns and became determined to work out what the patterns were, though now that they appear to have a piecewise linearity property that is likely to get ever more complicated as the number of distinct prime factors increases, my enthusiasm for that has somewhat dimmed. Nevertheless, I don’t rule out that it may be possible to work out exactly the set of linear forms in the exponents of which one should take the maximum.) I too was slightly worried about whether truncating the Taylor series was going to work in the region that truly interests us, but even if that is the case I think it should still be quite a useful thing to have done.

    I am actually a little more disturbed by the presence of the maximum in the formula for the coefficient A_{m,n}=e^{-\alpha(m\vee n)}d((m,n)) of the matrix. That, it seems to me, rules out any truly nice expression for anything else that one might wish to compute in terms of A, and probably accounts for the piecewise linearity that was appearing yesterday (though I do not claim to have found a direct connection).

    However, it may be that some of the ideas I was having when I was trying a Fourier approach could accidentally help to produce a matrix with a better formula. Let me briefly review what the extra idea is that seems to have some chance of achieving that (but someone else may spot some reason that this extra idea cannot give rise to a matrix that is not just given by a nice formula but is also useful).

    The reason that the maximum occurs in the formula for A_{m,n} is that each HAP has a sharp cutoff. What do I mean by that? Well, the criterion for belonging to the HAP \{d,2d,\dots,md\} is that you should be a multiple of d and you should be at most md. The second condition switches off suddenly after md — that is the sharp cutoff. (I realize that many people reading this do not need to be told what a sharp cutoff is, but I see no harm in explaining it to people who might not otherwise know exactly what I meant.) If we now take a sum like \sum_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2, then the coefficient of x_mx_n is equal to twice the sum of the c_{k,d} over all pairs (k,d) such that d is a common factor of m and n and kd\geq\max\{m,n\}. So there seems to be no way of avoiding a dependence on \max\{m,n\}.

    That is true if we use HAPs with sharp cutoffs. But there is another option that may be available to us. As with several ideas connected with proving EDP, this further option will not work unless a stronger version of EDP is also true, which means that there is no guarantee that it works. (This is a point that Moses made when he introduced his approach and that I forgot to make in the post above: the stronger the statements we try to prove, the more chance there is of finding counterexamples, so we probably shouldn’t give up the search for counterexamples just yet. Even a counterexample to a strengthening of EDP would have major implications for our understanding of EDP itself.)

    The idea is to exploit the following simple fact (and now I am repeating things I have said before). Let (\theta_n) be any decreasing sequence of real numbers that tends to zero, and suppose that \sum_n\theta_n=S<\infty. Let (x_n) be a sequence such that |x_1+\dots+x_n|\leq C for every n. Then the sum \sum_n\theta_nx_n, which equals

    \displaystyle (\theta_1-\theta_2)x_1+(\theta_2-\theta_3)(x_1+x_2)+\dots,

    has modulus at most C\sum_n(\theta_n-\theta_{n+1})=C\theta_1.

    Thus, if we can find a lower bound of A\theta_1 for |\sum_n\theta_nx_n|, then we may conclude that there exists a partial sum with modulus at least A.

    This gives us the option of considering not just a sum such as

    \displaystyle \sum_{k,d}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2

    but also smoother expressions such as

    \displaystyle \sum_d\int_\beta c_{\beta,d}(\sum_k e^{-\beta kd}x_{kd})^2 d\beta.

    My choice of e^{-\beta kd} there may not be optimal — I can choose any \theta_{\beta,k,d} that has the right sort of decreasing behaviour in k.

    What is the coefficient of x_mx_n in the above expression? It is

    \displaystyle 2\sum_{d|(m,n)}\int_\beta c_{\beta,d}e^{-\beta(m+n)}d\beta,

    except when m=n when we lose the factor 2.

    One possible choice for c_{\beta,d} is 1. If we make that choice, then we get d((m,n))\int_\beta e^{-\beta(m+n)}d\beta, which equals d((m,n))/(m+n).

    Unfortunately, for that choice \sum_d\int_\beta c_{\beta,d}=\infty. Another choice would be e^{-\alpha d}e^{-\beta}, which sums/integrates to approximately 1/\alpha when \alpha is small. Then we would get a coefficient of d_\alpha((m,n))/(m+n+1), where I am defining d_\alpha(x) to be the sum of e^{-\alpha d} over all factors d of x.

    The reason this general method is not guaranteed to work is that the unboundedness of the partial sums does not imply the unboundedness of the smoothed partial sums. It may be that someone can completely kill off this idea by coming up with a counterexample to “smooth EDP”. That is, one would want a \pm 1 sequence (x_n) (or a sequence of unit vectors if you prefer) such that there is some absolute constant C with |\sum_k e^{-\beta kd}x_{kd}|\leq C for every d and every \beta>0.

    I would be very interested in such an example, as it would show that the truth of EDP would have to depend in a crucial way on the sharp cutoffs of the HAPs.

  7. gowers Says:

    It occurs to me that I should try out smooth EDP on the usual character-like examples. What’s more, I have a horrid feeling that it may fail for such examples, precisely because of the smoothing.

    So let me try to work out the sum e^{-\beta}-e^{-2\beta}+e^{-4\beta}-e^{-5\beta}+\dots. It is e^{-\beta}(1-e^{-\beta})/(1-e^{-3\beta}). We now add to that the same expression with \beta replaced by 3\beta, then 9\beta, and so on. The result looks pretty close in size to e^{-\beta}+e^{-3\beta}+e^{-9\beta}+\dots.

    So far that is OK, since this tends to infinity as \beta tends to 0. But I think disaster strikes if we consider \mu_3 instead of \lambda_3, so that now the sum we care about is the alternating sum e^{-\beta}-e^{-3\beta}+e^{-9\beta}-\dots. In that case, the sum is uniformly bounded for all \beta.

    I’ll have to think about how much of a problem this is.

    • gowers Says:

      Does it, for example, imply that it is not possible to subtract from the matrix A_{m,n}=d_\alpha((m,n))/(m+n+1) a large diagonal part while maintaining positive semi-definiteness? Indeed, does it imply something of this kind whenever A_{mn} is given by a formula of the form

      \displaystyle \sum_{d|(m,n)} \int_\beta c_{d,\beta} e^{-\beta(m+n)}d\beta?

      That would be quite interesting, as it would suggest that in order for Moses’s approach to work, it would be essential to deal with sharp cutoffs and the resulting slightly unpleasant expressions. (Here’s a completely speculative idea: perhaps from any such matrix one can subtract off a smooth part and be left with a matrix from which it is easier to see how to remove a large diagonal without losing positive definiteness.)

    • gowers Says:

      One further small remark. The failure of “smooth EDP” fits quite well with the way that one actually proves that the partial sums of \mu_3 diverge. (Recall that to calculate \mu_3(n) you write it as 3^km with m not a multiple of 3, and then set it to be (-1)^k if m is congruent to 1 mod 3, and -(-1)^k if m is congruent to 2 mod 3. This function is easily checked to be completely multiplicative.) To prove that, one decomposes the sequence up into 1,-1,0,1,-1,0, … plus 0,0,-1,0,0,1,0,0,0,0,0,-1,0,0,1,0,0,0,… and so on. The contributions to the partial sums are then 1,0,0,1,0,0,1,0,0,… plus 0,0,-1,-1,-1,0,0,0,0,0,0,-1,-1,-1,0,0,… and so on, which makes it easy to choose numbers at which the partial sum is a sum of several 1s and no -1s. But if we smooth this off, then we lose what it was that was allowing us to do this. We were depending on the fact that the sum up to n of the various parts of \mu_3 varies in a discontinuous way. By contrast, if {}f is one of those parts, then the sum \sum_n e^{-\beta n}f(n) varies in a very smooth way as \beta tends to 0.

      This slightly vague observation leads to a slightly vague question: what constraints does the existence of the function place on the coefficients c_{k,d} and b_m if Moses’s method is to work?

      I shall try to work this out very directly. It tells us that

      \sum_{k,d} c_{k,d}(\mu_3(d)+\mu_3(2d)+\dots+\mu_3(kd))^2\geq \sum_m b_m.

      Let \nu(n)=\mu_1+\dots+\mu(n) and let c_k=\sum_dc_{k,d}. Since \mu_3 is multiplicative, the above formula implies that \sum_kc_k\nu(k)^2\geq\sum_m b_m.

      Hmm, that doesn’t seem to tell us all that much, since on average \nu(k)^2 grows roughly logarithmically.

    • gowers Says:

      Oops, I’ve just realized that I made a mistake. I approximated e^{-\beta}(1-e^{-\beta})/(1-e^{-3\beta}) by e^{-\beta} because I was accidentally thinking of e^{-\beta} as being very small. However, the mistake turns out not to make much difference, since when \beta is small we get roughly e^{-\beta}/3. I haven’t checked 100%, but I’m pretty sure that the sum is still uniformly bounded.

  8. gowers Says:

    Here’s a small observation. Suppose we look at the complex case. I think we can do slightly better than \mu_3 if we are trying to find the lowest possible discrepancy. The idea is to define a completely multiplicative function by letting f(p)=1 if p is congruent to 1 mod 3, f(p)=-1 if p is congruent to 2 mod 3, and f(3)=e^{2\pi i\alpha}, where \alpha=(\sqrt{5}-1)/2. The point about choosing the golden ratio is to make the multiples of \alpha as equidistributed as possible mod 1, or equivalently to make the powers of f(3) as equidistributed as possible round the circle.

    When we break up {}f in the usual way according to which is the largest power of 3 that divides n, we find that for each k we have a set of partial sums that either equals 0 or e^{2\pi i\alpha k}. So to make these partial sums reinforce each other, we need to choose a bunch of ks such that the corresponding e^{2\pi i\alpha k}s are pointing in roughly the same direction. If we choose \alpha=1/2, this is easy to do: we just choose alternate ks (this is the function \mu_3). But for a badly approximable irrational, things are harder. I haven’t worked out what the best thing to do is, but I think it is probably this. We choose an angle \theta, and we pick an integer n=\sum_{k\in A}3^k, where A is the set of all k such that e^{2\pi ik\alpha} lies on the arc that goes from e^{-i\theta} to e^{i\theta}. Assuming that the multiples of \alpha are perfectly equidistributed (this is a heuristic argument from now on) the density of A is \theta/\pi, and \sum_{k\in A}e^{2\pi ik}=|A|(2\theta)^{-1}\int_{-\theta}^{\theta}\cos x dx=|A|\sin\theta/2\theta. That means that |A| is roughly \theta\log n/\pi and the sum is therefore roughly (\sin\theta/2\pi)\log n. Thus, the best one can do is to take \theta=\pi/2, which gives a bound of \log n/2\pi. Here, logs were to base 3.

    I find this quite interesting because it just might provide an explanation for the seeming appearance of golden ratios in the 1124 examples. I note that \log_3(1124)/2\pi is fairly close to 1. Is there some way of making a \pm 1 sequence out of this complex example at a cost to the discrepancy of a factor of 2 or so?

    • gowers Says:

      Incidentally, if one moves to the unit-vectors case, then up to a constant one can’t do better than the \sqrt{\log n} that Moses has already observed. To see this, let’s suppose that we set f(3^k)=u_k for some unit vector u_k and then define f(3^km) to be the usual multiple of f(3^k). We find ourselves wanting to choose u_1,\dots,u_m in such a way that no sum \sum_{k\in A}u_k has large norm. (Here, m=\log_3n.)

      However, we know that if \epsilon_1,\dots,\epsilon_m is a random choice of signs, then the expectation of \|\sum_i\epsilon_iu_i\|^2 is exactly m. This proves that there exists a choice of signs such that \|\sum_i\epsilon_iu_i\|\geq\sqrt{m}. From the triangle inequality it follows that either \|u_1+\dots+u_m\|\geq\sqrt{m}/3 or when you add it to the random \pm 1 combination you get a vector with norm at least 2\sqrt{m}/3. In the latter case you find a set A such that the norm of 2\sum_{i\in A}u_i is at least 2\sqrt{m}/3, so in this case you also have a sum of the unit vectors of size at least \sqrt{m}/3.

    • Moses Charikar Says:

      That’s interesting. I have been wondering whether it is possible to devise clever constructions of low discrepancy vector sequences that use sequences for other primes say and combines them in an ingenious way. A couple of remarks:

      1. Arguing about complex numbers and reasoning about angles plays a role in the proof of Roth’s theorem on discrepancy of APs.

      2. It would interesting to show some connection between vector discrepancy and the discrepancy of \pm 1 sequences. For example, is it possible that the vector discrepancy over HAPs is bounded by a constant, but the \pm 1 discrepancy is \Omega(\log n) ?

      Possibly relevant to this is a recent paper that uses semidefinite programming to efficiently construct low discrepancy \pm 1 sequences for various set systems. He gets a constructive version of Spencer’s result of O(\sqrt{n}) discrepancy for an arbitrary system of $n$ sets on a universe of $n$ elements, improving on the O(\sqrt{n \log n}) achieved by random sequences. The technique in this paper is applicable to situations where the discrepancy remains bounded for the set system restricted to an arbitrary subset of the elements.

  9. SDPLover Says:

    In retrospect, it should have been obvious to think of using SDPs. In Lovasz’s survey on SDPs (http://www.cs.elte.hu/~lovasz/semidef.ps), Section 5.2, he uses an SDP argument to give a 1-page proof of Roth’s 1964 theorem. Interestingly his argument is a primal-only argument; he does not look at the dual.

  10. gowers Says:

    I’ve just done a little experiment, which I found quite interesting, though I don’t have a completely clear programme for what to do with it.

    I am trying to pursue Moses’s idea of using the Cholesky decomposition to help us guess which diagonal entries we can reduce while maintaining positive semidefiniteness. I thought it would be quite interesting to take the matrix A_{mn}=d((m,n)) (where d(x) is the number of divisors of x), which, as Moses observed, can be split up as V^TV, where V_{ij}=1 if i|j and 0 otherwise, subtract an infinitesimal amount from the diagonal, and apply the Cholesky decomposition algorithm to the perturbed matrix. (This is similar in spirit to Moses’s Taylor-expansion idea but computationally I think it is slightly easier.)

    As a first attempt to get some understanding of what was going on, I decided to subtract an infinitesimal \delta from A_{11} and leave the rest of the diagonal alone. The result was to replace V by V+\delta W, where the first few rows of W look like this.

    -1/2, 1/2, 1/2, 1/2, 1/2, 1/2, …

    0, -1/2, -1, -1/2, -1, -1/2, …

    0, 0, -1/2, 0, -1, 1/2, …

    0, 0, 0, 0, 0, 0, …

    0, 0, 0, 0, -1/2, 1, …

    0, 0, 0, 0, 0, -1/2, …

    I was a little sad when U_{6,6} turned out to be -1/2 because up to then the diagonal entries had magically agreed with the diagonal entries of the matrix that Moses calculated earlier. However, there was no reason for the two matrices to be the same.

    I’d be quite interested to know more of this matrix. In particular, if the diagonal entries appear to be bounded below then it would be quite encouraging.

    Note that if one subtracts an infinitesimal amount from the diagonal and ignore all quadratic and higher-order terms, then the corrections to the matrix for each diagonal element are independent of each other. In other words, we can study each diagonal entry separately. What I would hope to see is that when n has many factors the effect of replacing A_{nn} by A_{nn}-\delta is not too drastic. It would also be interesting if the converse were true as well. In fact, what I’d really like to see is something like this. You subtract \delta from A_{nn} and obtain a matrix V+\delta W_n. The diagonal terms of W_ n are bounded below by some function g(n) (which is negative). The bound gets worse as n gets larger, but it is not so bad for n with many factors. As a result, we can choose a reasonably large diagonal matrix D (with non-negative entries \delta_n that sum to infinity) such that there is a uniform lower bound on the diagonal entries of \sum_n\delta_nW_n.

    • Moses Charikar Says:

      Tim, one comment is that your matrix corresponds to the quadratic form
      \displaystyle \sum_d (x_d+x_{2d}+x_{\lfloor n/d \rfloor d})^2

      By setting x_i = +1 for i \leq n/2 and x_i = -1 for i > n/2, the value of this expression is at most n. So we cannot subtract more than a sum total of n from the diagonal.

      Another (actually simpler) way to say the same thing is that the diagonal entries of the Cholesky decomposition are all 1, so clearly we will not be able to subtract more than n from the diagonal. Is this a problem for what you were thinking about ?

    • gowers Says:

      Moses, I’m struggling to understand the last remark of yours. In particular, which matrix are you referring to when you write “your matrix”? The fact that you have a free variable n in your definition suggests that you are talking about W_n, or a quadratic form related to W_n, but from the rest of what you write I am not so sure.

      However, to answer your last question, I was wanting to take the case \alpha=0 of the matrix you defined in this comment and to try to understand, purely formally (that is, without thinking about bounds on sums of coefficients etc.) what happens to the Cholesky decomposition if you subtract a tiny amount from the diagonal. My reason for doing that was not that I expected a proof of EDP to be a direct result, but just that I hoped that the calculations would be easy and would throw up patterns that would enable one to make intelligent guesses about the sequence (b_m).

      But perhaps your point is that the \alpha=0 case is not informative because it corresponds to putting a weight of 1 on all “full” HAPs (that is, ones that go right up to n) and 0 on the rest, and we can easily construct examples of bounded discrepancy sequences in this case: just have the first half of the terms equal to 1 and the second half equal to -1. (This point came up recently actually in a question that Sune asked.)

      I have to say that I don’t know whether one can hope to find anything interesting out about EDP by messing about with matrices that are provably not useful for the problem. But I think I’ll try one or two other things for a bit!

    • gowers Says:

      Instead of trying some other guess, I have had a go at seeing what I can say in general about subtracting an infinitesimal amount from the diagonal of a matrix. Recall that once we have chosen our coefficients c_{k,d} we end up with a quadratic form, we take the matrix of that form, and we try to subtract as much as we can from the diagonal while keeping it positive semidefinite. My hope is that we can get some idea of what to subtract if we linearize the problem by subtracting an infinitesimal amount from the diagonal. What I hope this shows is that some diagonal entries (ones corresponding to integers with few factors) are “expensive”, in the sense that they have a big effect on the diagonal entries in the Cholesky decomposition, whereas others are “cheap”. This, I hope, will give a clue about what we should subtract if we are subtracting a non-infinitesimal amount. (For example, perhaps one could repeat the infinitesimal process infinitely many times and end up with expressions involving exponentials.)

      Suppose, then, that A is the matrix of the quadratic form, and that V is an upper triangular matrix such that V^TV=A. The upper triangular property says that if V_{ij}\ne 0 then we must have i\leq j. Now let D be a diagonal matrix with entries d_i, and let \delta be an infinitesimally small constant. We would like to find an upper triangular matrix U such that (V+\delta U)^T(V+\delta U)=A-\delta D. Then V+\delta U will be the Cholesky decomposition of the perturbed matrix A-\delta D. Since \delta is infinitesimal, this works out as U^TV+V^TU=-D.

      Now let the columns of V be v_1,v_2,\dots and the columns of U be u_1,u_2,\dots. The upper triangular property tells us that v_i and u_i are vectors whose coordinates are zero after the ith coordinate.

      The ijth entry of U^TV is just the scalar product u_i.v_j, so the equations that these column vectors must satisfy are that u_i.v_j=-u_j.v_i whenever i\ne j and 2u_i.v_i=-d_i for every i.

      This makes it easy to solve for U (at least algorithmically, though whether one can get a closed-form expression for the solution in any case worth solving is open to question). If we have worked out u_1,\dots,u_{i-1}, we let w_i be the column vector that is the same as u_i except that w_{ii}=0. Then w_i is fully determined by the equations w_i.v_j=-u_j.v_i for j<i. We now need to work out u_{ii}, which is determined by the equation 2u_{ii}v_{ii}=-d_i-2w_i.v_i.

      We are hoping that with a judicious choice of D the u_{ii} will not get too large and negative. However, I don’t think there is much one can say about that question unless one chooses a specific matrix V to work with.

      Once one is actually doing calculations, it will be easiest to choose D to have just one non-zero entry equal to 1. One can then take linear combinations.

    • Moses Charikar Says:

      Sorry, I should have been more precise. As you correctly inferred, I meant that A_{ij}=d((i,j)) corresponds to the quadratic form \sum_d (x_d+x_{2d}+x_{\lfloor n/d \rfloor d})^2.

      There is something peculiar about the \alpha > 0 case that causes the diagonal entries in the Cholesky decomposition to shoot up. But you are right that we are so far from understanding this at this point – we’ve got to get started somewhere and the \alpha = 0 case is a good starting point.

  11. Sune Kristian Jakobsen Says:

    Some necessary conditions on the coefficients b_i in Moses approach (necessary if we want a proof of EDP):

    \sum_{i=1}^{\infty}b_i=\infty
    If we have a sequence with that take values {1,-1} (or more generally: unit vectors) for all i\in A\subset \mathbb{N}, and 0 for all other i and the sequence has discrepancy C, then \sum_{i\in A}x_i\leq C^2.

    Are these conditions sufficient? And do you think it would make sense to try to find the coefficients using only these two “axioms”? (I’m not up to date with everything here, so you might be working on approaches that are strictly smarter).

  12. Huy Nguyen Says:

    I just wanted to point out that there is a fairly simple set of n vectors whose dot products exactly equal the matrix of the quadratic form when we choose the tails t_{k, d}=e^{-\alpha k d}. Vector v_i is a concatenation of u_{i1}, \ldots,u_{in}. If j < i, then u_{ij}=0 . If i \le j < n, u_{ij}=\sqrt{e^{-\alpha j}-e^{-\alpha (j+1)}} (1_{d|i})_d, where (1_{d|i})_d is a vector whose dth coordinate is 1 if d|i and 0 otherwise. Lastly u_{in}=\sqrt{e^{-\alpha n}} (1_{d|i})_d.
    It is easy to check that v_i\cdot v_j = e^{-\alpha \max(i,j)}d((i,j)).
    The unfortunate thing about these vectors is that they are in n^2 dimensions. However, maybe we can gain some intuition by working on these vectors for now while still trying to understand the Cholesky decomposition.

    • Moses Charikar Says:

      Huy’s expression actually applies to the quadratic form obtained from any choice of c_{k,d} and can be interpreted as follows: Write out a matrix with one row for each HAP. The row vector for the HAP d,2d,..,kd is \sqrt{c_{k,d}} times the incidence vector for the HAP. Now the columns of this matrix are the required vectors v_i.

      This gives a way to understand what the entries of the diagonal of the Cholesky decomposition are. The ith diagonal entry is the magnitude of the component of v_i orthogonal to the space spanned by v_1, \ldots,v_{i-1}. Is this something we can get a handle on ?

  13. gowers Says:

    I don’t know how much it will help, or what the arguments are in favour of choosing c_{k,d}=e^{-\alpha kd}, but it may be of some use to consider coefficients that ought to be the other natural choice, namely c_{k,d}=(kd)^{-(1+s)} for some small positive s. In this case, \tau_{k,d}=d^{-(1+s)}\sum_{j\geq k}j^{-(1+s)}, which is roughly equal to d^{-(1+s)}k^{-s}/s. Also, \sum_{k,d}c_{k,d} is, by roughly the same calculation, about 1/s^2. (One thing I like about this choice is that it splits as a product k^{-(1+s)}d^{-(1+s)}, which makes a lot of calculations pretty clean.)

    The coefficient of x_mx_n in the quadratic form

    \displaystyle \sum_{k,d}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2

    is, as ever, \sum_{d|(m,n), k\geq\max\{m,n\}}c_{k,d}, which equals \sum_{d|(m,n)}d^{-(1+s)}\sum_{k\geq\max\{m/d,n/d\}}k^{-(1+s)}. If we approximate the latter sum by (\max\{m,n\}/d)^{-s}/s, then this simplifies to s^{-1}(\max\{m,n\})^{-s}\sum_{d|(m,n)}d^{-1}, which we can also write as s^{-1}(\max\{m,n\})^{-s}\sum_{d|(m,n)}\frac d{(m,n)}. But \sum_{d|x}d is a well known arithmetical function \sigma(x) with many nice properties. (See this Wikipedia article for more details.) For example, it is multiplicative, though not completely multiplicative. So we end up with the nice expression \frac{s^{-1}(\max\{m,n\})^{-s}\sigma((m,n))}{(m,n)}. That’s not quite right — if m\ne n it needs to be doubled. But it gives the matrix entry A_{mn} for the matrix of the quadratic form.

    • gowers Says:

      After doing the above, I went to have a look at some of Moses’s data to see whether the best possible matrices for the dual problem exhibit the kind of behaviour that is coming out of the above calculations. In retrospect I realize that the answer was doomed to be no, because I now remember that Moses chose the exponential decay of the coefficients c_{k,d} precisely because that is what was coming out of the experiments.

      Anyhow, it seems that the matrix elements are more closely connected with d((m,n)) (the number of divisors of (m,n) than they are with \sigma((m,n)).

      I’ve just done another calculation and it’s left me a bit puzzled. Actually, perhaps it’s OK. The coefficient A_{m,n} should equal \sum_{d|(m,n)}\tau_{\max\{m,n\},d}, and the experimental evidence suggests that this is equal to \sum_{d|(m,n)}C_de^{-\alpha d\max\{m,n\}}. At least for small (m,n), given that \alpha is small, this should be roughly \sum_{d|(m,n)}C_d. If this is to be roughly proportional to the number of factors of (m,n) then C_d should be roughly constant. But this doesn’t seem to be the case.

      Actually, that isn’t such a problem, because we don’t get a matrix that’s proportional to d((m,n)). From the data, this is particularly noticeable in the fifth row, which has a bump at every fifth column, but up to more like 8/5 of the underlying level. This suggests that C_5 is more like 3/5 times C_1. The square of 5/3 is 25/9, which is not far from e, suggesting that the log of the ratio of C_1 to C_5 should be around 1/2. And that seems to be borne out by this diagram of Huy’s, as do various other facts that one can spot when looking at the matrix.

      The ratio corresponding to 7 appears to be about 4/3, so C_7 seems to be more like 1/3. Let me have a look at a few more of these at prime values, to see if a pattern for the C_d emerges. At 11 I get a ratio that’s pretty close to 8/7, so C_{11} is fairly close to 1/7. At 13 I get something that doesn’t look close to a rational with small denominator, so I’ve hit a brick wall. But at least it is smaller than the ratio that I got for 11.

      However, I do get the distinct impression that the coefficients C_d are approximately multiplicative — no time to check this. But the reason I’m interested is that it looks as though it should be possible to say roughly what the coefficients C_d are, and if we can do that then we have a conjectured good matrix.

      Just to reiterate what I’m doing, in case anyone feels ready to get a computer to do the calculations quickly and easily, I’m guessing the coefficients C_d by assuming that the matrix is given by a formula A_{m,n}=\sum_{d|(m,n)}C_d times something that decays slowly along each row. This makes it easy to guess C_d. For example, if I look at the fifth row, then C_5 should be approximately equal to (A_{55}-A_{15})/A_{15} (if I normalize by setting C_1=1).

      The dream scenario would be that C_d turns out to be a power of d. I’d be interested to see how close this dream is to reality by seeing a plot of log of (A_{mm}-A_{m1})/A_{m1} against \log m.

    • gowers Says:

      Moses, I’ve just had a look at the b values that you obtained and they are distressingly lacking in any obvious pattern. For example b_{13} is almost 100000 times bigger than b_{11}. This makes no sense to me whatsoever. Is it perhaps an ugly “edge effect” that comes from looking at small progressions?

      It seems to me that to make any more progress by staring at the data it will be necessary to modify the SDP in some way so as to make the data nicer.

      Other things that don’t make sense: the value at 12 is very big, but the values at 1,2,3,4 and 6 are all tiny. The values at 2, 4, 7 and 49 are all tiny but the values at 8 and 14 are big. Etc. etc.

    • gowers Says:

      I’m now at home and have an even more fundamental problem with staring at the data, which is that Princeton won’t let me look at it. (When I was on my work computer it somehow recognised that I didn’t have some malicious reason for wanting to look at 512-by-512 matrices.)

    • Alec Edgington Says:

      I had that problem briefly but it seems OK now. I’ve put Moses’ tables of b_n on the wiki: here.

    • gowers Says:

      Great, thanks. A first thing to try to work out might be which values of n are the ones for which b_n is absolutely tiny, and which are the values for which it is reasonably substantial. That seems to be reasonably independent of the size of the matrix. By “work out” I mean of course that one should try to find some simple rule that says which values are small.

    • Alec Edgington Says:

      The n with b_n = 0 (to within the accuracy of the data) for the 1500 matrix are: 1, 2, 3, 263, 367, 479, 727, 907, 1181, 1277, 1282, 1303, 1429, 1447, 1487.

      Most of these are prime, the exceptions being 1 (fair enough) and 1282. Not much to go on there. Of the 274 n with b_n <  0.00000001, 174 are prime, and as you've noted they include some quite smooth numbers, like 18 and 49.

      There is some hint of regularity when one looks at geometric progressions. For example, this plot shows \log b_{2n} - \log b_n

    • Alec Edgington Says:

      Curiously, in light of Sune’s observation that \sum b_{n!} \leq 1, the largest value is at a factorial (b_{720} = 0.1115). (I’m still focusing on the 1500 sequence). The n with b_n > 0.005 are: 108, 120, 180, 216, 240, 252, 270, 288, 360, 420, 432, 480, 504, 540, 630, 648, 720, 756, 810, 840, 864, 900, 1008, 1080. These are all very smooth.

    • Sune Kristian Jakobsen Says:

      This seems to be a sign that the b_n sequence depends a lot on N (that is the largest number we consider): Smooth number “just below” N (on a log scale) take high values. So perhaps it is important to remember that we don’t have to find a sequence b_n with infinite sum, but only with arbitrary large sum.
      Perhaps one way of doing this would be to make b_i a sequence over the rational numbers in (0,1), and now we want the sum to be infinite?

    • Moses Charikar Says:

      I don’t know why you are having trouble with accessing the files. I’ll make a copy and link them off of my Princeton CS web page as well.

      About reading patterns in the b_n values … I have a feeling that the large values are more reliable than the small ones. There are all sorts of reasons why the small values might be misleading: roundoff errors, the fact that we truncated the size of the problem at 512, 1025, 1500, etc (the latter is probably more significant than the former). Huy mentioned that in general the patterns in the data seem to be become more robust for larger values of n. Unfortunately, solving the 1500 size SDP took about a day. 1024 took an hour or two I think. So it is unlikely we will be able to get data for much larger n. (The solver was running on my mac mini – admittedly not the most powerful computing platform available, but one I had complete control over).

      In looking through the data earlier, I’ve always sorted the values and tried to spot patterns in the large values – that seems to be a useful way to look at the data. I’d be quite happy to understand the largest b_n that contain half the total mass, say. As Sune and Alec point out, large values seem to occur at large smooth n. I have a very hand-wavy heuristic explanation for this from thinking about the \alpha = 0 matrix that Tim was looking at earlier and the Taylor coefficients on the diagonal we’ve observed for the small \alpha case before. I’ll try to put down my thoughts in a later comment.

    • Moses Charikar Says:

      I agree with Tim that it would be a good idea to modify the SDP to make the data “nicer”. If we have a guess of what setting of c_{k,d} is good/reasonable/fits the data, then we can plug this into the SDP and it will return the optimal values of b_i for this particular choice of c_{k,d}. Perhaps this will produce better values for b_i ?

    • Moses Charikar Says:

      I think I know what was causing the file access problems. To access the dual solutions I put up, you needed to use this link:
      https://webspace.princeton.edu/xythoswfs/webui/_xy-2045882_1-t_TWwKmqrU
      This contains a password that gives you access to the directory. After your browser displays this page, the URL it displays in the address bar is different. If instead you copy this new URL into a new browser window, it takes you to the Princeton webspace login page.

      I’ve changed the access permissions, so you can now access the directory by simply following this link:
      https://webspace.princeton.edu/users/moses/EDP/
      Should have done this in the first place.

    • Klas Markström Says:

      Moses, what kind of machine can your solver be used on? If it runs on linux, and can be run by someone not too familiar with it, I can provide some larger machines and try to push it a few steps further.

      Are there any good parallel SDP-solvers?

    • Moses Charikar Says:

      Sure, it would be nice to have access to larger solutions. I used some publicly available solvers. The code is available and you can install them on linux. I had to install LAPACK and BLAS to compile. It’s easy to run them even if you know nothing about SDPs. I have code that produces the input file, you invoke the solver from the command line, and it outputs the solution to a file.

      The solvers I used are available here:
      http://www.mcs.anl.gov/hs/software/DSDP/
      https://projects.coin-or.org/Csdp/
      DSDP seemed to perform slightly better than CSDP for these instances.

      There are some implementations of parallel SDP solvers, but I don’t have experience with them (and I imagine it would take more effort to get them up and running):
      http://sdpa.indsys.chuo-u.ac.jp/sdpa/software.html#sdpara

    • Moses Charikar Says:

      Klas, my reply to you is stuck in the moderation queue presumably because there were too many URLs in it. So let me repeat sans URLs:

      Sure, it would be nice to have access to larger solutions. I used some publicly available solvers. The code is available and you can install them on linux. I had to install LAPACK and BLAS to compile. It’s easy to run them even if you know nothing about SDPs. I have code that produces the input file, you invoke the solver from the command line, and it outputs the solution to a file.

      The solvers I used were DSDP and CSDP (you can find them easily).
      DSDP seemed to perform slightly better than CSDP for these instances.

      There are some implementations of parallel SDP solvers, but I don’t have experience with them (and I imagine it would take more effort to get them up and running). One such solver I found was SDPARA.

    • Klas Markström Says:

      Moses, Ok I’ll look into it, I have a busy day today so it might hae to wait until the weekend. What is mainly limiting your runs now, memory or time?

      It seems that DSDP just needs to link to scalapack in order to become a parallel solver as well.

    • Anonymous Says:

      Time is the bottleneck, but the memory requirement is also going to be prohibitively large for larger n. The 1500 size problem used up 2G of memory I think and it rises quadratically in the problem size.

    • Klas Markström Says:

      Moses, could you send me, or put one the web, the code for producing the input files?

    • Moses Charikar Says:

      Sure, the code is here:
      https://webspace.princeton.edu/users/moses/EDP/disc4.c

      It takes the problem size as an argument and produces the SDP formulation in SDPA format – all solvers are able to read this.

    • Klas Markström Says:

      Moses, I’ll try to compile the parallel version of DSDP this afternoon. Which of the output options would you want in order to compare with what you have already done. The solution and dual seem obvious to output, but is there anything else that will be useful?

    • Moses Charikar Says:

      Klas, I’ve been using the default output. I use this:
      dsdp5 input-file -save output-file
      If you want the code to print out reassuring messages that progress is being made, use something like this:
      dsdp5 input-file -save output-file -print 1 -dloginfo 10
      We have code to parse the output.

      Most likely, we will want to solve other kinds of SDP formulations soon (TBD), but it would be good to know how large an instance we can hope to solve in reasonable time with the older SDP formulation.

  14. Sune Kristian Jakobsen Says:

    As Moses commented ealier, we must have \sum_{n}b_{n!}\leq 1 (to be precise: Moses only wrote explicitly that it was finite, but his argument shows that it is less than 1). The only thing we needed in this argument is that n! divides (n+1)!, so more generally: If a_n is a sequence such that a_n|a_{n+1} then \sum_{n}b_{a_n}\leq 1. At the same time we know that for any k, we can find a bounded discrepancy sequence that is 0 on multiples of k and takes the value 1 or-1 for all other terms. This implies that for any k, \sum_{n, k\nmid n}b_n< \infty. These two conditions, together with \sum_{n}b_n=\infty are pretty strong, and it took me some time to find an example, but they exist. E.g: b_{(p_n\#)^k}=\frac{1}{10(n+k)^2} for positive n,k, and b_n=0 for all other n. Here p_n\# is the product of the first n primes.

    • Moses Charikar Says:

      Here is another such sequence: Define a_1 = 2, a_{i+1} = 1+\prod_{j=1}^i a_j. Now let b_n = 1 if n = a_i for some i, else b_n = 0. This satisfies the condition, but I don’t think this is a good setting of b_n values however. Our choice of a_i ensures that at most one element of the sequence belongs to any HAP. Now consider a sequence with value 1 at positions a_i and 0 elsewhere. The discrepancy of such a sequence is 1. Hence \sum b_{a_i} \leq 1.

    • Sune Kristian Jakobsen Says:

      That sequence doesn’t satisfy \sum_{2\nmid n}b_n< \infty.

    • Moses Charikar Says:

      You’re right. I forgot about that condition.

    • Sune Kristian Jakobsen Says:

      If b_n is a non-negative sequence that satisfy these two condition (\forall k: \sum_{k\nmid n}x_n\leq 1 and for all increasing sequences a_n where a_n | a_{n+1}: \sum b_{a_n} is finite) then \lim_{n\to\infty} b_n=0: Assume for contradiction that this is not the case. Then there is a c >0 such that infinitely many terms in b is >c. Pick an integer a_1 such that b_{a_1}>c. If possible, pick a larger a_2 such that a_1|a_2 and b_{a_2}>c and so on. If we can continue this way, we get a sequence a_n such that a_n | a_{n+1} and \sum b_{a_n}= \infty, contradicting the second condition. If we can’t continue this, there is a a_i, such that a_i\nmid n for all other n with a_n>c. Since this is the case for infinitely many n’s, we have \sum_{a_i\nmid n}x_n=\infty contradicting the first condition.

      When we look at the experimental data, it doesn’t look like b tends to 0: b_1 is very small, and b_{720} i large. Again, I think this suggests that we shouldn’t try to make the sum of the b-values infinite, but only arbitrary large. Another way of looking at the problem, would be to work over the positive rational: If we divide though by 720 in the data we have, b_{720} becomes b_1 and the old b_1 becomes b_{\frac{1}{720}}.

  15. Sune Kristian Jakobsen Says:

    Do we have a plot of the b-values for N=512, 1024 and 1500 so we can compare?
    Here is one experiment I think could be interesting: Find the sequence of b-values for N=719 and N=720. I think that the largest factorial less than N take a large value, so my guess is that there might a somewhat large difference between the two b_{120} values.

    • Alec Edgington Says:

      Here is a crude attempt to compare the three b-vectors visually. The plot shows, superimposed, \log b^{(N_1)}_n against \log b^{(N_2)}_n for all 1 \leq n \leq N_1, for (N_1, N_2) = (512, 1024), (512, 1500), (1024, 1500) in red, blue and green respectively:

  16. Alec Edgington Says:

    Moses, what would happen if we looked at zero-based HAPs instead? In other words, at the quadratic form:

    \sum_{k,d \geq 1} c_{k,d} (x_0 + x_d + \ldots + x_{(k-1)d})^2 - \sum_{i \geq 0} b_i x_i^2

    Then there is only one ‘singleton’, (x_0). (I must admit I don’t fully understand why including the singleton HAPs is a problem — after all, HAPs of any odd length give a trivial lower bound of 1 on the discrepancy.)

    • Moses Charikar Says:

      For the values of n we have been able to solve the SDP, the optimum value does not exceed 1 if we exclude singletons. If we included singletons, the value would jump to 1, but in an uninteresting way and would not give us any useful structure that we could extrapolate to larger n.

      Odd length HAPs do give a lower bound for \pm 1 sequences, but not for vector sequences. Remember that the SDP produces a lower bound on vector discrepancy. e.g. it turns out that \min_{v_1,v_2,v_3} \max \{||(v_1+v_2||, ||v_1+v+2+v_3||\} is 1/2 when v_1,v_2,v_3 are unit vectors, but clearly the minimum is 1 when they are 1-dimensional unit vectors.

      So what will happen if we throw in x_0 as well ? I don’t know, but it should not change the asymptotic behavior of the SDP value with n.
      It is an interesting question.

    • Moses Charikar Says:

      That expression did not come out quite correct. It should read:
      \min_{v_1,v_2,v_3} \max \{||(v_1+v_2||, ||v_1+v_2+v_3||\}

    • Alec Edgington Says:

      Ah, I see, thanks.

  17. Moses Charikar Says:

    I wanted to clarify that the matrix*.txt files contain the matrices corresponding to the quadratic form with the diagonal term \sum_i b_i x_i^2 subtracted out. So you should be careful in reading off diagonal entries from that matrix.

    I’ve put up a bunch of other files to make sifting through the data easier. They are available here:
    https://webspace.princeton.edu/users/moses/EDP/

    cd*.txt contains the values of \tau_{2,d} which are an approximation of C_d.

    qf*.txt: contains the entries of the matrix corresponding to the quadratic form without the diagonal term subtracted out. I only specify the upper triangular matrix, with one entry of the matrix per line.

    • Moses Charikar Says:

      As you will see from the data, the C_d values are far from uniform.
      If we normalize the C_d values (by dividing by C_1), they start looking a lot like d(i)/i (within a factor of 2 say).

    • Moses Charikar Says:

      One more thing I should have mentioned about the qf*.txt files corresponding to the quadratic form. There are three numbers in the lines corresponding to diagonal terms: For the i,i entry the first number is the coefficient of x_i^2 in the quadratic form, the second is the b_i term that is subtracted out from this entry, and the third is what remains after the subtraction.

    • Moses Charikar Says:

      Here is some data in support of the hypothesis that d(i)/i is a good predictor of C_i. Look at the files cd-pattern*.txt here:
      https://webspace.princeton.edu/users/moses/EDP/

      Each line has the following format:
      i……..C_i/C_1………..i*C_i/C_1……..d(i)

      Note that we use the tails \tau_{2,i} from the SDP solution as an approximation of C_i. Our working conjecture is that \tau_{2,i} = C_i e^{-2\alpha i}, so this approximation gets progressively worse with i.

      Also, the quantity \alpha in the exponent seems to be a function of n. In Huy’s data, the value of \alpha (from the curve fitting for \tau_{k,1}) is between 3/n and 4/n.

      This tells you how bad the approximation \tau_{2,i} = C_i e^{-2\alpha i} \approx C_i gets for large i.

    • gowers Says:

      I’m trying to get my head round the C_d data, and this too is quite hard. Initially, as you say, they look roughly proportional to d(i)/i, but this pattern doesn’t last once the values get large. For example, on the 512 list, if I multiply the 100 value by 100 I get roughly 0.1. Since d(100)=9, one would expect that to be about 9 times as big as the value at 1, which is about 0.07. But 0.1 is well short of 0.63.

      Perhaps more informative is to compare the values for C_d that come from the 512 list with the values that come from the 1024 list. Their ratios are definitely not the same, but their general behaviour is pretty comparable, at least to start with. However, when you get to the mid-80s, their behaviour starts to diverge. Here is a chunk of values from the 512 list, with the 1024 values to the right:

      80: 0.002609483…………..0.005165121

      81: 0.004640467…………..0.005737507

      82: 0.000066366…………..0.000311586

      83: 0.000167644…………..0.000153519

      84: 0.006531641…………..0.006488271

      85: 0.000480438…………..0.002050214

      86: 0.000197278…………..0.000293866

      87: 0.000144329…………..0.000546010

      88: 0.000761433…………..0.001270844

      89: 0.000257089…………..0.000254195

      90: 0.002463915…………..0.006283851

      Note that not only are the ratios far from constant, but there are instances (like 86 and 87, for instance), where even the ordering on the 512 list is different from the ordering on the 1024 list.

      I don’t know how seriously to take this, however. I would naively expect that we should not be able to read too much out of the experimental data about the behaviour of C_p for large primes, say, because their impact on the problem should be typically fairly small, but sometimes, by a fluke, so to speak, the obstacle to low discrepancy will just happen to involve a particular large prime. This is possibly borne out by the data: for example, on the 512 list we have values of 0.000000071 and 0.000000079 at 101 and 103, respectively. On the 1024 list the values are 0.000089463 and 0.000044972 — a different order of magnitude. However, this, I would contend, is because after a while one stops using large primes to prove large discrepancy, and the moment arises earlier if you’re just looking at sequences of length 512 than it does if you’re looking at sequences of length 1024. This is, I would imagine, closely related to the phenomenon that we are already very familiar with: that when you look at finite examples you get a lot of structure on the smooth numbers and quite a lot of rather arbitrary behaviour on the non-smooth numbers. For instance, we saw it with low-discrepancy multiplicative functions starting out character-like and ending up non-character-like until one is overwhelmed by the effects of the “cheating” corrections. It may well be that something like this also applies to the b_i.

      This suggests to me a possible way of cleaning up the data. Perhaps we could do an additional Cesaro-style averaging. That is, instead of looking at the 512 problem, one could average over all the problems up to 512, so that in a certain sense the weight associated with each integer decreases linearly to zero. Of course, I’m not suggesting doing 512 separate analyses and averaging the results. As a matter of fact, I’m not entirely clear what I am suggesting, since if all we do is attach weights to the HAPs (giving more weight to ones that end earlier, say), then the SDP will just cancel those weights out when it chooses its own weights c_{k,d}. And if we attach decreasing weights to the elements, then we run into the problem that our problem becomes too smooth and character-like functions can have bounded discrepancy. I’ll see if I can come up with a suitable cleaning-up process, but I don’t see it at the moment.

      For now, I suppose the main point I’m making is that in understanding the data we probably have to restrict attention to smooth numbers and extrapolate from there.

    • gowers Says:

      Annoyingly, even for very small or very smooth numbers the behaviour isn’t what one would ideally like. For instance, if C_i really is given by d(i)/i then both C_i and iC_i should be multiplicative (but not completely multiplicative) in i. There are some signs of this, but there are also several disappointments. For example, C_{12} is quite a lot smaller than C_3C_4. (Here I’m using the data from this page, the interpretation of which Moses explains in the comment before last.)

    • Anonymous Says:

      Tim, some of the discrepancy you point out for the value C_{100} in the 512 list can be explained by the fact that I used \tau_{2,100} as a proxy for C_{100}. But \tau_{2,100} \approx e^{-200\alpha}C_{100} where \alpha is between 3/512 and 4/512. Say \alpha = 3/512. Then the value of the approximation for C_{100} we used is a factor e^{-600/512} smaller than the real value which works out to about 0.3.

  18. gowers Says:

    Here’s a list of numbers n such that b_n is very small. Alec has already done something like this, but I want to set the threshold quite a bit higher than he did, partly because the resulting set seems to have more structure, and partly in the light of Moses’s remark that the small values should not be taken too seriously. (I presume this means that the actual values should not be taken seriously, but that their smallness is probably a genuine phenomenon.) I’m working from this wiki page. A couple of question marks means that the decision was a bit borderline.

    1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 17, 23, 37, 41, 46, 47, 49, 53, 55, 61, 71, 73??, 79, 82, 83, 98, 101, 103, 106, 107??, 111, 113, 122, 127, 134 (the first truly annoying value, since b_{67} is large, so it breaks the pattern that any number on this list should have all its factors in the list as well), 137, 139, 145 (a second counterexample, since b_{29} is large), 149, 151, 159, 166, 173, 179, 181, 188, 191, 193, 194 (b_{97} is pretty large), 197, 199, 203, 205, 223, 227, 229, 233, 247, 251, 254, 259, 263, 269, 271, 274, 277, 278, 281, 283, 289, 293, 307, …

    The fact that there seem to be several pairs of twin primes (such as 281 and 283) on the list suggests that there probably isn’t any congruence condition on the set of primes that shows up in this list (as does the fact that all the primes up to 11 are on it). Here, however, is a list of primes that do not appear on the above list of numbers: 13, 19, 29, 31, 43, 59, 67, 89, 97 (though twice 97 does appear), 109, 131, 157, 163, 167, 211, 239, 241, 257, …

    Plugging the beginnings of these lists into Sloane’s database doesn’t seem to throw up anything interesting, though I did discover the following utterly irrelevant fact: the numbers 1, 2, 3, 4, 5, 6, 7, 9, 10, 11 and 17 — that is, the first eleven numbers on the first list above — are also the first eleven numbers n such that the decimal representation of 5^n contains no zeros. The next such number is 18, so no conjecture there …

  19. gowers Says:

    I’d like to throw out an idea in order to see whether there is an obvious objection to it. The character-like functions have the property that their mean-square discrepancy is unbounded, in the following sense: the average of (x_1+x_2+\dots+x_m)^2 over all m\leq n is at least c\log n for some positive constant c. Here, we only have to consider the partial sums because the function is completely multiplicative. However, we could hypothesize that a more general statement holds, namely that for every \pm 1 sequence of length n there exists some d such that the mean-square sum along the multiples of d is at least c\log n (or at least is unbounded). If that is true, then we might be able to prove it using SDP. And I think that is equivalent to insisting that the coefficients c_{k,d} are independent of k. At any rate, if we can find non-negative coefficients c_d such that \sum_dc_d\lfloor n/d\rfloor=1 and non-negative coefficients b_i that have unbounded sum, such that the quadratic form

    \displaystyle \sum_{kd\leq n}c_d(x_d+x_{2d}+\dots+x_{kd})^2-\sum_{i=1}^nb_ix_i^2

    is positive semidefinite, then we are obviously done. The point of this observation is twofold. First, it should be possible to do the computations more quickly (since one has far fewer coefficients to work out). Secondly, I think this restriction imposes a kind of smoothing effect of the sort I was trying to guess in this comment, and therefore could potentially lead to coefficients with a more obvious structure.

    This comes into the category of “experiment that I would very much like to see done even though it has a positive probability of having a serious flaw and being completely uninformative”.

    • gowers Says:

      On second thoughts, I don’t see why this should be smoother than c_{k,d}=C_de^{-\alpha kd}. But that makes me wonder about something else.

      First, if we make the simplifying assumption that c_{k,d}=C_de^{-\alpha kd} and then try to optimize the choice of C_d subject to that assumption (choosing \alpha to be -3/n, say), do we end up with a computational problem that is significantly smaller? Prima facie it looks as though it should, as now we are trying to work out linearly many coefficients instead of quadratically many. But my understanding of SDP is not good enough to be sure that this is correct. It would be nice if it did, since then we could look at much larger matrices and be reasonably confident that the results we got behaved similarly to the actual best possible matrices.

      Secondly, what if we don’t go for the optimal matrix at all and simply set every single c_{k,d} equal to 1/N, where N is the number of pairs (k,d) such that kd\leq n, and then try to optimize the b_i for the resulting quadratic form? How tractable is that? Note that the matrix of this quadratic form has A_{ij} proportional to \sum_{d|(i,j)}(1+\lfloor n/d\rfloor-\max\{i/d,j/d\}).

      Now that I ask the question, I remember Moses saying something about needing to go at least as far as 2^{70} in order to prove a discrepancy of 2, or something like that. So obviously there is no hope of seeing the sum of the b_i getting large, but we just might get the sequence itself having a nice structure.

    • Moses Charikar Says:

      Tim, I haven’t thought through what you are suggesting and I am about to sign out for the next several hours, but let me address your comment about whether the problem becomes significantly easier if one guesses values for c_{k,d}. I agree that it should become easier. We had n \log n constraints to start with (each has a corresponding c_{k,d}) and now we have reduced them to n constraints, each of them is of the form v_i.v_i = 1. In fact the resulting SDP has similar structure to SDPs used for Max-Cut, a well studied problem.

      Now whether or not off the shelf solvers will be able to take advantage of this simplicity is hard to say. To illustrate some subtleties here, I actually rewrote the SDP formulation a few times (all of them equivalent to each other) before I obtained a form that seemed easy for solvers to solve large instances on. It is possible that one write custom code to solve this simpler SDP faster, but that would involve significant effort starting from scratch. More likely, one may be able to use SDP code for Max Cut if such specialized code is available.

  20. Moses Charikar Says:

    Going back to something Tim discussed earlier in this comment , I want to mention something interesting about the matrix he was trying to compute. To summarize, we consider the matrix A_{mn} = d((m,n)) and consider the effect on the Cholesky decomposition when we subtract an infinitesimal \delta from A_{11}.

    Firstly, the Cholesky decomposition of A = V^T V, where V_{ij} = 1 if i|j and 0 otherwise. Let V+\delta W be the new decomposition, where we ignore quadratic and higher terms in \delta.
    I wrote some code to compute W and the result was a little surprising to me. The entries of this matrix going up to 10, 100, 1000 are given here:
    https://webspace.princeton.edu/users/moses/EDP/w10.txt
    https://webspace.princeton.edu/users/moses/EDP/w100.txt
    https://webspace.princeton.edu/users/moses/EDP/w1000.txt
    Each line is of the form:
    i \ldots j \ldots 2*W_{ij}
    (The values should be divided by 2 to obtain the entries of W that Tim was looking at). Here is what was surprising to me: the minimum value in the file for size 10 is -2, the minimum for size 100 is -3 and the minimum for size 1000 is -4. I had expected to see much lower values than that.

    • gowers Says:

      Let me mention that yesterday I did some more hand computations of this matrix, and observed that, for each k>1, once you get beyond the diagonal, the kth row of the matrix is a linear combination of the vectors 11111111111…, 0101010101…, 001001001… up to the kth. That is, it is a linear combination of the HAPs up to k, which are of course the rows of V. I’m sure there’s a simple proof of this, but I haven’t yet found it — it just seemed to work for the beginnings of the first few rows.

      I think it partially explains why the entries grow quite slowly: in order to get a big entry it has to be at a number with a lot of factors and in addition to that the coefficients have to line up.

      Going back to the form of U, this is saying that U is obtained by multiplying V by a lower triangular matrix and then getting rid of everything below the diagonal. I’d be interested to know what that lower triangular matrix was.

  21. Gil Says:

    Looking at l2-average discrepency is a very nice variation and indeed it reduces the number of coefficients. Now, can it be true that the l2-norm over all partial sums and all d’s is small as well?

    Namely, maybe not only there exists some d such that the mean-square sum along the multiples of d is at least c\log n (or at least is unbounded), but the mean square sum for all partial sums for all ds together is at least unbounded.

    • gowers Says:

      Here is a small sanity check. One might think that perhaps it was too crude to give equal weight to all HAPs, since for example there are n HAPs of length 1. So let’s see whether the contributions from these HAPs to the average is likely to be small.

      We can approximate the number of HAPs of length m by n/m. If we do, then we see that the number of HAPs of length between 2^k and 2^{k+1} is roughly independent of k as k runs from 0 to \log_2n. Therefore, about half of all HAPs will have length between n^{1/4} and n^{3/4}, so if the mean-square discrepancy along these is logarithmic, then the entire mean-square discrepancy is logarithmic. And that seems like a fairly reasonable conjecture.

      However, by “fairly reasonable” I don’t mean that I’ve tried to test it in any way. Perhaps it would be worth thinking about whether we can construct a \pm 1 sequence of length n such that the mean-square discrepancy over HAPs with length between n^{1/4} and n^{3/4} (or any two powers of your choice) is bounded. An additional benefit of this problem is that it would force us to think about EDP in a more global, large-n, big-difference-HAP sort of way.

      Actually, I’ve just realized that if you have unbounded mean-square discrepancy for EDP, then you probably do for this problem as well. The only way it could fail would be if for every smallish common difference d the partial sums along mutliples of d started out possibly being unbounded but after you reached n^{1/4}d they began to control themselves. But we know that there would have to be little pockets of unbounded drift — consider, for example, what happens to HAPs of common difference at most m after the point x_{m!}. This isn’t a proof — just an argument that the mean-square discrepancy for HAPs of length between n^{1/4} and n^{3/4} is likely to be unbounded.

    • Moses Charikar Says:

      I tried solving the SDP having fixed c_{k,d} = 1 for all k,d. This means the objective function of the SDP is
      \displaystyle \min \sum_{kd<= n} (v_d + v_{2d} + \ldots + v_{kd})^2
      subject to the constraints v_i^2 = 1 for all i.

      Actually I normalized so that \sum c_{k,d} = 1. This is a much easier problem to solve (this should be true for any fixing of the c_{k,d}). I was able to solve instances of size up to 4096 and we could probably go up to the next power of 2 if we wanted. Here are the optimal values I got:
      256 0.4548
      512 0.4686
      1024 0.4797
      2048 0.4897
      4096 0.4982

      The b_n values in the SDP dual solution are available here:
      http://webspace.princeton.edu/users/moses/EDP/try1/
      The files have the format:
      ib_i … #divisors(i) … factorization of i

      Note that some b_i values are negative. This is because I used the constraint v_i^2 = 1 in the SDP. If I had used the constraint v_i^2 \geq 1 instead, all of them would have been non-negative.

  22. Kartik Sivaramakrishnan Says:

    Moses, what software are you using to solve the SDP? Are you using
    an interior point code? I must mention that there are other fast approaches to
    solve an SDP approximately. For starters, you might want to look at SDPLR
    that is available at
    http://dollar.biz.uiowa.edu/~sburer/www/doku.php?id=software#sdplr
    Of course, these techniques only produce approximate solutions not
    the high accuracy that interior point methods provide. However, since
    you are only using the SDP as a relaxation to your original problem, I assume
    this is not a problem.

    You mention that the SDP has a structure similar to the one Goemans-Williamson
    used in their max-cut algorithm. This is the SDP where the only constraints are that X_ii = 1 for all i and X is psd. Is this it or are there other constraints in your
    SDP? I must mention that SDPLR is well equipped to solve a problem with this
    “max-cut” like structure. If you provide some more feedback, then I can be of greater help.

    • Moses Charikar Says:

      Kartik, I used DSDP to solve the SDP. I’ve also tried CSDP which works well too. What we are really looking for is the dual solution. Does SDPLR generate this ?

      The min average square discrepancy problem is equivalent to max-cut in the following sense: Construct a weighted graph on n vertices, one for each element in the sequence. The edges of the graph are a superposition of cliques, one for each HAP (where the clique is on the elements contained in the HAP). Now the optimum value of the max-cut problem on this graph is linearly related to the optimum value of the min average square discrepancy question. The same connection holds for the SDP formulations of the two problems.

  23. gowers Says:

    Another idea I’d like to put forward, though it’s not obvious that it can be made to work, is to combine SDP with infinitary methods somehow. The motivation for doing this is that it might be easier to use soft methods and just prove unboundedness.

    I thought of this because of the aspect of Moses’s method that he mentioned above: that the decay rate \alpha depends on the size of the problem n: it seems that \alpha is around c/n, where c is between 3 and 4. That means, if EDP is true, that the best SDP proof is rather heavily dependent on n. So it’s not clear that we get anything if we let n tend to infinity: the normalizing factor also tends to infinity, so all the coefficients tend to 0.

    But this doesn’t completely rule out some renormalization that would give rise to an infinitary proof. Now one would hope for a decay rate of zero — that is, no decay, and consequently easier computations. But instead of taking a sum, we would have to take an average. And of course this wouldn’t be an average in the conventional sense, as there are infinitely many coefficients c_{k,d} but rather, some kind of clever average that we would have to devise. For instance, we could take averages over increasingly large sets of pairs (k,d) and take some limit. Or we could take the limit of C_d(x_d+x_{2d}+\dots+x_{kd}) along some carefully constructed ultrafilter on \mathbb{N}^2. Or we could do something else of this general kind.

    We would then hope to be able to find an infinite sequence (b_m) that sums to infinity such that subtracting the form \sum_mb_mx_m^2 still leaves one with a positive semidefinite quadratic form. (We would have to make sense of this statement.)

    It isn’t obvious what any of this means, but there seems to be at least a chance that the calculations might be easier, if one could find a meaning, than they are in the finitary case.

    • gowers Says:

      Let me say what I think the infinitary matrix should look like and why. First, if \tau_{k,d}=C_de^{-\alpha kd}, then c_{k,d} should be minus the “derivative” of \tau_{k,d} with respect to k, or \alpha dC_de^{-\alpha kd}. Letting \alpha tend to zero and “renormalizing” by dividing by \alpha we get c_{k,d}=dC_d. Now Moses suggests that C_r looks like d(r)/r, which gives us c_{k,r}=d(r).

      The corresponding quadratic form is therefore \sum_{k,r}d(r)(x_1+\dots+x_r)^2, which has matrix (A_{mn}), where A_{mn}=\sum_{r|(m,n)}d(r)\sum_{kr\geq\max\{m,n\}}1. Of course, this is infinite, but the sum over k seems to be more or less the same infinity for every m,n,r, so renormalizing again let’s go for A_{mn}=\sum_{r|(m,n)}d(r). This counts the number of pairs r,s such that r|s and s|(m,n). Equivalently, it counts the number of ordered factorizations of (m,n) as rst. The first few values of this function are 1,3,3,6,3,9,3,10, and it turns out to be a reasonably well-known arithmetical function denoted by d_3(n). (This last piece of information I found out by plugging these sequence values into Sloane’s database.)

      If V is once again the matrix with ijth entry V_{ij}=\mathbf{1}_{i|j}, and if D is a diagonal matrix with D_{nn}=d(n), then this matrix A factorizes as V^TDV. I’m not sure whether that is of any use to us, but at least it is reasonably clean.

      So a question I’d like to ask now is whether, in some purely formal sense, we can write V^TDV as U^TU+E where E is a nice big positive diagonal matrix. That wouldn’t be an immediate solution to the problem, but it might suggest a way forward.

    • gowers Says:

      Actually, I’m not so sure about the second of those “renormalizations” so let me try again. We know always that, whatever the choice of c_{k,r}, the matrix entry A_{mn} equals \sum_{r|(m,n)}\sum_{kr\geq\max\{m,n\}}c_{k,r}, which equals \sum_{r|(m,n)}\tau_{\max\{m,n\}/r,r}. If \tau_{k,r}=C_re^{-\alpha kr} then this gives us \sum_{r|(m,n)}C_re^{-\alpha\max\{m,n\}}. If in addition C_r=d(r)/r, then on letting \alpha go to zero we get \sum_{r|(m,n)}d(r)/r. I’m in a slight hurry now, so I have no time to think about why this extra factor of 1/r has come in, but it feels more correct to me.

      The effect on the above discussion if we go for this matrix is that the diagonal matrix D should have entry D_{nn}=d(n)/n. For what it’s worth, \sum_{r|n}d(r)/r is the Dirichlet convolution of the divisor function with 1/x, all divided by n.

    • gowers Says:

      Now I’m a bit confused. If our quadratic form is V^TDV for some positive diagonal matrix D, then it is also W^TW, where W=D^{1/2}V. But the nth row of W is supported on the multiples of n, so for this to work it would seem to require us to be able to find for each d some linear combination of x_d,x_{2d},\dots in such a way that for every \pm 1 sequence at least one of those linear combinations was large. (I’m leaving unanswered the question of how one would make sense of this in the infinitary world.) But that doesn’t seem very plausible.

      In fact, we can say explicitly what W is. The ijth entry is d(i)/i if i|j and 0 otherwise. This means that we would be attempting to show that there exists r such that \sum_m d(mr)x_{mr}/mr was large. Or at least I think that’s what we would be attempting to show — I haven’t carefully checked.

      The reason this doesn’t feel plausible is that in the finite case it definitely fails: if you’ve got n rows supported on the first n HAPs, then you can find a sequence that is killed by the first n-1 rows, and the effect of the last row will not be a drastic one.

      I need to think through the ramifications of this, but I may not get a chance to do so for some time.

    • gowers Says:

      Ah, I’ve seen it now. A model in which the tails \tau_{k,r} don’t decay is not a good model because it tells you that only the last c_{k,r} is non-zero. So it’s not all that surprising that it was defective in the way I said.

  24. gowers Says:

    Again, no time to think through properly what I am saying, but here’s a remark about the linear-algebraic approach that Gil and I were discussing at one point.

    Let V be, once again, the matrix where V_{ij}=1 if i|j and 0 otherwise. This is an upper triangular matrix, so we know that it is invertible. In fact, it is quite easy to invert it explicitly: the inverse is the matrix U where U_{ij}=\mu(j/i) if i|j and 0 otherwise. To see this, note that \sum_kU_{ik}V_{kj}=\sum_{k|j}\mu(k/i) (where we’ll interpret \mu(k/i) to be zero if i does not divide k). This equals \sum_{r|j/i}\mu(r), which is 1 if j/i=1 and 0 otherwise.

    Once we know how to invert V, we know how to build sequences such that their partial sums along all full HAPs (that is, ones of the form \{d,2d,\dots,\lfloor n/d\rfloor d\}) are small. We also know other things like that for smaller HAPs we would ideally like to use just the first few columns of U to build low-discrepancy sequences, but they will give us sequences that are zero from some point on.

    This makes me wonder whether the columns of U could be a useful basis for looking at the problem. The idea, very roughly indeed, would be that if you want to create a \pm 1 sequence, then you are forced to use the columns of U in a certain way (in particular, you are forced to use plenty of late columns) and that should mess up some of the shorter HAPs. A disadvantage of this approach is that it might involve an understanding of \mu that is better than one can realistically to hope for.

    • gowers Says:

      Let me be slightly more specific. The jth column of U is the vector u_j=\sum_{i|j}\mu(i)e_i, where (e_i) is the standard basis of \mathbb{R}^n. If we take a linear combination \sum_ja_ju_j, then its sum along the full HAP of common difference d is a_d. Therefore, the sequences with bounded partial sums along all full HAPs are precisely the linear combinations \sum_ja_ju_j for which the a_j are bounded.

      For a sequence to have bounded discrepancy, however, we need more. We need not only for the sequence to be a bounded linear combination of the u_j, but also for the same to be true of all the projections of that sequence to the first m coordinates.

      Going back to the disadvantages, another one is that with this approach one is back with the 1, -1, 0, 1, -1, 0, … problem (unless one uses this new basis to build a better SDP or something like that).

      For what it’s worth, if you want \sum a_ju_j to equal this annoying sequence (turned into a column vector) then you have to take a_j to be 1 if \lfloor n/j\rfloor is congruent to 1 mod 3 and 0 otherwise. That one works out by using the fact that U is the inverse of V. More generally, by the way U is constructed, a_j is (as has already been mentioned) the sum of your given sequence over every jth term.

  25. gowers Says:

    One last thought. It has now been observed a few times that EDP becomes false if one restricts attention to HAPs of the form \{d,2d,\dots,\lfloor n/d\rfloor d\}. But the obvious example to show this is the sequence that is 1 up to n/2 and -1 thereafter.

    Now suppose we gradually make the task harder by allowing other lengths. (For this discussion I’ll define the length of a HAP to be the distance between its end points rather than its cardinality.) If, for example, we also allowed HAPs of length 1/2, then the obvious example is 1 up to n/4, then -1 up to n/2, then 1 up to 3n/4, then -1 up to n. It’s not perfect, but it’s pretty good.

    More generally, if we want to deal with all HAPs of length m, then it seems to be a good idea to have a sequence that has long strings of 1s and -1s and satisfies x_i=-x_m and the average over any interval of length m is zero. The first property makes whatever you do robust under small changes to the common difference, and the second means that as you shift the HAP you don’t change its drift. But note that the periodicity is not such a good idea when you come to consider HAPs with common difference m.

    Here I’m just floating the idea that perhaps one could prove EDP by picking a few different lengths and showing that they can’t all be dealt with simultaneously.

    The fact that I’m thinking about new ideas doesn’t mean I’ve given up on Moses’s approach — far from it. I’m just taking a little time out from it.

    • Alec Edgington Says:

      This approach feels well suited to the EDP over the rationals. Let’s consider \mathbb{T}-valued functions, because it makes the equations easier to write. If we want a function f : \mathbb{Q}^+ \rightarrow \mathbb{T} such that for all m, n \geq 1, \lvert \sum_{r=1}^n f(r \frac{m}{n}) \rvert \leq C, we can take f(x) = e^{2i\pi x}. That even gives us C=0; but then f(1) + f(2) + \ldots is unbounded.

      A naive attempt to fix this by replacing f(x) by f(x) g(\lfloor x \rfloor) takes us back to the original EDP over the integers, so we need to be more subtle. C = 0 feels too strong, so we have some rope to play with.

    • Alec Edgington Says:

      Correction: it only gives us C=0 if we restrict to the cases where n \nmid m.

  26. gowers Says:

    Going back to the SDP approach (I quite like the idea of using SDP to solve EDP — just spotted that), here’s an idea for cleaning it up.

    Up to now, the computer experiments have looked at the problem of finding the best possible dual function in the case of sequences of length n. But that seems to lead to problems. I don’t mean mathematical problems with the approach, but just oddnesses in the data.

    To get rid of these oddnesses, it would be good to get rid of the sharp cutoff at n. I tried one idea for doing that recently, but ended up smoothing the whole problem so much that the result became false (because character-like functions had bounded “smoothed discrepancy”). But there I was smoothing the HAPs — I now realize that that was a bad thing to do and that the sharp cutoffs in the HAPs are somehow essential to the truth of EDP (if it is true). However, that doesn’t stop us smoothing off the entire interval in which we are working.

    So what I propose is to choose a decay rate \eta, which will give the interval a “half-life” proportional to 1/\eta, which we think of as the essential size of the problem. We then try to show that there must exist k,d such that |\sum_{j=1}^k e^{-\eta jd}x_{jd}|>C. To do that, we do what we’ve been doing with the sharp cutoff. That is, we try to find non-negative coefficients c_{k,d} and b_i such that the c_{k,d} sum to 1, the b_i sum to more than C^2, and

    \sum c_{k,d}(\sum_{j=1}^k e^{-\eta jd}x_{jd})^2\geq\sum_ib_ix_i^2.

    From a computational point of view, I would imagine choosing a half-life of smething like 200 and truncating the decaying HAPs at 1500, at which point there is little chance of further terms contributing to the discrepancy. (It may be possible to choose a bigger half-life than this, come to think of it.)

    One reason to hope that this is a more natural set-up is precisely the observed decay of the \tau_{k,d}. Perhaps the computer is trying to tell us that it would have preferred an exponentially decaying ground set to one that suddenly stops at n. And perhaps the decay rate for the smoothed problem would actually equal the decay we put in in the first place: that is, perhaps \alpha would turn out to equal \eta. At any rate, it seems at least possible that this will be a more stable problem, in the sense that the values of C_d and b_i might be expected to depend in a nicer way on \eta than they behaved on n. If so, then we might be able to read more out of the experimental data.

    • Moses Charikar Says:

      I’m a little confused. This looks a lot like the smooth EDP you were trying earlier. Why doesn’t the problem with smooth EDP you pointed out here apply to the new proposal ?

    • gowers Says:

      The main difference here is that the coefficient c_{k,d} depends on k (and therefore involves HAPs that are suddenly cut off at k) whereas before I had coefficients c_{\beta,d} that depended on a parameter \beta that always gave rise to something smooth.

      To put the point another way, if \eta is small and kd is a lot smaller than \eta^{-1}, then HAP of length k and difference d is basically counted here, whereas in the previous set-up I would have had a smoothly decaying HAP with half life \beta^{-1} or so.

      Yet another way of putting the point is that if you work out the coefficient of A_{mn} here then it will involve \max\{m,n\}. I haven’t got time to work out exactly what it is right now, but will do so soon.

      And a final way of putting it is that here I am smoothing n but I am not smoothing k.

      Also when I get a spare moment, I’ll work out what happens for character-like functions and post the results. That should make everything completely clear.

    • gowers Says:

      Here are the two computations promised earlier. The first is of the matrix entry A_{mn}. Since all we are doing is replacing the variable x_n by e^{-\eta n}x_n, and since the previous entry was \sum_{d|(m,n)}C_d\tau_{\max\{m,n\},d}, it is now \sum_{d|(m,n)}C_d\tau_{\max\{m,n\},d}e^{-\eta(m+n)}.

      It’s a bit complicated to work out the discrepancy of a character-like function (though if I felt less lazy I think I could approximate it reasonably well), but here is an informal argument that it is unbounded.

      I’ll begin by discussing roughly how big the sum \sum_{j\leq k}e^{-\gamma j}\mu_3(j) ought to be. Recall that we work this out by splitting the integers up to k according to the highest power of 3 that divides them. Then the sum over non-multiples of 3 is something very small (at most e^{-\gamma}-e^{-2\gamma} I think), plus an extra contribution of e^{-\gamma k} if k is congruent to 1 mod 3. Provided k is not much bigger than \gamma^{-1}, this extra contribution will be substantial. We now apply the same to the multiples of 3 that are not multiples of 9, and we get a pretty similar calculation, but this time the extra contribution is negative.

      What this demonstrates is that at least up to around \gamma^{-1} the calculation is more or less the same as it is when there is no decay. Since the contribution up to \gamma^{-1} is a constant proportion of the whole lot, this means that unbounded discrepancy in the usual case more or less implies unbounded discrepancy in this case too.

      Where I hope that this new expression scores over the old one is that the influence of non-smooth numbers should fade out gradually rather than suddenly. For example, if we are looking at a prime p, it won’t make a big difference whether 2p>n (because there won’t be an n but just a half-life of around n).

    • Moses Charikar Says:

      Here is another way to think about Tim’s proposed smoothing. To repeat what he said, we want to find non-negative coefficients c_{k,d} and b_i such that the c_{k,d} sum to 1, the b_i sum to more than C^2, and
      \displaystyle \sum c_{k,d}(\sum_{j=1}^k e^{-\eta jd}x_{jd})^2\geq\sum_ib_ix_i^2.

      Now substitute y_i = e^{-\eta i} x_i and
      b'_i = e^{2\eta i} b_i. Then the problem is equivalent to proving that
      \displaystyle \sum c_{k,d}(\sum_{j=1}^k y_{jd})^2\geq\sum_i b'_i y_i^2.
      where the c_{k,d} and b'_i such that the c_{k,d} sum to 1, and \sum e^{-2\eta i}b_i' \geq C^2. In other words, this is almost the same as the original problem except that the weighting in the objective function is changed to give exponentially decreasing weights to the b_i‘s.

    • gowers Says:

      That’s a nice way of looking at it. Just to be absolutely clear, I would also add that a second very important difference, in a sense the main point of the modification, is that the problem is now infinite, though in practice, when programming, one would truncate at an n for which e^{-\eta n} is small — so it is “morally finite”.

  27. Kartik Sivaramakrishnan Says:

    Moses, DSDP is an interior point solver. Yes, I believe SDPLR also generates
    the dual solution. Although, DSDP exploits the low rank of the data matrices
    in the SDP, SDPLR should be a lot faster on SDPs that have a max-cut
    like structure.

    Hans Mittlemann has done some benchmarking on the SDP solvers.
    See http://plato.asu.edu/ftp/sdplib.html
    You can see the results on maxG11, maxG32, and maxG51 which
    are SDP relaxations arising from the max-cut problem. SDPLR is the fastest
    on these problems. These problems are of small size. For
    large matrix sizes, SDPLR should be a lot faster. However, as I mentioned
    before, the primal/dual solutions will be less accurate.

    I was reading some of your earlier comments and you mention
    that the SDP for N = 1500 took more than a day to solve. Is N here the size
    of the solution matrix in the SDP? The simplest SDP relaxation for the maxcut problem has only N equality constraints (X_{ii} = 1) and the X psd requirement.
    Is this the SDP that you are trying to solve? If this is the case, I am surprised
    that it is taking this long.

    Can you post the coefficient matrix for the SDP (the Laplacian matrix or
    the adjacency matrix for the max-cut relaxation somewhere)? I am assuming that this is the only input that is needed. I can try to run the SDP relaxation through SDPLR for you. Please let me know if there are any additional features in the SDP as well.

    • Moses Charikar Says:

      Kartik, the SDPs we were trying to solve earlier do not have a Max-Cut like structure. The newer ones, where we guess values for all c_{k,d} do. The earlier ones that took a day to solve have n \log n constraints, one corresponding to each HAP, in addition to the n constraints X_{ii} = 1, As expected, the new ones are much easier to solve (see here ). At this point, we haven’t settled on what good guesses for these c_{k,d} values ought to be. So for some time, we might still need to solve SDPs that are more complex than Max-Cut like formulations.

  28. Alec Edgington Says:

    I see that there is an answer on Math Overflow to the question Gil asked about the number of \pm 1 sequences of length n with all partial sums bounded by C. If I understand it correctly the answer is approximately (2 \cos \frac{\pi}{2C+2})^n.

    This means that if we make the crude assumption that being completely multiplicative is an ‘independent’ event from having partial sums bounded by C, then the probability of both occurring is about 2^{\pi(n) - n} (\cos \frac{\pi}{2C+2})^n, so that the expected number of such sequences in this model is about 2^{\pi(n)} (\cos \frac{\pi}{2C+2})^n, which decays roughly exponentially with n. I think this constitutes a kind of heuristic argument for EDP, and so it may be worth exploring whether we can prove anything in the direction of ‘independence’ of the two conditions here.

    • Gil Says:

      Dear Alec, indeed, Douglas Zare gave a very nice answer. I do plan to say a little more about the probabilistic heuristics, how should we treat its “predictions” and how maybe it can be used to create examples with low discrepency. (I think indeed the heuristic for multiplicative sequences gives similar predictions to those for general sequences.)

    • Alec Edgington Says:

      Reading Douglas Zare’s answer in a little more detail, I see that he also gives the stable distribution of the endpoints of the paths that stay within [-C, C]. The form is very nice: essentially the probability of being at m (assuming m has the right parity) is proportional to \cos \frac{\pi m}{2C+2}. Furthermore, the distribution remains stable if C is allowed to increase slowly (for example, as n^{\frac{1}{3}}).

      I don’t know if this is what you had in mind, Gil, but this may indeed be helpful in searching for sequences with logarithmic discrepancy C(n), say: given choices for x_1, x_2, \ldots, x_{n-1}, we can choose x_n to maximize the likelihood
      \sum_{d \mid n} \log \cos \frac{\pi (x_d + x_{2d} + \ldots + x_n)}{2C(n/d)+2}
      or perhaps a weighted likelihood:
      \sum_{d \mid n} \frac{1}{d} \log \cos \frac{\pi (x_d + x_{2d} + \ldots + x_n)}{2C(n/d)+2}
      (Here when I write \cos I really mean a function that is zero outside the interval [-\frac{\pi}{2}, \frac{\pi}{2}] and equal to the cosine inside it.)

  29. Gil Kalai Says:

    Let me say a little more about a polynomial approach to EDP. (I also think it is good, especially in the polymath mode, to consider in parallel various things.)

    Here is a conjecture:

    Conjecture (*) Let F be the field with p elements, p is fixed. Let a_1,a_2,\dots a_n be nonzero elements in F, and suppose that n is large enough.

    Then for every element t in F there is a HAP whose sum is t.

    Remarks

    1) If all the a_is are plus or minus one then this conjecture is equivalent to a positive solution for EDP.

    2) When we try to apply the polynomial method to EDP this more general conjecture comes up naturally and I will elaborate on this a little.

    3) The conjecture is interesting already for t=0. (I dont know if it we can reduce it to this case.)

    4) Of course, maybe it is obviously false, but I dont see it off-hand. I will be happy to see a counterexample.

    5) The probabilistic heuristics that we discussed does not conflict with this conjecture. (Even when p is roughly sqrt log n.)

    Let x_1,x_2\dots,x_n and z be variables.

    Let Q=Q(x_1,x_2,\dots,x_n,z)=\prod \prod (\sum_{i=1}^mx_{ir}-z),

    Let P=P(x_1,x_2,\dots,x_n,z)=Q\prod_{i=1}^nx_i.

    The direct approach via the polynomial method to EDP sais that if we start with Q and then reduce it to sum of square free monomials in the x_is by using x_i^2=1, then when n is large we get identically zero. One critique against such an approach is that the multilinear reduction (i.e. forcing the identities x_i^2=1 for every i is where the combinatorics is hidden and the algebraic formulation is artificial. However, if Conjecture (*) is true then the task is much simpler and algebraically more natural. Conjecture A is equivalent to

    Conjecture (**): Over the field of p elements if n is large enough P=0.

    When we describe explicitly the polynomial P it is complicated but not terribly so. For example the free term (which corresponds to the t=0 case of Conjecture (*)) is the product of the partial sums of variables over all HOP’s. For other power of t we need to take some elementary symmetric functions of such partial sums.

    • Sune Kristian Jakobsen Says:

      I think this conjecture is very interesting. We could generalize this to APs, then the conjecture would be that for any t there exists n, d, and 0< k \leq d such that a_{k}+a_{d+k}+\dots + a_{(n-1)d+k}=t. Here I think that the requirement 0< k\leq d is unnatural, and if we remove this, we get a weaker conjecture. That led me to this conjecture that is weaker than yours, but still stronger than EDP:

      Let F be a field with p elements and let a_1,a_2,\dots be an infinite sequence of non-zero elements in F. Now for every t in F there is d,n_1 < n_2 such that a_{dn_1}+a_{d(n_1+1)}+\dots + a_{dn_2}=t. (That is every element in F is a HAP-drift in the sequence)

    • Gil Says:

      Dear Sune, I like the drift version of the conjecture. Of course you can write it algebraically by adding more terms to the product in the definitions of P and Q. In my conjecture (*) the case t=0 is already interesting and perhaps there it is not needed to require that the retms are non zero. (In other words, perhaps the free term of Q is identically zero for n large enough.) For the drift case, of course, the t=0 is very easy.

    • obryant Says:

      Let G(p,t) be the length of the longest sequence of nonzero elements of GF(p) whose HAPs never sum to p, and let G(p) be the max of G(p,t) over all t.

      G(2) = 1, trivially.

      G(3) = 5, as after the first term the rest are forced: 11212 is the unique best.

      G(5,0) \geq 155, and a witness to this bound is:
      1344 4244 3444 3342 4132 1322 2334 4231 1441 1314 4241 1414 4234 2422 3342 3332 3243 3233 1334 3233 2231 2221 3234 2134 3311 4144 2442 4441 1134 3311 2123 1324 2342 1414 4141 1323 4213 2444 242

      No pattern, just found by (an ongoing) depth first search.

      For G(5,t), there’s some hope of understanding it by hand. First, since we can multiply the entire sequence by st^{-1}, we get G(5,1)=G(5,2)=G(5,3)=G(5,4). So, t=3=-2 is typical. As sequence avoiding -2 contains only terms -1, 1, 2, and by the Mathias result it must contain infinitely many 2’s. This seems like enough to get started…

    • obryant Says:

      “sum to p” should have “sum to t”, of course.

    • Gil Says:

      (the argument why G(5,i) does not depend on a nonzero i is not clear.)

    • obryant Says:

      If we have a sequence a_1, a_2,\dots,a_k with no HAP summing to t, with t\not=0, then s t^{-1}a_1, s t^{-1} a_2, \dots s t^{-1} a_k has no HAP summing to s t^{-1} t = s. Thus, G(p,s)\geq G(p,t) for nonzero s,t, and so by symmetry G(p,1)=G(p,2)=…=G(p,p-1).

  30. Kartik Sivaramakrishnan Says:

    Klas, regarding your inquiry about parallel SDP solvers

    CSDP has a parallel OpenMP version. See
    https://projects.coin-or.org/Csdp/
    I’ve used it in the past and it is easy to install.

  31. EDP15 — finding a diagonal matrix « Gowers's Weblog Says:

    […] relation to this question, it is worth looking at this comment of Sune’s, and the surrounding […]

Leave a comment