EDP11 — the search continues

I do not have too much to say about the new programme to use SDP to solve EDP, other than that it is still being actively pursued. One remark I would make is that I have rather forgotten recently about keeping the wiki updated, and SDP is a major omission from it. It would be good to have a theoretical account of the method (which I suppose I could paste quite easily from one of my existing posts), and an experimental page devoted particularly to data connected with this approach, with links to all the matrices, sequences and plots that have been produced.

So far, the data has had some expected features (such as the sequence values $b_m$ tending to be larger when $m$ is smooth) and some puzzling ones (such as the fact that $b_{13}$ is much much larger than $b_{11}$). An initial hope for the method was that the experimental data would give rise to a very precise conjecture, but so far that has not happened. There are, however, various avenues that have not been fully explored, and I still have some hope that we will suddenly stumble on some data that we can fully understand.

One way of doing this is to introduce a smoothing. One idea I had turned out to go too far: in the search for a pretty formula I ended up with a version of EDP that was false. For the record, here is that version. One can think of the discrepancy of a sequence as its largest inner product with the characteristic function of any HAP. Those characteristic functions have sharp cutoffs (in that they go up to $md$ for some $m$ and then suddenly stop), so one might hope to make the problem nicer by smoothing the cutoffs. One natural way of doing this is to look at functions such as $f_{d,\alpha}$, which take the value $e^{-\alpha k}$ at $kd$ and 0 at non-multiples of $d$. For this to be a sensible idea, we need a “smoothed EDP” to be true. That is, we need it to be the case that for every $\pm 1$ sequence $(x_n)$ there exist $d$ and $\alpha>0$ such that $\sum_k e^{-\alpha k}x_{kd}$ has absolute value at least $C$. However, this is false for the character-like function $\mu_3$. The multiplicativity of $\mu_3$ implies that it will be enough to show this when $d=1$, so let us fix some $\alpha$ and calculate the sum $\sum_ne^{-\alpha n}\mu_3(n)$.

We do this in the usual way. First, let us look at non-multiples of 3. That gives us the sum

$\displaystyle e^{-\alpha}-e^{-2\alpha}+e^{-4\alpha}-e^{-5\alpha}+\dots,$

which equals $e^{-\alpha}(1-e^{-\alpha})/(1-e^{-3\alpha})$. More generally, the sum over multiples of $3^k$ that are not multiples of $3^{k+1}$ works out as $(-1)^ke^{-3^k\alpha}(1-e^{-3^k}\alpha)/(1-e^{-3^{k+1}\alpha})$. One can check that the function $x/(1+x+x^2)$ is increasing in the interval $[0,1]$, so if we sum this over all $k$ then the alternating series test tells us that the total is at most 1/3, whatever the value of $\alpha$.

A similar argument works if we use a weight of $n^{-\sigma}$ instead, but the calculations are easier. By the alternating series test, we know that $1^{-\sigma}-2^{-\sigma}+4^{-\sigma}-5^{-\sigma}+\dots$ is positive and at most 1. Call this value $S$. Then $\sum_nn^{-\sigma}\mu_3(n)=S(1-3^{-s}+3^{-2s}-\dots)$, which is at most $S$.

At one point I observed that if you have a sequence $(x_n)$ of bounded discrepancy then the Dirichlet series $\sum_n x_nn^{-\sigma}$ is uniformly bounded for all positive real $\sigma$, and wondered if we could deduce anything useful from this. The fact that $\mu_3$ has this property shows that we certainly cannot derive a contradiction, though it leaves open the possibility of deducing that the sequence $(x_n)$ is forced to have character-like behaviour.

In the light of these observations, what hope is there for obtaining nicer formulae by smoothing? The answer is that even if we cannot smooth the HAPs, we can at least smooth the interval of integers we are thinking about as a whole. That is, instead of thinking of discrepancy as a function of $n$ (the length of the sequence $x_1,x_2,\dots,x_n$), we could take an infinite sequence $(x_k)$, associate with $x_k$ a weight $e^{-\alpha k}$, and think of the discrepancy as a function of the decay rate $\alpha$ (or equivalently the half-life, which is proportional to $1/\alpha$). That is, we would like to prove that for each $\alpha>0$ there must exist $k,d$ such that $\sum_{j=1}^k e^{-\alpha jd}x_d$ has absolute value at least $C(\alpha)$, where $C(\alpha)\rightarrow\infty$ as $\alpha\rightarrow 0$.

If we try to prove this using Moses’s SDP approach, then, as Moses points out, there is a nice formulation of the problem. We would like to find non-negative coefficients $c_{k,d}$ that sum to 1, and coefficients $b_m$ such that $\sum_me^{-2\alpha m}b_m\geq C(\alpha)^2$, such that

$\sum_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2\geq\sum_mb_mx_m^2$

for every sequence $(x_n)$. This is similar to what we were looking at before, but now we have attached some slowly decaying weights to the $b_m$.

Note that we could think of the previous version of the problem as doing exactly the same, but with weights that equal 1 up to $n$ and $0$ thereafter. I am hoping that with these smoother weights, we will get nicer numbers coming out.

I would like to end this post by drawing attention to Gil’s polynomial method. This is a completely different approach to EDP, where one constructs a polynomial over a finite field that is identically zero if and only if EDP is true. Rather than explain the idea in detail, let me link to two comments in which Gil himself gives a nice explanation. He introduced the idea in this comment and an interesting variant of the idea in this one. It would be good to add this method too to the list of possible proof techniques on the wiki.

110 Responses to “EDP11 — the search continues”

1. gowers Says:

It occurs to me that if the exponentially decaying weights do not produce nice numbers, then it would also be worth trying weights of the form $n^{-\sigma}$ for some smallish $\sigma$. But these weights would decay quite a bit more slowly, so, given the limits of computer power we face, we might have to settle for $\sigma=1/2$ or something like that. (In other words, the quantity we would be trying to maximize would be $\sum_mm^{-1/2}b_m$.) It’s a bit unsettling that when $\sigma\leq 1$ these weights sum to infinity, but I think it isn’t a serious problem.

2. gowers Says:

Moses, here is a question that you will I think be able to answer immediately, whereas for me it would take some effort. Suppose you did not exclude the HAPs of length 1 from the data. Would the effect of that (for small $n$) be that the best solution would be to take $c_{1,1}=b_1=1$ and everything else zero, or would one get a tiny improvement on that (perhaps by taking $c_{d,1}=1-\epsilon_1$ and $b_d=1-\epsilon_2$ for some very smooth $d$, and being able to choose the remaining coefficients in such a way that the sum of the $b_i$ is very slightly greater than 1?

I ask because I am wondering whether the rather strange observed behaviour of the $b_i$ at small numbers might be explained somehow by the exclusion of the very small HAPs, or more generally by “edge effects at zero”.

If that is a problem, then one could perhaps do a double smoothing: instead of just weighting the $b_i$ with an exponential decay, one could make them start fairly smoothly as well. For instance, one might choose a weight such as $\alpha^2 me^{-\alpha m}$ for $b_m$.

• gowers Says:

Correction, a better normalizing factor would be $\alpha/e$, so that the weight of $b_m$ was $\alpha e^{-1}me^{-\alpha m}$. The maximum of these weights would be 1 and the sum would be proportional to $\alpha^{-1}$. If $m=r/\alpha$, then the value of the weight is $(r/e)e^{-r}$, so the weight has order of magnitude comparable to 1 when $m$ is proportional to $1/\alpha$. So the “effective length of the interval” is still proportional to $1/\alpha$ but now it is smoothed at both ends.

• Moses Charikar Says:

Good question about excluding the singleton HAPs. I had tacitly assumed that in order to prove a lower bound $> C^2$, one should not any place on HAPs of length $\leq C$. Let’s think about $C=1$. The value of the bound is the ratio of $\sum_i b_i$ to $\sum_{k,d} c_{k,d}$. Suppose this ratio is $> 1$. Then for every $d$, I claim that it cannot be the case that both $c_{1,d} > 0$ and $b_d >0$. If this happens, reducing both $c_{1,d}$ and $b_d$ by $\epsilon$ gives a better bound (and the quadratic form is unchanged and hence still positive semidefinite). So it is possible that the bound is $> 1$ for the values of $n$ we have looked at and if so, we should see interesting structure by introducing singleton HAPs.

But now that I think about it, we actually have the flexibility to set the $b_i$ values negative if it helps us. (Technically the $b_i$ variables are dual variables for the constraint $x_i^2 = 1$. For such a constraint, the sign of $b_i$ is unrestricted. We had thought that $b_i$ negative was unnatural and relaxed the constraint to $x_i^2 \geq 1$. – a lower bound that uses this constraint continues to be a valid lower bound for the $\pm 1$ setting. Having relaxed the constraint to $x_i^2 \geq 1$, the dual variables $b_i$ are constrained to be non-negative). If we allow ourselves to be use the flexibility of setting $b_i$ negative, then by the argument above, if the bound is $> 1$, w.l.o.g. for all $d$, $c_{1,d} = 0$. In other words, if we can establish a bound $> 1$ by including singleton HAPs, then we can establish such a bound without using singleton HAPs as well.

Our earlier experiments with sign-unrestricted $b_i$ (without singleton HAPs) did not yield bounds $> 1$. This seems to indicate that we should not see interesting structure if we throw in the singleton HAPs. Perhaps we should revisit those solutions in any case to see if the $b_i$ have nicer structure.

• Moses Charikar Says:

Another way to mitigate the “edge effects at zero” might be to focus on bounding the drift instead of the discrepancy. Then we would include terms corresponding to the “shifted HAP” $id,(i+1)d,\ldots,jd$ in the quadratic form. The end points $0$ and $n$ are not completely symmetric, but perhaps they can be made somewhat symmetric with respect to step sizes $d \leq t$ by choosing n = gcd(2,3,…,t). But gcd(2,3,…,9) = 2520 is already above the limit of what we can currently solve (although we might be able to go beyond that with more powerful/parallel computing machinery).

One downside of this idea is that we get constraints (and dual variables) for every shifted HAP, giving something like $n^2 \sum 1/d^2$ number of constraints. If instead, we hypothesize that for any $d,k$, all shifted HAPs
with common difference $d$ and length $k$ should be given the same weight, then we are back to the $n \log n$ constraints we had before.

• Moses Charikar Says:

A slight extension of the idea above (motivated by an earlier comment by Alec) would be to construct the quadratic form for shifted HAPs over the interval [-n,n]. Now the end-points are symmetric, although there is still an edge effect. Multiples of $d$ appear in roughly the same number of shifted HAPs of common difference $d$ and length $k$. This number is $k$ except for those multiples of $d$ that are close to the end points. Perhaps one should only consider pairs $d,k$ such that $dk$ is small w.r.t $n$.

3. Sune Kristian Jakobsen Says:

An equivalent formulation of Gil’s field conjecture is this (this only works when we are working with HAPs):

Let F be the field with p elements and let $a_0,a_1,\dots$ be nonzero elements in F. Then there is $n,d$ with $a_0+a_d+\dots a_{dn}=0$.

Gil’s conjecture for some fixed t is trivially equivalent to this with $a_0=-t$.

• Alec Edgington Says:

Just to state the obvious, Gil’s conjecture doesn’t involve the multiplicative structure of the field at all — so would make sense for any abelian group, or even any group. What groups might it hold for? Clearly every element must have finite order for the conjecture to stand a chance. That isn’t enough, though: it’s easy to construct counterexamples in $\mathbb{Q}/\mathbb{Z}$. Can we find a counterexample in a finite group?

• Gil Says:

So what we need now is to prove the relation: $\prod_{d,i:di \le n}(x_0+x_d+\dots+x_{di})=0$, over a field with p variables, when n is large.

Does it relate to some interesting algebra? Is there a method to tell when a product of many linear terms in n variables vanishes over a field with p elements? Does it ring any other bell?

• Alec Edgington Says:

I don’t think that is quite right, because $x_0 = 1$ and $x_i = 0$ for $i > 0$ gives a product of 1. In other words, the ‘nonzero’ in Sune’s formulation is essential.

• obryant Says:

Doesn’t this start-from-zero formulation not guarantee a (proper) HAP with sum 0, since we can’t take $a_0=-0$?

• Gil Says:

You are right: we need that $\prod_{i=1}^n x_i\prod_{d,i:di \le n}(x_0+x_d+\dots+x_{di})=0,$ over a field with p variables, when n is large.

• Sune Kristian Jakobsen Says:

@obryant: You right.

• Sune Kristian Jakobsen Says:

*You -> You are

@Alec: We can’t make the conjecture more general that way, since we need that every element generate the group. If there was an element, that didn’t generated the group, we could take the constant sequence of the element. Of course we can forget about the multiplicative structure, but that doesn’t help us.

• Alec Edgington Says:

Sune, you’re right: I was just thinking about the case $t=0$ of Gil’s conjecture.

• Gil Says:

One comment about the polynomial above $\prod_{i=1}^n x_i\prod_{d,i:di \le n}(x_0+x_d+\dots+x_{di})=0$, (I hope now it is ok) is that what we want is that when p is small and n is large it gives identically 0. I think we can use Chevallay-Warning theorem to show that when n is not too large w.r.t. p (and it appears that it will be the threshold of the probabilistic heuristics) then there is a non zero solution.

• Sune Kristian Jakobsen Says:

Alec, I see. That might be true in some new cases, but at least we need the group to be finite. Each term is only contained in a finite number of HAPs, so if the group is infinite, it will always be possible to add another term at the end of a finite sequence, without creating a HAP with sum 0.

• Gil Says:

well, on further thought I do not see how Chevallay’s theorem (that requires more variables than the degree) is applicable to any interesting case here.

• Alec Edgington Says:

Sune: indeed. And actually the multiplicative structure in $\mathbb{Z}_p$ does make it more interesting. For one thing, it reduces the problem to two cases: $t=0$ and $t=1$ (since as Kevin points out, all $t \neq 0$ cases are equivalent).

In the $t=0$ case there is the possibility to consider multiplicative sequences as a special case, interesting in its own right.

However, I think that for EDP it is enough to prove the conjecture for $t=1$.

• obryant Says:

Here’s the Mathworld entry for Chevalley’s theorem.

• Alec Edgington Says:

Let me write $\Gamma(p)$ for Kevin’s $G(p,1)$. If my calculations are correct, we have:

$\Gamma(3) = 1$:

2



$\Gamma(5) = 83$:

233232432423232233433232232233233432242232233433232233232232
33323234323223343323223



$\Gamma(7) \geq 269$:

222322322325225232242525222224223243222233224526232425223225
235222322523225324322326222325242326244622224335222322443245
324324246224223332244436334432235333232444225226236225636222
322325545325222223262232245455224222225535226325322224343624
26445233232623252625222325222



(This last search is still ongoing; I’m not hopeful of getting a definite value by brute force.)

4. New ideas in Polymath5 « Euclidean Ramsey Theory Says:

[...] me update this there is another thread. There are two methods used one involves semidefinite programming. The other method which is [...]

5. Moses Charikar Says:

We have some experiments underway to try out the exponential interval smoothing proposed here . We should have some data to report soon.

I am also intrigued by the exponential “HAP length smoothing” that Tim proposed here . Basically, you build in the exponential decay $c_{k,d} = C_d e^{-\alpha kd}$ that we have observed experimentally and get the SDP to figure out the values for $C_d$. For every $d$, this is a smooth version of taking the average discrepancy for HAPs involving elements of the sequence upto $1/\alpha$. I should say that Huy pointed out earlier that while the tails $\tau_{k,d}$ seem to fall off exponentially, the $c_{k,d}$ are not monotone – in fact there seem to be some patterns there too, but for now, we are assuming (hoping) that these patterns are possibly a lower order feature of the SDP solutions that are not critical to getting a good lower bound. Trying out Tim’s HAP length smoothing idea will help build confidence in this.

We do have some data from a related idea that Tim and Gil put forth, namely to
take average square discrepancy over all HAPs, giving equal weight to every HAP
involving sequence elements upto $n$. That SDP solutions are described here , but we haven’t had a chance to look through them to see if the $b_i$ values have nice patterns.

6. gowers Says:

I’ve had another look at the $b_i$ values data on this page, concentrating on the case $n=1500$. I thought I’d try to ignore any boundary effects, and would therefore ignore very small numbers. One way of doing that seemed to be to look at multiples of 6, starting at 24 (the point at which the values cease to be tiny).

This offers the usual mix of tantalizing hints that something’s going on and counterexamples to any hypothesis one might actually come up with. It definitely seems as though $b_i$ likes to be large at smooth numbers, so it is tempting to guess that it is (typically) given by a formula of the form $b_m=\sum_{d|m}f(d)$. Any reasonable such formula would, one might naively think, involve a monotonic function ${}f$. However, this is incompatible with the data: at 42, 66 and 78 the values decrease, suggesting that ${}f$ is a decreasing function, and at 102 (that is, 6 times 17) it increases again.

I’m hoping this is another anomaly that results from edge effects and will disappear if one looks at a smoother version of the problem. Indeed, I’m hoping that the best choice for $b_m$ is indeed of the given form, perhaps multiplied by an exponential decay in $m$. Indeed, for the smoothed problem, I might conjecture (not on the basis of strong evidence) a formula of the form $e^{-\gamma m}\sum_{d|m}f(d)$ for some nice monotonic function ${}f$ (such as a power of $d$).

7. Greg Martin Says:

To exploit the captive audience here, I’ll shamelessly give another plug for using the terminology “friable” for a number with only small prime factors, rather than “smooth” as in the second paragraph of Tim’s post: to support this preference, notice how many times the word “smooth” is used in the rest of the post, with meanings having nothing to do with prime factors!

8. gowers Says:

Here is a guess about what the coefficients $b_n$ might be (or rather, might be closely related to).

I mentioned a couple of comments ago that it looks as though the formula for $b_n$ might involve an expression such as $\sum_{d|n}f(d)$ for some nice function ${}f$. (I say “involve” because I would also expect some kind of decay term as well.) I decided to think about what the constraints on the $b_n$ would tell us about ${}f$. In particular, I was thinking about the constraint that we want the sum of the $b_n$ to be infinite but the sum along non-multiples of ${}m$ to be finite for any ${}m$.

Let’s pretend we already know ${}f$ and know that it is multiplicative. What would that tell us about the sum of the $b_n$? Well, we’d get

$\displaystyle \sum_n\sum_{d|n}f(d)=\sum_d\sum_mf(md)=\sum_df(d)\sum_mf(m).$

As for the sum over all odd $n$, this is the same except that $md$ has to be odd, so the sums on the right hand side end up being over all odd $d$. Morally speaking, we end up with a sum of a quarter of the size of the original sum.

Assuming that ${}f$ decreases smoothly, this “morally” can be turned into a rigorous approximate statement. So we learn that ${}f$ can’t decrease smoothly after all.

However, we have managed to penalize the odd numbers, and more generally non-multiples of $p$ have been penalized by $(1-1/p)^2$. This suggests that we could iterate the procedure.

I’m now going to start with $f(d)=1$ for every $d$ and do the iteration. I don’t care whether things tend to infinity (they will) but I want the sum over multiples of $m$ to be much larger than the sum over non-multiples of $m$, so that I can then multiply by some decay factor and get sums over HAPs to diverge and sums over complements of HAPs to converge.

The first step is to get $f_1(n)=\sum_{d|n}1=d(n)$. The next stage is to get $f_2(n)=\sum_{d|n}f_1(d)=\sum_{r|n}\sum_{s|r}1,$ which is the number of ordered triples $(r,s,t)$ such that $rst=n$. (We’ve seen this before and it’s usually called $d_3(n)$.) The sum of $f_2$ over odd $n$ is the sum over all odd $r,s,t$ which gives us a penalty of $1/8$.

Jumping ahead, I propose stopping at a fixed point, which is the function $f_\omega(n),$ defined as the number of ordered $k$-tuples $(r_1,\dots,r_k)$ (for any $k$) of integers greater than 1 such that $r_1r_2\dots r_k=n$. Unfortunately, this is infinite, since we can just tack on 1s the whole time. But we get something nicer if we count the closely related quantity where each $r_i$ has to be greater than 1.

This is equivalent to counting the number of chains $a_0,a_1,\dots,a_k$ such that $a_0=1$, $a_k=n$ and $a_i|a_{i+1}$ for every $i$.

If we sum this function up to $N$, then we get the number of ordered tuples of integers greater than 1 with product at most $N$. And if we restrict to odd numbers, we’ll get a penalty factor that tends to zero with $N$. I haven’t yet worked out exactly what it is.

I also haven’t checked whether a function like this could satisfy the other constraints, but I feel as though it might be in the right ball park.

• gowers Says:

It occurs to me that what I’ve just been calculating is something like the principal eigenvector of the matrix whose $d$th row is the characteristic function of the HAP with common difference $d$. No time to check exactly what is going on.

• gowers Says:

Actually, I had a quick check and it was easy. At least formally speaking it’s this.

Take the matrix whose $d$th column is the HAP of common difference $d$. This is lower triangular with 1s down the diagonal. Remove the diagonal part, except for the 1 in the top left hand corner. Then ${}f$ is an eigenvector of the resulting matrix … I think.

9. Moses Charikar Says:

Here is some data from the experiments with exponential interval smoothing proposed here . That was relatively easy to do because it only involved a minor change in the SDP. (This does not implement Tim’s later suggestion of smoothing the beginning of the interval). I solved the problem for n=512 (with half life 100,200,300,400,500,1000) and n=1024 (with half life 200,500,1000). The solutions are here:

http://webspace.princeton.edu/users/moses/EDP/try2/

Briefly, the idea was that the solution should be a gracefully evolving function of the halflife parameter and for sufficiently large n (as a function of the halflife) the solution ought to be a close approximation to the solution for infinite n. Here are the values of the solutions for various combinations of n and halflife:

512, 100: 0.3545
512, 200: 0.4119
512, 300: 0.4363
512, 400: 0.4510
512, 500: 0.4616
512, 1000: 0.5510

1024 200: 0.4119
1024 500: 0.4616
1024 1000: 0.5537

At this level of precision, it looks like the solution value has settled for half life 500 at n=512. Of the solutions we have, I would say the solutions to look at are 512 (500) and 1024 (500 and 1000). The files dual*.txt in the directory above give the values of $b_i$ and tails $\tau_{k,d}$.

Some initial observations are that the $b_i$ values drop off very rapidly. It doesn’t look like the tails $\tau_{k,d}$ drop off exponentially – I’m not quite sure what the trend is. It might make sense to examine individual $c_{k,d}$ values instead.

10. Moses Charikar Says:

Another piece of data, from the SDP to minimize drift over the interval [-n,n], suggested here . I’ve assumed that for any $d,k$, all shifted HAPs $id, (i+1)d,\ldots, (i+k-1)d$ of common difference $d$ and length $k$ have equal weight and hence, a common coefficient $c_{k,d}$. Also, I’ve assumed that $b_i = b_{-i}$. As before, the SDP dual tries to maximize
$\sum b_i$ subject to the constraint $\sum c_{k,d} = 1$.
(Note that $c_{k,d}$ does not have a multiplier in this summation). The SDP considers pairs $k,d$ such that $kd \leq n$ and so far, I have solved the SDP for for n=100 and n=200. (These SDPs are harder to solve because the constraints are dense).
The solutions are here:

http://webspace.princeton.edu/users/moses/EDP/try3/

Look at the dual solutions dual*.txt. For each $c_{k,d}$, the last two items on the line are the product $kd$ and the factorization of $kd$. I think the solutions look very interesting. In particular, the $c_{k,d}$ values look like they are either very close to 0 or bounded away from 0. If we could figure out a rule to classify them into these two categories, that would be a promising start.

• Moses Charikar Says:

It seems useful to group the $c_{k,d}$ values with the same product $kd$. For each such product, it seems that at most one or two $c_{k,d}$ values are bounded away from zero.

There may be some irregularities in the data for large $i$ (i.e close to $n$). Ideally I would have liked to solve the SDP with $\max kd$ much smaller than $n$, but given that I was only able to solve for $n=200$ so far, I used pairs $k,d$ with product all the way upto $n$.

• gowers Says:

Here, in case it makes life easier for anyone, is a list of all the $k$ for which $c_{k,d}$ isn’t more or less zero, done for each $d$ up to 17. Recall that the product of $d$ and $k$ does not exceed 200.

d=1: 4,11,13,22,23,29,32,77,79,103,183,195

d=2: 2,4,13,23,84,93,99

d=3: 2,11,13,23,29,32,46,62,65

d=4: 2,4,13,45

d=5: 2,4,13,16,33,39

d=6: 2,4,7,11,13

d=7: 2,4,7,11,13

d=8: 2,4,7,11,13

d=9: 2,8,12,13,21

d=10: 2,4,7,13,16

d=11: 2,4,11,12,13,18

d=12: 2,4,11,16

d=13: 2,7,8,11,14

d=14: 2,4,7,12,13

d=15: 2,4,8,10,11

d=16: 2,4,8,10,11

d=17: 4,7,9,10,11

These pairs certainly don’t look arbitrary, but it’s also far from clear what is going on.

• Moses Charikar Says:

If you want to look through the data, the file

http://webspace.princeton.edu/users/moses/EDP/try3/cgroup200g.txt

contains the $c_{k,d}$ values from the solution for $n=200$, grouped
by $kd$. The format is
c … $k$$d$$c_{k,d}$$kd$ … factorization of $kd$

11. gowers Says:

I wanted to make a general point, in response to the seeming irregularities in the data. I have suggested that perhaps they result from sharp cutoffs, but it is not absolutely obvious that smoothing the cutoffs should suddenly result in beautiful functions with structure that jumps out at us.

Consider, for example, the values $b_p$ as $p$ runs through the primes. If the $b_m$ were given by a formula that was a product of a decay term and something arithmetical, we would expect a graph of $b_p$ against $p$ to be pleasant: probably a function that decays in a regular looking way, or perhaps one that starts by increasing but then decays. One would not expect discontinuities, or that some primes would be favoured over others (with no obvious congruence explanations). However, discontinuities and favouritism seem to abound when one actually looks at the data.

One possible explanation for this is that the $b_m$ depend quite a bit on “random” features of the primes, such as whether a prime belongs to a cluster such as 101,103,107,109, or whether it is fairly isolated. Indeed, given what it is like to try to construct long sequences with low discrepancy, it seems highly plausible that this should be the case. If it is the case, then it’s possible that there is no hope of actually working out a formula for the $b_m$, however cleverly one smooths the problem.

Another possibility is that one would get greater regularity by not allowing HAPs with small common differences. But it looks unlikely that we would be able to get any useful data on this if $n=2048$ is too large, as the large HAPs almost certainly don’t kick in for a very long time.

If all this is the case, then we may want to think about making guesses that are inspired by the figures we have without actually trying to match them exactly.

I’m not saying this is definitely the case though. One possibility is that if we use a more number-theoretic smoothing such as $n^{-\sigma}$ we will suddenly get something more number-theoretic coming out of the other end, despite not having the optimal matrix.

• Moses Charikar Says:

Yes, I agree that our goal should not be to fit the observed solution exactly,but rather to extrapolate some nice patterns that they exhibit. We could also impose on them patterns that we think they ought to exhibit (such as your guess about what the $b_m$ values should be.

It is possible that we may be able to obtain solutions for much larger instances than the ones we currently have. Klas was going to try to get an SDP solver running on a parallel machine setup and Kartik mentioned he has some experience in the past with this. We also have an impressive high performance computing setup at Princeton, but exploiting this is probably well beyond my programming skills.

• Klas Markström Says:

Actually I now have some test jobs in the machine queue, with n up to 4096.

Getting the parallel version set up was a bit trickier than expected, but after some discussion with our support staff that could be solved. I have tested the program for small n and only 8 processors and that looked fine. The machine should be capable of solving really large cases, but I’m not the only user so for large jobs there will be quite a bit of queuing time.
This is the machine which I’m using for this http://www.hpc2n.umu.se/resources/Akka/

• Klas Markström Says:

The first test case ran this night and everything went well. n=1024 took a less than half an hour using 32 processors. I hope the next few cases will start today and bring some information beyond what we already had.

12. gowers Says:

It occurs to me that I wouldn’t mind knowing what the sequence $x_1,\dots,x_n$ such that the ratio of $\sum c_{k,d}(x_d+\dots+x_{kd})^2$ to $\sum_ib_ix_i^2$ is minimized looks like. My guess is that this would be a pretty easy computational task, and I think it might give us some interesting structure. It would also be interesting to see how near/far the extremal sequence was from being a $\pm 1$ sequence.

• Moses Charikar Says:

The SDP actually produces unit vectors $v_i$ that minimize this expression. You can either view the (primal) SDP solution as the Gram matrix of a set of unit vectors, or equivalently, as a covariance matrix of a set of $n$ unit normal random variables. The question is: what is the best way to view/think of this $n \times n$ matrix.

13. Anonymous Says:

Tim suggested here that perhaps $b_n = \sum_{d|n} f(d)$. We can actually build this into the SDP dual, i.e. force the dual to have this form. I think this is equivalent to using the constraint that for every $d$, the average value of $x_{id}^2$ is 1. This is a weaker constraint than saying that $x_j^2 = 1$ for all $j$. Is it conceivable that the HAP discrepancy of length $n$ sequences (of reals, say) grows unboundedly even under this weaker constraint ? The weaker constraint rules out the problematic sequence 1,-1,0,1,-1,0,…

• Gil Says:

That’s very interesting! But I do not see even that when you fix n having average sum of squeres =1 for every HAP does not imply that the individuals $x_i^2$ are 1.

More generally, relaxing the condition on the x_i s in order to force some structure on the dual problem is a nice general idea. Some possible suggestions:

1) We can require that the sum of squeres average to 1 for HAP hose period divides n. We can even take n=m! and require only that the sum of squares along HAP with periods 1,2,…,m average to 1.

2) In a different direction we can just assume that x_i^2 are bounded below by 1

3) Or, that their average on HAP (or some selected HAPs) are bounded below by 1.

(Minimizing the discrepency forces the individual x_is to be bounded from above so we do not need to assume it.)

• gowers Says:

I think forcing the average of $x_{id}^2$ to be 1 does force all the $x_i^2$ to be 1, simply because one is solving some simultaneous equations in the $x_i^2$ that have a unique solution (because the corresponding matrix is upper triangular with no zeros on the diagonal). So one would probably have to ask for all the averages to be at least 1, or restrict the sizes of the common differences, or something like that.

• Moses Charikar Says:

Yes, forcing equality constraints for multiples of $d$ with $d=1,\ldots,n$ is too strong. I like the idea of doing this for $d$ that are small w.r.t $n$, or as Gil suggests, for $d$ that divide $n$. (this might be better in terms of controlling edge effects – but this is unavoidable to some extent, since all HAPs start at one end point). Also, I think that using an inequality constraint is better than using a hard equality constraint. Like Gil, I don’t think this ought to change the nature of the problem. But this results in the corresponding dual variables being non-negative instead of being sign unrestricted and I think we might have a better shot at interpreting non-negative values. In fact, we have been doing this already with the dual solutions we have constructed. Most of the SDPs we have solved used the constraints $x_i^2 \geq 1$ instead of $x_i^2 = 1$.

14. gowers Says:

Here’s a probabilistic model it might be interesting to think about.

Let $N$ be some large integer and let $M\leq N$. (It might be that choosing $M=N$ would be OK, but I’m not sure.) For each prime $p\leq M$, randomly choose nested subsets $A_{p,1}\supset A_{p,2}\supset\dots\supset A_{p,m}$ of $\{1,2,\dots,N\}$, where $m$ is maximal such that $p^m\leq N$, and do so in such a way that the cardinality of $A_{p,k}$ is $\lfloor N/p^k\rfloor$. (Or, if it makes things much easier, choose $A_{p,k}$ by randomly picking elements of ${}A_{p,k-1}$ with probability $p$ and keep going until you reach the empty set.) For every integer $n=p_1^{k_1}\dots p_r^{k_r}$ we then let $A_n$ be the set $A_{p_1,k_1}\cap\dots\cap A_{p_r,k_r}$.

We then think of the set $A_d$ as a randomized analogue of the set of multiples of $d$, and of the sets $A_d\cap\{1,2,\dots,m\}$ as randomized analogues of HAPs.

The obvious general question, which could be thought about either experimentally or theoretically, is this: how similar is this randomized model to what you get with actual HAPs? An even more obvious specific question is whether EDP is true in this set-up. That is, given a $\pm 1$ sequence, must it have unbounded discrepancy on randomized HAPs?

I think this question might turn out to be easier than EDP, but might also give us some useful insight into EDP because the structure of the set system is very similar. It’s conceivable that it might also give smoother experimental data, because we wouldn’t have any obvious analogue of clusters of primes.

• Alec Edgington Says:

Interesting! As a detail, I wonder if it might be more natural to take all initial segments of $A_d$, rather than all $A_d \cap \{1, 2, \ldots, md\}$, as the HAP analogue (since this would give about the right number of HAPs).

• gowers Says:

Oops, that’s what I meant to do and I absent-mindedly messed up the definition. I’ve now changed it.

I also realize that I definitely prefer the model where you don’t fix the cardinality of $A_{p,k}$ but rather you choose elements independently at random from ${}A_{p,k-1}$ with probability $1/p$. That way for each $n$ the set $A_n$ has its elements chosen independently with probability $1/n$. I haven’t quite checked but I think it also means that if $m|n$ then the joint distribution of $A_m$ and $A_n$ is that you randomly choose elements of $A_m$ with probability $1/m$ and to make $A_n$ you randomly choose elements inside $A_m$ with probability $m/n$. I’m sure lots of other statements like that are true. In other words, with this model a lot of probabilistic statements become quite neat.

• gowers Says:

Alec, it’s probably already occurred to you, but it’s only just dawning on me that this question (is EDP true for randomized HAPs?) is closely connected with the kinds of things you were asking about bounded partial sums. We would be asking for the partial sums to be bounded along every $A_n$. We know quite well what the probability is for any individual $n$, but how, for instance, would the events for $A_2$ and $A_3$ correlate? And what is the probability that the event for $A_{2^k}$ holds for every $k$?

That last question is also quite interesting in the usual HAPs case. That is, if I take a random $\pm 1$ sequence, what is the probability that the partial sums will be bounded along the multiples of any power of 2? We know that the Morse sequence has this property, but how rare is the property?

Let me write ${}E_n$ for the event that the sums along the randomized set $A_n$ are bounded. My hunch is that the events $A_1,A_2,A_4,\dots$ should behave rather like independent random variables. Let me try to justify that heuristically.

Suppose we know that the partial sums along $A_1$ are bounded and want to estimate the probability that, given that, the partial sums along $A_2$ are bounded. I’ll regard the sequence $(x_n)$ as given and $A_2$ as random. So we know that the $x_n$ are very well balanced between 1s and -1s. Therefore, when we choose $A_2$, it is rather similar to choosing a random sequence of 1s and -1s. So it looks as though the problem is rather similar to that of simply choosing a random sequence of length roughly $N/2$ and asking for the probability that the partial sums are bounded.

Moreover, one would expect that if $E_2$ holds, then the restriction of the sequence to $A_2$ will be fairly typical, so that if we look at the event $E_3$ then we can argue in a similar way.

It also occurs to me that this model may have features that are genuinely different from EDP. For example, it seems to me perfectly possible that “randomized prime EDP” could be true. That is, it could be the case that with high probability every sequence has unbounded discrepancy on some $A_p\cap\{1,2,\dots,m\}$ with $p$ prime.

The reason this could be the case is that if the event $E_p$ has probability something like $e^{-cN/p}$ and the events are independent, then the conjunction of the events has probability $e^{-cN\log\log N}$, so the expected number of sequences for which they all hold is less than 1 when $N$ is sufficiently large.

• Sune Kristian Jakobsen Says:

Let me first point out that this model is very similar to this one, but there is one important difference: That we only look at a finite number of primes at a time. I think it is an important difference (improvement), since otherwise each number belongs to infinitely many $A_n$s because $\sum p_i^{-1}=\infty$. I’m not sure exactly why this could be a problem, but I think there is a god chance that it is.

I think that the question should be: For a random system of sets $A_n$ (random in the way Tim described it) does there exist a sequence with bounded discrepancy wrt. this system of sets? That is, we should allow the sequence to depend on the set system. But of course this problem is harder to analyse, because ${}E_n$ isn’t well-defined in this question.

Here is a generalisation: Instead of using the probabilities $1/p$, we could use some probabilities $q_p\in (0,1)$, where $q_p$ is a decreasing sequence over the primes. This way we don’t get a model of EDP over the integers, but of EDP over the pseudo-integers!

• gowers Says:

I agree that what one is trying to show is that with high probability there is no sequence with bounded discrepancy on every single set of the form $A_n\cap\{1,2,\dots,m\}$. But the way I imagined one might try to prove this was by taking an arbitrary fixed sequence and proving that the probability that it has bounded discrepancy on all those sets is much smaller than $2^{-n}$.

Thanks for reminding me about the earlier proposals for randomizing EDP — I had forgotten about them.

• Alec Edgington Says:

Here is another idea for a randomized construction. Instead of constructing the $A_{p,k}$ independently for each prime and forming intersections, can we do a kind of randomized sieve of Eratosthenes to generate ‘random primes’?

That is, let $A_0 = \{1\}$; given $A_0, \ldots, A_{k-1}$, let $p_k$ be the smallest number not in any of them, and let $A_k$ consist of $p_k$ together with a random selection of other numbers each with probability $1/p_k$.

I’ve just found out that this is known as the Hawkins random sieve, and an analogue of the Riemann hypothesis holds for it, as well as lots of other nice facts: see for example this paper.

We could then proceed to define $A_{p,k}$ as above, as subsets of $A_p$, and then form intersections; or there may be some more general sieve-like construction to get general HAP analogues more directly.

Yet another, very simple, idea would be simply to define $A_d$, for any $d \geq 1$, by $\mathrm{P}(n \in A_d) = 1/d$, and take the initial segments of the $A_d$ as the HAP analogues.

15. gowers Says:

It occurred to me that although I have persuaded myself that the $b_m$ should be bigger when $m$ belongs to more HAPs, I have not actually justified that statement to myself, unless you count the rather feeble argument that points that belong to more HAPs should be more important and harder to deal with.

I tried to think of some simpler problem that would still be hard. To simplify things, let me talk about the mean-square discrepancy problem: that is, the problem where we want to show that the average over all $kd\leq n$ of $(x_d+x_{2d}+\dots+x_{kd})^2$ is unbounded for $\pm 1$ sequences. One way of doing that would be to find a sequence ${}b_1,\dots,b_m$ such that ${}\sum_ib_i$ is unbounded and prove that the average in question is at least $\sum_ib_ix_i^2$ for all sequences. In other words, we have a certain quadratic form and we would like to know not just that it is positive semidefinite (which is trivial as it’s a sum of squares) but that it is positive semidefinite with room to spare — allowing us to subtract a diagonal and still have a positive semidefinite form.

What I have just tried to do is find any non-trivial sequence ${}b_1,\dots,b_n$ such that I can subtract $\sum_ib_ix_i^2$ and preserve positive semidefiniteness. In a way it’s a hopeless task, because as soon as I manage to find a sequence it becomes, almost by definition, trivial. But there are at least levels of triviality.

To simplify the discussion, I shall approximate the number of HAPs we are considering by ${}n\log n$ (since there are about $n/d$ HAPs with common difference $d$. So to begin at the bottom level of triviality, since for each $d$ we have the singleton HAP $\{d\}$, we know that the average over all the HAPs is at least ${}(n\log n)^{-1}\sum_ix_i^2$. Indeed, this contribution is so trivial that Moses has simply subtracted it off.

However, we can do a lot better than that while still remaining in the realms of the trivial, as one might expect, since for $\pm 1$ sequences we can prove easily that the mean-square discrepancy is at least some positive constant. Indeed, observe (not for the first time) that ${}a^2+(a+b)^2=2(a+b/2)^2+b^2/2$. It follows that for any sequence ${}y_1,\dots,y_k$ we have ${}y_1^2+(y_1+y_2)^2+\dots+(y_1+\dots+y_k)^2$ is at least $y_2^2/2+y_4^2/2+\dots+y_k^2/2$ (in the case where $k$ is even) and also at least $y_1^2+y_3^2/2+y_5^2/2+\dots+y_{k-1}^2/2$. Averaging the two we find that it is at least $y_1^2/2+(y_2^2+\dots+y_k^2)/4$.

If we now sum this over all HAPs, we find that a lower bound for the sum is $\sum_mB_mx_m^2$, where ${}B_m=1/4+\sum_{d|m}1/4=(1+d(m))/4$. Dividing through by ${}n\log n$ gives us ${}b_m=(1+d(m))/4n\log n$.

Now what is $\sum_md(m)$? It’s the sum over all $d$ of the number of multiples of $d$ less than or equal to $n$. In other words, it is precisely the quantity that we approximated by ${}n\log n$. So we end up with a bound of $1/4$. (I’m fairly sure that Moses made exactly this point in one of his early comments.) Or rather, the bound is more like $1/4+(\log n)^{-1}$.

This, it seems to me, is the trivial bound one would like to beat. So a useful exercise would be to find any rigorous proof that we can subtract off some diagonal quadratic form $\sum_ib_ix_i^2$ with sum greater than $1/4+(\log n)^{-1}$ and maintain positive semidefiniteness. Probably getting a constant greater than 1/4 is already challenging enough to give useful insights into the problem (unless there is a slightly less trivial triviality I have not taken account of — I’ll think about that now).

What’s truly trivial about the above approach is that for each $d$ it subtracts off a diagonal form from ${}x_d^2+(x_d+x_{2d})^2+\dots+(x_d+\dots+x_{\lfloor n/d\rfloor d})^2$ and keeps all the results positive semidefinite. It is easy to see that such an approach cannot give a good lower bound. Somehow one has to be “daring” and do the subtraction in such a way that the resulting form inextricably mixes HAPs with different common differences and does not allow the form to decompose into separate positive semidefinite forms for each common difference. It is for this reason that I think that to make progress beyond $1/4$ will require a genuine improvement in understanding.

One final comment is that it is interesting that even in this trivial case we see that the $b_i$ are favouring numbers with more factors.

• gowers Says:

I’ve convinced myself that one can in fact improve on the constant $1/4$ using, if not a trivial argument, at least what one might think of as a disappointingly elementary one. But I think it may not be a complete waste of time as it may give another hint about how to come up with an SDP proof.

The idea is this. We are trying to improve on the rewriting of

$x_d^2+(x_d+x_{2d})^2+\dots+(x_d+\dots+x_{kd})^2$

as

$\frac 12x_d^2+(x_d+\frac{x_{2d}}2)^2+\frac{x_{2d}^2}4+(x_d+x_{2d}+\frac {x_{3d}}2)^2+\frac{x_{3d}^2}4+\dots$

Let’s just concentrate on that expression when $d=1$. We see that equality can occur (leaving aside what happens right at the end), since we can take $x_1=1/2$, $x_2=-1$, $x_3=1$, $x_4=-1$, etc.. However, this best possible sequence for the HAPs with common difference 1 is terrible for HAPs with common difference 2, and that reflects itself in the fact that if we make this choice for $x_1,x_2,x_3,\dots$ then the rewritten sum for $d=2$, which contains terms like $(x_2+x_4+x_6+\frac{x_8}2)^2$ plus diagonal terms, will be strictly bigger than its diagonal part when the non-diagonal terms for $d=1$ are all zero.

This means that if we put together the non-diagonal terms for $d=1$ and $d=2$ we will have something positive definite, so we can subtract a further diagonal contribution. I haven’t done the calculations, but I’m 99% certain that this allows us to improve on the contribution from $d=1$ and $d=2$ by a constant factor. If we now split up all the different $ds$ into pairs of the form $\{r,2r\}$ and play the same game for every pair, we should improve on the constant 1/4.

One could imagine then trying to continue to put different differences together and attempt to get something unbounded. However, getting the extra contributions to diverge would probably be quite challenging.

• gowers Says:

I’ve now worked a little harder, and can provide an actual proof that the constant is bigger than $1/4$. Let’s continue to focus on the differences $d=1$ and $d=2$. I want to group together the various terms such as $(x_1+\dots+x_r)^2$ and $(x_2+\dots+x_{2s})^2$ and squeeze more out of the diagonal. The preliminary stage is to write the sum of the $(x_1+\dots+x_r)^2$ terms as

$\frac{x_1^2}2+(x_1+\frac{x_2}2)^2+(x_1+x_2+\frac{x_3}2)^2+\dots$

$+(x_2^2+x_3^2+\dots)/4.$

We also do the same for the even terms. I’m not going to bother to specify what happens when you get to the other end.

Now let’s leave out the diagonal parts. Let $r$ be an even number. Let $A_r=x_1+\dots+x_r$ and let $B_r=x_2+x_4+\dots+x_r$. Then I want to take the terms $(A_r+x_{r+1}/2)^2$ up to $(A_r+x_{r+1}+x_{r+2}+x_{r+3}+x_{r+4}/2)^2$ together with the terms $(B_r+x_{r+2}/2)^2$ and $(B_r+x_{r+2}+x_{r+4}/2)^2$ together and try to find a diagonal expression that bounds it from below.

To simplify the notation, observe that we are trying to find a lower bound for an expression of the following form:

$(A+a/2)^2+(A+a+b/2)^2+(A+a+b+c/2)^2+$

$+(A+a+b+c+d/2)^2+(B+b/2)^2+(B+b+d/2)^2.$

Now using the identity $x^2+(x+y)^2=2(x+y/2)^2+y^2/2$, we see that $(B+b/2)^2+(B+b+d/2)^2$ is equal to

$2(B+3b/4+d/4)^2+(b+d)^2/8.$

Similarly, $(A+a+b/2)^2+(A+a+b+c/2)^2$ is at least $(b+c)^2/8$ and $(A+a+b+c/2)^2+(A+a+b+c+d/2)^2$ is at least $(c+d)^2/8$. So the sum of the $A$ terms can be represented as a sum of squares plus $(b+c)^2/16$ plus $(c+d)^2/16$. (We could probably do better than this — I am not trying to optimize the bound that comes out of this sort of argument since a computer could do that rather better.) Now $(b+c)^2+(c+d)^2+(b+d)^2$ is equal to $b^2+c^2+d^2+(b+c+d)^2$, so we can gain an extra $(b^2+c^2+d^2)/16$ on the diagonal.

We needed to use six terms to get that gain, which I think means that the eventual gain to the answer will be only $1/96$. So I think we get an improvement from $1/4$ to $25/96$. (But as I said, we could squeeze more out here.)

The reason I am interested in this is that it genuinely mixes two different common differences, in this case with one equal to twice the other. Perhaps the next case to look at would be $d=2$ and $d=3$ combined. Again, one would like a slight improvement on the trivial bound that comes from treating each difference separately.

An important remark is that eventually it is not enough to consider just pairs of differences, since we know, for example, that it is easy to get bounded discrepancy on all HAPs with odd common difference. So the kind of parity argument that’s implicit in the calculation I’ve just done is an essential feature of the problem.

16. Moses Charikar Says:

Interesting. Continuing along the same lines for $d=2$ and $d=3$ combined, one should be able to squeeze out additional diagonal terms by grouping together terms from Tim’s rewriting of each individual HAP thus:
$\displaystyle (A+x_2/2)^2 + (A+x_2+x_4/2)^2 + (A+x_2+x_4+x_6/2)^2 + (A+x_2+x_4+x_6+x_8/2)^2 + (A+x_2+x_4+x_6+x_8+x_{10}/2)^2 + (A+x_2+x_4+x_6+x_8+x_{10}+x_{12}/2)^2$
$\displaystyle (B+x_3/2)^2 + (B+x_3+x_6/2)^2 + (B+x_3+x_6+x_9/2)^2 + (B+x_3+x_6+x_9+x_{12}/2)^2$
Now it is not possible for this sum of squares expression to be 0 if $x_6^2+x_{12}^2 > 0$.
For this to happen, $x_2=-2A,x_4=2A,x_6=-2A,x_8=2A,x_{10}=-2A,x_{12}=2A$.
Also $x_3=-2B,x_6=2B,x_9=-2B,x_{12}=2B$.
This implies $x_6=x_{12}=0$. If $x_6^2 + x_{12}^2 > 0$, then this sum of squares expression must be positive. So for some constant $c>0$, it should be possible to subtract $c(x_6^2+x_{12}^2)$ from this expression to obtain something positive semidefinite.

17. gowers Says:

1. If one were going to try to produce a proof along these lines, one would have to add to the constant something not much less than $1/n$ when taking account of HAPs of common difference $n$, even after having squeezed plenty out already using smaller HAPs. This doesn’t look very possible.

2. However, that was assuming that the contributions from the HAPs decay reasonably smoothly, but there is no reason for this to be the case.

3. It is also assuming that the best thing to do is greedy in the following sense: you squeeze as much as you can out of the diagonal, and then you introduce a new HAP and squeeze as much extra out as you can, and then introduce a new HAP, and so on. But even just looking at $d=1$ and $d=2$ I get the impression that that is not the right thing to do. For instance, the reason I got only $(b+c)^2/16$ and $(c+d)^2/16$ in the calculation above was that I had to use the term $(A+a+b+c/2)$ in the derivation of both terms. If I had held back a little with $d=1$, it might have paid off twofold when I came to deal with the extra contribution with $d=2$. If that is the case, then understanding it is probably quite important when it comes to understanding why numbers with lots of common factors are particularly important.

4. A problem with this is that the calculations look as though they might get messier and messier as you take account of more and more HAPs. So at some point it might be necessary to pass from precise formulae for everything to a vaguer description of what is going on.

5. I looked above at “bunches” of terms and did a small trick to forget about the initial parts of the HAPs. But effectively that is just estimating the average discrepancy by looking at the average drift on small intervals. So we know (as is obvious anyway) that these intervals will have to get rapidly larger if we want to make progress. In practice, however, we can more or less forget about them: whatever we can prove about the first interval we’ll probably be able to prove about subsequent intervals, more or less anyway, by applying the trick.

6. Something it might be worth trying to do is to take a bunch of odd common differences and try to do the rewriting for them. We know that a sequence such as $x_n=(-1)^n$ will have small discrepancy, but perhaps we will find that we can do the rewriting in such a way that we end up with a bunch of terms that can be simultaneously zero only for that sequence. (An example of such a bunch of terms is the one we started with, namely $x_1$, $x_1+x_2/2$, $x_1+x_2+x_3/2$, and so on, which will be simultaneously zero only for multiples of $(1/2,-1,1,-1,\dots)$.) So we would have a choice: either we disappoint a lot of HAPs with odd common difference, or we have very rapid growth along the HAP with common difference 2. That’s an oversimplification, since we could also take the sequence 1,1,-1,-1,1,1,-1,-1,… but we might be able to get a handle on the prime-or-power-of-2 version of EDP somehow.

7. One can regard 6 as a bit like the original task except that now, instead of trying to squeeze out diagonal terms, we try to squeeze out terms of the form $(x_k+x_{k+1})^2$ that are putting pressure on the sequence to alternate. But we’d probably want to squeeze out not just those but also terms like $(x_k+x_{k+1}+x_{k+2}+x_{k+3})^2$ or perhaps things like $(x_k+x_{k+8})^2$ — anyhow, things that try to force periodicity with period a power of 2.

• gowers Says:

Further remarks.

8. Here is a very simple demonstration of the obvious fact that if we rewrite a positive semidefinite quadratic form as a sum of a diagonal form and another form that is positive semidefinite and that vanishes for some sequence that is never zero (which implies that we cannot subtract any more from the diagonal without losing positive semidefiniteness), then we may still be able to improve the sum of the diagonal coefficients. This is obvious because the whole point of the SDP is to optimize the sum of the coefficients, which wouldn’t be an interesting task if you always got the same answer. A simple example that illustrates it is the quadratic form $2x^2+2xy+2y^2$. We could try working out the Cholesky decomposition, in which case we would write this as $2(x+y/2)^2+3y^2/2$, which puts it in the required form. Or we could do the same in reverse and get $3x^2/x+2(y+x/2)^2$. Or we could go the more balanced approach of writing $(x+y)^2+x^2+y^2$, which gives a sum of 2 to the diagonal coefficients and is therefore better. It’s easy to prove that that is best possible: we need somehow to deal with the $2xy$ term. For that we need some combination of terms of the form $(ax+by)^2$. Each individual term takes care of $2abxy$ of the total and subtracts $a^2+b^2$ from the sum of the diagonal coefficients. But the ratio $(a^2+b^2)/2ab$ is minimized when $a=b$.

Suppose now that we saw the decomposition $2(x+y/2)^2+3y^2/2$. What would alert us to the fact that we could improve it? The answer is that we would say to ourselves that the term $(x+y/2)^2$ cannot be best possible if there is a positive $ay^2$ term around, because the coefficients of $x$ and $y$ are not equal, so it’s better to increase the coefficient of $y$ by a small amount and reduce the coefficient of $x$ by a larger amount (or more to the point, reduce the square of that coefficient by more than you increase the square of the coefficient of $y$).

Note that if we were looking at the version of the problem where we do not insist that all the diagonal terms are positive, then we wouldn’t even need the positive $ay^2$ term to run this argument. We could just take the term $2(x+y/2)^2$ and argue that it would have been better to take $(x+y)^2+x^2-y^2/2$.

9. In general, we have a quadratic form and we want to produce the off-diagonal terms as efficiently as possible — that is, we want to find a sum of squares that contributes as little as possible to the diagonal terms. Maybe we can think in advance what counts as efficient and then apply an intelligent greedy algorithm rather than a stupid one. To make that a bit clearer, what I mean is that there is a simple algorithm for writing the quadratic form as the sum of a diagonal part and a part that cannot be improved (because there exists a sequence that takes the value zero nowhere and yet maps to zero when you apply that part of the form — I’ve gone back to insisting that the diagonal coefficients are positive). However, there is a lot of flexibility in how one applies the algorithm, and a good choice might lead to a much better result than a bad choice.

10. For example, perhaps one could apply a Cholesky decomposition after all, but take the variables in a better order than order of size.

18. gowers Says:

I want to pursue remark 10 just a little. Let’s take $n=6$ and see what we can do with the quadratic form

$a^2+(a+b)^2+(a+b+c)^2+(a+b+c+d)^2$

$+(a+b+c+d+e)^2+(a+b+c+d+e+f)^2$

$+b^2+(b+d)^2+(b+d+f)^2$

$+c^2+(c+f)^2+d^2+e^2+f^2.$

If we expand this out we get

$6a^2+8b^2+6c^2+6d^2+3e^2+3f^2$

$+10ab+8ac+6ad+4ae+2af+8bc+10bd+4be$

$+4bf+6cd+4ce+4cf+4de+4df+2ef.$

Now I want to apply the Cholesky decomposition, but I want to think a little bit about which order to take the variables out. Bearing in mind that I am trying to get rid of all the off-diagonal terms using linear forms with coefficients that are reasonably equal, a fairly natural tactic seems to be to start with the smallest diagonal term. Also, whenever we have two variables that do not form a cross term together, we can’t pair them in a bracket unless we use negative coefficients inside brackets and aim for cancellation. (This raises a question: does the best alternative splitting up use negative coefficients? At least in some examples it almost certainly does.) Anyhow, since 5 is prime, $e$ pairs up less with the other variables than ${}f$, so let’s start by getting rid of $e$. That means we take a term

$3(e+\frac 23a+\frac 23b+\frac 23c+\frac 23d+\frac 13f)^2$

and subtract it off. Actually, I am now realizing the obvious fact that the diagonal term I am going to end up with will just involve one variable, which is not good at all. So really I should be stealing a little bit of $e^2$ before I start and hoping for the best. (This is very similar to what Moses suggested right at the start.) If I do that, then it seems quite natural to remove $e^2$ and then apply the first stage of the Cholesky decomposition. If I do that, then I end up subtracting

$e^2+(e+a+b+c+d+f/2)^2.$

Hmm … that looks substantially worse, actually, since I’ve removed more from the other diagonal terms than I’ve gained with this $e^2$. So I’ll go back to the first idea and reserve the option to depart from Cholesky later. Subtracting the first expression I get the quadratic form

$\frac{14}3a^2+\frac{20}3b^2+\frac{14}3c^2+\frac{14}3d^2+\frac 73f^2$

$+\frac{22}3ab+\frac{16}3ac+\frac{10}3ad+\frac 43af+\frac{16}3bc$

$+\frac{22}3bd+\frac{8}3bf+\frac{16}3cd+\frac{8}3cf+\frac 83df.$

For convenience, let us multiply that by 3, getting

$14a^2+20b^2+14c^2+14d^2+7f^2$

$+22ab+16ac+10ad+4af+16bc$

$+22bd+8bf+16cd+8cf+8df.$

I think I’m going to abort this calculation as it’s getting rather unpleasant and isn’t obviously going to lead to a useful insight. Instead, I want to pursue a different idea.

19. gowers Says:

The idea I want to think about is this. Suppose we have a representation of our quadratic form as a sum of squares. I’m wondering (and this may well be known) whether one can solve the SDP problem by means of a succession of small improvements. If so, then maybe we could get a good idea of what the optimal solution is like by using the fact that none of these small “local” improvements is possible. I’m hoping that that could give us a much more precise idea about why $b_m$ tends to be large at numbers with more common factors, why it is often very small, and so on.

• gowers Says:

The difficulty is that it isn’t possible to change just one or two terms at a time if those terms contain more than just one or two variables. Here instead is a general criterion for the representation of the matrix of a quadratic form in $n$ variables as $U^TU+D$ to be best possible (where $D$ is diagonal and $U$ may as well be upper triangular).

It tells us that if $W$ is an upper triangular matrix such that $U^TW+W^TU$ is diagonal, then the trace of $U^TW$ must be zero. This follows from an easy variational argument: if the trace is non-zero, then add a very small multiple $\delta W$ to $U$. We have $(U+\delta W)^T(U+\delta W)=U^TU+\delta(U^TW+W^TU)$ plus a lower-order term. If the trace of $U^TW$ is non-zero, then choosing the sign of $\delta$ appropriately, we find that, up to first order, $(U+\delta W)^T(U+\delta W)$ gives us the same quadratic form, except on the diagonal where the sum is smaller. So we can add the deficit to $D$ and we have an improved representation.

If you try to solve the equations that arise when you look for an upper triangular matrix $W$ such that $U^TW+W^TU$ is diagonal, you find (i) that it is reasonably easy, and that you are free to choose the diagonal of $W$ as you go along. That is, you end up with an $n$-dimensional space of solutions. (I haven’t checked this statement carefully, but I think it is a good approximation of the truth.) Therefore, the fact that the trace of $U^TW$ is zero for all such matrices seems to be telling us something reasonably strong about $U$. (Note that the trace of $U^TW$ is a well-known object, given by $\sum_{ij}U_{ij}W_{ij}$.) However, I have not got my head around what it is saying.

I’m not actually aiming to understand the best possible representation, for reasons discussed earlier, but I hope that by looking at this variational problem it may be possible to say that certain representations are clearly not best, and thereby perform the algorithm more intelligently. (I think of the algorithm as “diagonalizing without worrying about the diagonal”. That is, given a symmetric matrix $A$, instead of the usual process of finding $U$ such that $U^TU=A$, one finds $U$ such that $U^TU-A$ is diagonal. Also, there is no need to insist that $U$ is a square matrix.)

• Moses Charikar Says:

I think you are converging towards some sort of primal-dual algorithm for solving this SDP, but hopefully one that we might be able to reason about given that we have a specific semidefinite program that we care about.

Let’s interpret what’s going on. Let $A$ be the matrix corresponding to the quadratic form we obtained by plugging in a particular guess for $c_{k,d}$. Having guessed these values, our goal now is to prove that
$\displaystyle \min \sum_{k,d} c_{k,d} (v_d^2+ v_{2d}^2 + \ldots + v_{kd}^2)^2$
is large, subject to the constraints $v_i^2 = 1$.
Note that the $v_i$ are vector analogs of the $\pm 1$ variables $x_i$ in the original formulation of EDP. Now let’s write this new problem as a semidefinite program, using the matrix $V$ to denote the matrix of dot products of the vectors $v_i$:
$\min A \cdot V$
$V_{ii} = 1$
$V \succeq 0$
Here the notation $M \succeq 0$ denotes that $M$ is positive semidefinite (psd). Let’s call this program the primal SDP. The dual of this is the following:
$\max \mbox{trace\ } D$
$D \mbox{\ is diagonal}$
$A - D \succeq 0$
It is this dual semidefinite program that we are focussing on: how do you subtract the largest possible diagonal term from a quadratic form so that the remainder is positive semidefinite ? By semidefinite programming duality, the optimum values of the primal and dual are exactly the same. This gives us a way to certify that a particular dual solution is best possible: we simply exhibit a primal solution of the same value – this serves as a certificate of optimality. If we are not able to do this, then somehow this ought to give us an idea of how the dual solution can be improved. Very roughly speaking, this is the idea behind primal-dual methods. There are additional conditions we can infer about the structure of the optimal primal-dual pair of solutions (the so called complementary slackness conditions) and these further restrict the kind of primal solutions certificate we should look for.

At a high level, it looks like this is what you are proposing to do. You also point out that we can construct the dual solution in a series of steps, each increasing the sum of diagonal entries subtracted out and we never need to backtrack and explictly undo what we did before (if we allow the dual variables to be negative, which we can). At any point of time, if we succeed in constructing a primal certificate for the remainder quadratic form that we have so far, then we terminate and declare success, else (hopefully) we have a way to improve the current solution.

• gowers Says:

I’m sure you’re right that if my line of thought leads anywhere, it will be to a standard technique, and I definitely agree that what matters is not so much the abstract ideas but how they play out in the particular case we’re looking at. I still don’t feel I have a good account of why it will end up favouring numbers with many common factors. Obviously I can see why such numbers are relevant, but I’d like is to describe a way of squeezing little extra bits out of the diagonal that favours them. Such a method would, I think, have to mix different $d$s to a much greater extent than what we have done so far.

One thought, which I think Gil will agree with, is that we might get some useful information by thinking about linear algebra. We already know that if we have a sum of the form $L_1(x)^2+\dots+L_n(x)^2$, where the $L_i$ are linear forms and $x$ is shorthand for $(x_1,\dots,x_n)$, and if the $L_i$ span $\mathbb{R}^n$ (which is really the dual space) then we can remove some more from the diagonal, since the quadratic form is positive definite rather than just positive semidefinite. Of course, what we actually need is for the characteristic functions of HAPs to do more than merely span the dual space: they have to be “well distributed” in some sense. But perhaps we can do things like picking bases of (characteristic functions of) HAPs and seeing how other HAPs are expanded in terms of those bases. Maybe if we could find a collection of bases that are all “significantly different” in some sense, we would be in good shape.

Another useful thing to do, I think, would be to look at what the annihilators are of certain subsets of HAPs. For example, I presume that the annihilator of the set of all HAPs of odd common difference and even length is precisely the sequence 1,-1,1,-1,… and its multiples. In fact, that’s more or less trivial. Some time ago, Gil and I were speculating that this kind of linear algebra fact should point towards looser facts that replace “adds up to zero” by “is bounded”. I now wonder whether looking at this SDP may be a way of making this thought precise.

• Moses Charikar Says:

This seems closely related to an earlier line of thought that you were pursuing here . Repeating a few things from that post, say $A$ is the matrix of the quadratic form, and $V^TV=A$. You were looking for a matrix $U$ such that $(V+\delta U)^T(V+\delta U)=A-\delta D$ (where $D$ is a diagonal matrix). Since $\delta$ is infinitesimal, this gives the condition $U^TV+V^TU=-D$. At that point, you were looking for upper triangular matrices $V$ and $U$, but let’s remove this condition. Huy pointed out that there is a direct way to construct a matrix $V$ (not a square matrix) that satisfies $V^TV=A$ from the known sum of squares representation of the quadratic form corresponding to $A$. By known representation, I mean
$\displaystyle \sum_{k,d} c_{k,d} (x_d+\ldots+x_{kd})^2$
To repeat again, $V$ has $n$ columns and the rows of $V$ correspond to the HAPs. The row corresponding to the HAP d,2d,..,kd is $\sqrt{c_{k,d}}$ times the incidence vector for the HAP.

Now we are looking for $U$ with the same dimensions as $V$ such that
$U^TV+V^TU=-D$. Let $U_i$ denote the $i$th column of $U$ (similarly for $V$). Then this condition says that $U_i \cdot V_j + U_j \cdot V_i = 0$ for $i \not = j$. Given that the coordinates of $U$ are determined by the membership of $i$ in each of the HAPs, we should be able to figure out some reasonable choices for $V_i$.

One possible choice is to set $V_i$ equal to the component of $U_i$ orthogonal to the space spanned by $\{U_1,\ldots U_n\}\setminus\{U_i\}$.
This satisfies the stronger condition that $U_i \cdot V_j = 0$ for all $i \not = j$.
Can we understand what these orthogonal components are, say in the special case that all $c_{k,d}=1$ ?

• gowers Says:

I am now in a state of confusion because I have an argument that proves something obviously false and I can’t see what’s wrong with the argument. The statement it proves is that for every positive semidefinite form one can subtract a non-zero positive diagonal form and retain positive semidefiniteness. The “proof” goes as follows.

Let the quadratic form be given by the matrix $U^TU$. Now follow the suggestion of your last paragraph and choose $V_i$ to be the component of $U_i$ orthogonal to the space spanned by the other $U_j$. Since the $U_i$

OK, that was useful. I was assuming that the $U_i$ were linearly independent, but of course they won’t be in general — and it is indeed obvious that if they are then we can subtract off something, since in that case the form is positive definite.

So now let’s scale down our ambition and instead of trying to find a contradiction in ZFC we’ll just continue the argument and see what it gives. For each $U_i$ that is not contained in the linear span of the other $U_j$ it gives us a space of possible choices for $V_i$, and thereby allows us to put any number we like in the $i$th place of $D$. This will lead to an improvement to the matrix $U$ unless …

OK, what the argument seems to show is that every $U_i$ must be in the linear span of the other $U_j$, a stronger property than linear dependence, but weaker than the kind of quantitative property one would ultimately need.

20. gowers Says:

I want to write a few things down that may well turn out to be equivalent to things that have already been said. But even if that is true, it may be a useful reformulation of the problem.

Let me start by thinking about the following general problem. You have a bunch of vectors $v_1,\dots,v_N$ in $\mathbb{R}^n$ and you would like to show that for every $x$ there exists $i$ such that $\langle |x,v_i|\rangle\geq c\|x\|_2$. How can you do it?

In some cases, it is easy. For example, if the $v_i$ are just the standard basis $e_1,\dots,e_n$, then we know that $\|x\|_2^2=\sum_i|\langle x,v_i\rangle|^2$, from which it follows immediately that we can take $c=n^{-1/2}$. But what if we are presented with some other set of vectors that isn’t necessarily an orthonormal basis?

One method (and this feels very similar to SDP) is to construct something that I remember from my Banach-spaces days: a representation of the identity. That means that we write the $n\times n$ identity matrix $I_n$ as a positive linear combination $\sum_i\lambda_iv_i\otimes v_i$. (If you don’t want to think about tensor products, then regard $v\otimes w$ as notation for the matrix that takes the value $v_iw_j$ in the $ij$th place.) Note that $(v_i\otimes v_i)(x)=\langle v_i,x\rangle v_i$. Therefore, from this representation of the identity we can deduce that

$\|x\|^2=\langle x,I_nx\rangle=\sum_i\lambda_i|\langle x,v_i\rangle|^2.$

This allows us to take $c=(\sum_i\lambda_i)^{-1/2}.$

Funnily enough, I had this idea many years ago as a way of trying to solve EDP, but then noticed that the 1, -1, 0, 1, -1, 0, … example meant that no useful inequality of this kind could be proved if the $v_i$ were (multiples of) characteristic functions of HAPs. However, what I did not notice is that if we applied a diagonal transformation to $\mathbb{R}^n$ then we could in principle get rid of this irritating obstacle.

Now for the main idea of what I wanted to say. As a matter of fact, it feels not quite the same as SDP, though it may be that what is different is just at the computational level rather than the theoretical level. Instead of aiming for a representation of the identity, let us aim for a representation of a diagonal transformation. That is, we shall try to find a diagonal matrix $B$ that we can write in the form $\sum_i\lambda_iv_i\otimes v_i$. We would like the trace of $B$ to be big compared with the sum of the $\lambda_i$.

An immediate objection might seem to be that if the $\lambda_i$ are non-negative and the $v_i$ are characteristic functions of HAPs, or multiples thereof, then everything in sight is positive and there is no chance of getting rid of off-diagonal terms. However, we are not forced to use characteristic functions of HAPs. For example, suppose I take a vector that’s given by a string of r 1s followed by a string of r -1s followed by a string of r 1s, and so on. Call this vector $v$. Let $x$ be a $\pm 1$ sequence and suppose that $|\langle x,v\rangle|\geq t$. Then by averaging, there must be a block inside which the sum of the $x_i$ is at least $tr/n$. Therefore, if $|\langle x,v\rangle|\geq Cn/r$ we may deduce that there is an interval in which the drift of $x$ is at least $C$, from which it follows that some partial sum is at least $C/2$.

The point of using such a vector $v$ is that $v\otimes v$ would have many negative terms off the diagonal, and therefore would be useful for the purposes of getting rid of the off-diagonal part.

I now need to do some calculations before I’ll be able to formulate precisely what would be required of a representation of a diagonal map $D$ for it to prove EDP. But the basic idea is that each $v_i$ would have to be normalized so that $\langle x,v_i\rangle$ gives the same lower bound for the discrepancy, and it would be with respect to those normalizations that one would want to make the coefficients $\lambda_i$ have as small a sum as possible.

Note that in this formulation it is extremely natural to expect that the diagonal terms will be bigger for numbers with many factors, simply because if $v_i$ is made out of some HAP with common difference $d$ it will affect the diagonal only at multiples of $d$.

• gowers Says:

I now have a clearer picture of what is needed. Let me call $v$ a test vector if $|\langle x,v\rangle|\geq C$ implies that the discrepancy of $x$ on some HAP is at least $C$. Obviously plus or minus the characteristic function of any HAP is a test vector, and so is any convex combination of such functions. It is not hard to prove that these are in fact the only test vectors. So we can prove a lower bound of $C$ for EDP if we can show that for every $\pm 1$ sequence $x$ there is a test vector $v$ such that $|\langle x,v\rangle|\geq C$.

Now suppose that we have a diagonal matrix $D$ with entries $b_i$, and suppose that we can represent it as $\sum_i\lambda_iv_i\otimes v_i$, where each $v_i$ is a test vector (that is, a linear combination of characteristic functions of HAPs with the absolute values of the coefficients summing to 1) and the $\lambda_i$ are positive. Then for every $x$ we have

$\sum_ib_ix_i^2=\langle x,Dx\rangle=\sum_i\lambda_i|\langle v_i,x\rangle|^2,$

from which it follows that there exists $i$ such that $|\langle v_i,x\rangle|$ is at least $(\sum_i\lambda_i)^{-1/2}(\sum_ib_ix_i^2)^{1/2}.$ In particular, if $x$ is a $\pm 1$ sequence, we find that its discrepancy is at least $(\sum_i\lambda_i)^{-1/2}(\sum_ib_i)^{1/2}.$

Thus, our task is to find a representation as above with the ratio of the sum of the $b_i$ to the sum of the $\lambda_i$ being unbounded.

Now given the $\lambda_i$ and the $v_i$ there is a simple expression for the sum of the $b_i$: the trace of $v_i\otimes v_i$ is simply $\|v_i\|_2^2$, so the trace of $\sum_i\lambda_iv_i\otimes v_i$, which we know equals $\sum b_i$, is $\sum_i\lambda_i\|v_i\|^2$. So our task is to produce a diagonal matrix out of tensors $v_i\otimes v_i$ such that the $v_i$ are built out of characteristic functions of HAPs and have, on average, reasonably large $\ell_2$ norms.

In the next comment I’ll show how to represent a diagonal map uneconomically, just to give an idea of what this task entails.

• gowers Says:

All I want to do here is a trivial calculation that just uses HAPs with common difference 1. I know that I can represent the unit vector $e_i$ as the difference between the interval from 1 to i and the interval from 1 to i-1. Therefore, $e_i/2$ is a test vector. But the identity is equal to $4\sum_i(e_i/2)\otimes(e_i/2)$. So we’ve managed to get $4n$ as our sum of coefficients, while the trace is $n$. Therefore, we have proved a lower bound of $1/2$ for the discrepancy. This is, of course, very similar to the trivial lower bound for the SDP dual problem, reinforcing my view that I am not doing anything fundamentally new here (but I still find this an easier way to think about what is going on).

The reason we got a bad bound is that our test vectors had $\ell_2$ norms equal to $1/2$, whereas what we actually need is that their average norms (where by this I mean a weighted average with weights given by the $\lambda_i$) should be unbounded. This is somehow saying that we need to achieve the cancellation of the off-diagonal parts in a “clever” way, with quite a lot all cancelling at once, rather than taking the easy way out and using vectors with small support.

• gowers Says:

I’d like to discuss the problem in probabilistic language for a bit. So let us impose the condition that the $\lambda_i$ are non-negative (which I asked for before) and sum to 1. What we are then asking for is a probability distribution on the set of all test vectors with the property that if you pick a random test vector $v$ according to this distribution, then the expected value of $v_rv_s$ will be 0 whenever $r\ne s$, while the expected value of $\|v\|_2^2=\sum_r|v_r|^2$ will be at least $C$.

Let us consider some initial attempts. Here, instead of trying to produce a diagonal matrix and finding that its trace is too small, I shall try to get a large trace but will not succeed in making the matrix diagonal. (But I hope that I shall make progress in that direction.)

Let $T_2$ be the class of test vectors that sum to zero and are formed out of two HAPs. An example of such a vector (when $n$ is a multiple of 6) is 2/5 times the characteristic function of $\{2,4,6,\dots,n\}$ minus 3/5 times the characteristic function of $\{3,6,9,\dots,n\}$. This is the sequence

$(0,2/5,-3/5,2/5,0,-1/5,0,2/5,-3/5,$
$2/5,0,-1/5,\dots).$

For this particular vector, there is a significant positive correlation between its entries at places that differ by a multiple of 6 (not to mention various other correlations), but there is also a fair amount of negative correlation.

This illustrates a difficulty we are likely to have: if two numbers have plenty of common factors, then test vectors that use HAPs with common differences equal to those factors will have a tendency to correlate at those two numbers. So this positive correlation will have to be compensated for by some negative correlation from elsewhere.

Another point is that the sum of all the correlations will be negative (since the autocorrelations on the diagonal are trivially positive and the sum of all the correlations is zero if the sum of the coordinates of the test vector is zero). Since we are aiming to produce a diagonal matrix with positive entries down the diagonal, we will in fact need to use at least some test vectors with non-zero sums of coordinates, and for the others we will be looking for negative correlations off the diagonal.

How can we minimize the effect of the positive correlations discussed in the previous paragraph but one? Here is a possible approach, though one that is guaranteed in advance not to work because I shall just consider HAPs that go all the way up to $n$. Let us build test vectors as follows. We first choose some random numbers $d_1,\dots,d_k$ between 1 and $m$, for some carefully chosen $m$. Let $P_i$ be the characteristic function of the HAP consisting of all multiples of $d_i$. We then take the linear combination $k^{-1}\sum_i\epsilon_iP_i$, where the $\epsilon_i$ are randomly chosen signs.

Let’s fix the $d_i$ and think about the correlations when we choose the random $\epsilon_i$. In other words, let $v_\epsilon$ be the above test vector and let us consider its distribution when we choose $\epsilon$ randomly. If $r$ and $s$ are two integers, then usually they will be divisible by only very few of the $d_i$, and unless they have an extremely large common factor they will tend to be divisible by almost disjoint sets of the $d_i$. This implies that there will be very little correlation between their values. (Note that the value of $v_\epsilon(r)$ is the sum over all $d_i$ that divide $r$ of $\epsilon_i$, so the expectation of $v_\epsilon(r)v_\epsilon(s)$ is the number of $i$ such that $d_i|(r,s)$.

What we produce by this means is a matrix $A$ that is not diagonal, but it has a sort of quasi-diagonal property: $A_{rs}$ is much larger when $(r,s)$ is large than when $(r,s)$ is small. (If it were zero except when $(r,s)=r=s$, then it would be diagonal.) One might perhaps hope that by pursuing this kind of idea, one could get rid of almost all the off-diagonal part, being left mainly with $A_{rs}$ with $(r,s)$ large. And then one could perhaps repeat this game but restricting attention to multiples of various products of $d_i$, and perhaps one could get all the numbers to work out in the end.

One problem with what I’ve just suggested is that the correlations seem to be positive. I don’t quite understand that, because if we ensure (by means of some coefficients) that their coordinates sum to zero, then we know that the off-diagonal terms are on average negative.

I’m deliberately putting forward only half thought out ideas here. The way I imagine something like this working is that one would try to use clever methods to get the matrix approximately diagonal in some sense, and then crude methods to get rid of the error. The hope would be that if the error was small enough, then the crude methods would not be too expensive (meaning that one wouldn’t use too large a sum of coefficients for too small a gain in the trace).

• gowers Says:

Let me quickly try to explain how the fact that I was considering only “full” HAPs (that is, ones of the form $\{d,2d,\dots,\lfloor n/d\rfloor d\}$) causes the approach to fail. (I don’t mean that it completely fails — just that one would have to modify it by using shorter HAPs as well.)

The problem is that if we consider the top right and bottom left quarters of the matrix, we see that we are not managing to deal with them. Let me argue for that rather roughly. We can think of a full HAP as consisting of two identical halves (the roughness of the argument being that they are not in fact identical). Therefore, the matrix we eventually produce will look a bit like a 2-by-2 block matrix with four identical blocks. But this means that whatever we see down the diagonal we also see a sort of shadow of going down the diagonals of the other blocks (the shadow being because the matrices were not actually identical).

As a matter of fact, building test vectors out of small-diameter HAPs may help with the other difficulties as well, by introducing a further element of randomness and therefore more opportunities for cancellation.

21. Klas Markström Says:

The first of the larger cases, N=2048, of the original semidefinite program has now finished. I have created a directory where I will post the solutions here, and have a link to it from my data page. http://abel.math.umu.se/~klasm/Data/EDP/

The n=2048 solutions is already there so Moses can now process it with his programs.

• Klas Markström Says:

Now the solutions for N=2500 is in place.

• gowers Says:

Many thanks for these. Could you remind me, and possibly others, how to interpret the data? In particular, what is the meaning of the intial 1 1, 2 1 or 2 3? (I know these appeared on Moses’s data too.) Are the 2 3 rows the b values? (They don’t look like it, because they are very small at a lot of smooth numbers.)

• Moses Charikar Says:

Thanks Klas. This should be very useful. Tim, we’ll have these files processed to produce output like the ones we’ve seen for the SDPs before. I’ve been away from my machine and wasn’t able to do this earlier.

• Moses Charikar Says:

I’ve added files containing the dual values from the solutions Klas generated. The data is here:

http://webspace.princeton.edu/users/moses/EDP/try0/

In the files dual*.txt, lines have the following format
‘b’ …. n …. $b_n$ …. d(n) …. factorization(n)
‘t’ …. k …. d …. $\tau_{k,d}$

The files b*-sorted.txt contain the $b_n$ values in decreasing order.
The files sum-ckd-*.txt contain the $\tau_{2,d} = \sum_{k \geq 2} c_{k,d}$ values.

Klas, can you tell me briefly what you needed to do to get the SDP solver code compiled and running ? Eventually, do you execute it using a command line interface ? I was hoping to get some help to run it on some parallel machines here, but it would help to know what the major steps are.

• Klas Markström Says:

Now N=3000 has finished as well. This case used 512 processor cores for about 21 hours.

The running times have increased quite fast so I don’t think I will solve any larger cases until we are done with the data from the three which have already finished. The machine has over 5000 core so going up to 8192 would probably be possible, but that will use up a lot of CPU time and take quite some time to get through the queue.

Moses, the main things to do was o identify the correct linking options for scalapack, lapack, and blas. I also had to remove some of the include statements for the parallel program, the calls were redundant. In order to use scalpack you might have to load some modules into your run time environment, this varies depending on the set up at the cluster you are using.
The final program is executed via a program called mpirun, which, on a cluster with a batch queue, is called from a batch script which you submit to the queue.

• Moses Charikar Says:

Thanks again Klas. We’re exploring alternate SDP formulations to force the dual solution to be “nicer” in some way, but we haven’t quite settled on a good way to do this yet. When we do, it would be good to harness your cluster again to solve the new SDPs. I’ve added the data from your solution to the 3000 size problem here:

http://webspace.princeton.edu/users/moses/EDP/try0/

• Alec Edgington Says:

These data strongly support the hypothesis of logarithmic growth of $\sum_i b^{(N)}_i$. A linear regression of $\sum_i b^{(N)}_i$ against $\log N$ gives:

$\sum_i b^{(N)}_i \approx 0.452 + 0.019 \log N$

with a standard error of $0.001$.

22. Gil Kalai Says:

Let me say a little more about what the probabilistic heuristics (PH) predicts for EDP and for various variations of the problem, and some thoughts about how this heuristics can be used for upper bounds (namely, for probabilistic constructions.) I will give several examples where we can expect that the PH is correct and few where we know it is incorrect and there are sequences with lower discrepency than the prediction. I do not know of cases where the prediction is wrong in the other direction.

1) Conjecture (*): Let $x_1 x_2, \dots x_n$ be non zero elements in a field with p elements. If n is sufficiently large w.r.t. p then every number t can be written as a sum of an HAP.

(A weaker interesting version due to Sue: every t can be written as a sum of an interval in an HAP.)

The number of sequences is $(p-1)^n$. The probability for an HAP to have the value t is roughly 1/p. If these probabilities were independent then the probability for for all sequence to sum to a value other than t is $(1-1/p)^{n\log n}$. This is smaller than $(p-1)^{-n}$ when p is roughly $(\log n)/(\log \log n)$.

So the prediction says that

(A) if p is much smaller than log n /log log n the assertion of Conjecture (*) holds

(B) If p is much larger than logn /log log n then the assertion of Conjecture (*) fails

The probabilistic heuristics does not take into account that we talk about sequences of non zero entries rather than arbitrary sequences. But for arbitrary sequences, of course, we can consider 1 -1 0 1 -1 0 1 -1 0 … So the prediction is incorrect in this version of the problem.

Overall, part (B) seems more robust then part (A).

2) The original EDP.

It is not a priori clear that the heuristic will give the same estimate for arbitrary sequences and for completely multiplicative sequences but (See Alec’s comment http://gowers.wordpress.com/2010/03/02/edp10-a-new-and-very-promising-approach/#comment-6540 ) this looks to be the case (at least roughly). The computation by Douglas Zare of the probability of a sequence of +- 1 of length n to have partial sums bounded between T and -T may be useful for fine tuning of the PH for the original EDP. (See http://mathoverflow.net/questions/16892/the-probability-for-a-sequence-to-have-small-partial-sums )

Actually I am not sure what the PH gives for the original EDP (But I think it is log n .)

3) Maximum spacing and average spacing between zeroes

Initially I tried to work not with the discrepency but with spacing netween zeroes. So we replace “maximum discrepency” in EDP with “maximum spacing between zeroes”. We can also consider average spacing between zeroes.

It looks that the prediction from PH will be that there is a sequence of +-1s of length n so that the spacing between 0s are below t (or on average below t) if t is larger than $\sqrt{\log n}$, but not if t is smaller than $\sqrt{\log n}$.

4) Prime power differences

If we care only about HAP with prime power differences then the PH prediction for the discrepency is roughly is log log n.

5) Prime differences

If we care only about HAP with prime differences then the PH prediction is also log log n. We know that in this case there are sequences of bounded discrepency so the heuristics is incorrect.

6) Squere free numbers

We can ask EDP when we forget about integers which are not squere free. The PH will give the same prediction for the discepency as for the original problem. For this problem we do not know any examples with discrepency in the log n areas.

7) Very restricted periods.

Suppose that n = m! and you consider HAP only with period 1,2,…,m then the PH threshold for the discrepency is log m. This is also for the analog of conjecture (*) we start with. I mention it because somehow this looks like a case where the polynomial method or linear algebra methods might be more feasible.

Here are two general problems:

Problem 1 : Are there variants of the problem for which the PH predictions are too low?

Problem 2: Can we identify (also heuristically) variants of the problems where the PH prediction should be correct.

Related probabilistic constructions

I believe that the heuristic is more robust in the upper bound direction. While we saw a few cases when the PH gives an answer too large, I am not aware of examples where the where the heuristics gives answers too small. I have some ideas (and so did Alec) about how to try to use this heuristic for constructions of sequences with low (log-ish) discrapancy (also for the variations where it is not known). Probably I should think more about it and write separately. I suspect that if you have an algorithm tuned towards unrealistic targets (e.g. greedy algorithm for minimizing discrapancy) then you get a random behavior because your optimization is not really relevant. The hope id to succeed with a probabilistic process tuned towards a realistic goal.

23. Gil Kalai Says:

A few more words about the polynomial approach.

We look at the polynomial

$\prod_{i=1}^n x_i \prod _{i,d: id \le n} (x_d+x_{2d}+\dots+x_{id})$

and we want to show that it is identically 0 over a field with p elements when p is fixed and n is large.

Lets look at one of the term

$\prod _{i=1}^n (x_1+x_2+\dots+x_i)$

When we expand it to monomials the coefficients have a combinatorial meaning in terms of permutations and inversions.

Given a permutation $\pi (0) \pi(1) \pi(2)\dots p(n)$ we can ask how many inversions does the number i contributes to? This is a number between 1 and i.

The coefficient of $x_1^{i_1}x_2^{i_2}\dots x_n^{i_n}$ is the number of permutations where there are $i_j$ integers contributing j inversions. Lets refer to $i_1,i_2,...,i_n$ as the inversion pattern. (I am not sure what is the standard term in the theory of permutation statistics.)

When we look at our original polynomial and expand it to monomials we have contributions which correspond to various permutations on [n/d] elements for all ds (or all the periods d we want to consider).

So if we have a good understanding of the number of permutations with a given inversion pattern modulo p (say modulo 7) this can be helpful for us.

24. Alec Edgington Says:

Klas, did your search for long discrepancy-2 sequences eventually finish, or is it still running, or did you decide to discontinue it?

• Klas Markström Says:

It was running until a few days ago when I needed that machine for something else and paused it.

Actually you might be able to finish the search using one of your search programs. There are five partial assignments with the property that if none of them can be extended to n=2016 then there are no discrepancy 2 sequences with n=2016.

The partial assignment are here http://abel.math.umu.se/~klasm/Data/EDP/stubs-2016
Note that these are not necessarily assignments to consecutive integers, and that they are not multiplicative.

• Alec Edgington Says:

OK, thanks. But 2016 sounds rather ambitious. Is it the case that any discrepancy-2 sequence of length 1125 must agree with one of these stubs?

• Klas Markström Says:

No, some of the values in the stubs probably depend on the behaviour at integers larger than 1125.

25. gowers Says:

I want to pursue the line of thought begun in this comment. I am still not sure whether it is a genuinely different approach from SDP or whether it can be seen to be equivalent, but at least I am now clearer in my mind about exactly what that question amounts to. At the moment, I am very tentatively of the opinion that it is a different idea, but my mind could in principle be changed: all one would have to do is give a method for converting a representation-of-diagonal proof into a dual-SDP proof and vice versa. Also, I need to think about what strengthening of EDP the representation-of-diagonal proof is equivalent to.

The general problem is to construct a quadratic form that has two properties: (i) it is larger than some diagonal form; (ii) if it is large for a sequence $(x_1,\dots,x_n)$, then that sequence has large discrepancy.

The SDP approach to this task is as follows. You build your quadratic form as a convex combination of terms of the form $(x_d+x_{2d}+\dots+x_{kd})^2$, and then you show that it can be rewritten as $q'(x)+\sum_ib_ix_i^2$, where $q'(x)$ is positive semidefinite and $\sum_ib_i$ is unbounded.

The representation-of-diagonal approach is to build the quadratic form as a convex combination of linear forms determined by what I called test vectors above, which are themselves convex combinations of plus or minus characteristic functions of HAPs. The main property of a test vector $v$ is that if $|\langle x,v\rangle|\geq C$ then the discrepancy of $x$ is at least $C$ on at least one of the HAPs you used to build $v$. We then try to choose test vectors $v_1,\dots,v_N$ and non-negative coefficients $\lambda_1,\dots,\lambda_N$ that sum to 1, such that the quadratic form $x\mapsto\sum_i\lambda_i|\langle x,v_i\rangle|^2$ is diagonal and has diagonal entries with unbounded sum. The matrix of this quadratic form is $\sum_i\lambda_iv_i\otimes v_i$ and the sum of the diagonal entries is equal to $\sum_i\lambda_i\|v_i\|_2^2$.

Thus, in one case one tries to write a large diagonal form as $q-q'$, where $q$ is the form that cannot be large unless the discrepancy is large somewhere, and $q'$ is positive semidefinite. In the other case, one tries to write a large diagonal form as a convex combination of “test forms” that have negative as well as positive coefficients, and it is the negative coefficients that lead to the cancellation of the off-diagonal terms.

At the moment, that feels to me like two completely different things to be trying to do. But perhaps one is dual to the other or something like that. I’ll think about it right now.

• Moses Charikar Says:

Tim, I haven’t thought through this carefully, but it seems to me that your representation-of-diagonal approach is a spectral decomposition of the matrix corresponding to the positive semidefinite form $q'(x)$ (where $q'(x)$ is defined in your comment above). Let $A$ be the matrix corresponding to $q'(x)$, i.e. $q'(x) = x^T A x$. Then $A = \sum_i \lambda_i v_i v_i^T$ where $\lambda_i$ are the eigenvalues of $A$ and $\lambda_i$ are the corresponding eigenvectors. This looks a lot like your decomposition.

I’m not quite sure why the sum of eigenvalues should be 1 (it should be related to the fact that $\sum c_{k,d} = 1$), and I’m not sure why the eigenvectors $v_i$ should be convex combinations of plus or minus characteristic functions of HAPs.

• Moses Charikar Says:

Hmm … what I said doesn’t make sense. You are actually representing a diagonal matrix $D = \sum_i \lambda_i v_i v_i^T$, not the matrix corresponding to $q'(x)$.

• gowers Says:

Also, I don’t mind using many more than $n$ vectors in the representation.

26. Moses Charikar Says:

I’m still trying to understand your approach. Is it true that the lower bound works not only when $\sum_i \lambda_i v_i v_i^T = D$, but also when $\sum_i \lambda_i v_i v_i^T - D$ is positive semidefinite ?

• gowers Says:

I think so, so let me attempt to prove it rigorously, just to be absolutely sure.

The $v_i$ have the property that if $\langle v_i,x\rangle\geq C$ then the discrepancy of $x$ on some HAP is at least $C$. So if $\sum_i\lambda_iv_iv_i^T-D$ is positive semidefinite, then we can apply it to a $\pm 1$ sequence (that is, multiply on the left by $x^T$ and on the right by $x$) and we will find that $\sum_i\lambda_i|\langle x,v_i\rangle|^2$ is at least as big as the trace of $D$ (by positive semidefiniteness of the difference). It follows that the discrepancy of the sequence is at least the square root of this trace, as usual.

Put like this, it makes it look as though all the approach is doing is replacing characteristic functions of HAPs with convex combinations of plus or minus these characteristic functions. But this extra flexibility raises the possibility that one might be able to prove semidefiniteness of the difference by showing that the difference is actually zero. I’m still not sure whether that is attempting to prove something far too strong.

• Moses Charikar Says:

So if you do allow the positive semidefinite generalization, this seems to give a more general lower bound than the SDP dual. Having thought about it briefly, I don’t see a way to get a semidefinite program to generate the best such bound. The problem seems to be dealing with the coefficients $\mu_i$ (a test vector is of the form $v= \sum \mu_i P_i$ where $P_i$ is the characteristic vector for a HAP, and $\sum |\mu_i| =1$.)

On the other hand, if you insist that you get a diagonal matrix exactly, my guess is that the best such bound is incomparable with the SDP dual. Allowing the positive semidefinite generalization may allow you to handle situations where you have almost got the non-diagonal entries zero, but not quite.

27. gowers Says:

The representation-of-diagonal approach involves expressing a diagonal matrix with certain properties as a convex combination of matrices with certain other properties. Therefore, if it is not possible, the Hahn-Banach theorem tells us that we can find some kind of separating functional that demonstrates this. (Other people have different ways of expressing this point, but this is the language I feel most comfortable with.) One would expect the nonexistence of such a functional to be saying that some kind of discrepancy statement holds. I’ve tried to work out what the statement is in this case, but haven’t managed to work it into anything neat. (What I’d like is for it to be a natural and interesting strengthening of EDP.)

Anyhow, we would like to prove that there is a diagonal matrix $D$ with entries that sum to $C$ that can be written as a convex combination of matrices $v\otimes v$ where $v$ is a test vector. If it can’t then there is a matrix $B$ such that $\langle B,D\rangle>1$ for every diagonal matrix $D$ with entries that sum to $C$ and $\langle B,v\otimes v\rangle\leq 1$ for every test vector $v$. The first condition tells us that $B$ is constant on the diagonal and that that constant is greater than $C^{-1}$. The second condition is saying that $v^TBv\leq 1$ for every test vector $v$.

So the failure of the representation-of-diagonal approach is equivalent to saying that for any matrix $B$ that equals 1 down the diagonal there exists a test vector $v$ such that $v^TBv>C$. That is, we can find characteristic functions $P_1,\dots,P_k$ of HAPs and coefficients $\mu_i$ with $\sum_i|\mu_i|=1$ such that $\sum_{i,j}\mu_i\mu_j P_i^TBP_j>C$.

That looks vaguely like a discrepancy statement, but I’m still puzzled by it. We can replace $B$ by $(B+B^T)/2$, so we may assume that it is symmetric. Therefore, we can express $B$ in the form $\sum\rho_iu_i\otimes u_i$. Since $\langle u\otimes u,v\otimes v\rangle=\langle u,v\rangle^2$, we can in this way introduce discrepancies of functions into the picture, but I haven’t quite pursued that thought and reached a nice discrepancy statement.

• gowers Says:

I think I’ve sorted it out now, and it seems that the approach using a representation of a diagonal matrix implies the Hilbert space version of EDP. Indeed, if we can’t represent a diagonal matrix in the way we want, then, as explained in the comment just above, we can find a symmetric matrix $B$ with 1s down the diagonal such that $v^TBv\leq C$ for every test vector $v$. Moreover, that is an equivalence. So if this method of proof works, then for every symmetric matrix $B$ with 1s down the diagonal there exists a test vector $v$ such that $v^Tv>C$. Now $v$ is a convex combination of plus or minus characteristic functions of HAPs. Therefore, by averaging we can find HAPs $P_i$ and $P_j$ such that $P_i^TBP>C$. Now since $B$ is symmetric and has 1s down the diagonal, we can find vectors $u_i$ such that $B_{rs}=\langle u_r,u_s\rangle$. (Actually, I’m not sure I see this — I surely need $B$ to be positive definite. But I’m in a hurry so I’ll leave this argument as it is for now.)

But what I can definitely say is that if $B$ is of that form then we can find $P_i$ and $P_j$. That tells us that $\langle\sum_{r\in P_i}u_r,\sum_{s\in P_j}u_s\rangle>C$, which by Cauchy-Schwarz tells us that the sum of the $u_r$ in either $P_i$ or $P_j$ has norm at least $C^{1/2}$. So the success of the method implies the vector-valued EDP, but it may prove something a bit more general still. I’ll have to think further.

• Moses Charikar Says:

To aid my own understanding, let me compare the conditions required for the SDP argument to work and those required for the representation of a diagonal matrix to work:

In order to prove a lower bound of $C$ on discrepancy via the SDP argument, we need the following property: for any positive semidefinite matrix $B$ with 1′s on the diagonal, there must be a HAP with corresponding characteristic vector $B$ such that $P^T B P \geq C^2$. This is equivalent to saying that in the vector version of EDP (where elements of the sequence are unit vectors instead of $\pm 1$), the square discrepancy of some HAP is at least $C^2$. This is by the argument in Tim’s comment above. Since $B$ is psd, we can find vectors $u_i$ such that $B_{ij}=\langle u_i,u_j\rangle$. Then $P^T B P = ||\sum_{i \in P} u_i||_2^2$.

In order to prove a lower bound of $C$ on discrepancy via the representation of a diagonal matrix argument, we need the following property: for any matrix $B$ with 1′s on the diagonal, there must be a test vector
$v$ such that $v^T B v \geq C^2$. I don’t think you can assume that $B$ is positive semidefinite, and if so, I don’t have a good interpretation for what this condition is saying.

On the other hand, consider the modification of the representation-of-diagonal argument to allow $\sum_i \lambda_i v_i v_i^T - D$ to be positive semidefinite (earlier we insisted that this was 0). For a proof of this kind to succeed, I think you need the following: for any psd matrix $B$ with 1′s on the diagonal, there must be a test vector $v$ such that $v^T B v \geq C^2$. If this is the case, as Tim points out, there exist $u_i$ such that $B_{ij} = \langle u_i,u_j \rangle$ and the existence of the test vector $v$ implies that $||\sum_{i \in P} u_i||_2^2 \geq C^2$ for some characteristic vector $P$ of a HAP. But then, $P^T B P \geq C^2$. Does this mean that instead of test vectors $v_i$, we could have used characteristic vectors of HAPs instead ? If this is true, it seems that this proof technique is equivalent in power to the SDP dual argument.

• gowers Says:

Here is an interpretation of the representation-of-diagonals approach, conditional on a linear algebra statement that seems plausible to me but that I have not checked. That statement is that if $B$ is any $n\times n$ matrix then one can find vectors $u_1,\dots,u_n$ and $v_1,\dots,v_n$ such that $\langle u_i,v_j\rangle=B_{ij}$ for every $i,j$. This says I can factor $B$ as $U^TV$, which starts to look disappointingly trivial as I can take $V$ to be any invertible matrix and then take $U^T$ to be $BV^{-1}$.

Never mind the fact that it is trivial — let’s see what it implies. As above, we know that the representation-of-diagonal approach fails if and only if there exists some matrix $B$ with 1s down the diagonal such that $v^TBv\leq C$ for every test vector $v$. Therefore it succeeds if and only if for every matrix $B$ with 1s down the diagonal there is a test vector $v$ such that $v^TBv>C$. As I noted above, that implies the existence of HAPs $P$ and $Q$ such that $|P^TBQ|>C$. (Actually, I forgot the modulus sign earlier, but once it’s there the conclusion is valid.) Now if $B_{ij}=\langle u_i,v_j\rangle$, then this tells us that $|\langle\sum_{i\in P}u_i,\sum_{j\in Q}v_j\rangle|>C$.

Thus, if the method succeeds, then we get the following apparently stronger version of the vector-valued EDP: let $u_1,\dots,u_n$ and $v_1,\dots,v_n$ be any two sequences of vectors such that $\langle u_i,v_i\rangle=1$ for every $i$. Then there exist HAPs $P$ and $Q$ such that $\langle\sum_{i\in P}u_i,\sum_{j\in Q}v_j\rangle> C$ for some $C=C(n)$ that tends to infinity with $n$. I haven’t thought about whether this statement is likely to be true. If we insist that $u_i=v_i$ for each $i$ then we recover the usual vector-valued EDP. Note also that the statement is potentially interesting even for real numbers: if we have two sequences $(x_n)$ and $(y_n)$ such that $x_ny_n=1$ for every $n$, does it follow that we can find HAPs $P$ and $Q$ such that the product of $\sum_{i\in P}x_i$ and $\sum_{j\in Q}y_j$ is unbounded in size?

Is this non-symmetric vector-valued version equivalent to the representation-of-diagonals statement? I was expecting to have something to check there, but now that I look back I think it’s trivial, since characteristic functions of HAPs are test vectors.

• gowers Says:

If it turns out to be correct that the representation-of-diagonals statement is equivalent to the non-symmetric vector-valued EDP, and if we can’t find a counterexample to the latter statement (and a very brief think about it in the real-valued case makes me think it has a good chance of being true), then I would like to argue that the representation-of-diagonals approach (rather than its weakening where you allow the difference to be positive semidefinite) is a good thing to think about. I have various reasons for this.

1. Proving that numbers are zero is easier than proving that matrices are positive semidefinite.

2. If we try to represent a diagonal matrix, we don’t have to decide in advance what that matrix should be, so we save ourselves from having to make some huge guess. All we care about is that on average the $\ell_2$ norms of the test vectors we use should be unbounded.

3. Trying to prove a stronger statement is often easier than trying to prove a weaker one, because one’s moves are somehow more forced. (There are counterexamples to this — consider van der Waerden’s theorem and Szemerédi’s theorem, for instance — so there is no guarantee that this principle applies in our case, but I have a hunch that it does.)

• Moses Charikar Says:

Very interesting question and I agree that getting a diagonal matrix seems easier to do than proving that some matrix is positive semidefinite. Tim, it looks like all you need for your proof to work is to be able to express a diagonal matrix as a convex combination of matrices of the form $P_i \otimes P_j$ and $- P_i \otimes P_j$ where $P_i,P_j$ are characteristic vectors of HAPs. If this is correct, this is easier to think about than taking convex combinations of test vectors. Also, the best such convex combination (that maximizes the sum of diagonal entries for an $n \times n$ matrix can be found by linear programming. The problem involves roughly $n^2 \log^2 n$ variables (corresponding to all pairs of HAPs) and roughly $n^2$ constraints (corresponding to the entries of the $n \times n$ matrix).

• gowers Says:

Ah yes — I don’t know why I was blind to this simpler formulation, which will make it easier to think about. And of course, I’d be very interested to see what the best decomposition is for various values of $n$. I think even a moderate-sized $n$ such as 250 could be quite informative.

Since the number of comments on this post is approaching 100, I am in the process of writing another post. I am using this opportunity to try to formulate this representation-of-diagonals approach as clearly as possible, and I will also mention there a few thoughts I have about how one might actually go about finding a good representation. But these thoughts could well need to be refined a great deal in the face of the experimental evidence. (Another possibility is that the optimal solution involves an apparently magical cancellation that leaves one none the wiser about how to approach the problem theoretically, but I hope it will give some useful clues.)

• Moses Charikar Says:

I have some initial LP solutions for this problem. The optimum values for small values of $n$ are as follows (an optimum value of p/q indicates that the convex combination involves multiples of 1/q):
6: 4/13
10: 25/81
12: 7/20
16: 40/110
18: 20/50
50: 0.422825
The solutions produced by the LP solver are here:

http://webspace.princeton.edu/users/moses/EDP/try-lp/

(I am not sure that the optimum solutions are unique).
The values $d_i$ give the value of the $(i,i)$ diagonal entry of the matrix (off-diagonal entries are zero) that the LP expresses as a convex combination of $\pm P_{d_1,k1} \otimes P_{d_2,k2}$. Here $P_{d,k}$ is the characteristic vector of the HAP with common difference $d$ and length $k$.
The lines of the form $(d1,k1)(d2,k2)$ give the coefficient of $P_{d_1,k_1} \otimes P_{d_2,k_2}$. I impose the condition (w.l.o.g.) that the coefficient of $P_{d_1,k_1} \otimes P_{d_2,k_2}$ is equal to the coefficient of $P_{d_2,k_2} \otimes P_{d_1,k_1}$ and output only one of them. Zero coefficients are not output and there are many of them (less than 2% of the coefficients for the $n=50$ instance are non-zero).

The solver is currently working on $n=100$. It’s taking up about 3G of memory for this instance, so it is unlikely that I can solve bigger sizes, but we should be able to get solutions for larger $n$ by running an LP solver on a parallel cluster.

• Alec Edgington Says:

Moses, I’m a little confused by these numbers. Looking at the data for $n=6$, I see that the $(1,1)$ diagonal entry is nonzero. But the only contributions to this are from matrices of the form $P_{1,k_1} \otimes P_{1,k_2}$; the only such terms here are $(1,2) (1,4)$ and $(1,3) (1,3)$ (and their transposes, which you’ve constrained to have the same coefficients); and these two terms have equal and opposite coefficients, so should cancel out. What have I missed?

• Alec Edgington Says:

Ah — I think I see my mistake. The $(1,3) (1,3)$ term doesn’t get added to its transpose!

• Moses Charikar Says:

Yes, I thought this might be a source of confusion. Perhaps I should have printed out all pairs of HAPs that are used in the convex combination explicitly.

The solution for 100 is finally done – the LP solver took almost 23 hours and more than 170,000 iterations to solve the problem. I’m not going to be able to solve any larger problems on my machine unless we restrict the problem somehow.
The data for 100 is here:

http://webspace.princeton.edu/users/moses/EDP/try-lp/

• Klas Markström Says:

Moses, I have two suggestions.
1. Could you also try some additional small sizes, say n=30, 40, 60?

2. There are some values which are quite small for m=50,100, and fairly small for several other values of n, e.g. d7 and d17. Could these values actually be set to exactly 0 and still give an optimal solution for all n?

Once some analysis of the solutions at hand has been done I’d be happy to run some larger cases as well.

• Moses Charikar Says:

Klas, I have put up solutions for 30, 40 and 60. I presume the extremely small values can be set to zero and still yield an optimal solution, but I am not sure about this. I don’t know of a good way to force the LP solver to not generate such small values. I decreased the error tolerance of the solver from 1e-6 to 1e-8. That may help. Other than that, one would have to examine the solution and resolve the problem with new constraints setting extremely small values to 0 explicitly. It’s possible to automate this process, but it will require some work.

BTW, there are some small values in the solutions with a double negative sign –. They should be positive. In fact here the solver returned a very small negative value for a variable that was supposed to be non-negative, so this is a tolerance issue.

• gowers Says:

I have a small presentational suggestion, which is that if the numbers are in fact numbers with small denominators, then it might be nice to present them as such rather than as decimals.

I also like the idea of presenting both $P\otimes Q$ and $Q\otimes P$. I find that when trying to understand what is going on in a particular representation it is slightly inconvenient to have only half of the off-diagonal coefficients displayed.

• Klas Markström Says:

Moses, you might want to try GNUs lp-solver from the glpk-package. That solver has an option which makes it solve a linear program using exact arithmetic. This slows the solver down a lot but it might give some interesting information. for small n at least.

• Klas Markström Says:

28. Gil Says:

(Continuing the linear algebra approach that Tim mentioned.)

Here is the simplest linear-algebraic question coming from EDP that I can think about: Let A_1, A_2,…,A_m be a partition of {1,2,…,n} into intervals of length at most t.

Consider the system of linear equations $\sum_{i \in rA_j} x_i=0$, where we have an equation for every r and j such that $rA_j$ is contained in {1,2,…,n}.

Then we conjecture that this syetem of equations implies for a large n that one of the $x_i$s equals to zero.

It is simple but sort of a strange question. (I suppose we also conjecture that in order to have any non zero solution the A_is should be very “structured” but we dont know what “structured” precisely means. It should be related to the lengths of the intervals being periodic)

An equivalent way to state it that takes us closer to SDP is that if we add to the equations the inequalities $x_i^2 \ge 1$ then the system is never feasible.

Maybe looking at the dual problem can add some insight?

The fact that we have equations for every way we try to partition [n] to short intervals makes it seemingly more difficult than the direct SDP approach.

29. Kristal Cantwell Says:

I have been looking at one problem that came up a while back that is the following: There are two players one is trying to force a large discrepancy the other is trying to stop it. The players pick primes and assign signs to them and the discrepancy is of the multiplicative function which is determined by the signs of the primes.I have an idea for showing a that the large discrepancy can be forced in some cases.

Let me describe the winning strategy if the idea works.
The player trying to force the discrepancy chooses
the lowest available prime and sets it positive.

Let me look at a specific case and show how the discrepancy can be forced. I believe that this idea can be extended to further cases.

The case is that the player who is trying to force the discrepancy moves
first and chooses 1/2 then for some reason the other player does not choose 1/3 so the first player chooses that and both primes are set to 1.

The first step of the proof is to group the prime
factors in pairs. Because of the way signs are assigned
they can be arranged in pairs. The first of these pairs will
be two and something. Now we can show that the
constributions made by all numbers having factors nontrivially
in a in any of the later pairs can be ignored. We note
that for any pair p,q and multiple rx, with x having factors only
in p and q we can order p pr qr p^2r etc in a way that the series
is alternating and starting with a positive number this will result
in a positive or zero discrepancy to the total discrepancy so if
the factors having factors in the first pair contributes k
to the discrepancy then we will be done if we can make k arbitrarily
large. So the idea is to ignore everything except the first pair. Now we can make the first pair consist of 2 and 7. And then if we look at these we can see the discrepancy will be high as we can pair powers of consecutive powers of 7 which will have different signs and which will increase the discrepancy. Anyway that is roughly the idea that I am looking at now.

30. Kristal Cantwell Says:

If you are looking at a multiplicative function then instead of looking at the the quadratic form

\displaystyle \sum_{k,d}c_{k,d}|x_d+x_{2d}+\dots+x_{kd}|^2-\sum_ib_ix_i^2 look
at the following

\displaystyle \sum_{k,d}c_{k,d}|x_1+x_{2}+\dots+x_{k}|^2-\sum_ib_ix_i^2

the idea is to try to use the information that the function is multiplicative to get a simplification as it looks like the number of different types of sums that are squared depends only on the number of terms and that might be useful.

31. gowers Says:

Apropos nothing much, here is a small observation: the notion of multiplicativity can be generalized to the vector-valued setting even when there is no given algebra structure on the vectors. In the $\pm 1$ case we can characterize multiplicativity by saying that the angle between $x_i$ and $x_j$ (which is always either 0 or $\pi$) depends only on $i/j$. This way of phrasing the definition makes just as good sense for unit vectors in a Euclidean space. This may be relevant to Kristal’s proposal to restrict the SDP approach to multiplicative functions.

Another way of thinking about the multiplicativity condition is to say that the map $S_d:x_i\mapsto x_{di}$ is an isometric embedding for every $d$.

• Moses Charikar Says:

Yes, at some point initially, I had considered imposing these multiplicative constraints in the SDP. With these constraints in place, the discrepancy of any HAP is equal to the discrepancy of the 1-HAP of the same length. Eventually, I did not pursue this because I felt the large number of constraints would make it difficult for SDP solvers to solve the problem for large $n$. It is possible that the SDP solutions are “nicer” with these constraints. Another interesting thing to check would be whether the optimal SDP solutions (without multiplicativity constraints) are in fact “multiplicative”, i.e. $\langle v_{di},v_{dj}\rangle = \langle v_i,v_j\rangle$.