I do not have too much to say about the new programme to use SDP to solve EDP, other than that it is still being actively pursued. One remark I would make is that I have rather forgotten recently about keeping the wiki updated, and SDP is a major omission from it. It would be good to have a theoretical account of the method (which I suppose I could paste quite easily from one of my existing posts), and an experimental page devoted particularly to data connected with this approach, with links to all the matrices, sequences and plots that have been produced.
So far, the data has had some expected features (such as the sequence values tending to be larger when
is smooth) and some puzzling ones (such as the fact that
is much much larger than
). An initial hope for the method was that the experimental data would give rise to a very precise conjecture, but so far that has not happened. There are, however, various avenues that have not been fully explored, and I still have some hope that we will suddenly stumble on some data that we can fully understand.
One way of doing this is to introduce a smoothing. One idea I had turned out to go too far: in the search for a pretty formula I ended up with a version of EDP that was false. For the record, here is that version. One can think of the discrepancy of a sequence as its largest inner product with the characteristic function of any HAP. Those characteristic functions have sharp cutoffs (in that they go up to for some
and then suddenly stop), so one might hope to make the problem nicer by smoothing the cutoffs. One natural way of doing this is to look at functions such as
, which take the value
at
and 0 at non-multiples of
. For this to be a sensible idea, we need a “smoothed EDP” to be true. That is, we need it to be the case that for every
sequence
there exist
and
such that
has absolute value at least
. However, this is false for the character-like function
. The multiplicativity of
implies that it will be enough to show this when
, so let us fix some
and calculate the sum
.
We do this in the usual way. First, let us look at non-multiples of 3. That gives us the sum
which equals . More generally, the sum over multiples of
that are not multiples of
works out as
. One can check that the function
is increasing in the interval
, so if we sum this over all
then the alternating series test tells us that the total is at most 1/3, whatever the value of
.
A similar argument works if we use a weight of instead, but the calculations are easier. By the alternating series test, we know that
is positive and at most 1. Call this value
. Then
, which is at most
.
At one point I observed that if you have a sequence of bounded discrepancy then the Dirichlet series
is uniformly bounded for all positive real
, and wondered if we could deduce anything useful from this. The fact that
has this property shows that we certainly cannot derive a contradiction, though it leaves open the possibility of deducing that the sequence
is forced to have character-like behaviour.
In the light of these observations, what hope is there for obtaining nicer formulae by smoothing? The answer is that even if we cannot smooth the HAPs, we can at least smooth the interval of integers we are thinking about as a whole. That is, instead of thinking of discrepancy as a function of (the length of the sequence
), we could take an infinite sequence
, associate with
a weight
, and think of the discrepancy as a function of the decay rate
(or equivalently the half-life, which is proportional to
). That is, we would like to prove that for each
there must exist
such that
has absolute value at least
, where
as
.
If we try to prove this using Moses’s SDP approach, then, as Moses points out, there is a nice formulation of the problem. We would like to find non-negative coefficients that sum to 1, and coefficients
such that
, such that
for every sequence . This is similar to what we were looking at before, but now we have attached some slowly decaying weights to the
.
Note that we could think of the previous version of the problem as doing exactly the same, but with weights that equal 1 up to and
thereafter. I am hoping that with these smoother weights, we will get nicer numbers coming out.
I would like to end this post by drawing attention to Gil’s polynomial method. This is a completely different approach to EDP, where one constructs a polynomial over a finite field that is identically zero if and only if EDP is true. Rather than explain the idea in detail, let me link to two comments in which Gil himself gives a nice explanation. He introduced the idea in this comment and an interesting variant of the idea in this one. It would be good to add this method too to the list of possible proof techniques on the wiki.
March 7, 2010 at 12:36 pm |
It occurs to me that if the exponentially decaying weights do not produce nice numbers, then it would also be worth trying weights of the form
for some smallish
. But these weights would decay quite a bit more slowly, so, given the limits of computer power we face, we might have to settle for
or something like that. (In other words, the quantity we would be trying to maximize would be
.) It’s a bit unsettling that when
these weights sum to infinity, but I think it isn’t a serious problem.
March 7, 2010 at 2:53 pm |
Moses, here is a question that you will I think be able to answer immediately, whereas for me it would take some effort. Suppose you did not exclude the HAPs of length 1 from the data. Would the effect of that (for small
) be that the best solution would be to take
and everything else zero, or would one get a tiny improvement on that (perhaps by taking
and
for some very smooth
, and being able to choose the remaining coefficients in such a way that the sum of the
is very slightly greater than 1?
I ask because I am wondering whether the rather strange observed behaviour of the
at small numbers might be explained somehow by the exclusion of the very small HAPs, or more generally by “edge effects at zero”.
If that is a problem, then one could perhaps do a double smoothing: instead of just weighting the
with an exponential decay, one could make them start fairly smoothly as well. For instance, one might choose a weight such as
for
.
March 7, 2010 at 4:10 pm
Correction, a better normalizing factor would be
, so that the weight of
was
. The maximum of these weights would be 1 and the sum would be proportional to
. If
, then the value of the weight is
, so the weight has order of magnitude comparable to 1 when
is proportional to
. So the “effective length of the interval” is still proportional to
but now it is smoothed at both ends.
March 7, 2010 at 6:54 pm
Good question about excluding the singleton HAPs. I had tacitly assumed that in order to prove a lower bound
, one should not any place on HAPs of length
. Let’s think about
. The value of the bound is the ratio of
to
. Suppose this ratio is
. Then for every
, I claim that it cannot be the case that both
and
. If this happens, reducing both
and
by
gives a better bound (and the quadratic form is unchanged and hence still positive semidefinite). So it is possible that the bound is
for the values of
we have looked at and if so, we should see interesting structure by introducing singleton HAPs.
But now that I think about it, we actually have the flexibility to set the
values negative if it helps us. (Technically the
variables are dual variables for the constraint
. For such a constraint, the sign of
is unrestricted. We had thought that
negative was unnatural and relaxed the constraint to
. – a lower bound that uses this constraint continues to be a valid lower bound for the
setting. Having relaxed the constraint to
, the dual variables
are constrained to be non-negative). If we allow ourselves to be use the flexibility of setting
negative, then by the argument above, if the bound is
, w.l.o.g. for all
,
. In other words, if we can establish a bound
by including singleton HAPs, then we can establish such a bound without using singleton HAPs as well.
Our earlier experiments with sign-unrestricted
(without singleton HAPs) did not yield bounds
. This seems to indicate that we should not see interesting structure if we throw in the singleton HAPs. Perhaps we should revisit those solutions in any case to see if the
have nicer structure.
March 7, 2010 at 11:29 pm
Another way to mitigate the “edge effects at zero” might be to focus on bounding the drift instead of the discrepancy. Then we would include terms corresponding to the “shifted HAP”
in the quadratic form. The end points
and
are not completely symmetric, but perhaps they can be made somewhat symmetric with respect to step sizes
by choosing n = gcd(2,3,…,t). But gcd(2,3,…,9) = 2520 is already above the limit of what we can currently solve (although we might be able to go beyond that with more powerful/parallel computing machinery).
One downside of this idea is that we get constraints (and dual variables) for every shifted HAP, giving something like
number of constraints. If instead, we hypothesize that for any
, all shifted HAPs
and length
should be given the same weight, then we are back to the
constraints we had before.
with common difference
March 8, 2010 at 2:20 am
A slight extension of the idea above (motivated by an earlier comment by Alec) would be to construct the quadratic form for shifted HAPs over the interval [-n,n]. Now the end-points are symmetric, although there is still an edge effect. Multiples of
appear in roughly the same number of shifted HAPs of common difference
and length
. This number is
except for those multiples of
that are close to the end points. Perhaps one should only consider pairs
such that
is small w.r.t
.
March 7, 2010 at 3:51 pm |
An equivalent formulation of Gil’s field conjecture is this (this only works when we are working with HAPs):
Let F be the field with p elements and let
be nonzero elements in F. Then there is
with
.
Gil’s conjecture for some fixed t is trivially equivalent to this with
.
March 7, 2010 at 4:44 pm
Just to state the obvious, Gil’s conjecture doesn’t involve the multiplicative structure of the field at all — so would make sense for any abelian group, or even any group. What groups might it hold for? Clearly every element must have finite order for the conjecture to stand a chance. That isn’t enough, though: it’s easy to construct counterexamples in
. Can we find a counterexample in a finite group?
March 7, 2010 at 7:43 pm
So what we need now is to prove the relation:
, over a field with p variables, when n is large.
Does it relate to some interesting algebra? Is there a method to tell when a product of many linear terms in n variables vanishes over a field with p elements? Does it ring any other bell?
March 7, 2010 at 8:01 pm
I don’t think that is quite right, because
and
for
gives a product of 1. In other words, the ‘nonzero’ in Sune’s formulation is essential.
March 7, 2010 at 8:57 pm
Doesn’t this start-from-zero formulation not guarantee a (proper) HAP with sum 0, since we can’t take
?
March 7, 2010 at 8:31 pm
You are right: we need that
over a field with p variables, when n is large.
March 7, 2010 at 10:12 pm
@obryant: You right.
March 7, 2010 at 10:18 pm
*You -> You are
@Alec: We can’t make the conjecture more general that way, since we need that every element generate the group. If there was an element, that didn’t generated the group, we could take the constant sequence of the element. Of course we can forget about the multiplicative structure, but that doesn’t help us.
March 8, 2010 at 8:07 am
Sune, you’re right: I was just thinking about the case
of Gil’s conjecture.
March 8, 2010 at 8:32 am
One comment about the polynomial above
, (I hope now it is ok) is that what we want is that when p is small and n is large it gives identically 0. I think we can use Chevallay-Warning theorem to show that when n is not too large w.r.t. p (and it appears that it will be the threshold of the probabilistic heuristics) then there is a non zero solution.
March 8, 2010 at 10:12 am
Alec, I see. That might be true in some new cases, but at least we need the group to be finite. Each term is only contained in a finite number of HAPs, so if the group is infinite, it will always be possible to add another term at the end of a finite sequence, without creating a HAP with sum 0.
March 8, 2010 at 12:34 pm
well, on further thought I do not see how Chevallay’s theorem (that requires more variables than the degree) is applicable to any interesting case here.
March 8, 2010 at 1:36 pm
Sune: indeed. And actually the multiplicative structure in
does make it more interesting. For one thing, it reduces the problem to two cases:
and
(since as Kevin points out, all
cases are equivalent).
In the
case there is the possibility to consider multiplicative sequences as a special case, interesting in its own right.
However, I think that for EDP it is enough to prove the conjecture for
.
March 8, 2010 at 11:13 pm
Here’s the Mathworld entry for Chevalley’s theorem.
March 9, 2010 at 7:44 am
Let me write
for Kevin’s
. If my calculations are correct, we have:
(This last search is still ongoing; I’m not hopeful of getting a definite value by brute force.)
March 7, 2010 at 6:59 pm |
[…] me update this there is another thread. There are two methods used one involves semidefinite programming. The other method which is […]
March 7, 2010 at 7:53 pm |
We have some experiments underway to try out the exponential interval smoothing proposed here . We should have some data to report soon.
I am also intrigued by the exponential “HAP length smoothing” that Tim proposed here . Basically, you build in the exponential decay
that we have observed experimentally and get the SDP to figure out the values for
. For every
, this is a smooth version of taking the average discrepancy for HAPs involving elements of the sequence upto $1/\alpha$. I should say that Huy pointed out earlier that while the tails
seem to fall off exponentially, the
are not monotone – in fact there seem to be some patterns there too, but for now, we are assuming (hoping) that these patterns are possibly a lower order feature of the SDP solutions that are not critical to getting a good lower bound. Trying out Tim’s HAP length smoothing idea will help build confidence in this.
We do have some data from a related idea that Tim and Gil put forth, namely to
. That SDP solutions are described here , but we haven’t had a chance to look through them to see if the
values have nice patterns.
take average square discrepancy over all HAPs, giving equal weight to every HAP
involving sequence elements upto
March 7, 2010 at 8:37 pm |
I’ve had another look at the
values data on this page, concentrating on the case
. I thought I’d try to ignore any boundary effects, and would therefore ignore very small numbers. One way of doing that seemed to be to look at multiples of 6, starting at 24 (the point at which the values cease to be tiny).
This offers the usual mix of tantalizing hints that something’s going on and counterexamples to any hypothesis one might actually come up with. It definitely seems as though
likes to be large at smooth numbers, so it is tempting to guess that it is (typically) given by a formula of the form
. Any reasonable such formula would, one might naively think, involve a monotonic function
. However, this is incompatible with the data: at 42, 66 and 78 the values decrease, suggesting that
is a decreasing function, and at 102 (that is, 6 times 17) it increases again.
I’m hoping this is another anomaly that results from edge effects and will disappear if one looks at a smoother version of the problem. Indeed, I’m hoping that the best choice for
is indeed of the given form, perhaps multiplied by an exponential decay in
. Indeed, for the smoothed problem, I might conjecture (not on the basis of strong evidence) a formula of the form
for some nice monotonic function
(such as a power of
).
March 7, 2010 at 10:11 pm |
To exploit the captive audience here, I’ll shamelessly give another plug for using the terminology “friable” for a number with only small prime factors, rather than “smooth” as in the second paragraph of Tim’s post: to support this preference, notice how many times the word “smooth” is used in the rest of the post, with meanings having nothing to do with prime factors!
March 8, 2010 at 10:33 am |
Here is a guess about what the coefficients
might be (or rather, might be closely related to).
I mentioned a couple of comments ago that it looks as though the formula for
might involve an expression such as
for some nice function
. (I say “involve” because I would also expect some kind of decay term as well.) I decided to think about what the constraints on the
would tell us about
. In particular, I was thinking about the constraint that we want the sum of the
to be infinite but the sum along non-multiples of
to be finite for any
.
Let’s pretend we already know
and know that it is multiplicative. What would that tell us about the sum of the
? Well, we’d get
As for the sum over all odd
, this is the same except that
has to be odd, so the sums on the right hand side end up being over all odd
. Morally speaking, we end up with a sum of a quarter of the size of the original sum.
Assuming that
decreases smoothly, this “morally” can be turned into a rigorous approximate statement. So we learn that
can’t decrease smoothly after all.
However, we have managed to penalize the odd numbers, and more generally non-multiples of
have been penalized by
. This suggests that we could iterate the procedure.
I’m now going to start with
for every
and do the iteration. I don’t care whether things tend to infinity (they will) but I want the sum over multiples of
to be much larger than the sum over non-multiples of
, so that I can then multiply by some decay factor and get sums over HAPs to diverge and sums over complements of HAPs to converge.
The first step is to get
. The next stage is to get
which is the number of ordered triples
such that
. (We’ve seen this before and it’s usually called
.) The sum of
over odd
is the sum over all odd
which gives us a penalty of
.
Jumping ahead, I propose stopping at a fixed point, which is the function
defined as the number of ordered
-tuples
(for any
) of integers greater than 1 such that
. Unfortunately, this is infinite, since we can just tack on 1s the whole time. But we get something nicer if we count the closely related quantity where each
has to be greater than 1.
This is equivalent to counting the number of chains
such that
,
and
for every
.
If we sum this function up to
, then we get the number of ordered tuples of integers greater than 1 with product at most
. And if we restrict to odd numbers, we’ll get a penalty factor that tends to zero with
. I haven’t yet worked out exactly what it is.
I also haven’t checked whether a function like this could satisfy the other constraints, but I feel as though it might be in the right ball park.
March 8, 2010 at 10:54 am
It occurs to me that what I’ve just been calculating is something like the principal eigenvector of the matrix whose
th row is the characteristic function of the HAP with common difference
. No time to check exactly what is going on.
March 8, 2010 at 11:11 am
Actually, I had a quick check and it was easy. At least formally speaking it’s this.
Take the matrix whose
th column is the HAP of common difference
. This is lower triangular with 1s down the diagonal. Remove the diagonal part, except for the 1 in the top left hand corner. Then
is an eigenvector of the resulting matrix … I think.
March 8, 2010 at 10:16 pm |
Here is some data from the experiments with exponential interval smoothing proposed here . That was relatively easy to do because it only involved a minor change in the SDP. (This does not implement Tim’s later suggestion of smoothing the beginning of the interval). I solved the problem for n=512 (with half life 100,200,300,400,500,1000) and n=1024 (with half life 200,500,1000). The solutions are here:
http://webspace.princeton.edu/users/moses/EDP/try2/
Briefly, the idea was that the solution should be a gracefully evolving function of the halflife parameter and for sufficiently large n (as a function of the halflife) the solution ought to be a close approximation to the solution for infinite n. Here are the values of the solutions for various combinations of n and halflife:
512, 100: 0.3545
512, 200: 0.4119
512, 300: 0.4363
512, 400: 0.4510
512, 500: 0.4616
512, 1000: 0.5510
1024 200: 0.4119
1024 500: 0.4616
1024 1000: 0.5537
At this level of precision, it looks like the solution value has settled for half life 500 at n=512. Of the solutions we have, I would say the solutions to look at are 512 (500) and 1024 (500 and 1000). The files dual*.txt in the directory above give the values of
and tails
.
Some initial observations are that the
values drop off very rapidly. It doesn’t look like the tails
drop off exponentially – I’m not quite sure what the trend is. It might make sense to examine individual
values instead.
March 8, 2010 at 10:32 pm |
Another piece of data, from the SDP to minimize drift over the interval [-n,n], suggested here . I’ve assumed that for any
, all shifted HAPs
of common difference
and length
have equal weight and hence, a common coefficient
. Also, I’ve assumed that
. As before, the SDP dual tries to maximize
subject to the constraint
.
does not have a multiplier in this summation). The SDP considers pairs
such that $kd \leq n$ and so far, I have solved the SDP for for n=100 and n=200. (These SDPs are harder to solve because the constraints are dense).
(Note that
The solutions are here:
http://webspace.princeton.edu/users/moses/EDP/try3/
Look at the dual solutions dual*.txt. For each
, the last two items on the line are the product
and the factorization of
. I think the solutions look very interesting. In particular, the
values look like they are either very close to 0 or bounded away from 0. If we could figure out a rule to classify them into these two categories, that would be a promising start.
March 8, 2010 at 11:53 pm
It seems useful to group the
values with the same product
. For each such product, it seems that at most one or two
values are bounded away from zero.
There may be some irregularities in the data for large
(i.e close to
). Ideally I would have liked to solve the SDP with $\max kd$ much smaller than
, but given that I was only able to solve for
so far, I used pairs
with product all the way upto
.
March 9, 2010 at 12:23 am
Here, in case it makes life easier for anyone, is a list of all the
for which
isn’t more or less zero, done for each
up to 17. Recall that the product of
and
does not exceed 200.
d=1: 4,11,13,22,23,29,32,77,79,103,183,195
d=2: 2,4,13,23,84,93,99
d=3: 2,11,13,23,29,32,46,62,65
d=4: 2,4,13,45
d=5: 2,4,13,16,33,39
d=6: 2,4,7,11,13
d=7: 2,4,7,11,13
d=8: 2,4,7,11,13
d=9: 2,8,12,13,21
d=10: 2,4,7,13,16
d=11: 2,4,11,12,13,18
d=12: 2,4,11,16
d=13: 2,7,8,11,14
d=14: 2,4,7,12,13
d=15: 2,4,8,10,11
d=16: 2,4,8,10,11
d=17: 4,7,9,10,11
These pairs certainly don’t look arbitrary, but it’s also far from clear what is going on.
March 9, 2010 at 5:19 am
If you want to look through the data, the file
values from the solution for
, grouped
. The format is
…
…
…
… factorization of 
http://webspace.princeton.edu/users/moses/EDP/try3/cgroup200g.txt
contains the
by
c …
March 9, 2010 at 12:45 am |
I wanted to make a general point, in response to the seeming irregularities in the data. I have suggested that perhaps they result from sharp cutoffs, but it is not absolutely obvious that smoothing the cutoffs should suddenly result in beautiful functions with structure that jumps out at us.
Consider, for example, the values
as
runs through the primes. If the
were given by a formula that was a product of a decay term and something arithmetical, we would expect a graph of
against
to be pleasant: probably a function that decays in a regular looking way, or perhaps one that starts by increasing but then decays. One would not expect discontinuities, or that some primes would be favoured over others (with no obvious congruence explanations). However, discontinuities and favouritism seem to abound when one actually looks at the data.
One possible explanation for this is that the
depend quite a bit on “random” features of the primes, such as whether a prime belongs to a cluster such as 101,103,107,109, or whether it is fairly isolated. Indeed, given what it is like to try to construct long sequences with low discrepancy, it seems highly plausible that this should be the case. If it is the case, then it’s possible that there is no hope of actually working out a formula for the
, however cleverly one smooths the problem.
Another possibility is that one would get greater regularity by not allowing HAPs with small common differences. But it looks unlikely that we would be able to get any useful data on this if
is too large, as the large HAPs almost certainly don’t kick in for a very long time.
If all this is the case, then we may want to think about making guesses that are inspired by the figures we have without actually trying to match them exactly.
I’m not saying this is definitely the case though. One possibility is that if we use a more number-theoretic smoothing such as
we will suddenly get something more number-theoretic coming out of the other end, despite not having the optimal matrix.
March 9, 2010 at 6:00 am
Yes, I agree that our goal should not be to fit the observed solution exactly,but rather to extrapolate some nice patterns that they exhibit. We could also impose on them patterns that we think they ought to exhibit (such as your guess about what the
values should be.
It is possible that we may be able to obtain solutions for much larger instances than the ones we currently have. Klas was going to try to get an SDP solver running on a parallel machine setup and Kartik mentioned he has some experience in the past with this. We also have an impressive high performance computing setup at Princeton, but exploiting this is probably well beyond my programming skills.
March 9, 2010 at 2:14 pm
Actually I now have some test jobs in the machine queue, with n up to 4096.
Getting the parallel version set up was a bit trickier than expected, but after some discussion with our support staff that could be solved. I have tested the program for small n and only 8 processors and that looked fine. The machine should be capable of solving really large cases, but I’m not the only user so for large jobs there will be quite a bit of queuing time.
This is the machine which I’m using for this http://www.hpc2n.umu.se/resources/Akka/
March 10, 2010 at 9:55 am
The first test case ran this night and everything went well. n=1024 took a less than half an hour using 32 processors. I hope the next few cases will start today and bring some information beyond what we already had.
March 9, 2010 at 1:15 am |
It occurs to me that I wouldn’t mind knowing what the sequence
such that the ratio of
to
is minimized looks like. My guess is that this would be a pretty easy computational task, and I think it might give us some interesting structure. It would also be interesting to see how near/far the extremal sequence was from being a
sequence.
March 9, 2010 at 1:33 am
The SDP actually produces unit vectors
that minimize this expression. You can either view the (primal) SDP solution as the Gram matrix of a set of unit vectors, or equivalently, as a covariance matrix of a set of
unit normal random variables. The question is: what is the best way to view/think of this
matrix.
March 9, 2010 at 1:24 am |
Tim suggested here that perhaps
. We can actually build this into the SDP dual, i.e. force the dual to have this form. I think this is equivalent to using the constraint that for every
, the average value of
is 1. This is a weaker constraint than saying that
for all
. Is it conceivable that the HAP discrepancy of length
sequences (of reals, say) grows unboundedly even under this weaker constraint ? The weaker constraint rules out the problematic sequence 1,-1,0,1,-1,0,…
March 9, 2010 at 7:36 am
That’s very interesting! But I do not see even that when you fix n having average sum of squeres =1 for every HAP does not imply that the individuals
are 1.
More generally, relaxing the condition on the x_i s in order to force some structure on the dual problem is a nice general idea. Some possible suggestions:
1) We can require that the sum of squeres average to 1 for HAP hose period divides n. We can even take n=m! and require only that the sum of squares along HAP with periods 1,2,…,m average to 1.
2) In a different direction we can just assume that x_i^2 are bounded below by 1
3) Or, that their average on HAP (or some selected HAPs) are bounded below by 1.
(Minimizing the discrepency forces the individual x_is to be bounded from above so we do not need to assume it.)
March 9, 2010 at 2:21 pm
I think forcing the average of
to be 1 does force all the
to be 1, simply because one is solving some simultaneous equations in the
that have a unique solution (because the corresponding matrix is upper triangular with no zeros on the diagonal). So one would probably have to ask for all the averages to be at least 1, or restrict the sizes of the common differences, or something like that.
March 10, 2010 at 4:29 am
Yes, forcing equality constraints for multiples of
with
is too strong. I like the idea of doing this for
that are small w.r.t
, or as Gil suggests, for
that divide
. (this might be better in terms of controlling edge effects – but this is unavoidable to some extent, since all HAPs start at one end point). Also, I think that using an inequality constraint is better than using a hard equality constraint. Like Gil, I don’t think this ought to change the nature of the problem. But this results in the corresponding dual variables being non-negative instead of being sign unrestricted and I think we might have a better shot at interpreting non-negative values. In fact, we have been doing this already with the dual solutions we have constructed. Most of the SDPs we have solved used the constraints
instead of
.
March 9, 2010 at 9:38 am |
Here’s a probabilistic model it might be interesting to think about.
Let
be some large integer and let
. (It might be that choosing
would be OK, but I’m not sure.) For each prime
, randomly choose nested subsets
of
, where
is maximal such that
, and do so in such a way that the cardinality of
is
. (Or, if it makes things much easier, choose
by randomly picking elements of
with probability
and keep going until you reach the empty set.) For every integer
we then let
be the set
.
We then think of the set
as a randomized analogue of the set of multiples of
, and of the sets
as randomized analogues of HAPs.
The obvious general question, which could be thought about either experimentally or theoretically, is this: how similar is this randomized model to what you get with actual HAPs? An even more obvious specific question is whether EDP is true in this set-up. That is, given a
sequence, must it have unbounded discrepancy on randomized HAPs?
I think this question might turn out to be easier than EDP, but might also give us some useful insight into EDP because the structure of the set system is very similar. It’s conceivable that it might also give smoother experimental data, because we wouldn’t have any obvious analogue of clusters of primes.
March 9, 2010 at 10:07 am
Interesting! As a detail, I wonder if it might be more natural to take all initial segments of
, rather than all
, as the HAP analogue (since this would give about the right number of HAPs).
March 9, 2010 at 10:43 am
Oops, that’s what I meant to do and I absent-mindedly messed up the definition. I’ve now changed it.
I also realize that I definitely prefer the model where you don’t fix the cardinality of
but rather you choose elements independently at random from
with probability
. That way for each
the set
has its elements chosen independently with probability
. I haven’t quite checked but I think it also means that if
then the joint distribution of
and
is that you randomly choose elements of
with probability
and to make
you randomly choose elements inside
with probability
. I’m sure lots of other statements like that are true. In other words, with this model a lot of probabilistic statements become quite neat.
March 9, 2010 at 11:18 am
Alec, it’s probably already occurred to you, but it’s only just dawning on me that this question (is EDP true for randomized HAPs?) is closely connected with the kinds of things you were asking about bounded partial sums. We would be asking for the partial sums to be bounded along every
. We know quite well what the probability is for any individual
, but how, for instance, would the events for
and
correlate? And what is the probability that the event for
holds for every
?
That last question is also quite interesting in the usual HAPs case. That is, if I take a random
sequence, what is the probability that the partial sums will be bounded along the multiples of any power of 2? We know that the Morse sequence has this property, but how rare is the property?
Let me write
for the event that the sums along the randomized set
are bounded. My hunch is that the events
should behave rather like independent random variables. Let me try to justify that heuristically.
Suppose we know that the partial sums along
are bounded and want to estimate the probability that, given that, the partial sums along
are bounded. I’ll regard the sequence
as given and
as random. So we know that the
are very well balanced between 1s and -1s. Therefore, when we choose
, it is rather similar to choosing a random sequence of 1s and -1s. So it looks as though the problem is rather similar to that of simply choosing a random sequence of length roughly
and asking for the probability that the partial sums are bounded.
Moreover, one would expect that if
holds, then the restriction of the sequence to
will be fairly typical, so that if we look at the event
then we can argue in a similar way.
It also occurs to me that this model may have features that are genuinely different from EDP. For example, it seems to me perfectly possible that “randomized prime EDP” could be true. That is, it could be the case that with high probability every sequence has unbounded discrepancy on some
with
prime.
The reason this could be the case is that if the event
has probability something like
and the events are independent, then the conjunction of the events has probability
, so the expected number of sequences for which they all hold is less than 1 when
is sufficiently large.
March 9, 2010 at 9:54 pm
Let me first point out that this model is very similar to this one, but there is one important difference: That we only look at a finite number of primes at a time. I think it is an important difference (improvement), since otherwise each number belongs to infinitely many
s because
. I’m not sure exactly why this could be a problem, but I think there is a god chance that it is.
I think that the question should be: For a random system of sets
(random in the way Tim described it) does there exist a sequence with bounded discrepancy wrt. this system of sets? That is, we should allow the sequence to depend on the set system. But of course this problem is harder to analyse, because
isn’t well-defined in this question.
Here is a generalisation: Instead of using the probabilities
, we could use some probabilities
, where
is a decreasing sequence over the primes. This way we don’t get a model of EDP over the integers, but of EDP over the pseudo-integers!
March 10, 2010 at 12:01 am
I agree that what one is trying to show is that with high probability there is no sequence with bounded discrepancy on every single set of the form
. But the way I imagined one might try to prove this was by taking an arbitrary fixed sequence and proving that the probability that it has bounded discrepancy on all those sets is much smaller than
.
Thanks for reminding me about the earlier proposals for randomizing EDP — I had forgotten about them.
March 10, 2010 at 10:17 am
Here is another idea for a randomized construction. Instead of constructing the
independently for each prime and forming intersections, can we do a kind of randomized sieve of Eratosthenes to generate ‘random primes’?
That is, let
; given
, let
be the smallest number not in any of them, and let
consist of
together with a random selection of other numbers each with probability
.
I’ve just found out that this is known as the Hawkins random sieve, and an analogue of the Riemann hypothesis holds for it, as well as lots of other nice facts: see for example this paper.
We could then proceed to define
as above, as subsets of
, and then form intersections; or there may be some more general sieve-like construction to get general HAP analogues more directly.
Yet another, very simple, idea would be simply to define
, for any
, by
, and take the initial segments of the
as the HAP analogues.
March 9, 2010 at 5:48 pm |
It occurred to me that although I have persuaded myself that the
should be bigger when
belongs to more HAPs, I have not actually justified that statement to myself, unless you count the rather feeble argument that points that belong to more HAPs should be more important and harder to deal with.
I tried to think of some simpler problem that would still be hard. To simplify things, let me talk about the mean-square discrepancy problem: that is, the problem where we want to show that the average over all
of
is unbounded for
sequences. One way of doing that would be to find a sequence
such that
is unbounded and prove that the average in question is at least
for all sequences. In other words, we have a certain quadratic form and we would like to know not just that it is positive semidefinite (which is trivial as it’s a sum of squares) but that it is positive semidefinite with room to spare — allowing us to subtract a diagonal and still have a positive semidefinite form.
What I have just tried to do is find any non-trivial sequence
such that I can subtract
and preserve positive semidefiniteness. In a way it’s a hopeless task, because as soon as I manage to find a sequence it becomes, almost by definition, trivial. But there are at least levels of triviality.
To simplify the discussion, I shall approximate the number of HAPs we are considering by
(since there are about
HAPs with common difference
. So to begin at the bottom level of triviality, since for each
we have the singleton HAP
, we know that the average over all the HAPs is at least
. Indeed, this contribution is so trivial that Moses has simply subtracted it off.
However, we can do a lot better than that while still remaining in the realms of the trivial, as one might expect, since for
sequences we can prove easily that the mean-square discrepancy is at least some positive constant. Indeed, observe (not for the first time) that
. It follows that for any sequence
we have
is at least
(in the case where
is even) and also at least
. Averaging the two we find that it is at least
.
If we now sum this over all HAPs, we find that a lower bound for the sum is
, where
. Dividing through by
gives us
.
Now what is
? It’s the sum over all
of the number of multiples of
less than or equal to
. In other words, it is precisely the quantity that we approximated by
. So we end up with a bound of
. (I’m fairly sure that Moses made exactly this point in one of his early comments.) Or rather, the bound is more like
.
This, it seems to me, is the trivial bound one would like to beat. So a useful exercise would be to find any rigorous proof that we can subtract off some diagonal quadratic form
with sum greater than
and maintain positive semidefiniteness. Probably getting a constant greater than 1/4 is already challenging enough to give useful insights into the problem (unless there is a slightly less trivial triviality I have not taken account of — I’ll think about that now).
What’s truly trivial about the above approach is that for each
it subtracts off a diagonal form from
and keeps all the results positive semidefinite. It is easy to see that such an approach cannot give a good lower bound. Somehow one has to be “daring” and do the subtraction in such a way that the resulting form inextricably mixes HAPs with different common differences and does not allow the form to decompose into separate positive semidefinite forms for each common difference. It is for this reason that I think that to make progress beyond
will require a genuine improvement in understanding.
One final comment is that it is interesting that even in this trivial case we see that the
are favouring numbers with more factors.
March 9, 2010 at 8:32 pm
I’ve convinced myself that one can in fact improve on the constant
using, if not a trivial argument, at least what one might think of as a disappointingly elementary one. But I think it may not be a complete waste of time as it may give another hint about how to come up with an SDP proof.
The idea is this. We are trying to improve on the rewriting of
as
Let’s just concentrate on that expression when
. We see that equality can occur (leaving aside what happens right at the end), since we can take
,
,
,
, etc.. However, this best possible sequence for the HAPs with common difference 1 is terrible for HAPs with common difference 2, and that reflects itself in the fact that if we make this choice for
then the rewritten sum for
, which contains terms like
plus diagonal terms, will be strictly bigger than its diagonal part when the non-diagonal terms for
are all zero.
This means that if we put together the non-diagonal terms for
and
we will have something positive definite, so we can subtract a further diagonal contribution. I haven’t done the calculations, but I’m 99% certain that this allows us to improve on the contribution from
and
by a constant factor. If we now split up all the different
into pairs of the form
and play the same game for every pair, we should improve on the constant 1/4.
One could imagine then trying to continue to put different differences together and attempt to get something unbounded. However, getting the extra contributions to diverge would probably be quite challenging.
March 9, 2010 at 11:44 pm
I’ve now worked a little harder, and can provide an actual proof that the constant is bigger than
. Let’s continue to focus on the differences
and
. I want to group together the various terms such as
and
and squeeze more out of the diagonal. The preliminary stage is to write the sum of the
terms as
We also do the same for the even terms. I’m not going to bother to specify what happens when you get to the other end.
Now let’s leave out the diagonal parts. Let
be an even number. Let
and let
. Then I want to take the terms
up to
together with the terms
and
together and try to find a diagonal expression that bounds it from below.
To simplify the notation, observe that we are trying to find a lower bound for an expression of the following form:
Now using the identity
, we see that
is equal to
Similarly,
is at least
and
is at least
. So the sum of the
terms can be represented as a sum of squares plus
plus
. (We could probably do better than this — I am not trying to optimize the bound that comes out of this sort of argument since a computer could do that rather better.) Now
is equal to
, so we can gain an extra
on the diagonal.
We needed to use six terms to get that gain, which I think means that the eventual gain to the answer will be only
. So I think we get an improvement from
to
. (But as I said, we could squeeze more out here.)
The reason I am interested in this is that it genuinely mixes two different common differences, in this case with one equal to twice the other. Perhaps the next case to look at would be
and
combined. Again, one would like a slight improvement on the trivial bound that comes from treating each difference separately.
An important remark is that eventually it is not enough to consider just pairs of differences, since we know, for example, that it is easy to get bounded discrepancy on all HAPs with odd common difference. So the kind of parity argument that’s implicit in the calculation I’ve just done is an essential feature of the problem.
March 10, 2010 at 7:13 am |
Interesting. Continuing along the same lines for
and
combined, one should be able to squeeze out additional diagonal terms by grouping together terms from Tim’s rewriting of each individual HAP thus:


.
.
.
. If
, then this sum of squares expression must be positive. So for some constant
, it should be possible to subtract
from this expression to obtain something positive semidefinite.
Now it is not possible for this sum of squares expression to be 0 if
For this to happen,
Also
This implies
March 10, 2010 at 10:01 am |
A few remarks about this.
1. If one were going to try to produce a proof along these lines, one would have to add to the constant something not much less than
when taking account of HAPs of common difference
, even after having squeezed plenty out already using smaller HAPs. This doesn’t look very possible.
2. However, that was assuming that the contributions from the HAPs decay reasonably smoothly, but there is no reason for this to be the case.
3. It is also assuming that the best thing to do is greedy in the following sense: you squeeze as much as you can out of the diagonal, and then you introduce a new HAP and squeeze as much extra out as you can, and then introduce a new HAP, and so on. But even just looking at
and
I get the impression that that is not the right thing to do. For instance, the reason I got only
and
in the calculation above was that I had to use the term
in the derivation of both terms. If I had held back a little with
, it might have paid off twofold when I came to deal with the extra contribution with
. If that is the case, then understanding it is probably quite important when it comes to understanding why numbers with lots of common factors are particularly important.
4. A problem with this is that the calculations look as though they might get messier and messier as you take account of more and more HAPs. So at some point it might be necessary to pass from precise formulae for everything to a vaguer description of what is going on.
5. I looked above at “bunches” of terms and did a small trick to forget about the initial parts of the HAPs. But effectively that is just estimating the average discrepancy by looking at the average drift on small intervals. So we know (as is obvious anyway) that these intervals will have to get rapidly larger if we want to make progress. In practice, however, we can more or less forget about them: whatever we can prove about the first interval we’ll probably be able to prove about subsequent intervals, more or less anyway, by applying the trick.
6. Something it might be worth trying to do is to take a bunch of odd common differences and try to do the rewriting for them. We know that a sequence such as
will have small discrepancy, but perhaps we will find that we can do the rewriting in such a way that we end up with a bunch of terms that can be simultaneously zero only for that sequence. (An example of such a bunch of terms is the one we started with, namely
,
,
, and so on, which will be simultaneously zero only for multiples of
.) So we would have a choice: either we disappoint a lot of HAPs with odd common difference, or we have very rapid growth along the HAP with common difference 2. That’s an oversimplification, since we could also take the sequence 1,1,-1,-1,1,1,-1,-1,… but we might be able to get a handle on the prime-or-power-of-2 version of EDP somehow.
7. One can regard 6 as a bit like the original task except that now, instead of trying to squeeze out diagonal terms, we try to squeeze out terms of the form
that are putting pressure on the sequence to alternate. But we’d probably want to squeeze out not just those but also terms like
or perhaps things like
— anyhow, things that try to force periodicity with period a power of 2.
March 10, 2010 at 1:05 pm
Further remarks.
8. Here is a very simple demonstration of the obvious fact that if we rewrite a positive semidefinite quadratic form as a sum of a diagonal form and another form that is positive semidefinite and that vanishes for some sequence that is never zero (which implies that we cannot subtract any more from the diagonal without losing positive semidefiniteness), then we may still be able to improve the sum of the diagonal coefficients. This is obvious because the whole point of the SDP is to optimize the sum of the coefficients, which wouldn’t be an interesting task if you always got the same answer. A simple example that illustrates it is the quadratic form
. We could try working out the Cholesky decomposition, in which case we would write this as
, which puts it in the required form. Or we could do the same in reverse and get
. Or we could go the more balanced approach of writing
, which gives a sum of 2 to the diagonal coefficients and is therefore better. It’s easy to prove that that is best possible: we need somehow to deal with the
term. For that we need some combination of terms of the form
. Each individual term takes care of
of the total and subtracts
from the sum of the diagonal coefficients. But the ratio
is minimized when
.
Suppose now that we saw the decomposition
. What would alert us to the fact that we could improve it? The answer is that we would say to ourselves that the term
cannot be best possible if there is a positive
term around, because the coefficients of
and
are not equal, so it’s better to increase the coefficient of
by a small amount and reduce the coefficient of
by a larger amount (or more to the point, reduce the square of that coefficient by more than you increase the square of the coefficient of
).
Note that if we were looking at the version of the problem where we do not insist that all the diagonal terms are positive, then we wouldn’t even need the positive
term to run this argument. We could just take the term
and argue that it would have been better to take
.
9. In general, we have a quadratic form and we want to produce the off-diagonal terms as efficiently as possible — that is, we want to find a sum of squares that contributes as little as possible to the diagonal terms. Maybe we can think in advance what counts as efficient and then apply an intelligent greedy algorithm rather than a stupid one. To make that a bit clearer, what I mean is that there is a simple algorithm for writing the quadratic form as the sum of a diagonal part and a part that cannot be improved (because there exists a sequence that takes the value zero nowhere and yet maps to zero when you apply that part of the form — I’ve gone back to insisting that the diagonal coefficients are positive). However, there is a lot of flexibility in how one applies the algorithm, and a good choice might lead to a much better result than a bad choice.
10. For example, perhaps one could apply a Cholesky decomposition after all, but take the variables in a better order than order of size.
March 10, 2010 at 3:33 pm |
I want to pursue remark 10 just a little. Let’s take
and see what we can do with the quadratic form
If we expand this out we get
Now I want to apply the Cholesky decomposition, but I want to think a little bit about which order to take the variables out. Bearing in mind that I am trying to get rid of all the off-diagonal terms using linear forms with coefficients that are reasonably equal, a fairly natural tactic seems to be to start with the smallest diagonal term. Also, whenever we have two variables that do not form a cross term together, we can’t pair them in a bracket unless we use negative coefficients inside brackets and aim for cancellation. (This raises a question: does the best alternative splitting up use negative coefficients? At least in some examples it almost certainly does.) Anyhow, since 5 is prime,
pairs up less with the other variables than
, so let’s start by getting rid of
. That means we take a term
and subtract it off. Actually, I am now realizing the obvious fact that the diagonal term I am going to end up with will just involve one variable, which is not good at all. So really I should be stealing a little bit of
before I start and hoping for the best. (This is very similar to what Moses suggested right at the start.) If I do that, then it seems quite natural to remove
and then apply the first stage of the Cholesky decomposition. If I do that, then I end up subtracting
Hmm … that looks substantially worse, actually, since I’ve removed more from the other diagonal terms than I’ve gained with this
. So I’ll go back to the first idea and reserve the option to depart from Cholesky later. Subtracting the first expression I get the quadratic form
For convenience, let us multiply that by 3, getting
I think I’m going to abort this calculation as it’s getting rather unpleasant and isn’t obviously going to lead to a useful insight. Instead, I want to pursue a different idea.
March 10, 2010 at 3:46 pm |
The idea I want to think about is this. Suppose we have a representation of our quadratic form as a sum of squares. I’m wondering (and this may well be known) whether one can solve the SDP problem by means of a succession of small improvements. If so, then maybe we could get a good idea of what the optimal solution is like by using the fact that none of these small “local” improvements is possible. I’m hoping that that could give us a much more precise idea about why
tends to be large at numbers with more common factors, why it is often very small, and so on.
March 10, 2010 at 7:04 pm
The difficulty is that it isn’t possible to change just one or two terms at a time if those terms contain more than just one or two variables. Here instead is a general criterion for the representation of the matrix of a quadratic form in
variables as
to be best possible (where
is diagonal and
may as well be upper triangular).
It tells us that if
is an upper triangular matrix such that
is diagonal, then the trace of
must be zero. This follows from an easy variational argument: if the trace is non-zero, then add a very small multiple
to
. We have
plus a lower-order term. If the trace of
is non-zero, then choosing the sign of
appropriately, we find that, up to first order,
gives us the same quadratic form, except on the diagonal where the sum is smaller. So we can add the deficit to
and we have an improved representation.
If you try to solve the equations that arise when you look for an upper triangular matrix
such that
is diagonal, you find (i) that it is reasonably easy, and that you are free to choose the diagonal of
as you go along. That is, you end up with an
-dimensional space of solutions. (I haven’t checked this statement carefully, but I think it is a good approximation of the truth.) Therefore, the fact that the trace of
is zero for all such matrices seems to be telling us something reasonably strong about
. (Note that the trace of
is a well-known object, given by
.) However, I have not got my head around what it is saying.
I’m not actually aiming to understand the best possible representation, for reasons discussed earlier, but I hope that by looking at this variational problem it may be possible to say that certain representations are clearly not best, and thereby perform the algorithm more intelligently. (I think of the algorithm as “diagonalizing without worrying about the diagonal”. That is, given a symmetric matrix
, instead of the usual process of finding
such that
, one finds
such that
is diagonal. Also, there is no need to insist that
is a square matrix.)
March 10, 2010 at 9:39 pm
I think you are converging towards some sort of primal-dual algorithm for solving this SDP, but hopefully one that we might be able to reason about given that we have a specific semidefinite program that we care about.
Let’s interpret what’s going on. Let
be the matrix corresponding to the quadratic form we obtained by plugging in a particular guess for
. Having guessed these values, our goal now is to prove that

.
are vector analogs of the
variables
in the original formulation of EDP. Now let’s write this new problem as a semidefinite program, using the matrix
to denote the matrix of dot products of the vectors
:



denotes that
is positive semidefinite (psd). Let’s call this program the primal SDP. The dual of this is the following:



is large, subject to the constraints
Note that the
Here the notation
It is this dual semidefinite program that we are focussing on: how do you subtract the largest possible diagonal term from a quadratic form so that the remainder is positive semidefinite ? By semidefinite programming duality, the optimum values of the primal and dual are exactly the same. This gives us a way to certify that a particular dual solution is best possible: we simply exhibit a primal solution of the same value – this serves as a certificate of optimality. If we are not able to do this, then somehow this ought to give us an idea of how the dual solution can be improved. Very roughly speaking, this is the idea behind primal-dual methods. There are additional conditions we can infer about the structure of the optimal primal-dual pair of solutions (the so called complementary slackness conditions) and these further restrict the kind of primal solutions certificate we should look for.
At a high level, it looks like this is what you are proposing to do. You also point out that we can construct the dual solution in a series of steps, each increasing the sum of diagonal entries subtracted out and we never need to backtrack and explictly undo what we did before (if we allow the dual variables to be negative, which we can). At any point of time, if we succeed in constructing a primal certificate for the remainder quadratic form that we have so far, then we terminate and declare success, else (hopefully) we have a way to improve the current solution.
March 11, 2010 at 1:27 am
I’m sure you’re right that if my line of thought leads anywhere, it will be to a standard technique, and I definitely agree that what matters is not so much the abstract ideas but how they play out in the particular case we’re looking at. I still don’t feel I have a good account of why it will end up favouring numbers with many common factors. Obviously I can see why such numbers are relevant, but I’d like is to describe a way of squeezing little extra bits out of the diagonal that favours them. Such a method would, I think, have to mix different
s to a much greater extent than what we have done so far.
One thought, which I think Gil will agree with, is that we might get some useful information by thinking about linear algebra. We already know that if we have a sum of the form
, where the
are linear forms and
is shorthand for
, and if the
span
(which is really the dual space) then we can remove some more from the diagonal, since the quadratic form is positive definite rather than just positive semidefinite. Of course, what we actually need is for the characteristic functions of HAPs to do more than merely span the dual space: they have to be “well distributed” in some sense. But perhaps we can do things like picking bases of (characteristic functions of) HAPs and seeing how other HAPs are expanded in terms of those bases. Maybe if we could find a collection of bases that are all “significantly different” in some sense, we would be in good shape.
Another useful thing to do, I think, would be to look at what the annihilators are of certain subsets of HAPs. For example, I presume that the annihilator of the set of all HAPs of odd common difference and even length is precisely the sequence 1,-1,1,-1,… and its multiples. In fact, that’s more or less trivial. Some time ago, Gil and I were speculating that this kind of linear algebra fact should point towards looser facts that replace “adds up to zero” by “is bounded”. I now wonder whether looking at this SDP may be a way of making this thought precise.
March 11, 2010 at 9:09 am
This seems closely related to an earlier line of thought that you were pursuing here . Repeating a few things from that post, say
is the matrix of the quadratic form, and
. You were looking for a matrix
such that
(where
is a diagonal matrix). Since
is infinitesimal, this gives the condition
. At that point, you were looking for upper triangular matrices
and
, but let’s remove this condition. Huy pointed out that there is a direct way to construct a matrix
(not a square matrix) that satisfies
from the known sum of squares representation of the quadratic form corresponding to
. By known representation, I mean

has
columns and the rows of $V$ correspond to the HAPs. The row corresponding to the HAP d,2d,..,kd is
times the incidence vector for the HAP.
To repeat again,
Now we are looking for
with the same dimensions as
such that
. Let
denote the
th column of
(similarly for
). Then this condition says that
for
. Given that the coordinates of
are determined by the membership of
in each of the HAPs, we should be able to figure out some reasonable choices for
.
One possible choice is to set
equal to the component of
orthogonal to the space spanned by
.
for all
.
?
This satisfies the stronger condition that
Can we understand what these orthogonal components are, say in the special case that all
March 11, 2010 at 10:15 am
I am now in a state of confusion because I have an argument that proves something obviously false and I can’t see what’s wrong with the argument. The statement it proves is that for every positive semidefinite form one can subtract a non-zero positive diagonal form and retain positive semidefiniteness. The “proof” goes as follows.
Let the quadratic form be given by the matrix
. Now follow the suggestion of your last paragraph and choose
to be the component of
orthogonal to the space spanned by the other
. Since the
…
OK, that was useful. I was assuming that the
were linearly independent, but of course they won’t be in general — and it is indeed obvious that if they are then we can subtract off something, since in that case the form is positive definite.
So now let’s scale down our ambition and instead of trying to find a contradiction in ZFC we’ll just continue the argument and see what it gives. For each
that is not contained in the linear span of the other
it gives us a space of possible choices for
, and thereby allows us to put any number we like in the
th place of
. This will lead to an improvement to the matrix
unless …
OK, what the argument seems to show is that every
must be in the linear span of the other
, a stronger property than linear dependence, but weaker than the kind of quantitative property one would ultimately need.
March 11, 2010 at 10:56 am |
I want to write a few things down that may well turn out to be equivalent to things that have already been said. But even if that is true, it may be a useful reformulation of the problem.
Let me start by thinking about the following general problem. You have a bunch of vectors
in
and you would like to show that for every
there exists
such that
. How can you do it?
In some cases, it is easy. For example, if the
are just the standard basis
, then we know that
, from which it follows immediately that we can take
. But what if we are presented with some other set of vectors that isn’t necessarily an orthonormal basis?
One method (and this feels very similar to SDP) is to construct something that I remember from my Banach-spaces days: a representation of the identity. That means that we write the
identity matrix
as a positive linear combination
. (If you don’t want to think about tensor products, then regard
as notation for the matrix that takes the value
in the
th place.) Note that
. Therefore, from this representation of the identity we can deduce that
This allows us to take
Funnily enough, I had this idea many years ago as a way of trying to solve EDP, but then noticed that the 1, -1, 0, 1, -1, 0, … example meant that no useful inequality of this kind could be proved if the
were (multiples of) characteristic functions of HAPs. However, what I did not notice is that if we applied a diagonal transformation to
then we could in principle get rid of this irritating obstacle.
Now for the main idea of what I wanted to say. As a matter of fact, it feels not quite the same as SDP, though it may be that what is different is just at the computational level rather than the theoretical level. Instead of aiming for a representation of the identity, let us aim for a representation of a diagonal transformation. That is, we shall try to find a diagonal matrix
that we can write in the form
. We would like the trace of
to be big compared with the sum of the
.
An immediate objection might seem to be that if the
are non-negative and the
are characteristic functions of HAPs, or multiples thereof, then everything in sight is positive and there is no chance of getting rid of off-diagonal terms. However, we are not forced to use characteristic functions of HAPs. For example, suppose I take a vector that’s given by a string of r 1s followed by a string of r -1s followed by a string of r 1s, and so on. Call this vector
. Let
be a
sequence and suppose that
. Then by averaging, there must be a block inside which the sum of the
is at least
. Therefore, if
we may deduce that there is an interval in which the drift of
is at least
, from which it follows that some partial sum is at least
.
The point of using such a vector
is that
would have many negative terms off the diagonal, and therefore would be useful for the purposes of getting rid of the off-diagonal part.
I now need to do some calculations before I’ll be able to formulate precisely what would be required of a representation of a diagonal map
for it to prove EDP. But the basic idea is that each
would have to be normalized so that
gives the same lower bound for the discrepancy, and it would be with respect to those normalizations that one would want to make the coefficients
have as small a sum as possible.
Note that in this formulation it is extremely natural to expect that the diagonal terms will be bigger for numbers with many factors, simply because if
is made out of some HAP with common difference
it will affect the diagonal only at multiples of
.
March 11, 2010 at 12:19 pm
I now have a clearer picture of what is needed. Let me call
a test vector if
implies that the discrepancy of
on some HAP is at least
. Obviously plus or minus the characteristic function of any HAP is a test vector, and so is any convex combination of such functions. It is not hard to prove that these are in fact the only test vectors. So we can prove a lower bound of
for EDP if we can show that for every
sequence
there is a test vector
such that
.
Now suppose that we have a diagonal matrix
with entries
, and suppose that we can represent it as
, where each
is a test vector (that is, a linear combination of characteristic functions of HAPs with the absolute values of the coefficients summing to 1) and the
are positive. Then for every
we have
from which it follows that there exists
such that
is at least
In particular, if
is a
sequence, we find that its discrepancy is at least 
Thus, our task is to find a representation as above with the ratio of the sum of the
to the sum of the
being unbounded.
Now given the
and the
there is a simple expression for the sum of the
: the trace of
is simply
, so the trace of
, which we know equals
, is
. So our task is to produce a diagonal matrix out of tensors
such that the
are built out of characteristic functions of HAPs and have, on average, reasonably large
norms.
In the next comment I’ll show how to represent a diagonal map uneconomically, just to give an idea of what this task entails.
March 11, 2010 at 12:27 pm
All I want to do here is a trivial calculation that just uses HAPs with common difference 1. I know that I can represent the unit vector
as the difference between the interval from 1 to i and the interval from 1 to i-1. Therefore,
is a test vector. But the identity is equal to
. So we’ve managed to get
as our sum of coefficients, while the trace is
. Therefore, we have proved a lower bound of
for the discrepancy. This is, of course, very similar to the trivial lower bound for the SDP dual problem, reinforcing my view that I am not doing anything fundamentally new here (but I still find this an easier way to think about what is going on).
The reason we got a bad bound is that our test vectors had
norms equal to
, whereas what we actually need is that their average norms (where by this I mean a weighted average with weights given by the
) should be unbounded. This is somehow saying that we need to achieve the cancellation of the off-diagonal parts in a “clever” way, with quite a lot all cancelling at once, rather than taking the easy way out and using vectors with small support.
March 11, 2010 at 2:55 pm
I’d like to discuss the problem in probabilistic language for a bit. So let us impose the condition that the
are non-negative (which I asked for before) and sum to 1. What we are then asking for is a probability distribution on the set of all test vectors with the property that if you pick a random test vector
according to this distribution, then the expected value of
will be 0 whenever
, while the expected value of
will be at least
.
Let us consider some initial attempts. Here, instead of trying to produce a diagonal matrix and finding that its trace is too small, I shall try to get a large trace but will not succeed in making the matrix diagonal. (But I hope that I shall make progress in that direction.)
Let
be the class of test vectors that sum to zero and are formed out of two HAPs. An example of such a vector (when
is a multiple of 6) is 2/5 times the characteristic function of
minus 3/5 times the characteristic function of
. This is the sequence
For this particular vector, there is a significant positive correlation between its entries at places that differ by a multiple of 6 (not to mention various other correlations), but there is also a fair amount of negative correlation.
This illustrates a difficulty we are likely to have: if two numbers have plenty of common factors, then test vectors that use HAPs with common differences equal to those factors will have a tendency to correlate at those two numbers. So this positive correlation will have to be compensated for by some negative correlation from elsewhere.
Another point is that the sum of all the correlations will be negative (since the autocorrelations on the diagonal are trivially positive and the sum of all the correlations is zero if the sum of the coordinates of the test vector is zero). Since we are aiming to produce a diagonal matrix with positive entries down the diagonal, we will in fact need to use at least some test vectors with non-zero sums of coordinates, and for the others we will be looking for negative correlations off the diagonal.
How can we minimize the effect of the positive correlations discussed in the previous paragraph but one? Here is a possible approach, though one that is guaranteed in advance not to work because I shall just consider HAPs that go all the way up to
. Let us build test vectors as follows. We first choose some random numbers
between 1 and
, for some carefully chosen
. Let
be the characteristic function of the HAP consisting of all multiples of
. We then take the linear combination
, where the
are randomly chosen signs.
Let’s fix the
and think about the correlations when we choose the random
. In other words, let
be the above test vector and let us consider its distribution when we choose
randomly. If
and
are two integers, then usually they will be divisible by only very few of the
, and unless they have an extremely large common factor they will tend to be divisible by almost disjoint sets of the
. This implies that there will be very little correlation between their values. (Note that the value of
is the sum over all
that divide
of
, so the expectation of
is the number of
such that
.
What we produce by this means is a matrix
that is not diagonal, but it has a sort of quasi-diagonal property:
is much larger when
is large than when
is small. (If it were zero except when
, then it would be diagonal.) One might perhaps hope that by pursuing this kind of idea, one could get rid of almost all the off-diagonal part, being left mainly with
with
large. And then one could perhaps repeat this game but restricting attention to multiples of various products of
, and perhaps one could get all the numbers to work out in the end.
One problem with what I’ve just suggested is that the correlations seem to be positive. I don’t quite understand that, because if we ensure (by means of some coefficients) that their coordinates sum to zero, then we know that the off-diagonal terms are on average negative.
I’m deliberately putting forward only half thought out ideas here. The way I imagine something like this working is that one would try to use clever methods to get the matrix approximately diagonal in some sense, and then crude methods to get rid of the error. The hope would be that if the error was small enough, then the crude methods would not be too expensive (meaning that one wouldn’t use too large a sum of coefficients for too small a gain in the trace).
March 11, 2010 at 6:08 pm
Let me quickly try to explain how the fact that I was considering only “full” HAPs (that is, ones of the form
) causes the approach to fail. (I don’t mean that it completely fails — just that one would have to modify it by using shorter HAPs as well.)
The problem is that if we consider the top right and bottom left quarters of the matrix, we see that we are not managing to deal with them. Let me argue for that rather roughly. We can think of a full HAP as consisting of two identical halves (the roughness of the argument being that they are not in fact identical). Therefore, the matrix we eventually produce will look a bit like a 2-by-2 block matrix with four identical blocks. But this means that whatever we see down the diagonal we also see a sort of shadow of going down the diagonals of the other blocks (the shadow being because the matrices were not actually identical).
As a matter of fact, building test vectors out of small-diameter HAPs may help with the other difficulties as well, by introducing a further element of randomness and therefore more opportunities for cancellation.
March 11, 2010 at 11:50 am |
The first of the larger cases, N=2048, of the original semidefinite program has now finished. I have created a directory where I will post the solutions here, and have a link to it from my data page. http://abel.math.umu.se/~klasm/Data/EDP/
The n=2048 solutions is already there so Moses can now process it with his programs.
March 11, 2010 at 3:47 pm
Now the solutions for N=2500 is in place.
March 11, 2010 at 7:19 pm
Many thanks for these. Could you remind me, and possibly others, how to interpret the data? In particular, what is the meaning of the intial 1 1, 2 1 or 2 3? (I know these appeared on Moses’s data too.) Are the 2 3 rows the b values? (They don’t look like it, because they are very small at a lot of smooth numbers.)
March 11, 2010 at 9:01 pm
Thanks Klas. This should be very useful. Tim, we’ll have these files processed to produce output like the ones we’ve seen for the SDPs before. I’ve been away from my machine and wasn’t able to do this earlier.
March 12, 2010 at 5:57 am
I’ve added files containing the dual values from the solutions Klas generated. The data is here:
http://webspace.princeton.edu/users/moses/EDP/try0/
In the files dual*.txt, lines have the following format
…. d(n) …. factorization(n)
‘b’ …. n ….
‘t’ …. k …. d ….
The files b*-sorted.txt contain the
values in decreasing order.
The files sum-ckd-*.txt contain the $\tau_{2,d} = \sum_{k \geq 2} c_{k,d}$ values.
Klas, can you tell me briefly what you needed to do to get the SDP solver code compiled and running ? Eventually, do you execute it using a command line interface ? I was hoping to get some help to run it on some parallel machines here, but it would help to know what the major steps are.
March 12, 2010 at 9:08 am
Now N=3000 has finished as well. This case used 512 processor cores for about 21 hours.
The running times have increased quite fast so I don’t think I will solve any larger cases until we are done with the data from the three which have already finished. The machine has over 5000 core so going up to 8192 would probably be possible, but that will use up a lot of CPU time and take quite some time to get through the queue.
Moses, the main things to do was o identify the correct linking options for scalapack, lapack, and blas. I also had to remove some of the include statements for the parallel program, the calls were redundant. In order to use scalpack you might have to load some modules into your run time environment, this varies depending on the set up at the cluster you are using.
The final program is executed via a program called mpirun, which, on a cluster with a batch queue, is called from a batch script which you submit to the queue.
March 12, 2010 at 10:49 am
Thanks again Klas. We’re exploring alternate SDP formulations to force the dual solution to be “nicer” in some way, but we haven’t quite settled on a good way to do this yet. When we do, it would be good to harness your cluster again to solve the new SDPs. I’ve added the data from your solution to the 3000 size problem here:
http://webspace.princeton.edu/users/moses/EDP/try0/
March 12, 2010 at 10:15 pm
These data strongly support the hypothesis of logarithmic growth of
. A linear regression of
against
gives:
with a standard error of
.
March 11, 2010 at 11:41 pm |
Let me say a little more about what the probabilistic heuristics (PH) predicts for EDP and for various variations of the problem, and some thoughts about how this heuristics can be used for upper bounds (namely, for probabilistic constructions.) I will give several examples where we can expect that the PH is correct and few where we know it is incorrect and there are sequences with lower discrepency than the prediction. I do not know of cases where the prediction is wrong in the other direction.
1) Conjecture (*): Let
be non zero elements in a field with p elements. If n is sufficiently large w.r.t. p then every number t can be written as a sum of an HAP.
(A weaker interesting version due to Sue: every t can be written as a sum of an interval in an HAP.)
The number of sequences is
. The probability for an HAP to have the value t is roughly 1/p. If these probabilities were independent then the probability for for all sequence to sum to a value other than t is
. This is smaller than
when p is roughly
.
So the prediction says that
(A) if p is much smaller than log n /log log n the assertion of Conjecture (*) holds
(B) If p is much larger than logn /log log n then the assertion of Conjecture (*) fails
The probabilistic heuristics does not take into account that we talk about sequences of non zero entries rather than arbitrary sequences. But for arbitrary sequences, of course, we can consider 1 -1 0 1 -1 0 1 -1 0 … So the prediction is incorrect in this version of the problem.
Overall, part (B) seems more robust then part (A).
2) The original EDP.
It is not a priori clear that the heuristic will give the same estimate for arbitrary sequences and for completely multiplicative sequences but (See Alec’s comment https://gowers.wordpress.com/2010/03/02/edp10-a-new-and-very-promising-approach/#comment-6540 ) this looks to be the case (at least roughly). The computation by Douglas Zare of the probability of a sequence of +- 1 of length n to have partial sums bounded between T and -T may be useful for fine tuning of the PH for the original EDP. (See http://mathoverflow.net/questions/16892/the-probability-for-a-sequence-to-have-small-partial-sums )
Actually I am not sure what the PH gives for the original EDP (But I think it is log n .)
3) Maximum spacing and average spacing between zeroes
Initially I tried to work not with the discrepency but with spacing netween zeroes. So we replace “maximum discrepency” in EDP with “maximum spacing between zeroes”. We can also consider average spacing between zeroes.
It looks that the prediction from PH will be that there is a sequence of +-1s of length n so that the spacing between 0s are below t (or on average below t) if t is larger than
, but not if t is smaller than
.
4) Prime power differences
If we care only about HAP with prime power differences then the PH prediction for the discrepency is roughly is log log n.
5) Prime differences
If we care only about HAP with prime differences then the PH prediction is also log log n. We know that in this case there are sequences of bounded discrepency so the heuristics is incorrect.
6) Squere free numbers
We can ask EDP when we forget about integers which are not squere free. The PH will give the same prediction for the discepency as for the original problem. For this problem we do not know any examples with discrepency in the log n areas.
7) Very restricted periods.
Suppose that n = m! and you consider HAP only with period 1,2,…,m then the PH threshold for the discrepency is log m. This is also for the analog of conjecture (*) we start with. I mention it because somehow this looks like a case where the polynomial method or linear algebra methods might be more feasible.
Here are two general problems:
Problem 1 : Are there variants of the problem for which the PH predictions are too low?
Problem 2: Can we identify (also heuristically) variants of the problems where the PH prediction should be correct.
Related probabilistic constructions
I believe that the heuristic is more robust in the upper bound direction. While we saw a few cases when the PH gives an answer too large, I am not aware of examples where the where the heuristics gives answers too small. I have some ideas (and so did Alec) about how to try to use this heuristic for constructions of sequences with low (log-ish) discrapancy (also for the variations where it is not known). Probably I should think more about it and write separately. I suspect that if you have an algorithm tuned towards unrealistic targets (e.g. greedy algorithm for minimizing discrapancy) then you get a random behavior because your optimization is not really relevant. The hope id to succeed with a probabilistic process tuned towards a realistic goal.
March 12, 2010 at 10:05 am |
A few more words about the polynomial approach.
We look at the polynomial
and we want to show that it is identically 0 over a field with p elements when p is fixed and n is large.
Lets look at one of the term
When we expand it to monomials the coefficients have a combinatorial meaning in terms of permutations and inversions.
Given a permutation
we can ask how many inversions does the number i contributes to? This is a number between 1 and i.
The coefficient of
is the number of permutations where there are
integers contributing j inversions. Lets refer to
as the inversion pattern. (I am not sure what is the standard term in the theory of permutation statistics.)
When we look at our original polynomial and expand it to monomials we have contributions which correspond to various permutations on [n/d] elements for all ds (or all the periods d we want to consider).
So if we have a good understanding of the number of permutations with a given inversion pattern modulo p (say modulo 7) this can be helpful for us.
March 12, 2010 at 10:25 am |
Klas, did your search for long discrepancy-2 sequences eventually finish, or is it still running, or did you decide to discontinue it?
March 12, 2010 at 11:44 am
It was running until a few days ago when I needed that machine for something else and paused it.
Actually you might be able to finish the search using one of your search programs. There are five partial assignments with the property that if none of them can be extended to n=2016 then there are no discrepancy 2 sequences with n=2016.
The partial assignment are here http://abel.math.umu.se/~klasm/Data/EDP/stubs-2016
Note that these are not necessarily assignments to consecutive integers, and that they are not multiplicative.
March 13, 2010 at 8:12 am
OK, thanks. But 2016 sounds rather ambitious. Is it the case that any discrepancy-2 sequence of length 1125 must agree with one of these stubs?
March 13, 2010 at 9:01 am
No, some of the values in the stubs probably depend on the behaviour at integers larger than 1125.
March 12, 2010 at 10:36 am |
I want to pursue the line of thought begun in this comment. I am still not sure whether it is a genuinely different approach from SDP or whether it can be seen to be equivalent, but at least I am now clearer in my mind about exactly what that question amounts to. At the moment, I am very tentatively of the opinion that it is a different idea, but my mind could in principle be changed: all one would have to do is give a method for converting a representation-of-diagonal proof into a dual-SDP proof and vice versa. Also, I need to think about what strengthening of EDP the representation-of-diagonal proof is equivalent to.
The general problem is to construct a quadratic form that has two properties: (i) it is larger than some diagonal form; (ii) if it is large for a sequence
, then that sequence has large discrepancy.
The SDP approach to this task is as follows. You build your quadratic form as a convex combination of terms of the form
, and then you show that it can be rewritten as
, where
is positive semidefinite and
is unbounded.
The representation-of-diagonal approach is to build the quadratic form as a convex combination of linear forms determined by what I called test vectors above, which are themselves convex combinations of plus or minus characteristic functions of HAPs. The main property of a test vector
is that if
then the discrepancy of
is at least
on at least one of the HAPs you used to build
. We then try to choose test vectors
and non-negative coefficients
that sum to 1, such that the quadratic form
is diagonal and has diagonal entries with unbounded sum. The matrix of this quadratic form is
and the sum of the diagonal entries is equal to
.
Thus, in one case one tries to write a large diagonal form as
, where
is the form that cannot be large unless the discrepancy is large somewhere, and
is positive semidefinite. In the other case, one tries to write a large diagonal form as a convex combination of “test forms” that have negative as well as positive coefficients, and it is the negative coefficients that lead to the cancellation of the off-diagonal terms.
At the moment, that feels to me like two completely different things to be trying to do. But perhaps one is dual to the other or something like that. I’ll think about it right now.
March 12, 2010 at 11:20 am
Tim, I haven’t thought through this carefully, but it seems to me that your representation-of-diagonal approach is a spectral decomposition of the matrix corresponding to the positive semidefinite form
(where
is defined in your comment above). Let
be the matrix corresponding to
, i.e.
. Then
where
are the eigenvalues of
and
are the corresponding eigenvectors. This looks a lot like your decomposition.
I’m not quite sure why the sum of eigenvalues should be 1 (it should be related to the fact that
), and I’m not sure why the eigenvectors
should be convex combinations of plus or minus characteristic functions of HAPs.
March 12, 2010 at 11:34 am
Hmm … what I said doesn’t make sense. You are actually representing a diagonal matrix
, not the matrix corresponding to
.
March 12, 2010 at 11:42 am
Also, I don’t mind using many more than
vectors in the representation.
March 12, 2010 at 11:49 am |
I’m still trying to understand your approach. Is it true that the lower bound works not only when
, but also when
is positive semidefinite ?
March 12, 2010 at 12:28 pm
I think so, so let me attempt to prove it rigorously, just to be absolutely sure.
The
have the property that if
then the discrepancy of
on some HAP is at least
. So if
is positive semidefinite, then we can apply it to a
sequence (that is, multiply on the left by
and on the right by
) and we will find that
is at least as big as the trace of
(by positive semidefiniteness of the difference). It follows that the discrepancy of the sequence is at least the square root of this trace, as usual.
Put like this, it makes it look as though all the approach is doing is replacing characteristic functions of HAPs with convex combinations of plus or minus these characteristic functions. But this extra flexibility raises the possibility that one might be able to prove semidefiniteness of the difference by showing that the difference is actually zero. I’m still not sure whether that is attempting to prove something far too strong.
March 12, 2010 at 7:31 pm
So if you do allow the positive semidefinite generalization, this seems to give a more general lower bound than the SDP dual. Having thought about it briefly, I don’t see a way to get a semidefinite program to generate the best such bound. The problem seems to be dealing with the coefficients
(a test vector is of the form
where
is the characteristic vector for a HAP, and
.)
On the other hand, if you insist that you get a diagonal matrix exactly, my guess is that the best such bound is incomparable with the SDP dual. Allowing the positive semidefinite generalization may allow you to handle situations where you have almost got the non-diagonal entries zero, but not quite.
March 12, 2010 at 12:11 pm |
The representation-of-diagonal approach involves expressing a diagonal matrix with certain properties as a convex combination of matrices with certain other properties. Therefore, if it is not possible, the Hahn-Banach theorem tells us that we can find some kind of separating functional that demonstrates this. (Other people have different ways of expressing this point, but this is the language I feel most comfortable with.) One would expect the nonexistence of such a functional to be saying that some kind of discrepancy statement holds. I’ve tried to work out what the statement is in this case, but haven’t managed to work it into anything neat. (What I’d like is for it to be a natural and interesting strengthening of EDP.)
Anyhow, we would like to prove that there is a diagonal matrix
with entries that sum to
that can be written as a convex combination of matrices
where
is a test vector. If it can’t then there is a matrix
such that
for every diagonal matrix
with entries that sum to
and
for every test vector
. The first condition tells us that
is constant on the diagonal and that that constant is greater than
. The second condition is saying that
for every test vector
.
So the failure of the representation-of-diagonal approach is equivalent to saying that for any matrix
that equals 1 down the diagonal there exists a test vector
such that
. That is, we can find characteristic functions
of HAPs and coefficients
with
such that
.
That looks vaguely like a discrepancy statement, but I’m still puzzled by it. We can replace
by
, so we may assume that it is symmetric. Therefore, we can express
in the form
. Since
, we can in this way introduce discrepancies of functions into the picture, but I haven’t quite pursued that thought and reached a nice discrepancy statement.
March 13, 2010 at 12:47 am
I think I’ve sorted it out now, and it seems that the approach using a representation of a diagonal matrix implies the Hilbert space version of EDP. Indeed, if we can’t represent a diagonal matrix in the way we want, then, as explained in the comment just above, we can find a symmetric matrix
with 1s down the diagonal such that
for every test vector
. Moreover, that is an equivalence. So if this method of proof works, then for every symmetric matrix
with 1s down the diagonal there exists a test vector
such that
. Now
is a convex combination of plus or minus characteristic functions of HAPs. Therefore, by averaging we can find HAPs
and
such that
. Now since
is symmetric and has 1s down the diagonal, we can find vectors
such that
. (Actually, I’m not sure I see this — I surely need
to be positive definite. But I’m in a hurry so I’ll leave this argument as it is for now.)
But what I can definitely say is that if
is of that form then we can find
and
. That tells us that
, which by Cauchy-Schwarz tells us that the sum of the
in either
or
has norm at least
. So the success of the method implies the vector-valued EDP, but it may prove something a bit more general still. I’ll have to think further.
March 13, 2010 at 8:23 am
To aid my own understanding, let me compare the conditions required for the SDP argument to work and those required for the representation of a diagonal matrix to work:
In order to prove a lower bound of
on discrepancy via the SDP argument, we need the following property: for any positive semidefinite matrix
with 1’s on the diagonal, there must be a HAP with corresponding characteristic vector
such that
. This is equivalent to saying that in the vector version of EDP (where elements of the sequence are unit vectors instead of
), the square discrepancy of some HAP is at least
. This is by the argument in Tim’s comment above. Since
is psd, we can find vectors
such that
. Then
.
In order to prove a lower bound of
on discrepancy via the representation of a diagonal matrix argument, we need the following property: for any matrix
with 1’s on the diagonal, there must be a test vector
such that
. I don’t think you can assume that
is positive semidefinite, and if so, I don’t have a good interpretation for what this condition is saying.
On the other hand, consider the modification of the representation-of-diagonal argument to allow
to be positive semidefinite (earlier we insisted that this was 0). For a proof of this kind to succeed, I think you need the following: for any psd matrix
with 1’s on the diagonal, there must be a test vector
such that
. If this is the case, as Tim points out, there exist
such that
and the existence of the test vector
implies that
for some characteristic vector
of a HAP. But then,
. Does this mean that instead of test vectors
, we could have used characteristic vectors of HAPs instead ? If this is true, it seems that this proof technique is equivalent in power to the SDP dual argument.
March 13, 2010 at 11:52 am
Here is an interpretation of the representation-of-diagonals approach, conditional on a linear algebra statement that seems plausible to me but that I have not checked. That statement is that if
is any
matrix then one can find vectors
and
such that
for every
. This says I can factor
as
, which starts to look disappointingly trivial as I can take
to be any invertible matrix and then take
to be
.
Never mind the fact that it is trivial — let’s see what it implies. As above, we know that the representation-of-diagonal approach fails if and only if there exists some matrix
with 1s down the diagonal such that
for every test vector
. Therefore it succeeds if and only if for every matrix
with 1s down the diagonal there is a test vector
such that
. As I noted above, that implies the existence of HAPs
and
such that
. (Actually, I forgot the modulus sign earlier, but once it’s there the conclusion is valid.) Now if
, then this tells us that
.
Thus, if the method succeeds, then we get the following apparently stronger version of the vector-valued EDP: let
and
be any two sequences of vectors such that
for every
. Then there exist HAPs
and
such that
for some
that tends to infinity with
. I haven’t thought about whether this statement is likely to be true. If we insist that
for each
then we recover the usual vector-valued EDP. Note also that the statement is potentially interesting even for real numbers: if we have two sequences
and
such that
for every
, does it follow that we can find HAPs
and
such that the product of
and
is unbounded in size?
Is this non-symmetric vector-valued version equivalent to the representation-of-diagonals statement? I was expecting to have something to check there, but now that I look back I think it’s trivial, since characteristic functions of HAPs are test vectors.
March 13, 2010 at 12:00 pm
If it turns out to be correct that the representation-of-diagonals statement is equivalent to the non-symmetric vector-valued EDP, and if we can’t find a counterexample to the latter statement (and a very brief think about it in the real-valued case makes me think it has a good chance of being true), then I would like to argue that the representation-of-diagonals approach (rather than its weakening where you allow the difference to be positive semidefinite) is a good thing to think about. I have various reasons for this.
1. Proving that numbers are zero is easier than proving that matrices are positive semidefinite.
2. If we try to represent a diagonal matrix, we don’t have to decide in advance what that matrix should be, so we save ourselves from having to make some huge guess. All we care about is that on average the
norms of the test vectors we use should be unbounded.
3. Trying to prove a stronger statement is often easier than trying to prove a weaker one, because one’s moves are somehow more forced. (There are counterexamples to this — consider van der Waerden’s theorem and Szemerédi’s theorem, for instance — so there is no guarantee that this principle applies in our case, but I have a hunch that it does.)
March 13, 2010 at 4:02 pm
Very interesting question and I agree that getting a diagonal matrix seems easier to do than proving that some matrix is positive semidefinite. Tim, it looks like all you need for your proof to work is to be able to express a diagonal matrix as a convex combination of matrices of the form
and
where
are characteristic vectors of HAPs. If this is correct, this is easier to think about than taking convex combinations of test vectors. Also, the best such convex combination (that maximizes the sum of diagonal entries for an
matrix can be found by linear programming. The problem involves roughly
variables (corresponding to all pairs of HAPs) and roughly
constraints (corresponding to the entries of the
matrix).
March 13, 2010 at 5:09 pm
Ah yes — I don’t know why I was blind to this simpler formulation, which will make it easier to think about. And of course, I’d be very interested to see what the best decomposition is for various values of
. I think even a moderate-sized
such as 250 could be quite informative.
Since the number of comments on this post is approaching 100, I am in the process of writing another post. I am using this opportunity to try to formulate this representation-of-diagonals approach as clearly as possible, and I will also mention there a few thoughts I have about how one might actually go about finding a good representation. But these thoughts could well need to be refined a great deal in the face of the experimental evidence. (Another possibility is that the optimal solution involves an apparently magical cancellation that leaves one none the wiser about how to approach the problem theoretically, but I hope it will give some useful clues.)
March 14, 2010 at 9:04 am
I have some initial LP solutions for this problem. The optimum values for small values of
are as follows (an optimum value of p/q indicates that the convex combination involves multiples of 1/q):
give the value of the
diagonal entry of the matrix (off-diagonal entries are zero) that the LP expresses as a convex combination of
. Here
is the characteristic vector of the HAP with common difference
and length
.
give the coefficient of
. I impose the condition (w.l.o.g.) that the coefficient of
is equal to the coefficient of
and output only one of them. Zero coefficients are not output and there are many of them (less than 2% of the coefficients for the
instance are non-zero).
6: 4/13
10: 25/81
12: 7/20
16: 40/110
18: 20/50
50: 0.422825
The solutions produced by the LP solver are here:
http://webspace.princeton.edu/users/moses/EDP/try-lp/
(I am not sure that the optimum solutions are unique).
The values
The lines of the form
The solver is currently working on
. It’s taking up about 3G of memory for this instance, so it is unlikely that I can solve bigger sizes, but we should be able to get solutions for larger
by running an LP solver on a parallel cluster.
March 14, 2010 at 2:34 pm
Moses, I’m a little confused by these numbers. Looking at the data for
, I see that the
diagonal entry is nonzero. But the only contributions to this are from matrices of the form
; the only such terms here are
and
(and their transposes, which you’ve constrained to have the same coefficients); and these two terms have equal and opposite coefficients, so should cancel out. What have I missed?
March 14, 2010 at 2:37 pm
Ah — I think I see my mistake. The
term doesn’t get added to its transpose!
March 15, 2010 at 8:13 am
Yes, I thought this might be a source of confusion. Perhaps I should have printed out all pairs of HAPs that are used in the convex combination explicitly.
The solution for 100 is finally done – the LP solver took almost 23 hours and more than 170,000 iterations to solve the problem. I’m not going to be able to solve any larger problems on my machine unless we restrict the problem somehow.
The data for 100 is here:
http://webspace.princeton.edu/users/moses/EDP/try-lp/
March 15, 2010 at 9:37 am
Moses, I have two suggestions.
1. Could you also try some additional small sizes, say n=30, 40, 60?
2. There are some values which are quite small for m=50,100, and fairly small for several other values of n, e.g. d7 and d17. Could these values actually be set to exactly 0 and still give an optimal solution for all n?
Once some analysis of the solutions at hand has been done I’d be happy to run some larger cases as well.
March 15, 2010 at 10:10 am
Klas, I have put up solutions for 30, 40 and 60. I presume the extremely small values can be set to zero and still yield an optimal solution, but I am not sure about this. I don’t know of a good way to force the LP solver to not generate such small values. I decreased the error tolerance of the solver from 1e-6 to 1e-8. That may help. Other than that, one would have to examine the solution and resolve the problem with new constraints setting extremely small values to 0 explicitly. It’s possible to automate this process, but it will require some work.
BTW, there are some small values in the solutions with a double negative sign –. They should be positive. In fact here the solver returned a very small negative value for a variable that was supposed to be non-negative, so this is a tolerance issue.
March 15, 2010 at 11:04 am
I have a small presentational suggestion, which is that if the numbers are in fact numbers with small denominators, then it might be nice to present them as such rather than as decimals.
I also like the idea of presenting both
and
. I find that when trying to understand what is going on in a particular representation it is slightly inconvenient to have only half of the off-diagonal coefficients displayed.
March 15, 2010 at 11:23 am
Moses, you might want to try GNUs lp-solver from the glpk-package. That solver has an option which makes it solve a linear program using exact arithmetic. This slows the solver down a lot but it might give some interesting information. for small n at least.
March 15, 2010 at 11:46 am
We should continue this thread in EDP12 instead of here.
March 12, 2010 at 3:14 pm |
(Continuing the linear algebra approach that Tim mentioned.)
Here is the simplest linear-algebraic question coming from EDP that I can think about: Let A_1, A_2,…,A_m be a partition of {1,2,…,n} into intervals of length at most t.
Consider the system of linear equations
, where we have an equation for every r and j such that
is contained in {1,2,…,n}.
Then we conjecture that this syetem of equations implies for a large n that one of the
s equals to zero.
It is simple but sort of a strange question. (I suppose we also conjecture that in order to have any non zero solution the A_is should be very “structured” but we dont know what “structured” precisely means. It should be related to the lengths of the intervals being periodic)
An equivalent way to state it that takes us closer to SDP is that if we add to the equations the inequalities
then the system is never feasible.
Maybe looking at the dual problem can add some insight?
The fact that we have equations for every way we try to partition [n] to short intervals makes it seemingly more difficult than the direct SDP approach.
March 13, 2010 at 10:03 pm |
I have been looking at one problem that came up a while back that is the following: There are two players one is trying to force a large discrepancy the other is trying to stop it. The players pick primes and assign signs to them and the discrepancy is of the multiplicative function which is determined by the signs of the primes.I have an idea for showing a that the large discrepancy can be forced in some cases.
Let me describe the winning strategy if the idea works.
The player trying to force the discrepancy chooses
the lowest available prime and sets it positive.
Let me look at a specific case and show how the discrepancy can be forced. I believe that this idea can be extended to further cases.
The case is that the player who is trying to force the discrepancy moves
first and chooses 1/2 then for some reason the other player does not choose 1/3 so the first player chooses that and both primes are set to 1.
The first step of the proof is to group the prime
factors in pairs. Because of the way signs are assigned
they can be arranged in pairs. The first of these pairs will
be two and something. Now we can show that the
constributions made by all numbers having factors nontrivially
in a in any of the later pairs can be ignored. We note
that for any pair p,q and multiple rx, with x having factors only
in p and q we can order p pr qr p^2r etc in a way that the series
is alternating and starting with a positive number this will result
in a positive or zero discrepancy to the total discrepancy so if
the factors having factors in the first pair contributes k
to the discrepancy then we will be done if we can make k arbitrarily
large. So the idea is to ignore everything except the first pair. Now we can make the first pair consist of 2 and 7. And then if we look at these we can see the discrepancy will be high as we can pair powers of consecutive powers of 7 which will have different signs and which will increase the discrepancy. Anyway that is roughly the idea that I am looking at now.
March 13, 2010 at 10:14 pm |
If you are looking at a multiplicative function then instead of looking at the the quadratic form
\displaystyle \sum_{k,d}c_{k,d}|x_d+x_{2d}+\dots+x_{kd}|^2-\sum_ib_ix_i^2 look
at the following
\displaystyle \sum_{k,d}c_{k,d}|x_1+x_{2}+\dots+x_{k}|^2-\sum_ib_ix_i^2
the idea is to try to use the information that the function is multiplicative to get a simplification as it looks like the number of different types of sums that are squared depends only on the number of terms and that might be useful.
March 13, 2010 at 11:17 pm |
Apropos nothing much, here is a small observation: the notion of multiplicativity can be generalized to the vector-valued setting even when there is no given algebra structure on the vectors. In the
case we can characterize multiplicativity by saying that the angle between
and
(which is always either 0 or
) depends only on
. This way of phrasing the definition makes just as good sense for unit vectors in a Euclidean space. This may be relevant to Kristal’s proposal to restrict the SDP approach to multiplicative functions.
Another way of thinking about the multiplicativity condition is to say that the map
is an isometric embedding for every
.
March 14, 2010 at 9:23 am
Yes, at some point initially, I had considered imposing these multiplicative constraints in the SDP. With these constraints in place, the discrepancy of any HAP is equal to the discrepancy of the 1-HAP of the same length. Eventually, I did not pursue this because I felt the large number of constraints would make it difficult for SDP solvers to solve the problem for large
. It is possible that the SDP solutions are “nicer” with these constraints. Another interesting thing to check would be whether the optimal SDP solutions (without multiplicativity constraints) are in fact “multiplicative”, i.e.
.