EDP9 — a change of focus

The discussion in the last thread has noticeably moved on to new topics. In particular, multiplicative functions have been much less in the spotlight. Some progress has been made on the question of whether the Fourier transform of a sequence of bounded discrepancy must be very large somewhere, though the question is far from answered, and it is not even clear that the answer is yes. (One might suggest that the answer is trivially yes if EDP is true, but that is to misunderstand the question. An advantage of this question is that there could in theory be a positive answer not just for \pm 1-valued functions but also for [-1,1]-valued functions with L_2 norm at least c>0, say.)

Another question that has been investigated, mostly by Sune, is the question about what happens if one takes another structure (consisting of “pseudointegers”) for which EDP makes sense. The motivation for this is either to find a more general statement that seems to be true or to find a more general statement that seems to be false. In the first case, one would see that certain features of \mathbb{N} were not crucial to the problem, which would decrease the size of the “proof space” in which one was searching (since now one would try to find proofs that did not use these incidental features of \mathbb{N}). In the second case, one would see that certain features of \mathbb{N} were crucial to the problem (since without them the answer would be negative), which would again decrease the size of the proof space. Perhaps the least satisfactory outcome of these investigations would be an example of a system that was very similar to \mathbb{N} where it was possible to prove EDP. For example, perhaps one could find a system of real numbers X that was closed under multiplication and had a counting function very similar to that of \mathbb{N}, but that was very far from closed under addition. That might mean that there were no troublesome additive examples, and one might even be able to prove a more general result (that applied, e.g., to [-1,1]-valued functions). This would be interesting, but the proof, if it worked, would be succeeding by getting rid of the difficulties rather than dealing with them. However, even this would have some bearing on EDP itself, I think, as it would be a strong indication that it was indeed necessary to prove EDP by showing that counterexamples had to have certain properties (such as additive periodicity) and then pressing on from there to a contradiction.

A question I have become interested in is understanding the behaviour of the quadratic form with matrix A_{xy}=\frac{(x,y)}{x+y}. The derivation of this matrix (as something to be interested in in connection with EDP) starts with this comment and is completed in this comment. I wondered what the positive eigenvector would look like, and Ian Martin obliged with some very nice plots of it. Here is a link to where these plots start. It seems to be a function with a number-theoretic formula (that is, with a value at n that strongly depends on the prime factorization of n — as one would of course expect), but we have not yet managed to guess what that formula is.

I now want to try to understand this quadratic form in Fourier space. That is, for any pair of real numbers (\alpha,\beta)\in [0,1)^2 I want to calculate K(\alpha,\beta)=\sum_{x,y}A_{x,y}e(\alpha x-\beta y), and I would then like to try to understand the shape of the kernel K. Now looking back at this comment, one can see that

\displaystyle \langle f,Af\rangle=\sum_d w_d\int_0^\infty|\sum_{n=1}^\infty e^{-\theta n}f(dn)|^2 d\theta.

Since the bilinear form K(\alpha,\beta) is determined by the quadratic form K(\alpha,\alpha) I’ll concentrate on the latter (which in any case is what interests me). So substituting f(n)=e(\alpha n) into the above formula gives me

\displaystyle K(\alpha,\alpha)=\sum_d w_d\int_0^\infty|\sum_{n=1}^\infty e^{2\pi i\alpha dn-\theta n}|^2 d\theta.

The infinite sum is a geometric progression, so this simplifies to

\displaystyle K(\alpha,\alpha)=\sum_d w_d\int_0^\infty\Bigl|\frac{e^{2\pi i\alpha d-\theta}}{1-e^{2\pi i\alpha d-\theta}}\Bigr|^2 d\theta.

Note that for each d the integrand is bounded unless \alpha is a multiple of 1/d, and more generally is small unless \alpha is close to a multiple of 1/d and \theta is close to 0. So we do at least have the condition of being close to a rational with small denominator making an appearance here. (Why small denominator? Because then there will be more d such that \alpha is a multiple of 1/d.)

I plan to investigate the sequence 1, -1, 0, 1, -1, 0, … from this perspective. It takes the value \frac 2{\sqrt{3}}\sin(2\pi n/3) at n. I shall attempt to understand from the Fourier side why this gives a sequence with such small discrepancy.

Before I finish this post, let me also mention a nice question of Alec’s, or perhaps it is better to call it a class of questions. It’s a little bit like the “entropy” question that I asked about EDP, but it’s about multiplicative functions. The question is this: you play a game with an adversary in which you take turns assigning \pm 1 values to primes. You want the resulting completely multiplicative function to have as small discrepancy as you can, whereas your adversary wants the discrepancy (that is, growth of partial sums) to be large. How well can you do? One can ask many variants, such as what happens if your adversary is forced to choose certain primes (for instance, every other prime), or if your adversary’s choices are revealed to you in advance (so now the question is what you can do if you are trying to make a low-discrepancy function but someone else has filled in half the values already and done so as badly as possible), or if you choose your values randomly, etc. etc. So far there don’t seem to be any concrete results, and yet it feels as though it ought to be possible to prove at least something non-trivial here.

One other question I’d like to highlight before I finish this post. It seems that we do not know whether EDP is true even if you insist that the HAPs have common differences that are either primes or powers of 2. The powers of 2 rule out all periodic sequences, but for a strange parity reason: for instance, if you have a sequence that’s periodic with period 72, then along the HAP with common difference 8 it is periodic with period 9, which means that the sum along each block of 9 is non-zero (because it is an odd number) and therefore the sums along that HAP grow linearly. Sune points out that the sequence f(n)=\exp(2\pi i n/6) is a simple counterexample over \mathbb{T}, but it’s not clear what message we can take from that, given that periodic sequences don’t work in the \pm 1 case. I like this question, because finding a counterexample should be easier if there is one, and if there isn’t, then proving the stronger theorem might be easier because HAPs with prime common differences are “independent” in a nice way.

Update: I intended, but forgot, to mention also some interesting ideas put forward by Gil in this comment. He has a proposal for trying to use probabilistic methods, and in particular methods that are suited to proving that rare events have non-zero probability when there is sufficient independence around, to show that there are many sequences with slow-growing discrepancy. It is not clear at this stage whether such an argument can be made to work, but it seems very likely that thinking about the problems that arise when one attempts to make it work will be fruitful.

108 Responses to “EDP9 — a change of focus”

  1. Sune Kristian Jakobsen Says:

    I just realized that we have some very integer-like pseudointegers where the EDP is false, namely the set of integers not divisible by p for some prime p (the sequence is the Legendre character). I know that this is nothing new, we already knew that these examples told us something about what a proof of EDP could look like, but I didn’t think of them as pseudointeger (actually I did once but the two parts of my brain that knew that “the set of integers not divisible by p can have bounded discrepancy” and “the set of integers not divisible by p is a set of pseudointeger” where too far from each other).

    I think these examples have discrepancy about at most \sqrt(p) (if I remember correctly) and the density of them is d=1-1/p. The next question is: Can we find pseudointegers with density 1 and bounded discrepancy? Or a weaker question: Can we find sets of pseudointeger with density arbitrary close to 1 and discrepancy <C for some fixed C?

    • Sune Kristian Jakobsen Says:

      I think the following gives a set of pseudointegers with density 1 and a multiplicative sequence with discrepancy 2:
      Start with the set of integers not divisible by 3. This have discrepancy 1 (I’m using the 1,-1,0,1,-1,0,… sequence) and density 2/3, so we need to increase the density. In order to do so, we add two primes a and b such that a< b but b-a is infinitesimal (so they are not a subset of R. More about this later. I think of a as a real number). We define x_a=1 and x_b=-1 and let multiplicativity define the rest. So for any n\in \mathbb{N} not divisible by 3 the numbers na and nb are close and x_{na} and x_{nb} have opposite signs, so these doesn’t increase the discrepancy to more than 2. The only problem is at powers of a: Around a^2 we have the numbers a^2,ab, b^2 with x_{a^2}=1,x_{ab}=-1,x_{b^2}=1 so we “create” a new prime c> b^2 such that c-a^2 is infinitesimal, but infinitely larger than b-a. Now everything is fine for a^3. Here x_{a^3}=1,x_{a^2b}=-1,x_{ab^2}=1,x_{b^3}=-1,x_{ac}=-1,x_{bc}=1, but at a^4 we get intro trouble again, so we need to add a new prime d> c^2. Continuing this way, we get a set of pseudointegers with bounded discrepancy.

      Lets calculate the density. To begin with we have the set of integers not divisible by 3. This has density 2/3. The set of numbers an or bn, where n is a integer not divisible by 3, has density \frac{2}{3}\frac{2}{a}. The set of numbers on the form aan,abn,bbn or cn has density \frac{2}{3}\frac{4}{a^2}, and so on. Now the whole set has density

      d=\frac{2}{3}\left(1+\frac{2}{a}+\frac{4}{a^2}+\frac{6}{a^3}+\frac{10}{a^4}+\dots\right)

      Where the nominator is A000123 (number of partitions of 2n into powers of 2). Since the sequence of nominators grows slower than some geometric sequence (e.g. 2^{2n-1}=4^n/2 using that it is less than the number of ordered partitions of 2n), we can choose a so that the density is 1.

      This is of course not a subset of the real numbers as I used infinitesimal, but if we just choose very small numbers instead, think we can correct the errors when they arise. I think this shows that a proof of EDP must somehow use the additive structure of the integers.

    • Sune Kristian Jakobsen Says:

      @Tim:
      “Perhaps the least satisfactory outcome of these investigations would be an example of a system that was very similar to \mathbb{N} where it was possible to prove EDP.”
      I’m not sure I understand you correctly. Do you mean that this wouldn’t be interesting or just that we should at least be able to get this result? I think that finding a set of pseudointegers where we can prove EDP is the only interesting pseudointeger-question left (when I say interesting I mean with relevances to EDP). Perhaps I have forgot some questions.

    • gowers Says:

      It was a slightly peculiar thing for me to say, because I changed my mind as I was writing it. Initially I thought that finding such a proof wouldn’t shed any light on EDP itself, but by the end of the comment I had started to realize that it could do after all.

  2. Jason Dyer Says:

    Going back to my graph-theoretic construction, I wanted to include a meaning for “completely multiplicative”.

    This likely could be more elegant. I apologize for the mess.

    Define everything as the comment, including the 4 extra conditions (it might be possible to do without some, but I haven’t had time to think about it). I will also call the set of edges from condition #1 the omega set, and the root node of that set to be the alpha node.

    A node is called prime if the in-degree is 1; that is, the only edge incoming is from the omega set.

    A node is a power if the in-degree is 2.

    Consider all incoming edges to a particular node; trace the edges backwards to their root nodes. These are the divisors of the node.

    Here is a prime factorization algorithm for the node n:

    1. List the k divisors of n (excluding the alpha node) d_1, d_2, ... d_k. Call the path of a divisor to be the traversal going from the divisor to n. Given any divisor a that has divisor b in its path, remove a from the list.

    2. For any divisor that is not prime and is not a power on the list d_1, d_2, ... d_k, connect the divisors of each of d_1, d_2, ... d_k as a branching tree (again excluding the alpha node); again, given any divisor a that has divisor b in its path, remove a from the list.

    For any divisor that is a power on the list d_1, d_2, ... d_k, connect the divisor that is not the alpha node c a multiple number of times m+1, where m is the number of times powers occur in the traversal between the divisor of c and c.

    3. Repeat #2 until all divisors on the bottom of the tree are prime.

    4. The set of divisors at the bottom of the tree is the prime factorization of n.

    So, a graph is completely multiplicative if that multiplying the values of all the nodes in the prime factorization of a node n (including repeated nodes as ncessary) gives the value of the node n.

    • Jason Dyer Says:

      There’s a bug in the algorithm: if n is already a power, it should jump to the “for any divisor that is a power on the list” process.

      Also, an extra condition #5 should be added if we want multiplication to be a function: each node can be the root node of only one labelled set of directed edges.

  3. Klas Markström Says:

    Regarding Gill’s probabilistic approach https://gowers.wordpress.com/2010/02/19/edp8-what-next/#comment-6278
    it might be a good idea to consider separate “bad” events for “sum is less than -C” and “sum is greater than C”

    The “less than “-events along a given AP are clearly pairwise positively correlated, and likewise for the “greater than”-events. However the “less than”-events and “greater than”-events along an AP are negatively correlated.

    To me mixing the two types of failure for the discrepancy bound seems to make things harder to keep track of. I guess I’m going more in the direction of the Janson inequalites than the local lemma here.

    • Gil Says:

      I think what I have in mind is to choose the locations of the zeros for the partial sum. We need to be able to show (1) that we can locate the zeroes of the partial sums where we want them, and then that (2) conditioned on such locations that the probabilities for “sum is greater than C” or “sum is smaller than -C” are very small.

      For part (2) the probabilities may perhaps be small enough that we can use union bounds and not worry about dependenies.

      For the location of zeroes part (1), we certainly need to be able to handle dependencies. And it seems that if we want to go below \sqrt{\log n}-discrepancy we will need to exploit positive dependencies (for the events of vanishing partial sums along intervals.) Indeed Janson’s inequalities may be quite relevant.

  4. gowers Says:

    A couple of observations that will I hope make the calculations in the post make a bit more sense.

    First, note that \int_0^\infty\frac{e^{-2\theta}}{(1-e^{-\theta})^2}d\theta equals \int_0^\infty \frac d{d\theta}\Bigl(\frac 1{1-e^{-\theta}}\Bigr)d\theta, which is infinite. Therefore, K(\alpha,\alpha) blows up when \alpha is equal to a rational number.

    Next, note that the bilinear form derived from K(\alpha,\alpha) is given by the formula

    \displaystyle K(\alpha,\beta)=\sum_dw_d\int_0^\infty \frac{e^{2\pi i\alpha-\theta}}{1-e^{2\pi i\alpha-\theta}}\frac{e^{-2\pi i\beta-\theta}}{1-e^{-2\pi i\beta-\theta}}d\theta.

    Now let’s continue to write e(x) for e^{2\pi i x} and let us also write c(x)=\cos(2\pi x)=(e(x)+e(-x))/2 and s(x)=\sin(2\pi x)=(e(x)-e(-x))/2i. Our basic example of a bounded-discrepancy sequence is (up to a constant multiple) the sequence s(n/3), which is begging to be understood from a Fourier perspective. Let us call this function s_3 and let e_3(n)=e(n/3). Now \langle s_3,As_3\rangle=\langle e_3-\overline{e_3},e_3-\overline{e_3}\rangle/4, which equals

    \displaystyle (1/4)(K(1/3,1/3)+K(-1/3,-1/3)+K(1/3,-1/3)+K(-1/3,1/3)),

    which, because of easy symmetry properties of K, is equal to

    \frac 14\sum_dw_d\int_0^\infty \Bigl(\frac{2e^{-2\theta}}{|1-e^{2\pi i\alpha d-\theta}|^2}-\frac{e^{4\pi i\alpha d-2\theta}}{(1-e^{2\pi i\alpha d-\theta})^2}-\frac{e^{-4\pi i\alpha d-2\theta}}{(1-e^{-2\pi i\alpha d-\theta})^2}\Bigr)d\theta,

    where \alpha=1/3 (but I’d rather keep it as \alpha, as the argument is perfectly general). Every time \alpha d is an integer, this gives us zero, and when \alpha d is not an integer it gives us something fairly small. So in this way we (sort of) see the bounded average discrepancy coming about. I still need to think more about this calculation before I can say I fully understand it, or be confident that it isn’t rubbish in some way.

  5. Kristal Cantwell Says:

    If player one and player two alternate assigning signs to primes with one have the objective of forcing a discrepancy of four then the player trying to force a discrepancy of four will always win.

    If player one and player two assign values to the primes of a multiplicative function and player one is trying to force a discrepancy of 4 and player 2 to prevent this player one will win.
    Player one chooses to assign 2 the value 1 then player 2 must assign 3 the value -1 or the sum of 1 to 4 will be 4 after 3 is assigned the value 1.
    Then player one assigns 5 the value 1 and since f(1),f(2),f(4),f(5) and f(8),f(9) and f(10) are positive the sum at 10 is at least 4 and we are done.

    If player one and player two assign values to the primes of a multiplicative function and player two is trying to force a discrepancy of 4 and player 2 to prevent this player two will win.

    First we will show that the first player must choose 2. If player one does not choose this then let the second player choose 2 and assign it the value 1 at player 2’s first move. Then the first two moves of player one must be assigning 3 and 5 the values -1. If not then player two will assign one of these the value one on player 2’s second move and either f(1),f(2),f(4),f(5) and f(8),f(9)
    and f(10) are positive or f(1),f(2),f(3), and f(4) are positive and in either
    case the discrepancy is 4.

    So the first player must assign f(2) the value -1 on the first players first
    move. Let the second player assign f(3) the value 1 on the second players first move. Now f(1),f(3),f(4),f(5),
    f(9),f(12),f(15),f(16) are positive one of f(7) and f(14) is positive. So if the first player does not assign 11 and 13 the value -1 on the second and third moves then the second player can assign one of them value 1 and f(1),f(3),f(4),f(5),f(9),f(12),f(15),f(16), one of f(7) and f(14) and the number chosen make 10 positive values less than or equal to 16 which gives a discrepancy of four.So the second and third moves of the first player are choosing 11 and 13 and assigning them value -1. The second player assigns 7 the value 1 on the third move. Now f(1),f(3),f(4),f(5), f(9),f(12),f(15),f(16) f(7),f(20),f(21),f(22)f(25),f(26),f(27) and f(28) are positive and the discrepancy is at least 4 at 28 and we are done.

  6. Polymath5 « Euclidean Ramsey Theory Says:

    […] Polymath5 By kristalcantwell There is a new thread for Polymath5. The pace seems to be slowing down a bit. Let me update this there is another thread. […]

  7. gowers Says:

    I want to follow up on this comment with some rather speculative ideas about why EDP might be true.

    The annoying example we keep coming up against is 1, -1, 0, 1, -1, 0, …, which, as I have already pointed out, is given by the formula f(n)=(2/\sqrt{3})s(n/3), where s(x) stands for \sin(2\pi x). Now perhaps something like the following is true. Given any \pm 1 sequence (or indeed any sequence) we can decompose it as a linear combination of sequences of the form c(\alpha n) and s(\alpha n), where \alpha is rational. A simple proof is as follows. We can produce the characteristic function of the HAP with common difference d as the sum d^{-1}(1+e(n/d)+e(2n/d)+\dots+e((d-1)n/d). And we can produce any function of the natural numbers as a linear combination of HAPs, just by solving an infinite system of linear equations that’s in triangular form. (Basically that is the proof of the Möbius inversion formula.)

    Let’s gloss over the fact that the decomposition is far from unique. It means that what I am about to say is not correct as it stands, but my hope is that it could be made correct somehow.

    On the face of it, using cosines in a decomposition is likely to lead to large discrepancies, because not only are they periodic, but they are at their maximum at the end of each period, so the HAP with common difference that period (which is an integer if we are looking at c(\alpha n) for some rational \alpha) is being given a linear discrepancy that we would hope could not be cancelled out. (This is where we would need a much more unique decomposition, since as things stand it can be cancelled out.) So perhaps one could prove that for a \pm 1 sequence to have any chance of having small discrepancy, it has to be built out of sines rather than cosines.

    Now the problem with sines is that they keep being zero. So one could ask the following question: how large a sum of coefficients do you need if you want to build a \pm 1 sequence out of sines?

    I think the answer to this question may be that to get values of \pm 1 all the way up to N requires coefficients that sum to at least c\log N. My evidence for that is the weak evidence that the obvious way of getting a \pm 1 sequence does give this kind of logarithmic growth. Here is how it works.

    I know that s(n/3) gives me \pm 1 values (after normalization) except at multiples of 3 where it gives me zero. To deal with multiples of 3, I first create the HAP with common difference 3 by taking (1+c(x/3)+c(2x/3))/3 (which works because it’s the real part of (1+e(x/3)+e(2x/3))/3, which also works. That, however, is not allowed because I’ve used cosines. So I’ll multiply it by s(x/9). The addition formulae for trigonometric functions tell us that s(\alpha x)c(\beta x)=(s(\alpha+\beta)x)+s((\alpha-\beta)x))/2, so this results in a sum of sines with coefficients adding up to 1 (or 2/\sqrt{3} when we do the normalization). But this new sequence has gaps at multiples of 9. Continuing this process, we find that for each power of 3 we need coefficients totalling 1 (or 2/\sqrt{3}) to fill the gaps at multiples of that power, which gives us a logarithmic bound.

    So the following might be a programme for proving EDP, which has the nice feature of using the \pm 1 hypothesis in a big way.

    1. Show that any sequence that involves cosines in a significant way must have unbounded discrepancy. (One might add functions such as s(\alpha x) where \alpha is irrational.)

    2. Show that any sequence involving sines must have a discrepancy at least as big as the sum of the coefficients of those sines.

    3. Show that to make a \pm 1 sequence out of sines one must have coefficients that grow at least logarithmically.

    As I say, before even starting to try to prove something like this, one would need to restrict the class of possible decompositions, so some preliminary thought is required before one can attempt anything like 1 or 2. Can anyone come up with a precise conjecture that isn’t obviously false? As for 3, it may be that one can already attempt to prove what I suggested above, that to get a \pm 1 sequence all the way up to N requires coefficients with absolute values that sum to at least c\log N.

    • Sune Kristian Jakobsen Says:

      A small remark: If we only show (2) for finite sums of sines, it might still be possible to for an infinite sum of sines where the coefficients doesn’t converge to have bounded discrepancy. A sequence with bounded discrepancy can easily be written as a limit of sequences with unbounded discrepancy.

    • gowers Says:

      I agree. For an approach like this to work there definitely needs to be some extra ingredient, which I do not yet see (and it is not clear that it even exists), that restricts the way you are allowed to decompose a sequence.

    • Sune Kristian Jakobsen Says:

      This sounds interesting, but I have to ask: Do we have any reason to prefer to use sine functions rather than characteristic functions of APs?

    • gowers Says:

      I’ve wondered about that, and will try to come up with a convincing justification for thinking about sines. However, that doesn’t mean that characteristic functions of APs shouldn’t be tried too. If we did, then perhaps the idea would be to show that using HAPs themselves is a disaster, and using other APs requires too many coefficients, or something like that.

      On a not quite related note, does anyone know whether EDP becomes true if for each p you are allowed to choose some shift of the HAP with common difference p? (We considered a related problem earlier, but here I do not insist that the shifts are consistent: e.g., you can shift the 2-HAP by 1 and the 4-HAP by 2.)

    • Gil Says:

      I suppose the following concrete questions are relevant:

      Conjecture 1 If a sequence of +1 -1 0 has bounded discrepency then, except for a set of density 0 it is periodic.

      I do not know a counter example to:

      Conjecture 2 If a multiplicative sequence of +1 -1 0 has bounded discrepency then it is periodic.

      A periodic sequence with period r has bounded discrepency if the sum of elements with indices di for every d which devides r is 0. (In particular, x_r=0.) So it make sense to check the suggested conjecture 3 of fine tune it on such periodic sequences.

  8. Gil Says:

    How about the periodic sequence whose period is 1 -1 1 1 -1 -1 1 -1 0

    • gowers Says:

      I don’t understand your question.

    • Gil Says:

      I was just curious about the sine decomposition (or sine/cosine decomposition) for the basic sequence \mu_3 and for “r-truncated mu_3” where you make the sequence 0 if at integers divisible by 3^r. (You mentiones the r=1 case and I wondered especially about r=2.)

    • gowers Says:

      I would decompose it by the method I mentioned above. That is, at the first stage I obtain the function 1 -1 0 1 -1 0 … as a multiple of s(n/3). I then obtain 0 0 1 0 0 1 0 0 1 … by taking (1+c(n/3)+c(2n/3))/3. I then convert that into 0 0 1 0 0 -1 0 0 0 … by pointwise multiplying it by s(n/9) (or rather 3/\sqrt{2}s(n/9)). By the addition formulae for sin and cos, that gives you the periodic sequence 1 -1 1 1 -1 -1 1 -1 0 recurring as a sum of sines, where the total sum of coefficients is 3\sqrt{2}.

    • Gil Says:

      so part 2) of the conjecture works ok for such trubcations?

  9. Moses Charikar Says:

    I wanted to mention an approach to proving a lower bound similar to some of the ideas being discussed here. Consider the generalization of EDP to sequences of unit vectors. The problem is to find unit vectors v_i so that \max_{k,d} \{||v_{d} + v_{2d} + \ldots + v_{kd}||_2^2\} is bounded.

    For sequences of length n, this leads to the optimization problem: Find unit vectors v_1,\ldots,v_n so as to minimize \max_{k,d} \{||v_{d} + v_{2d} + \ldots + v_{kd}||_2^2\}. This can be expressed as a semidefinite program (SDP) – a convex optimization problem that we know how to solve to any desired accuracy. Moreover, there is a dual optimization problem
    such that any feasible solution to the dual gives us a lower bound on the value of the (primal) SDP and the optimal solutions to both are equal.

    The dual problem to this SDP is the following: Find values c_{k,d}, b_i
    so as to maximize \sum b_i such that \sum_{k,d} c_{k,d} \leq 1
    and the quadratic form \sum_{k,d} c_{k,d}(x_{d}+x_{2d}+\ldots+x_{kd})^2 - \sum_i b_i x_i^2 is positive semidefinite.

    The semidefinite program is usually referred to as a relaxation of the original optimization question over \pm 1 variables. Note that any feasible dual solution gives a valid lower bound for the original \pm 1 question. This is easy to see directly. Suppose that \sum_{k,d} c_{k,d}(x_{d}+x_{2d}+\ldots+x_{kd})^2 - \sum_i b_i x_i^2 \geq 0 for all x_1,\ldots,x_n. In particular, it is non-negative for any x_i \in \{\pm 1\} and for such values \sum_{k,d} c_{k,d}(x_{d}+x_{2d}+\ldots+x_{kd})^2 \geq \sum_i b_i. Since \sum_{k,d}c_{k,d} \leq 1, there must be some k,d such that (x_d + x_{2d} + \ldots + x_{kd})^2 \geq \sum b_i.

    Now what is interesting is that for the vector discrepancy question for sequences of length n, there is always a lower bound of this form that matches the optimal discrepancy. If the correct answer for vector discrepancy was a slowly growing function of n, and if we could figure out good values for c_{k,d} and b_i to prove this, we would have a lower bound for EDP.

    Now the nice thing is that we can actually solve these semidefinite programs for small values of n and examine their optimal solutions to see if there is any useful structure. Huy Nguyen and I have been looking at some of these solutions. Firstly here are the optimal values of the SDP for some values of n.

    128: 0.53683
    256: 0.55467
    512: 0.56981
    1024: 0.58365
    1500: 0.59064

    (I should mention that we excluded HAPs of length 1 from the objective function of the SDP, otherwise the optimal values would be trivially at least 1.) The discrepancy values are very small, but the good news is that they seem to be growing with n, perhaps linearly with \log n. The bad news is that it is unlikely we can solve much larger SDPs. (We haven’t been able to solve the SDP for 2048 yet.)

    The main point of this was to say that there seems to be significant structure in the dual solutions of these SDPs which we should be able to exploit if we understand what’s going on. One pattern we discovered in the c_{k,d} values is that the sums of tails of this sequence (for fixed d) seem to drop exponentially. More specifically, if t_{j,d} = \sum_{k\geq j} c_{k,d} then t_{j,d} = C_d e^{-\alpha jd} (approximately). It looks like the scaling factor C_d is dependent on d, but the factor \alpha in the exponent is not. The b_i values in the dual solution also seem to have interesting structure. The values are not uniform and tend to be higher for numbers with many divisors (not surprising since they appear in many HAPs).

    We should figure out the easiest way to share these SDP dual solutions with everyone so others can play with the values as well.

    • gowers Says:

      This looks like a very interesting idea to pursue. One aspect I do not yet understand is this. It is crucial to EDP that we look at functions that take values \pm 1 and not, say sequences that take values in [-1,1] and are large on average. For example, the sequence 1, -1, 0, 1, -1, 0, … has bounded discrepancy. In your description of the dual problem and how it directly gives a lower bound, it seems to me that any lower bound would also be valid for this more general class of sequences. But perhaps I am wrong about this. If so, then where is the \pm 1 hypothesis being used?

      If it was in fact not being used, that does not stop what you wrote being interesting, since one could still try to find a positive semidefinite matrix and try to extract information from it. For example, it might be that the sequences that one could build out of eigenfunctions with small eigenvalues had properties that meant that it was not possible to build \pm 1-valued sequences out of them. (This is the kind of thing I was trying to think about with the quadratic form mentioned in the post.)

      I have edited your comment and got rid of all the typos I can, but I can’t quite work out what you meant to write in the formula t_{j,d}=C_de^{-\alpha j,d}.

    • Moses Charikar Says:

      Tim, the \pm 1 hypothesis is being used in placing the constraint v_i \cdot v_i = 1 in the SDP. So a sequence with \pm 1 would be a valid solution but one with \{\pm 1, 0\} would not. The variables b_i correspond to this constraint. In fact, it seems sufficient to use the constraint v_i \cdot v_i \geq 1 (the optimal values are almost the same). In this case, the dual variables b_i \geq 0. In some sense, value of b_i is a measure of how important the constraint v_i \cdot v_i \geq 1 is.

      While \{\pm 1,0\} sequences are not valid solutions, distributions
      over such sequences which are large on average on every coordinate are valid
      solutions to the SDP. So a lower bound on the vector question means that any distribution on sequences which is large on average on every coordinate must contain a sequence with high discrepancy.

      The expression t_{j,d} = C_d e^{-\alpha j,d} should not have a comma in the exponent, i.e. it should read: t_{j,d} = C_d e^{-\alpha jd}. Sorry for all the typos ! I wish I could visually inspect the comment before it posts (or edit it later).

    • gowers Says:

      Ah, I see now. This is indeed a very nice approach, and I hope that we will soon be able to get some experimental investigations going and generate some nice pictures.

    • Moses Charikar Says:

      I mentioned that it looks like the optimal values of the SDP seem to be growing linearly with \log n. If true, this would establish a lower bound of \sqrt{\log n} on the discrepancy of \pm 1 sequences. This is because for the vector problem, the objective function is actually the square of the discrepancy for integer sequences. The analog of integer discrepancy would be to look at ||v_{d} + v_{2d} + \ldots + v_{kd}||_2 but in fact, the objective function of the SDP is the maximum of ||v_{d} + v_{2d} + \ldots + v_{kd}||_2^2

      In fact the growth rate for vector sequences could be different from the growth rate for integer sequences. I think we can construct sequences unit vectors of length n such that the maximum value of ||v_{d} + v_{2d} + \ldots + v_{kd}||_2 is \sqrt{\log n}. This can be done using the familiar 1,-1,0,1,-1,0,… sequence: Construct the vectors one coordinate at a time. Every vector in the sequence will have a 1 or -1 in exactly one coordinate and 0’s elsewhere. In the first coordinate we place the 1,-1,0,1,-1,0,… sequence. Look at the subsequence of vectors with 0 in the first coordinate. For these, we place the 1,-1,0,1,-1,0,… sequence in the second coordinate. Now look at the subsequence of vectors with 0’s in the first two coordinates. For these, we place the 1,-1,0,1,-1,0,… sequence in the third coordinate and so on. All unspecified coordinate values are 0. The first n vectors in this sequence have non-zero coordinates only amongst the first \log_3 n coordinates. Now for any HAP, the vector sequence has bounded discrepancy in every coordinate. Thus the maximum of ||v_{d} + v_{2d} + \ldots + v_{kd}||_2 for the first n vectors is bounded by \sqrt{\log n}.

    • gowers Says:

      I like that observation. One could even think of it as a multiplicative function as follows. We take the function from \mathbb{N} to L_\infty(\mathbb{T}) defined by the following properties: (i) if n is congruent to \pm 1 mod 3, then f(n) is the constant function \pm 1; (ii) f(3) is the function z; (iii) f is completely multiplicative (where multiplication in L_\infty(\mathbb{T}) is pointwise).

      To be more explicit about it, to calculate f(n) you write n as m3^k, where m is congruent to \pm 1 mod 3, and f(n) is then the function z\mapsto\pm z^k.

    • gowers Says:

      Out of curiosity, let me assume that c_{k,d} is given by a formula of the kind C_d e^{-\alpha kd} (which would give the right sort of behaviour for the tails). What does that give us for \sum_{k,d}c_{k,d}(x_d+\dots+x_{kd})^2?

      Well, if m\ne n then x_mx_n is counted 2\sum c_{k,d} times, where the sum is over all common factors d of m and n and over all k that exceed \max\{m/d,n/d\}. For each fixed d, … OK, I’ll change to assuming that the tail of the sum of the c_{k,d} is given by that formula, so we would get C_d\exp(-\alpha\max\{m,n\}), and then we’d sum that over all d dividing (m,n).

      Now let me choose, purely out of a hat, to go for C_d=\phi(d), so that when we sum over d we get (m,n)\exp(-\alpha\max\{m,n\}). This is not a million miles away from what I was looking at before, but now the task is much clearer. We don’t have to worry about the fact that some functions like 1, -1, 0, 1, -1, 0, … have bounded discrepancy. Rather, we must find some sequence b_1,b_2,\dots such that subtracting the corresponding diagonal matrix leaves us with something that’s still positive semidefinite, in such a way that the sum of the b_i is large.

      I haven’t checked in the above what the sum of the c_{k,d} is, so I don’t know how large the sum of the b_i has to be. But, for those who share the anxiety I had earlier, the way we deal with the problem of the bounded-discrepancy sequences is that the sequence b_i will tend to be bigger at numbers with many prime factors, to the point where, for example, the sum of the b_i such that i is not a multiple of 3 will be bounded.

      Here’s a simple toy problem, but it could be a useful exercise. Find a big supply of positive sequences of real numbers (b_i) such that \sum_ib_i=\infty but for every m the sum of all b_i such that i is not a multiple of m is finite.

      I’ve just found one to get the process going: take b_n to be 1 if n=m! for some m and 0 otherwise. So the question is to find more interesting examples, or perhaps even something closer to a characterization of all such sequences.

    • gowers Says:

      The more I think about this the more I look forward to your sharing the values of the solutions to the SDP dual problem that you have found, especially if they can also be represented visually. You’ve basically already said this, but what excites me is that we could then perhaps make a guess at some good choices for the c_{k,d} and the b_i and end up with a much more tractable looking conjecture than EDP — namely, that some particular matrix is positive semidefinite.

    • Alec Edgington Says:

      Regarding the toy problem, here’s a slight generalization of your example: let K_m (m \geq 1) be any sequence of finite non-empty subsets of \mathbb{N}, and let c_m \geq 0 be any sequence such that \sum_m c_m = \infty. Then the sequence

      \sum_m c_m \chi_{m! K_m}

      satisfies the condition (where \chi_S is the characteristic function of S).

    • Alec Edgington Says:

      To generalize a bit further: let \beta_{m,r} (m, r \geq 1) be a matrix of non-negative reals such that \sum_m \beta_{m,1} = \infty and \sum_r \beta_{m,r} < \infty for all m. Then let b_n = \beta_{m,r} where n = m! r with m maximal.

    • gowers Says:

      One can greedily create such sequences as follows. First, choose a sequence (a_n) of positive reals that sums to infinity. Next, arbitrarily choose a sequence with finite sum that takes the value a_1 somewhere, and put it down on the odd numbers. That takes care of the non-multiples of 2. Now we take care of multiples of 3 by choosing a sequence that has finite sum and takes the value a_2 somewhere and placing it at the points that equal 2 or 4 mod 6 (that is, the non-multiples of 3 that have not had their values already assigned). Next, we deal with non-multiples of 5 by choosing values for numbers congruent to 6, 12, 18 or 24 mod 30 … and so on.

    • Huy Nguyen Says:

      I have put some summary data of the SDP solutions at http://www.cs.princeton.edu/~hlnguyen/discrepancy/discrepancy.html
      The data mostly focus on the tails rather than c_{k, d}.

    • Moses Charikar Says:

      This comment got posted in the wrong spot earlier. Please delete the other copy.

      The dual solutions for n=512,1024,1500 and the corresponding positive semidefinite matrices are available
      here . Double click on a file to download it.

      The files dual*.txt have the following format.
      Lines beginning with “b” specify the b_i values
      b i b_i
      Lines beginning with “t” specify the tails t_{k,d}
      t k d t_{k,d}

      The files matrix*.txt have the following format: the ith line of the file contains the entries of the ith row of the matrix.

  10. gowers Says:

    Gil, I meant to mention your ideas about creating low-discrepancy sequences probabilistically in my latest post, but forgot. I have now updated it. I would like to bring that particular discussion over to here, which is why I am writing this comment.

    I am trying to come up with a purely linear-algebraic question that is nevertheless relevant to EDP. Here is one attempt. Let r be an integer, and let us try to build a sequence (x_n) of real numbers such that for every d and every k the sum x_{((k-1)r+1)d}+\dots+x_{krd}=0. That is, for every d, if you break the HAP with common difference d into chunks of length r, then the sum over every chunk is zero.

    If r is prime, then one way of achieving this is to create a sequence with the following three properties: (i) it is periodic with period r; (ii) x_n=0 whenever n is a multiple of r; (iii) x_1+\dots+x_{r-1}=0. If r is composite, then the condition is similar but slightly more complicated. A general condition that covers (ii) and (iii) simultaneously is that for every factor d of r the sum x_d+x_{2d}+\dots+x_r must be zero.

    The set of all such sequences is a linear subspace of the set of all real sequences. My question is whether every sequence satisfying the first condition (namely summing to zero along chunks of HAPs) must belong to this subspace.

    I have given no thought to this question, so it may have a simple and uninteresting answer.

    Let me just remark that it is very important that the “chunks” are not just any sets of r consecutive terms of HAPs, since then periodicity would follow trivially (because when you remove x_n from the beginning of a chunk and add x_{n+r}, you would need the sum to be the same). So, for example, if r=5, then the conditions imposed on the HAP with common difference 3 are that x_3+x_6+x_9+x_{12}+x_{15}=0, that x_{18}+x_{21}+x_{24}+x_{27}+x_{30}=0, and so on.

    • Gil Kalai Says:

      Sure, lets continue discussing it here along with the various other interesting avenues. The idea was to try to impose zeroes as partial sums along intervals of HAPs. If the distance between these zeroes is order k then in a random such sequence we can hope for discrepency \sqrt k.

      The value k=\log n appears to be critical. (Perhaps, more accurately, $k=\log n \log\log n$.) When k is larger the number of conditions is sublinear and the probability for such a condition to hold is roughly is 1/\sqrt k. So when k=(\log n)^{1.1} we can expect that the number of sequences satisfying the conditions is typically 2^{n-o(1)}.

      This give a heuristic prediction of \sqrt{\log n} (up to lower order terms, or perhaps, more accurately, \sqrt{\log n\log\log n}) as the maximum discrepency of a sequence of length n.

      Of course, the evidence that this is the answer is rather small. This idea suggests that randomized constructions may lead to examples with descrepency roughly \sqrt{\log n}. In fact, this may apply to variants of the problems like the one where we restrict ourselves to square free integers. I will make some specific suggestions in a different post.

      Regarding lower bounds, if k is smaller than $\log n$ then the number of constrains is larger then the number of variables. So this may suggest that even to solve the linear equations might be difficult. Like with any lower bound approach we have to understand the case that we consider only HAP with odd periods where we have a sequence of discrepency 1. It is possible, as Tim suggested, that solutions to the linear algebra problem exists only if the imposed spacing are very structured which hopefully implies a periodic solution.

  11. Moses Charikar Says:

    The dual solutions for n=512,1024,1500 and the corresponding positive semidefinite matrices are available here. Double click on a file to download it.

    The files dual*.txt have the following format.
    Lines beginning with “b” specify the b_i values
    b i b_i
    Lines beginning with “t” specify the tails t_{k,d}
    t k d t_{k,d}

    The files matrix*.txt have the following format: the ith line of the file contains the entries of the ith row of the matrix.

  12. gowers Says:

    Although the mathematics of this comment is entirely contained in the mathematics of Moses Charikar’s earlier comment, I think it bears repeating, since it could hold the key to a solution to EDP.

    Amongst other things, Moses points out that a positive solution to EDP would follow if for large n one could find coefficients c_{k,d} and b_m such that the quadratic form

    \displaystyle \sum_{k,d}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2-\sum_mb_mx_m^2

    is positive semidefinite, the coefficients c_{k,d} are non-negative and sum to 1, and the coefficients b_m are non-negative and sum to \omega(n), where \omega is some function that tends to infinity. Here, the sums are over all k,d such that kd\leq n and over all m\leq n. The proof is simple: if such a quadratic form exists, then when each x_i=\pm 1 we have that \sum_mb_mx_m^2=\sum_mb_m=\omega(n), and since the c_{k,d} are non-negative and sum to 1 we know by averaging that there must exist k,d such that (x_d+\dots+x_{kd})^2\geq\omega(n).

    Here are a few things that are very nice about this.

    (i) The condition that x_i=\pm 1 is used in an important way: if the b_m are mainly concentrated on pretty smooth numbers, then we will not be trying to prove false lower bounds for sequences like 1, -1, 0, 1, -1, 0, … since the sum of the b_m over non-multiples of 3 can easily be at most 1 or something like that.

    (ii) We can use semidefinite programming to calculate the best possible quadratic form for fairly large n (as Moses and Huy Nguyen have done already) and try to understand its features. This will help us to make intelligent guesses about what sorts of coefficients c_{k,d} and b_m have a chance of working.

    (iii) We don’t have to be too precise in our guesses in (ii), since to prove EDP it is not necessary to find the best possible quadratic form. It may be that there is another quadratic form with similar qualitative features that we can design so that various formulae simplify in convenient ways.

    (iv) To prove that a quadratic form is positive semi-definite is not a hopeless task: it can be done by expressing the form as a sum of squares. So we can ask for something more specific: try to find a positive linear combination of squares of linear forms in variables x_1,\dots,x_n such that it equals a sum of the form

    \displaystyle \sum_{k,d}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2-\sum_mb_mx_m^2.

    To do this, it is not necessary to diagonalize the quadratic form, though that would be one way of expressing it as a sum of squares.

    In theory, therefore, a simple identity between two sums of squares of linear forms could give a one-line proof of EDP. It’s just a question of finding one that will do the job.

    At this point I’m going to stick my neck out and say that in view of (i)-(iv) I now think that if we continue to work on EDP then it will be only a matter of time before we solve it. That is of course a judgment that I may want to revise in the light of later experience.

    • gowers Says:

      Here is a very toy toy problem, just to try to get some intuition. The general aim is to try to produce a sum of squares of linear forms, which will automatically be a positive semidefinite quadratic form, from which it is possible to subtract a diagonal quadratic form and still have something positive semidefinite.

      Here is a simple example where this can be done. Let us consider the quadratic form

      \displaystyle (a+b)^2+(a+c)^2+\lambda(b+c)^2

      in three variables a,b and c. Here, \lambda is a small positive constant. Now this form is positive definite, since the only way it could be zero is if a+b=a+c=b+c=0, which implies that a=b=c=0. But we want to quantify that statement.

      One possible quantification is as follows. We rewrite the quadratic form as

      \displaystyle (1-\lambda)((a+b)^2+(a+c)^2)+

      \displaystyle +\lambda((a+b)^2+(b+c)^2+(a+c)^2)

      and observe that the part in the second bracket can be rewritten as a^2+b^2+c^2+(a+b+c)^2. Therefore, the whole quadratic form is bounded below by \lambda(a^2+b^2+c^2).

      This isn’t a complete analysis, since if a+b+c=a+b=a+c=0 then a=b=c=0, so I haven’t subtracted enough. But I have to go.

    • gowers Says:

      Actually, for intuition-building purposes, I think the identity

      \displaystyle (a+b)^2+(b+c)^2+(a+c)^2=a^2+b^2+c^2+(a+b+c)^2

      is better, because it is very simple, and it shows how the “bunchedupness” on the left-hand side can be traded in for a diagonal part and something that’s more spread out. Now all we have to do is work out how to do something similar when we have HAPs on the left-hand side …

    • gowers Says:

      Here’s an exercise that would be one step harder than the above, but still easier than EDP and possibly quite useful. I’d like to know what the possibilities are for subtracting something diagonal and positive from the infinite quadratic form

      \displaystyle ax_1^2+a^2(x_1+x_2)^2+a^3(x_1+x_2+x_3)^2+\dots

      and still ending up with something positive definite, where a is some constant less than 1. That is, I would like to find a way of rewriting the above expression as a sum of squares of linear forms, with as much as possible of the weight of the coefficients being given to squares of single variables.

      Actually, I’ve seen one possible way of doing it. Note that x^2+a(x+y)^2=(1-a)x^2+a(2(x+y/2)^2+y^2/2). That allows us to take the terms of the above series in pairs and write each pair as a square plus a^{2n}x_{2n}^2/2. So we can subtract off the diagonal form

      \displaystyle \frac12(a^2x_2^2+a^4x_4^2+a_6x_6^2+\dots)

      and still have something positive semidefinite.

      However, it looks to me as though that is not going to be good enough by any means, because if we sum the coefficients in that kind of expression over all d we are going to get something of comparable size to the sum of all the coefficients, which will be bounded. So we have to take much more account of how the HAPs mix with each other (which is not remotely surprising).

      So I’d like some better examples to serve as models.

    • Moses Charikar Says:

      If you were able to subtract off a large diagonal term from the expression \displaystyle ax_1^2+a^2(x_1+x_2)^2+a^3(x_1+x_2+x_3)^2+\dots and still end up with something positive semidefinite, then this would serve as a lower bound for the discrepancy of the collection of subsets \{1\}, \{1,2\}, \{1,2,3\}, \ldots. But the minimum discrepancy of this collection is 1. Hence the sum of coefficients of the diagonal terms can be no larger than \sum_{i=1}^{\infty} a^i. (The ratio of the two quantities is a lower bound on the square discrepancy). The best you can do I suppose is to have a be very small, and subtract a x_1^2.

    • gowers Says:

      I didn’t mean that just that form on its own would suffice, but that even if you add up a whole lot of expressions of that kind, one for each d, the bound for the sum of the diagonal terms will be bounded in terms of the sum of all the coefficients. But I suppose your remark still applies: that is inevitably the case, or else one could prove for some fixed d that the discrepancy always had to be unbounded on some HAP of common difference d, which is obvious nonsense.

      Let’s define the dpart of the quadratic form that interests us to be \sum_kc_{k,d}(x_d+\dots+x_{kd})^2. And let’s call it q_d. Then it is absolutely essential to subtract a diagonal form \Delta from \sum_dq_d in such a way that we cannot decompose \sum_dq_d-\Delta as a sum \sum_d(q_d-\Delta_d) of positive semi-definite forms.

      Maybe the next thing to do is try to find a non-trivial example of subtracting a large diagonal from a sum of a small number of q_ds. (By non-trivial, I mean something that would beat the bound you get by subtracting the best \Delta_d from each q_d separately.)

    • Moses Charikar Says:

      We could get some inspiration from the SDP solutions for small values of n. Note that we explicitly excluded the singleton terms from each HAP because that would trivially give us a bound of 1. So far, the best bound we have (for n=1500) from the SDP is still less than 1. Getting a bound that exceeds 1 by this approach is going to require a very large value of n. That being said, here are some dual solutions that have clean expressions:

      \displaystyle 2(x_1+x_2)^2 + 2(x_1+x_2+x_3)^2 - x_3^2 \geq 0

      This gives a lower bound of \frac{1}{4} on the square discrepancy for n=3.

      \displaystyle (x_1+x_2+x_3+x_4+x_5)^2 + 2 (x_1+x_2+x_3+x_4+x_5+x_6)^2 + 2 (x_1+x_2+x_3+x_4+x_5+x_6+x_7)^2 + (x_1+x_2+x_3+x_4+x_5+x_6+x_7+x_8)^2 + (x_2+x_4)^2 + (x_2+x_4+x_6)^2 + (x_2+x_4+x_6+x_8)^2 - x_6^2 - x_7^2 - x_8^2 \geq 0

      This gives a lower bound of \frac{1}{3} on the square discrepancy for n=8.

      I don’t have proofs that these quadratic forms are non-negative, except that they ought to be if we believe that the SDP solver is correct.

    • Moses Charikar Says:

      Of course, we can get a sum of squares representation from the Cholesky decomposition of the matrix corresponding to the quadratic form.

    • gowers Says:

      The non-negativity of the first of those forms is a special case of what I said earler: x^2+(x+y)^2=2(x+y/2)^2+y^2/2. I’ll think about the other one.

    • Moses Charikar Says:

      The second form has the following decomposition:

      \displaystyle 6(x_1+x_2+x_3+x_4+x_5+\frac{5}{6}x_6 + \frac{1}{2}x_7 + \frac{1}{6} x_8)^2 + 3(x_2 + x_4 + \frac{2}{3} x_6 + \frac{1}{3} x_8)^2 + \frac{1}{2} (x_6 + x_7 + x_8)^2

      We know that the SDP bound is \frac{1}{4} for n \leq 7 and jumps to \frac13 for n=8 thanks to this quadratic form. Hence this decomposition is non-trivial in that it cannot be decomposed as a sum (q_1-\Delta_1) + (q_2 -\Delta_2) of positive semidefinite forms.

    • Huy Nguyen Says:

      If we look at the Cholesky decomposition of the matrix M corresponding to the quadratic form computed by the SDP solver for n=1500 for inspiration for the sum of squares, there seem to be some interesting patterns going on there. Let R^T R be the decomposition, and R_i be the i-th row of R, then x^T Mx can be rewritten as \sum_i (R_i x)^2. R_i seems to put most of its weight on numbers that are multiple of i. The weight at the multiples of i decreases quickly as the multiples get larger. Among the non-multiples of i, there is also some pattern in the weights of numbers with common divisors with i as well. I have put some plots at http://www.cs.princeton.edu/~hlnguyen/discrepancy/cholesky.html

    • gowers Says:

      Here’s another way of thinking about the problem. Let’s assume that we have chosen (by inspired guesswork based on experimental results, say) the coefficients c_{k,d}. So now we have a quadratic form \sum_{k,d}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2, and we want to write it in a different way, as \Delta+\sum_iL_i(x)^2, where x is shorthand for (x_1,x_2,\dots) and for each i L_i(x) is some linear form \sum_j a_{ij}x_j. Also, \Delta is a non-negative diagonal form (that is, one of the form \sum \delta_ix_i^2, and our aim is to get the sum of the \delta_i to be as large as possible.

      Instead of focusing on \Delta, I think it may be more fruitful to focus on the off-diagonal part of the quadratic form. That is, we try to choose the L_i so as to produce the right coefficients at every x_ix_j, and we try to do that so efficiently that when we’ve finished we find that the diagonal part is not big enough — hence the need to add \Delta.

      To explain what I mean about “efficiency” here, let me give an extreme example. Suppose we have a quadratic form in x_1,\dots,x_n and all we are told about it is that the coefficient of every x_ix_j is 2. An inefficient way of achieving this is to take \sum_{i<j}(x_i+x_j)^2. If we do this, then the diagonal part is (n-1)\sum_ix_i^2. But we can do much much better by taking (x_1+\dots+x_n)^2, which gives a diagonal part of \sum_ix_i^2.

      In the HAPs case, what we’d like to do is find ways of reexpressing the sum \sum_{k,d}c_{k,d}(x_d+\dots+x_{kd})^2 more efficiently by somehow cleverly combining forms so as to achieve the off-diagonal part with less effort. The fact that there are very long sequences with low discrepancy tells us that this will be a delicate task, but we could perhaps try to save only something very small. For instance, we could try to show that the form was still positive semidefinite even after we subtract \sum_n n^{-1}x_{n!}^2. (This would show c\sqrt{\log\log n} growth in the discrepancy, whereas we are tentatively expecting that it should be possible to get \sqrt{\log n}.)

    • Gil Says:

      What would it take, by this method, to show that the discrepency is > 2?

    • gowers Says:

      If I understand correctly from Wikipedia, the Cholesky decomposition would attempt to solve the problem in my previous comment in a greedy way: it would first make sure that all the coefficients of x_1x_i were correct, leaving a remainder that does not depend on x_1. Then it would deal with the x_2x_i terms (with i\geq 2), and so on. If this is correct (which it may not be) then it is not at all clear that it will be an efficient method in the sense I discussed above (though in the particular example I gave it happened to give the same decomposition).

    • gowers Says:

      The answer to Gil’s question is that we’d need to choose the coefficients c_{k,d} to sum to 1 and to be able to rewrite the quadratic form \sum_{k,d}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2 as \sum_i L_i^2+\Delta where the L_i are linear forms and \Delta is a diagonal form \sum_i d_ix_i^2 with non-negative coefficients d_i that sum to more than 4. (The 4 is because we then have to take square roots.)

      Indeed, if we can do this, then we see that if x_i=\pm 1 for every i, then the quadratic form we started with takes value greater than 4, so by averaging at least one of the (x_d+x_{2d}+\dots+x_{kd})^2 is greater than 4.

      We know from the 1124 examples that this is not going to be easy, but the fact that we need to go up to 4 is encouraging (in the sense that 1124 is not as frighteningly large a function of 4 as it is of 2).

    • gowers Says:

      As a tiny help in thinking about the problem, it is useful to note that the coefficient of x_mx_n in the quadratic form \sum_{k,d}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2 is 2\sum_{d|(m,n)}\sum_{kd\geq m\vee n}c_{kd}. If, following Moses, we write \tau_{k,d} for \sum_{j\geq k}c_{k,d}, then this becomes 2\sum_{d|(m,n)}\tau_{m\vee n,d}.

      It’s a shame that this formula involves the maximum m\vee n, but we might be able to deal with that by smoothing the truncations of the HAPs (as I did in the calculations in the EDP9 post). That is, one could try to prove that a sum such as \sum_r e^{-\alpha r} x_{rd} is large, which implies by partial summation that one of the sums \sum_{r\leq m} x_{rd} is large. This too raises technical problems — instead of summing over the coefficients we end up integrating (which isn’t a problem at all) but the integral of the coefficients is infinite. I call this a technical problem because it still doesn’t rule out finding some way of showing that the diagonal coefficients are “more infinite” in some sense, or doing some truncation to make things finite and then dealing with the approximations.

    • gowers Says:

      One further small remark. The suggestion from the experimental evidence is that \tau_{k,d} has the form C_de^{-\alpha kd}. However, we are not forced to go for the best possible form. So perhaps we could try out \tau_{k,d}=C_de^{-\alpha k} (and it is not hard to choose the c_{k,d} so as to achieve that). Then \sum_{d|(m,n)}\tau_{m\vee n,d} would equal e^{-\alpha(m\vee n)}\sum_{d|(m,n)}C_d. That would leave us free to choose some nice arithmetical function d\mapsto C_d. For example, we could choose C_d=\Lambda(d) and would then end up with \log(m,n)e^{-\alpha(m\vee n)}.

      If we did that, then we would have the following question. Fix a large integer N, and work out the sum S of all the coefficients c_{k,d} such that kd\leq N. Then try to prove that it is possible to rewrite the quadratic form \sum_{m,n\leq N}\log(m,n)e^{-\alpha(m\vee n)}x_mx_n as a diagonal form plus a positive semidefinite form in such a way that the sum of the diagonal terms is at least \omega(S).

      There is no guarantee that this particular choice will work, but I imagine that there is some statement about a suitable weighted average of discrepancies that would be equivalent to it, and we might find that that statement looked reasonably plausible.

    • Moses Charikar Says:

      Since \Lambda(d) is non-zero only for prime powers, your choice of C_d = \Lambda(d) would prove that the discrepancy is unbounded even if we restrict ourselves to HAPs with common differences that are prime powers. Certainly plausible.

      One comment on how large n needs to be for this to prove that the discrepancy is \geq 2. If the trend from the SDP values for small n continues, it will take really really large n for the square discrepancy to exceed even 1 ! The increment in the value from n=2^9 to n=2^{10} was a mere 0.014. At this rate, you would need to multiply by something like 2^{70} to get an increment of 1 in the lower bound on square discrepancy.

    • gowers Says:

      For exactly this reason, we had a discussion a few days ago about whether EDP could be true even for HAPs with prime-power common differences. Sune observed that it is false for complex numbers, since one can take a sequence such as x_n=\exp(2\pi in/6). However, it is not clear what the right moral from that example is, since no periodic sequence can give a counterexample for the real case. But it shows that any quadratic-forms identity one found would have to be one that could not be extended to the complex numbers. But it seems that such identities do not exist: if we change the real quadratic form \sum_{d,k}c_{k,d}(x_d+\dots+x_{kd})^2 into the Hermitian form \sum_{d,k}c_{k,d}|x_d+\dots+x_{kd}|^2, the matrix of the form is unchanged, and the coefficient of x_mx_n becomes the coefficient of \Re(x_mx_n)/2.

      So if my understanding is correct, even if EDP is true for prime power differences, it cannot be proved by this method, and C_d=\Lambda(d) was therefore a bad choice.

    • gowers Says:

      In fact, the reason I chose it was that I had a moment of madness and stopped thinking that \sum_{d|n}\phi(d)=n, because I forgot that it was precisely the same as \sum_{d|n}\phi(n/d). (What I mean is that I knew this fact, but temporarily persuaded myself that it wasn’t true.) So I go back to choosing instead C_d=\phi(d), in which case the quadratic form one would try to rewrite would be \sum_{m,n\leq N}(m,n)e^{-\alpha(m\vee n)}x_mx_n.

    • Moses Charikar Says:

      Another way to see this is that complex numbers can be viewed as 2 dimensional vectors: e^{i \theta} \equiv (\cos \theta, \sin \theta). If EDP does not hold for complex numbers for a subset of HAPs, then it does not hold for vectors for the same subset of HAPs. Hence we cannot get a good lower bound via the quadratic form.

    • Moses Charikar Says:

      One observation about the b_i: If only a subset of the b_i are non-zero, we need to think carefully about where to place these non-zero values. A proof of this form would imply this: Consider any sequence where we place \pm 1 values at locations i such that b_i > 0, but are free to place arbitrary values (including 0) at locations i such that b_i = 0. Then the discrepancy of any such sequence over HAPs is unbounded.

      In particular, this rules out having non-zero values for b_i only at i = n!, because an alternating $\pm 1$ sequence at these values (and zeroes elsewhere) has bounded discrepancy over all HAPs.

      One attractive feature of this approach is that the best lower bound achievable via a proof of this form is exactly equal to the discrepancy of vector sequences (for every sequence length n). But we ought to admit the possibility that the discrepancy for vector sequences may be bounded (even if it is unbounded for \pm 1 sequences). We know at least two instances where we can do something with unit vectors that cannot be done with \pm 1 sequences. Are there constructions of sequences of unit vectors with bounded discrepancy ?

    • Moses Charikar Says:

      This is a minor point, but I think the choice \tau_{k,d}=C_de^{-\alpha kd} actually ensures that the coefficient of x_mx_n is 2e^{-\alpha(m\vee n)}\sum_{d|(m,n)}C_d. Doesn’t change what we need to prove.

  13. Gil Kalai Says:

    Here is a proposed probabilistic construction which is quite similar to earlier algorithms, can be tested empirically, and perhaps even analyzed. Let n be the lenth of the sequence we want to create and let T be a real number. ( T= T(n) will be a monotone function of n that we determine later.)

    You chose the value of x_i, $1 \le i \le n$, after you have chosen earlier values. For every HAP containing i we compute the discrepency along the HAP containing i. We ignore those HAP so that the discrepency is smaller than T. For an HAP so that the discrepency D is larger than T, if the sum is negative we give weight (D/T) to +1 and weight 1 to -1. If the sum is negative we give the weight (D/T) to -1 and weight 1 to 1. (D is the absolute value of the sum.) (We can also replace D/T by a fixed constant, say 2.)

    Now when we chose x_i we compute the prduct of these weights for the choise +1 and for the choise -1 and choose at random according to these products.

    I propose to choose T as something like \sqrt{\log n \log\log n} or a little bit higher. We want to find a T so that typically for every i only a small (perhaps typically at most 1) number of HAP will contribute non trivial weights. We experiment this algorithm for various ns and Ts.

    • Gil Says:

      Actually, when we move from the spacing k to the discrepency inside the intervals, we do pay a price from time to time. And it seems that if you consider n intervals as we do there will be intervals whose discrepency is \sqrt{\log n} larger than expected. This brings us back to the \log n area.

      I still propose to experiment what I suggested in the previos post but would expect that this will lead to examples with discrepency in the order of \log n.

      I see no heuristic arument that we can go below \log n discrepency. But for certain measures of average discrepency the heuristic remains at \sqrt{\log n}).

  14. gowers Says:

    This is a response to this comment of Moses, but that subthread was getting rather long, so I’m starting a new one.

    What you say is of course a good point, and one that, despite its simplicity, I had missed. Let me spell it out once again. If we manage to find an identity of the form

    \displaystyle \sum_{kd\leq n}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2=\sum_{m\leq n}b_mx_m^2+\sum_iL_i(x)^2,

    with all coefficients non-negative, the c_{k,d} summing to 1 and the b_m summing to \omega(n), then not only will we have proved EDP, but we will have shown a much stronger result that applies to all sequences x_1,\dots,x_n (and even vector sequences, but let me not worry about that for now).

    To give an example of the kind of implication this has, let us suppose that A is any set such that \sum_{m\in A}b_m\leq\omega(n)/2, then we must be able to prove that EDP holds for sequences that take the value 0 on A and \pm 1 on the complement of A, since for such a sequence we know that

    \displaystyle \sum_{kd\leq n}c_{k,d}(x_d+x_{2d}+\dots+x_{kd})^2\geq \sum_{m\notin A}b_m\geq\omega(n)/2.

    We know that there are subsets on which EDP holds. An obvious example is the set of all even numbers. As Moses points out, the set of all factorials is not even close to being an example. In general we do not have to stick to subsets: the experimental evidence suggests that we should be looking at weighted sets, where the discussion is a bit more complicated.

    An obvious preliminary problem is to try to come up with a set of integers of density zero such that EDP does not obviously fail for sequences that are \pm 1 inside that set and 0 outside. Unfortunately, there is a boring answer to this. If EDP is true, then let n_k be such that all \pm 1 sequences of length n_k have discrepancy at least k. Now take all integers up to n_1, together with all even numbers up to 2n_2, all multiples of 4 up to 4n_3, and so on. The density of this set is zero and it has been constructed so that EDP holds for it.

    However, it may be that this gives us some small clue about the kind of sequence (b_m) we should be looking for.

    • gowers Says:

      Here is another attempt to gain some intuition about what is going on in the SDP approach to the problem. I want to give a moderately interesting example (but only moderately) of a situation where it is possible to remove a diagonal part from a quadratic form and maintain positive semidefiniteness. It is chosen to have a slight resemblance to quadratic forms based on HAPs, which one can think of as sets that have fairly small intersections with each other, but a few points that belong to a larger than average number of sets.

      To model this, let us take a more extreme situation, where we have a collection of sets that are almost disjoint, apart from the fact that one element is common to all of them. To be precise, let us suppose that we have r subsets A_1,\dots,A_r of \{2,3,\dots,r+1\}, each of size q, with |A_i\cap A_j|=1 for every i\ne j such that for every x\ne y belonging to \{2,3,\dots,r+1\} there is exactly one A_i that contains both x and y. Such systems (projective planes) are known to exist, though I may have got the details slightly wrong — I think r works out to be q^2+q+1.

      Now let us consider the quadratic form \sum_i(x_1+\sum_{j\in A_i}x_j)^2. Thus, we are adding the element 1 to each set A_i and taking the quadratic form that corresponds to this new set system. It is not hard to check that the coefficient of x_1^2 is r, the coefficient of x_1x_i is 2q (because for each j there turn out to be exactly q sets A_i that contain j), the coefficient of x_i^2 is q when i\geq 2 (for the same reason) and the coefficient of x_ix_j when 2\leq i<j is 2.

      It follows, if my calculations are correct, that the form can be rewritten as

      (qx_1+x_2+\dots+x_{r+1})^2+

      +(r-q^2)x_1^2+(q-1)(x_2^2+\dots+x_{r+1}^2).

      To me this suggests that a more efficient way of representing some average of squares over HAPs may well involve something like putting together bunches of HAPs that and replacing the corresponding sum of squares by sums of squares of linear forms that have weights that favour numbers that belong to many of the HAPs — that is, smoother numbers.

      One problem is that we may have two numbers m and n that are both very smooth but nevertheless (m,n)=1. But that may be where the bunching comes in. For example, perhaps it would be useful to take a large smooth number n and look at all HAPs that contain n, and try to find a more efficient representation for just that bunch. (That doesn’t feel like a strong enough idea, but exploring its weaknesses might be helpful.)

  15. Alec Edgington Says:

    While wondering about heuristic arguments for EDP I did a very simple calculation which leaves me somewhat puzzled.

    Consider the set of all \pm 1-valued sequences on \{ 1, 2, \ldots, n\} as a probability space, with all sequences having equal probability 2^{-n}. Let M be the event ‘sequence is completely multiplicative’, and K_C the event ‘partial sums are bounded by C’. Then \mathrm{P}(M) = 2^{\pi(n) - n}, and I think that \mathrm{P}(K_C) \sim k_C / \sqrt{N} (for some constant k_C).

    If we were to assume that M and K_C are independent, we would get \mathrm{P}(M \cap K_C) \sim k_C 2^{\pi(n) - n} / \sqrt{n}. Since 2^{\pi(n)} is much bigger than \sqrt{n}, this means we would expect there to be lots and lots of completely multiplicative sequences of length n with discrepancy C.

    What puzzles me is not that the assumption of independence is wrong but just how wrong it is. If EDP is true, then the multiplicativity of a sequence must force the discrepancy to be large in a big way. Of course, we know that it forces f(a^2 n) = f(n), so that the sequence ‘reproduces’ itself across certain subsequences, but this doesn’t seem enough to explain the above.

    • Gil Says:

      The probability that the partial sums are bounded are exponentially small, no?

    • gowers Says:

      I don’t suppose you’ll be satisfied with this answer, but part of the explanation is surely that if a sequence is multiplicative and has bounded partial sums, then it has bounded partial sums along all HAPs. So if that is impossible, then …

      I see that that’s not much more than a restatement of the question. Perhaps one way to gain intuition would be to think carefully about the events “has partial sums between -C and C” and “is 2-multiplicative”, where by that I mean that f(2n)=f(2)f(n) for every n. Do these two events appear to be negatively correlated? (I can’t face doing the calculation myself.) It may be that the negative correlation doesn’t show up at this point, in which case this question will not help.

    • Alec Edgington Says:

      Gil: unless I’m misunderstanding something, the probabiity of the partial sum being within a certain bound goes down as n^{-\frac{1}{2}}, corresponding to the height around zero of the density function of a Gaussian variable with mean zero and variance n. (We actually want the maximum rather than the final height, but I think this behaves similarly.)

      Tim: that sounds like a good idea; I’ll look at that…

    • gowers Says:

      Alec, I think Gil is probably right. Intuitively, if you want to force a random walk to stay within [-C,C], then you will have to give it a nudge a positive proportion of the time, which is restricting it by an exponential amount. In the extreme case where C=1 this is clear — every other move of the walk is forced.

      To put it another way, and in fact, this is a proof, you cannot afford to have a subinterval with drift greater than 2C. But for this to happen with positive probability, the subinterval needs to have size a constant times C^2, so the probability that it never happens is at most something like \exp(-cn/C^2) for some absolute constant c.

    • Alec Edgington Says:

      Sorry, yes, you’re right. I was misinterpreting a result about the maximum value attained (as opposed to the maximum absolute value). That alleviates my puzzlement.

  16. Gil Kalai Says:

    Another technique which is fairly standard and may be useful in the lower bound direction is the “polynomial technique”. Say you want to prove that every long enough sequence has discrepency at least 3. Consider polynomials in the variables x_1,x_2,\dots,x_n over a field with 7 elements. Now mod our by the equation x_i^2=1. We want to show that a certain polynomial is identically zero. The polynomial is: \prod \prod (\sum_{i=1}^mx_{ir}-3)(\sum_{i=1}^mx_{ir}+3), where the products are over all r and all m smaller than n/r. Once we forced the identity x_i^2=1 we can reduce every such polynomial to square free polynomial and what we need to show is that for large enough n this polynomial is identically zero.

    In case the formula wont compile: you take the product of all partial sums -3 times the same sum + 3, over all HAPs.

    (If you prefer to consider polynomials over the field and not mod up by any ideal you can replace the x_i by x_i^3 and add the product of all x_is as another term.)

    We use the fact that if the discrepency is becoming greater than 2 then some of the above sums will be +3 or -3 modulo 7.

    On the one hand, it feels like simply reformulating the problem, but on the other hand, maybe it is not. Perhaps, the complicated product can be simplified or some other algebraic reason why such products ultimately vanish exist. This type of technique is sometimes useful for problems where we can think also of semi definite/linear programming methodsl.

    I also share the feeling that at present Moses’s semidefinite program is the most promising.

  17. Klas Markström Says:

    A while back we were looking at multiplicative functions such that f(3x)=0 for all x, and one question which was raised was wether a function of this type, with bounded discrepancy , must be the function 1, -1, 0, 1, -1, 0, 1, -1, 0,…

    EDP7 — emergency post


    I could not see an intuitive reason for why this should be true, apart from just not managing to construct a counterexample. Has anyone looked more at this?

    I used one of my old Mathematica program to search for a multiplicative function of this type, with f(1)=1, and while I had lunch it found an assignment to the first 115 primes which works up to n=630.

    Here is the values for the primes
    {-1,0, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, -1, -1,
    1, 1, -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, -1, 1, -1, -1, 1,
    1, -1, -1, 1, -1, 1, 1, 1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, -1, -1,
    -1, 1, 1, 1, -1, -1, 1, 1, 1, 1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1,
    1, -1, 1, -1, -1, -1, 1, -1, -1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1,
    1, -1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, -1}

    The function for all x is then
    {1, -1, 0, 1, 1, 0, -1, -1, 0, -1, 1, 0, -1, 1, 0, 1, 1, 0, -1, 1, 0, \
    -1, -1, 0, 1, 1, 0, -1, 1, 0, -1, -1, 0, -1, -1, 0, 1, 1, 0, -1, 1, \
    0, -1, 1, 0, 1, -1, 0, 1, -1, 0, -1, 1, 0, 1, 1, 0, -1, -1, 0, -1, 1, \
    0, 1, -1, 0, -1, 1, 0, 1, -1, 0, 1, -1, 0, -1, -1, 0, 1, 1, 0, -1, \
    -1, 0, 1, 1, 0, -1, 1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, 1, \
    1, 0, -1, -1, 0, 1, -1, 0, -1, 1, 0, -1, 1, 0, 1, -1, 0, 1, 1, 0, -1, \
    1, 0, -1, -1, 0, 1, -1, 0, 1, 1, 0, -1, 1, 0, -1, -1, 0, 1, -1, 0, 1, \
    -1, 0, 1, -1, 0, -1, 1, 0, 1, -1, 0, 1, -1, 0, -1, 1, 0, -1, 1, 0, 1, \
    -1, 0, 1, -1, 0, -1, 1, 0, -1, 1, 0, -1, 1, 0, -1, -1, 0, 1, 1, 0, 1, \
    -1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, \
    -1, -1, 0, 1, 1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, 1, 1, 0, -1, 1, 0, \
    -1, 1, 0, -1, 1, 0, -1, -1, 0, 1, -1, 0, 1, -1, 0, -1, 1, 0, 1, 1, 0, \
    -1, -1, 0, -1, 1, 0, 1, 1, 0, -1, -1, 0, 1, -1, 0, 1, -1, 0, -1, 1, \
    0, -1, 1, 0, -1, 1, 0, -1, 1, 0, 1, -1, 0, -1, -1, 0, 1, -1, 0, 1, \
    -1, 0, 1, 1, 0, -1, -1, 0, 1, 1, 0, 1, 1, 0, -1, -1, 0, 1, -1, 0, 1, \
    1, 0, -1, -1, 0, 1, -1, 0, 1, 1, 0, -1, -1, 0, -1, 1, 0, -1, 1, 0, 1, \
    -1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, -1, 1, 0, -1, 1, 0, 1, 1, 0, -1, \
    -1, 0, -1, 1, 0, -1, -1, 0, 1, 1, 0, 1, 1, 0, -1, -1, 0, -1, -1, 0, \
    1, -1, 0, 1, -1, 0, 1, -1, 0, 1, 1, 0, -1, -1, 0, 1, 1, 0, -1, -1, 0, \
    1, 1, 0, -1, -1, 0, 1, -1, 0, 1, -1, 0, 1, 1, 0, -1, -1, 0, 1, 1, 0, \
    -1, 1, 0, 1, 1, 0, -1, -1, 0, -1, 1, 0, 1, -1, 0, 1, 1, 0, -1, -1, 0, \
    1, 1, 0, -1, -1, 0, 1, -1, 0, 1, -1, 0, -1, 1, 0, 1, 1, 0, -1, 1, 0, \
    -1, 1, 0, -1, -1, 0, -1, 1, 0, -1, -1, 0, 1, 1, 0, 1, -1, 0, -1, -1, \
    0, 1, 1, 0, -1, -1, 0, 1, 1, 0, 1, 1, 0, -1, -1, 0, 1, -1, 0, -1, 1, \
    0, -1, 1, 0, 1, 1, 0, -1, 1, 0, -1, 1, 0, -1, -1, 0, -1, 1, 0, -1, 1, \
    0, 1, 1, 0, -1, -1, 0, 1, -1, 0, 1, -1, 0, 1, -1, 0, -1, 1, 0, -1, 1, \
    0, 1, 1, 0, -1, 1, 0, -1, 1, 0, -1, -1, 0, -1, 1, 0, -1, -1, 0, 1, \
    -1, 0, 1, -1, 0, 1, 1, 0, 1, 1, 0, -1, -1, 0, 1, -1, 0, -1, -1, 0, 1, \
    1, 0, 1, -1, 0, -1, -1, 0, 1, 1, 0, 1, 1, 0, -1, -1, 0, -1, 1, 0, 1, \
    -1, 0, -1, 1, 0, -1, 1, 0, 1, 1, 0, -1, -1, 0, 1, 1, 0, -1, -1, 0, \
    -1, -1, 0, 1, 1, 0, 1, 1, 0

    • Klas Markström Says:

      The point of the example of course being that it is distinct from the unique example with discrepancy 1 even for small values of $x$.
      For a finite bound N one will typically be able to construct trivial examples defined on the first N integers simply by changing the values at the primes closest to N, since they are not affected by the multiplicative requirement.

    • Alec Edgington Says:

      One can get at least as far as 886 with a sequence like this of discrepancy 2 sending 5 to +1:

      +-0++0--0-+0-+0++0-+0--0++0-+0--0--0++0-+0-+0+-0+-0--0++0-+0-+0+-0-+0++0--0--0++0
      --0++0-+0+-0+-0--0+-0++0+-0+-0-+0-+0--0++0-+0+-0+-0++0-+0--0--0++0+-0-+0+-0+-0-+0
      -+0+-0+-0-+0-+0-+0--0++0+-0+-0-+0++0--0+-0+-0--0+-0+-0+-0+-0++0-+0++0--0-+0+-0+-0
      -+0++0-+0--0++0--0++0--0-+0-+0-+0-+0+-0-+0+-0+-0--0+-0++0++0--0+-0++0--0+-0++0--0
      -+0-+0+-0+-0+-0+-0-+0-+0-+0--0++0--0++0+-0+-0-+0+-0+-0+-0++0-+0-+0--0-+0-+0+-0+-0
      ++0--0+-0-+0+-0--0++0+-0+-0+-0++0--0+-0+-0-+0++0-+0--0-+0-+0+-0++0--0--0++0--0+-0
      ++0-+0+-0--0++0-+0-+0+-0+-0-+0-+0++0--0--0++0+-0-+0-+0++0-+0-+0--0-+0-+0+-0+-0++0
      -+0--0+-0--0++0-+0+-0+-0++0--0-+0+-0-+0-+0++0--0++0--0--0++0++0--0++0--0-+0-+0+-0
      ++0-+0-+0-+0--0++0--0+-0+-0+-0+-0+-0++0-+0-+0--0++0--0-+0++0--0-+0-+0++0--0-+0+-0
      +-0+-0+-0++0-+0--0++0--0+-0+-0++0--0--0++0-+0-+0--0++0++0-+0--0++0-+0--0+-0+-0++0
      --0--0++0++0--0+-0-+0-+0--0+-0++0++0--0--0+-0++0--0+-0+-0++0+-0--0++0++0--0+
      

      Here the following primes are sent to -1: 2, 7, 13, 19, 23, 31, 43, 47, 53, 61, 67, 73, 83, 97, 101, 107, 131, 139, 149, 151, 163, 167, 181, 191, 193, 199, 233, 239, 271, 277, 281, 283, 293, 313, 317, 349, 353, 359, 397, 401, 409, 419, 421, 431, 439, 443, 457, 463, 467, 509, 523, 547, 563, 571, 577, 587, 607, 613, 619, 631, 643, 647, 661, 677, 683, 691, 701, 709, 751, 787, 811, 823, 827, 829, 839, 859, 863, 883.

      I can’t see much of a pattern, but there seems to be quite a bit of freedom to assign either (+1, -1) or (-1, +1) at twin-prime pairs (3k-1, 3k+1).

    • Alec Edgington Says:

      That search finally terminated — so that is the longest such sequence we can get that sends 5 to +1.

    • Klas Markström Says:

      Alec, could you try that with the value at 7 flipped instead? It would be interesting to see how much further one gets by following the discrepancy 1 function out to different primes.

    • Alec Edgington Says:

      OK, I’ll try that when I get back to my computer this evening.

    • Alec Edgington Says:

      In this case (flipping the value at 7) the maximal length of sequence (or rather, the maximal length that is of the form p-1 — which is what I should have said above too) is 946:

      +-0+-0--0++0-+0+-0+-0-+0++0--0+-0++0--0+-0-+0-+0+-0-+0-+0+-0+-0++0--0--0++0+-0--0
      +-0++0--0++0--0+-0++0++0-+0-+0--0--0++0+-0+-0+-0-+0-+0+-0-+0+-0+-0-+0+-0+-0-+0+-0
      +-0+-0+-0-+0-+0+-0+-0-+0-+0+-0+-0+-0+-0-+0+-0-+0-+0-+0-+0-+0-+0+-0-+0++0--0-+0+-0
      +-0--0+-0+-0+-0++0-+0-+0--0+-0++0-+0--0+-0++0+-0+-0++0--0+-0+-0+-0+-0++0-+0--0+-0
      --0+-0+-0++0+-0++0-+0-+0-+0--0+-0+-0+-0+-0++0--0++0-+0--0+-0+-0+-0--0++0+-0+-0-+0
      --0+-0++0++0-+0-+0--0-+0--0++0-+0++0--0++0-+0--0+-0-+0-+0+-0--0++0+-0++0--0+-0+-0
      --0+-0++0++0--0++0--0++0--0+-0+-0--0++0--0++0-+0-+0++0--0++0--0--0++0--0++0+-0+-0
      +-0+-0-+0--0++0+-0+-0+-0-+0-+0+-0+-0+-0+-0+-0+-0+-0--0++0+-0-+0++0--0-+0+-0+-0+-0
      -+0+-0--0++0+-0++0--0-+0+-0++0--0--0++0--0++0-+0++0--0-+0+-0--0++0+-0+-0--0+-0+-0
      ++0+-0--0++0++0--0--0++0--0++0+-0+-0++0--0+-0+-0+-0-+0++0--0-+0--0++0--0++0++0--0
      -+0+-0--0++0+-0-+0+-0--0++0-+0++0--0-+0+-0++0--0-+0++0--0--0++0-+0+-0--0++0++0--0
       -+0++0--0-+0+-0+-0+-0-+0--0++0-+0--0+-0++0++0--0-+0--0+
      

      (Primes sent to -1 in this example: 2, 5, 7, 13, 17, 29, 37, 41, 43, 59, 67, 71, 79, 83, 89, 109, 113, 137, 139, 157, 167, 179, 191, 197, 211, 223, 227, 229, 251, 257, 269, 277, 281, 293, 311, 349, 353, 359, 379, 383, 389, 401, 421, 431, 443, 457, 467, 479, 487, 491, 499, 521, 541, 547, 557, 563, 569, 577, 587, 599, 617, 619, 641, 647, 653, 683, 701, 709, 719, 761, 769, 773, 787, 809, 811, 857, 859, 881, 911, 929, 937.)

      I’ll kick off a search with 11 flipped, but won’t hold my breath for it to finish!

    • Alec Edgington Says:

      Well, I’ll have to eat my words. It finished almost immediately, having got only as far as 330:

      +-0+-0+-0++0--0+-0+-0--0++0+-0+-0+-0+-0+-0-+0+-0+-0-+0--0+-0+-0++0+-0+-0--0++0+-0
      +-0++0--0--0+-0+-0+-0++0-+0-+0+-0+-0+-0+-0+-0+-0--0+-0+-0+-0+-0++0+-0+-0--0+-0+-0
      +-0++0+-0--0++0+-0-+0+-0--0+-0+-0++0+-0+-0+-0-+0-+0-+0++0-+0--0+-0+-0+-0+-0+-0+-0
      +-0--0++0--0+-0++0+-0--0+-0+-0++0+-0+-0+-0+-0+-0--0+-0++0--0+-0++0+-0+-0++0--0+-0
      --0+-0
      

      (Primes sent to -1: 2, 5, 13, 17, 23, 29, 41, 43, 47, 59, 71, 73, 83, 89, 101, 109, 113, 131, 137, 149, 173, 179, 181, 191, 211, 223, 227, 233, 239, 257, 263, 269, 281, 293, 311.)

      I had assumed that the maximum length would be a monotonic function of the first prime flipped, but evidently not…

    • Alec Edgington Says:

      I’ve now gone a bit further with this, and found:

      min p flipped  max n (n+1 prime)
      5              886
      7              946
      11             330
      13             >= 1380
      17             408
      19             >= 2646
      23             22
      29             >= 546
      31             >= 9906
      37             >= 27508
      41             >= 690
      43             >= 24420
      47             46
      53             52
      59             >= 1326
      61             >= 15072
      

      It seems that flipping a prime of the form 3k+1 is much better than flipping a prime of the form 3k-1. (Perhaps this isn’t too surprising, as it means we don’t immediately require f(3k+4) to change sign to keep the discrepancy at 2, but I’m not sure that fully explains it.)

    • Klas Markström Says:

      That’s an interesting sequence. I would not have expected the different p:s to give such different lengths.

      However the fact that one can find sequences as long a the ones you have found for some primes, suggest that any proof showing that the maximum length is finite for any p will have to be rather indirect, not giving a good bound for the maximum length.

  18. Moses Charikar Says:

    As Tim pointed out earlier, the Cholesky decomposition is greedy and does not necessarily give an efficient decomposition into sum of squares that uses the diagonal sparingly. But I want to propose using it as a proof technique for positive semidefiniteness after we have subtracted out the diagonal terms. If we can prove that the Cholesky decomposition succeeds, this proves that the matrix is positive definite. Put differently, I want to try and figure out what b_i values we can get away with without making the Cholesky decomposition procedure fail.

    Let’s review how Cholesky works. The goal is to represent positive definite matrix A = V V^T such that V is a lower triangular matrix. If V_i is the ith row of V, the only non-zero coordinates of V_i are in the first i positions, and V_i \cdot V_j = A_{ij}.

    The algorithm proceeds thus: V_{11} = \sqrt{A_{11}}. Having determined V_1, V_2, \ldots, V_{i-1}, the coordinates of V_i are determined thus: V_{i,1} is completely determined by the condition V_1 \cdot V_i = A_{1,i}. Next, V_{i,2} is determined by V_2 \cdot V_i = A_{2,i} and so on. Finally, V_{i,i} is determined by the constraint: \sum_{j \leq i} V_{i,j}^2 = A_{ii}. The procedure succeeds if, for all i, in the computation of V_{i,i}, A_{ii}-\sum_{j < i} V_{i,j}^2\geq 0.

    Consider what happens if, for a particular i, we subtract b_i from A_{i,i} and compute the new Cholesky decomposition. Since A_{i,i} is smaller, V_{i,i} will become smaller. This in turn will increase the values of V_{j,i} for j > i. This sets off a chain reaction and decreases V_{j,j} values for j > i. If we subtract too much, we run the risk of the Cholesky decomposition failing. But if it does fail, we know for sure that we have ruined positive definiteness.

    Now suppose we know the Cholesky decomposition of matrix A. This might help us guess what diagonal terms we can safely subtract from A. I’m hoping that the large diagonal V_{i,i} in the decomposition give us clues for where we can hope to subtract b_i (although there are complex dependencies and subtracting a small amount from one location changes a whole bunch of other elements in the decomposition.)

    • Moses Charikar Says:

      Here is a setting I want to think about, with some assumptions on various parameters slightly different from the ones that Tim proposed earlier. I’m going to repeat a few things from previous comments. Recall that we are interested in the quadratic form \displaystyle \sum_{k,d} c_{k,d} (x_d + x_{2d} + \ldots + x_{kd})^2

      As before \sum_{j \geq k} c_{j,d} = \tau_{k,d} = C_d e^{-\alpha kd}. Consider \alpha to be very small. We’ll use this to make some approximations. Say C_d = 1 for all d. (I hope this will turn out to be an easier setting to analyze and understand).

      Note that \sum_{k,d} c_{k,d} = \sum_d \tau_{1,d} = \sum_d e^{-\alpha d} \approx 1/\alpha.

      Consider the symmetric matrix A corresponding to the quadratic form above (i.e. the quadratic form is x^T A x). We hope to subtract a large diagonal term \sum b_i x_i^2 such that the remainder is still positive semidefinite. For m \geq n, the entry A_{m,n} = e^{-\alpha m} d((m,n)) where d(i) = the number of divisors of i.

      First, lets figure out the Cholesky decomposition for \alpha = 0. In this case, A_{m,n} = d((m,n)). This has a nice form which is obvious in retrospect: V_{m,n} = 1 if n|m and V_{m,n} = 0 otherwise. In other words, the nth column is the indicator function for the infinite HAP with common difference n. It is easy to verify that V_m \cdot V_n = d((m,n)) since V_{md} = V_{nd} = 1 precisely when d|m and d|n.

      The next step is to figure out what the decomposition is for this setting of parameters with \alpha > 0 and very small. I want to obtain approximate expressions for the entries of the Cholesky decomposition as linear expressions in \alpha, i.e. by dropping higher order terms from the Taylor series. I’m hoping that the relatively structured form of the matrix will allow us to get closed form approximations for the entries of this decomposition, and further that, it will result in diagonal elements V_{n,n} being relatively high for n with many divisors. If so, this should help us subtract large b_n values for such n.

    • Moses Charikar Says:

      I tried calculating the coefficients of \alpha for the entries of the Cholesky decomposition of the matrix described above. (The constant term for each entry is given by the Cholesky decomposition for \alpha = 0 determined above).The computation of the \alpha coefficients seemed too tedious (and error-prone) to do by hand, so I wrote some code to do it. Here is the output of the code for the first 100 rows of the decomposition. The format of each line of the file is
      i j \beta(i,j)
      where \beta(i,j) is the coefficient of \alpha in the Taylor expansion of V_{i,j}. It seems to match some initial values I computed by hand, so I hope the code is correct. I can’t say I understand what these values are, but it is reassuring to see that for n with many divisors,
      \beta(n,n) seems to be large. On the other hand it seems to be
      -1/2 for prime n. I hope this means that we can subtract out large values of b_n for n with many divisors while maintaining the property that the Cholesky decomposition continues to proceed successfully. To prove that Cholesky works, we will need to show upper bounds on |V_{i,j}| for i > j and lower bounds on the diagonal terms V_{i,i} (for the decomposition of the matrix with the diagonal form subtracted out).

      Before we do that, we need to understand what these coefficients \beta(i,j) are and why the values \beta(n,n) are large for n with many divisors. Any ideas ?

    • gowers Says:

      I’ve just had a look in Sloane’s database to see if there are any sequences that match the values \beta(n,n) but had no luck (even after various transformations, such as multiplying by 2, or multiplying by 2 and adding 1). But it might be worth continuing to look.

    • gowers Says:

      A couple of small observations. I think if we continue along these lines we will be able to get a formula for the diagonal values.

      The observations are that 2^k maps to (k-2)2^{k-2} for every k (so far, that is) and that 3^k maps to (k-3/2)3^{k-1} for every k. I’ll continue to look at powers for a bit and then I’ll move on to more general geometric progressions (an obvious example being 3,6,12,24,48,96) and see what pops out.

    • gowers Says:

      A formula that works when p=2,3,5 and gives the right value for k=0 and k=1 is that p^k maps to ((\frac{p-1}2)k-p/2)p^{k-1}. I suspect it is valid for all prime powers but have not yet checked.

      Maybe this looks more suggestive if we rewrite it as \frac 12 ((1-1/p)k-1)p^k.

    • Moses Charikar Says:

      Very interesting. I’ve put up a file with only the diagonal entries upto 1000 here . It is tedious to look for diagonal entries in the earlier file. I’ve also included prime factorizations in the new file.

    • gowers Says:

      Sorry — had to do something else.

      But I’ve now checked the GP 3,6,12,… and we get the formula f(3.2^k)=2^{k-2}(7k-2).

      Next up, the GP 5,10,20,40,… which gives the formula f(5.2^k)=2^{k-2}(11k-2).

      So I’d hazard a guess that f(p.2^k)=2^{k-2}((2p+1)k-2). Let me do a random check by looking at f(88). If my guess is correct, this should be 2^{3-2}\times 23=46 and in fact it is … 134. Let me try that again. It should be 2^{3-2}(23\times 3-2)=2\times 67=134. So I believe that formula now.

    • gowers Says:

      It’s taken me longer than it should have, but I now see that we ought to be looking not at f(n) but at f(n)/n. Somehow, this is where the “real pattern” is.

      When n=p^k we get f(n)/n=(1/2)((1-1/p)k-1), and when n=2^kp we get f(n)/n=(1/4)((2+1/p)k-2/p)=(1/2)(1+1/2p)k-1/p).

      I think an abstract question is emerging. Suppose we know that, as seems to be the case, for any a and p we have a formula of the kind f(ap^k)=p^k(uk+v). With enough initial conditions, that should determine f. We already have a lot of initial conditions, since we know that f(1)=f(p)=-1/2 for every prime p. Those conditions are enough to determine f(p^k), but they don’t seem to be sufficient to do everything. For that I appear to need to know f at all square-free numbers, or something like that.

    • gowers Says:

      For primes p,q with p<q the formula appears to be

      f(pq)=q(p-1)-1/2. I inducted that from a few small instances, and then checked it on 407=11\times 37. My formula predicts 370-1/2=369.5 and that is exactly what I get.

      In keeping with the general idea of looking at f(n)/n, let me also write this formula as f(pq)/pq=1-1/p-1/2pq.

      A small calculation, which I shall now do, will allow me to deduce the formula for p^aq^b.

    • Moses Charikar Says:

      Ok, I think we should have the diagonal values figured out soon, although understanding why these numbers arise will take more effort. Looking ahead, the hope was that large numbers are an indication that large b_i can be subtracted out from those entries without sacrificing positive semidefiniteness. At this point, I don’t have a clue about how these b_i should be picked. We can experiment with some guesses later.

      Question for later:
      The hope is that for any C, for suitably small \alpha, we can pick b_i such that \sum b_i > C/\alpha. At this point, I haven’t thought through why this should necessarily be the case. Is there a heuristic argument that we have enough mass on the diagonal to pull this through ?

      I am going to have to sign off for a while and get back to this later in the day.

    • gowers Says:

      For n=p^kq with p<q I get

      f(n)=kp^{k-1}(p-1)(q+1/2)-p^k/2

      or equivalently

      f(n)/n=k(1-1/p)(1+1/2q)-1/2q.

      I’ll add to this comment when I get a formula for p^aq^b, which should be easy now.

    • Alec Edgington Says:

      All very interesting. Another thing to note about the \beta(i,j) is that the fractional part is \frac{1}{2} precisely when j \mid i and 4 \nmid j.

    • gowers Says:

      I got the following formula for f(p^aq^b)/p^aq^b:

      \frac 12(1-\frac 1p)a((1+\frac 1q)b+1)+\frac 12((1-\frac 1q)b-1).

      I have tried it out on 225=3^2.5^2 and got the answer f(225)=577.5, which was correct. So I believe this formula too.

      Just to repeat what I am doing here, I am assuming that the function f(n)/n takes geometric progressions to arithmetic progressions and deducing what I can whenever I know what two values are. In this way, knowing f for prime powers and for products of two primes was enough to give me f for all numbers of the form p^aq^b. Sorry — correction — I have assumed that only for GPs with prime common ratio. I haven’t looked at what happens when the common ratio is composite.

      Come to think of it, if f(n)/n takes GPs with prime common ratio to APs, then \exp(f(n)/n) takes GPs with prime common ratio to GPs, which is a pretty strong multiplicativity property. That leads me to think that the eventual formula is going to be quite nice. I hope I’ll be able to guess it after working out f(pqr)/pqr, or perhaps I’ll have to do f(p^aq^br^c)/p^aq^br^c. But first I need some lunch.

    • gowers Says:

      Here’s a slightly more transparent way of writing the formula for f(p^aq^b)/p^aq^b:

      \frac 12(1-\frac 1p)(1+\frac 1q)ab+\frac 12(1-\frac 1p)a+\frac 12(1-\frac 1q)b-\frac 12.

    • gowers Says:

      Just before I dive into the calculations, let me make the remark that the signs are that the formula will be an inhomogeneous multilinear one in the indices. More precisely, I am expecting a formula of the following kind for f(n)/n when n=p_1^{a_1}\dots p_r^{a_r} and the primes p_1,\dots,p_r are in increasing order:

      \frac 12\sum_{A\subset\{1,\dots,k\}}\prod_{j\in A}(1+\frac{c_{A,j}}{p_j})a_j-1.

    • gowers Says:

      I’ve just wasted some time going about this a stupid way, which was to calculate f(pqr)/pqr for several values and try to spot a pattern. But I now see that that was unlikely to be easy, since the dependence on 1/p, 1/q and 1/r is trilinear. The values look quite strange too, but I have a new plan. That plan is to try to find a formula for f(6p^k)/6p^k when p is a prime greater than 3. That should give me some coefficients of a linear function in k. If I repeat the exercise for one or two other small products of two primes, then it should be easyish to spot a formula for the coefficients, and then I’ll have a formula for f(pqr^k), which will be a good start. In fact, it will be more than just a start, as from that and the multilinearity assumption I’ll be able to work out all the rest of the values at products of three prime powers.

    • gowers Says:

      Oh dear, for the first time I have run up against an anomaly in the data that may be hard to fit into a nice pattern. When p is any of 7,11,13,17 or 19 the value of f(6p)/6p is given by the formula 2-(2p-5)/12p, but when p=5 we get instead 2-(2p-7)/12p. But I’ll continue with this and try not to worry about f(30) too much for now. (But I would love to learn that it was wrong and should have been 57.5 instead of 58.5.)

    • Moses Charikar Says:

      I’ve added an even bigger file with diagonal entries upto 10,000 in this directory in case you are looking for more data points to verify guesses for formulae. The directory also contains the C code (cholesky.c) used to generate the lower triangular matrix of coefficients and another piece of code (cholesky2.c) to generate the diagonal entries only. I really hope the code is not buggy !

    • Moses Charikar Says:

      Uh oh … I spotted a tiny problem in the code which could potentially affect the calculation for rows and columns greater than n/2. Except that for some mysterious reason it doesn’t. i.e. the output is exactly the same after fixing the “bug”. Basically, I was not setting the constant term in the Taylor expansion correctly for diagonal entries > n/2. It should have been 1 and was being set to 0 instead. This could potentially affect the calculation of the coefficient of \alpha for all entries in columns > n/2, but apparently it does not. I’m checking to see if there are any other issues.

    • Moses Charikar Says:

      I looked over the code and as far as I can tell, it is correct. I now understand why the “bug” above did not really change anything.

    • gowers Says:

      After further investigation, I no longer think the value at 30 is anomalous, but I have discovered (with much more effort than I thought would be needed, because the pattern is clearly subtler than I thought it would be) the following formula for f(2^kpq)/2^kpq. We know from earlier that f(pq)/pq=1-1/p-1/2pq, so that gives us the value when k=0. If I call that value a_{pq}, then the formula, which is linear in k as we expect, is

      \displaystyle a_{pq}+(1+\frac{p*q}{4pq})k,

      where * is a binary operation that I do not fully understand. Let me tabulate the values that I have calculated so far:

      3*5=19, 3*7=21, 3*11=29,3*13=33,

      5*7=31,5*11=33,5*13=37,

      7*11=43, 7*13=43,

      11*13=67.

      I had to calculate a lot of values before I realized that there is a nice formula for this binary operation except if p and q are consecutive primes. The formula is simply p*q=2(p+q)+1. But when p and q are consecutive we seem to get a little “kick” and the value is larger. In fact, by the time we get to 11*13 the kick is not all that little.

      Anyhow, this observation partially explains the mysterious anomaly at 30, but really it replaces it by a bigger mystery: why should the values be sensitive to consecutiveness of primes?

      Of course, at this stage I haven’t looked at all that many numbers, so I hope that I’ll be able to ask a more precise question in due course. So far, however, I don’t feel all that close to a formula for f(p^aq^br^c)

    • gowers Says:

      Correction: the formula does not give the right value for 7*13 either, so my understanding is worse than I thought. However, in a way that might be good news, since I found the idea of a formula that fails for consecutive primes a bit bizarre.

    • gowers Says:

      At some point I may ask whether somebody can write a piece of code to work out this binary operation that I am working out laboriously by hand. But for now, here are a few more values, which perhaps give us tiny further clues.

      7*17=49, at which point I think it is a pretty safe conjecture that if p*q=2(p+q)+1, then p*r=2(p+r)+1 for all r\geq q.

      11*19=67, so we’re not there yet with 11.

      11*23=69=2(11+23)+1 so now we have got there.

      It looks as though a necessary and sufficient condition for p*q=2(p+q)+1 is that q\geq 2p. What’s more, it looks as though p*q is constant before that. So the right formula should involve maxes and things.

      OK, here is my revised formula: p*q=\max\{6p+1,2(p+q)+1\}. Phew!

      To test the earlier conjecture, let me try 7*19. It works out to be 53, which is indeed what it should be.

      And to test the new conjecture, let me try 13*19. The prediction is 79. What I actually get is … 79. OK, I think I believe this formula now, but it will need quite a bit more slog to get a formula for f(p^aq^br^c)/p^aq^br^c from here. I’ve spent several hours on this: the calculations are certainly routine enough to do on a computer, so I wonder if anyone could take over at this point. For instance, it would be good to give a formula for f(3^kpq). Does it change behaviour according to whether q<3p or q>3p?

    • Alec Edgington Says:

      Here’s a plot of f(n)/n for 1 \leq n \leq 10\,000:

  19. gowers Says:

    These calculations (of Moses in the previous comment) look very interesting and potentially fruitful. Rather than trying to duplicate them, I will continue to do what I can to refine my speculations about what the coefficients c_{k,d} and b_m might conceivably look like.

    As has already been mentioned, an important constraint on the b_m (aside from the essential one that the partial sums should be unbounded) is that if A is a set for which EDP fails, then \sum_{m\in A}b_m<\infty. Here I am saying that EDP fails for A if there exists a sequence (x_n) that takes \pm 1 values on A and arbitrary values on \mathbb{N}\setminus A such that the HAP-discrepancy of (x_n) is bounded. This is a necessary condition on the coefficients b_m, since if \sum_{m\in A}b_m=\infty then we know that EDP is true for A. (Once again, I am not being careful about the distinction between the large finite case and the infinite case.)

    In fact, it is better to think about the complex case, or even the vector case, since the SDP proof, if it succeeds, automatically does those cases as well. So let me modify that definition to say that EDP fails for A if there is a bounded-discrepancy sequence of complex numbers (or vectors) that has modulus (or norm) 1 everywhere on A.

    With this in mind, it is of some interest to have a good idea of which sets A are the ones for which EDP succeeds. A trivial example is any HAP (assuming, that is, that EDP is true). An example of a set for which EDP fails is the complement of any HAP. Let me prove that in the case where the HAP has common difference 15 — it will then be clear how the proof works in general. For any d, let \chi_d be a non-trivial character of the multiplicative group of residue classes mod d that are coprime to d. Now define f(n) to be \chi_{15}(n) if n is coprime to 15, \chi_5(n/3) if (15,n)=3, \chi_3(n/5) if (15,n)=5, and 0 if n is a multiple of 15. Any HAP will run periodically through either all the residue classes mod 15, or all the residue classes that are multiples of 3, or all the residue classes that are multiples of 5, or will consists solely of multiples of 15. Whatever happens, we know that f((15m+1)d)+f(15m+2)d)+\dots+f(15(m+1)d)=0 for every m and every d, and that |f(n)|=1 for every n that is not a multiple of 15.

    As Moses pointed out yesterday (to stop me getting carried away), if A is the set of factorials, then EDP fails for A. That is because every HAP intersects A in a set of the form \{m!,(m+1)!,(m+2)!,\dots\}, which means that we can define x_n to be 0 if n is not a factorial, and (-1)^m if n=m!.

    The main point of this comment is to give a slightly non-trivial example of a set for which EDP succeeds. By “non-trivial” I don’t mean mathematically hard, but just that the set is rather sparse. It is the set of perfect squares. (The proof generalizes straightforwardly to higher powers, and, I suspect, to a rich class of other sets, but I have not yet tried to verify this suspicion.)

    To see this, one merely has to note what the intersection of \{d,2d,3d,4d,\dots\} with \{1,4,9,16,\dots\} is. The answer is that it is the set \{r^2,4r^2,9r^2,16r^2,\dots\}, where r^2 is the smallest multiple of d that is a perfect square. (That is, to get r from d you multiply once by each prime that divides d an odd number of times.) The proof that this is what you get is that a trivial necessary and sufficient condition for d to divide a^2 is that r^2 should divide a^2.

    Now suppose that EDP failed for the set of perfect squares, and let y_n be the …

    Hang on, my proof has collapsed. What I am proving is the weaker statement that EDP fails if you insist that the sequence is zero outside A. Let me continue with that assumption, though the statement is much less interesting.

    Now suppose that EDP fails for the set of perfect squares, and let y_n be the sequence that demonstrates this. Thus (because of our new assumption) y_{m^2} has modulus 1 for every positive integer m, and y_n=0 if n is not a perfect square. But then we can set x_m=y_{m^2} and we have a counterexample to EDP.

    This comment has ended up as a bit of a damp squib, but let me salvage something by asking the following question. Suppose that the vector version of EDP is true and that (x_n) is a sequence such that |x_{m^2}|=1 for every m. Does it follow that (x_n) has unbounded discrepancy?

    This may be a fairly easy question, since the squares are so sparse. (If it is, then the whole point of this comment is rather lost, but a different useful point will have been made instead.) For example, perhaps it is possible to find a sequence that takes the value 1 at every square and 0 or -1 everywhere else and has bounded discrepancy.

    I think there is an interesting class of questions here. For which sets A does there exist a function f of bounded discrepancy that takes the value 1 everywhere on A and 0, 1 or -1 everywhere else? As I did in the end of this comment, one can paste together different HAPs to produce an example with density zero, but are there any interesting examples? I would expect that as soon as A is reasonably sparse (in some sense that would have to be worked out — it would have to mean something like “sparse in every HAP”), it would be possible to use a greedy algorithm to prove that EDP fails. So here is a yet more specific question. Define a sequence (x_n) as follows. For every perfect square it takes the value 1. For all other n we choose x_n so as to minimize the maximum absolute value of the partial sum along any HAP that ends at n, and if we have the choice of taking x_n=0 then we do. Thus, the first few terms of the sequence are as follows. x_1=1 (since 1 is a square), x_2=0 (since to minimize the maximum of |x_1| and |x_1+x_2| we can take x_2 to be 0 or -1 and we prefer 0), x_3=0 (for the same reason), x_4=1 (since 4 is a square), x_5=-1 (since x_1+\dots+x_4=2, so this makes the maximum at 5 equal to 1 rather than 2), x_6=-1, x_7=0, and so on. Does this algorithm give rise to a bounded-discrepancy sequence? If not, is there some slightly more intelligent algorithm that does?

Leave a comment