EDP21 — restrictions on possible proofs

The situation we are now in with EDP is roughly this. We have a statement that would imply EDP and that seems easier to prove because it is a reasonably simple existential statement (there exists a decomposition with certain properties) rather than a universal statement (every \pm 1 function has unbounded discrepancy on HAPs). However, the existential statement seems itself to be hard, despite one’s initial expectation that it ought to be easier. So what do we do?

One tool that we have at our disposal is duality, which is what we used to convert the problem to an existential one in the first place. Now obviously we don’t want to apply duality twice and end up with the original problem, but, perhaps surprisingly, there are ways that applying duality twice could be useful.

Here are two such ways. The first is that you prove that a certain kind of decomposition would be sufficient to prove EDP. Then you argue that if such a decomposition exists, then a more restricted kind of decomposition must also exist. Dualizing again, one ends up with a new discrepancy problem that is different from the original one (though it will imply it). The second way is this: if it is not easy to write down a decomposition that works, then one wants to narrow down the search space. And one way of doing that is to prove rigorously that certain kinds of decompositions do not exist. And an efficient way of doing that is to use duality: that is, one finds a function with low discrepancy on the class of sets that one was hoping to use for the decomposition. Since this class is restricted, solving the discrepancy problem is easier than solving EDP (but this time it doesn’t imply EDP).

We have already had an example of the first use of dualizing twice. In this post I want to give in detail an example of the second.

General results about duality.

Recall that we are trying to find a linear combination f=\sum_i\lambda_iS_i with the following properties, and with c as small as possible.

  1. Each S_i is a “square” in the sense that it is the characteristic function of a set of the form P_i\times P_i, where P_i itself is an interval of positive integers.
  2. \sum_xf(x,x)=1.
  3. If x and y are distinct and coprime, then \sum_af(ax,ay)=0.
  4. \sum_i|\lambda_i|\leq c.

Recall also that since \sum_xf(x,x)=\sum_i\lambda_i|P_i|, an equivalent formulation is to replace conditions 2 and 4 by the condition that

\sum_i|\lambda_i|\leq c\sum_i\lambda_i|P_i|.

Now the set of functions that satisfy conditions 2 and 3 is convex — indeed, it is an affine subspace V. And the set L of linear combinations \sum_i\lambda_iS_i such that \sum_i|\lambda_i|\leq c is also convex. If we cannot find such a decomposition, then these two convex sets must be disjoint, which implies that there is a linear functional \phi that separates them. By rescaling, we can ask that \phi should take the value 1 everywhere on V and be less than 1 on every function that belongs to L.

The first of these two conditions on \phi implies that \phi is constant on all rays (that is, intersections of \mathbb{N}^2 with lines through the origin), since otherwise we would be able to find points in V such that the inner product with \phi was whatever we wanted. The second condition implies that c|\sum_{(x,y)\in S_i}\phi(x,y)|<1 for every i, and is also implied by it. Therefore, if no decomposition of the desired form exists then there is a function \phi that is 1 on the main diagonal, constant on rays, and that has discrepancy at most c^{-1} on all squares. The converse is almost trivial.

Now let us see what happens if we modify the requirements of the decomposition as follows. We shall no longer ask for the sets S_i to be squares (so that we can allow, for instance, rectangles or fly traps). We shall insist that every set S we use belongs to some set system \Sigma. We shall also assume that \Sigma has disjoint subsets \Gamma and \Delta such that if S\in\Gamma, then the coefficient of S is required to be non-negative, and if S\in\Delta then the coefficient of S is required to be non-positive.

What happens if we dualize the problem with these new restrictions? This time we replace the set of all combinations \sum_i\lambda_iS_i such that \sum_i|\lambda_i|<c by the set of all linear combinations with that property and the additional property that \lambda_i\geq 0 if S_i\in\Gamma and \lambda_i\leq 0 if S_i\in\Delta. This tells us that \sum_{(x,y)\in S_i}\phi(x,y) is at most c^{-1} if S_i\in\Gamma, at least -c^{-1} if S_i\in\Delta, and at most c^{-1} in modulus for all other S_i. Again, the converse is easy.

Ruling out simple decompositions.

Now let us see how we can exploit this. I would like to obtain a lower bound for the best c one can obtain if \Sigma consists of 2-by-2 and 3-by-3 squares, and fly traps, and if the coefficient of every fly trap is non-positive.

To do this, let us define a function \phi as follows. If x=y, then we take \phi(x,y) to be 1, as we must. If |x-y|=1 and one of x or y is a multiple of 3, then we take \phi(x,y)=0, and otherwise we take \phi(x,y)=-1/2. (So the only way that \phi(r,r+1) can be -1/2 is if r\equiv 1 mod 3.) If |x-y|=2 and one of x or y is a multiple of 6, then we take \phi(x,y)=0 (this is forced since we have already decided that \phi(x/2,y/2)=0) and otherwise \phi(x,y)=-1/2.

That determines \phi on all points (x,y) such that |x-y|\leq 2, and on all multiples of all such points. It is not hard to check that the sum of \phi on any 2-by-2 or 3-by-3 square lies between 1 and 2. So it remains to look at the fly traps.

Since we are assuming that the coefficients of fly traps are all negative, our aim is to choose the remaining values of \phi in such a way that the sum on any fly trap is never less than -2. There is a trivially best strategy for doing that, which is to choose \phi(x,y) to be infinity for all remaining points. (This is overkill, but it works.)

If we do that, then what could go wrong? We would have to have a fly trap on which the sum was less than 2, which means we would have some pair of positive integers M,k such that the values of (M,M+1),\dots,(M,M+k) (or the same but with minuses) are all chosen before we put in any infinite values, and at least five of them are -1/2.

The condition that the values are all chosen is equivalent to the condition that every j\leq k is a factor of 2M. Indeed, the value of (M,M+j) is chosen only if j/d=1 or 2, where d is the highest common factor of M and j. If j/d=1, then j|M, and if j/d=2, then j/2|M, so j|2M.

Let us see what we can deduce from this. We know that k\geq 5, or we certainly cannot have at least five -1/2s. Setting j=3 we deduce that 3|M. Setting j=4 we deduce that 2|M. And setting j=5 we deduce that 5|M. Therefore, 30|M. It follows that \phi(M,M+1)=\phi(M,M+2)=0. And from that it follows that k\geq 7, so 210|M. Since 5 is coprime to both 2 and 3, \phi(M,M+5)=0 (as M/5 is a multiple of 6). Therefore, k\geq 8, from which it follows that 4|M, so 420|M.

There is a sort of race going on here. It is possible that \phi(M,M+3)=\phi(M,M+6)=-1/2, but otherwise all \phi(M,M+j) must be 0 up to j=7. So now it follows that k\geq 10. In particular, 9|M, so 1260|M. But this implies that \phi(M,M+3)=\phi(M,M+6)=0, since now M is a multiple of 36. So we’re sort of back to square one.

Once we know that 36|M, what property must j have if there is to be any hope that \phi(M,M+j)\ne 0? If j|M, then we need M/j not to be a multiple of 3, so we need j to be a multiple of 9. And if j is a factor of 2M but not of M, then we need 2M/j not to be a multiple of 6, which it will not be since it is odd. So that is still a possibility, but it requires j to be a multiple of 8.

Going back to our example, we now know that we can restrict attention to multiples of 8 or 9. But by the time we get to five of those, we have passed 16, which tells us that 8|M and that now j needs to be a multiple of 16. And before we know it, we hit 27. And so on.

I won’t try to make that last bit rigorous, but I’m pretty sure that it’s possible to do so, and to prove that the race can never be won. What I mean by that is that for every fly trap we will be able to put in an infinity before we reach five -1/2s. If that is correct, then it proves that we cannot improve on a c=1/2 bound if we use a linear combination of 2-by-2 and 3-by-3 squares and try to cancel it out with some fly traps, all of which have the same sign.

I’ve just spotted that I made a mistake above, which is that a fly trap has two wings, so to speak, which means that for the sum to reach less than -2, we need only four -1/2s. (The diagonal starts you off at 1, and then each -1/2 has the effect of subtracting 1, so we can afford up to three of them.) But I think the conclusions I drew are valid even for this tighter race.

What happens if in addition to fly traps we look at rectangles of width 1? We’ve dealt with ones that intersect the diagonal, but what about a rectangle such as the one that consists of all the points from (M,r) to (M,s) for some r,s?

Now we need five -1/2s. What can we say about the highest power of 2 that divides M if all the points in [m+1,m+t] divide 2M? Let 2^r be the largest power of 2 that divides any number in [m+1,m+t], and let 3^s be the largest power of 3. Now let j belong to that interval. How can \phi(M,M+j) fail to be 0?

If j|M, then a necessary condition is that M/j should not be a multiple of 3, since otherwise \phi(M,M+j)=\phi(M/j,M/j+1)=0. But that implies that 3^s|j. If j|2M but j is not a factor of M, then 2^{r} must be a factor of j, since 2^{r-1}|M. So the total number of -1/2s is at most the number of multiples of 3^s that belong to the interval plus the number of multiples of 2^r that belong to the interval. But since any three consecutive multiples of 3^s include a multiple of 3^{s+1} and any two consecutive multiples of 2^r include a multiple of 2^{r+1}, the total number of -1/2s in a rectangle of width 1 that contains no infinity is at most 3.

This is actually a rigorous treatment of the fly trap case as well. It proves that we can’t obtain a decomposition with c<1/2 if all we allow ourselves to use is 2-by-2 squares, 3-by-3 squares, and fly traps and rectangles of width 1 with negative coefficients.

If we allow ourselves symmetric rectangle pairs, then the above function has discrepancy 3, so the argument no longer works. However, we can improve on that bound by changing the coefficient -1/2 to something slightly closer to zero. If we choose it to be -\alpha, then the worst discrepancy on a 3-by-3 square turns out to be 3-2\alpha and the worst on a rectangle pair is 6\alpha. Equating the two, we can take \alpha=3/8, and that gives us a discrepancy of 2.25, which tells us that we can’t make c any better than 4/9 even with this extra flexibility.

A fatter example?

Suppose we try to extend the above result to include 4-by-4 squares as well. Now we have to give sensible values to points of the form (r,r+3). The simplest way we might try to do that without messing anything up would be to define \phi(r,r+3) to be 0 if r is not a multiple of 3 (in which case r and r+3 are coprime) and \phi(r/3,r/3+1) otherwise. This means that \phi(r,r+3)=0 except if r\equiv 3 mod 9, in which case it is -1/2.

This does not add any -1/2s to the fly traps, since the only -1/2s we have put in the lines y=x\pm 3 are ones that were already there. However, it does change some infinities to zeros. What difference does that make?

Well, the condition now for \phi(M,M+j) not to be infinite is that j|3M. So if we have an interval [m+1,m+t] on which \phi(M,j) is finite, and if 3^s is the highest power of 3 that divides any of the integers in that interval, then the highest power of 3 that divides M is at least 3^{s-1} (whereas earlier it was 3^s). Let’s suppose that it is exactly 3^{s-1} (since this is the most difficult case). If j|3M but j does not divide M, then 3^s|j, so 3M/j is not divisible by 3, which means that \phi(M,j)=\phi(3M/j,3)=0. So we must have j|2M. This takes us back to the previous case, except that the highest power of 3 that divides M is smaller. So now from the condition that M/j should not be a multiple of 3, all we can deduce is that 3^{s-1}|j. Unfortunately, if 3^s is the highest power of 3 that goes into any number in an interval, 3^{s-1} can go into up to eight numbers in that interval. So it looks as though this method of fattening the example fails completely. This may be worth looking into in case it suggests a decomposition (though that is asking a lot, since there is no particular reason to suppose that the construction I have just attempted has to be exactly as it was in my attempt).

What next?

At the moment I am trying to solve the problem by means of the following time-honoured technique. First I try to prove the result. Then, when I run into difficulties, I try to disprove it. Then, when I run into difficulties there, I try to prove it again, or perhaps prove merely that the way that I was trying to disprove it cannot work. And so on. The more iterations of this process, the smaller the bearing any success will have on the problem, but if the problem is hard enough then it may be best to start small.

How does that relate to EDP? Well, we would like a decomposition. So we try finding a decomposition with stronger properties. Then, having failed to do so, we see whether we can prove that no decomposition with the stronger properties exists. Fortunately, duality gives us a technique for doing this: we try to find a function with small discrepancy on all the relevant sets. On failing to find such a function, we try to get some sense of why that is difficult, in the hope that it will lead us to a yet more refined class of possible decompositions. And so on.

So the next thing I feel like trying is to generalize the above argument to strips of width 3, 4, 5, etc. about the main diagonal. At some point difficulties will emerge (or if they don’t then we’ll have proved something very interesting). What I hope is that those difficulties will lead us to a construction that improves on c=1/2. I feel more and more that this would be a crucial breakthrough: if we can do that in a theoretical way (rather than just finding a pattern on a computer that we are unable to interpret, though even that would be very interesting) then we will have much more of a chance of generalizing it to arbitrary positive c.

93 Responses to “EDP21 — restrictions on possible proofs”

  1. Alec Edgington Says:

    Here’s a simple observation that’s nevertheless quite handy for ruling out potential functions \phi : \mathbb{N}^2 \to \mathbb{R}. (Tim has already effectively said this, but I’ll state it formally.) Let \hat{\phi} be the corresponding function on the rationals (so \hat{\phi}(m/n) = \phi(m,n)), and let A^+ = \hat{\phi}^{-1}(\mathbb{R}^+) and A^- = \hat{\phi}^{-1}(\mathbb{R}^-). Then A^+ and A^-, despite being disjoint, have the same closure. Indeed, if we had a point in A^+, say, that was isolated from all points in A^-, then by going far enough along the corresponding ray we could find rectangles containing arbitrarily many points on the ray (all with the same positive value) and no negative values. So every point in A^+ is a limit of points in A^-, and vice versa.

    It occurs to me that the Walters function might be useful for constructing examples with restricted subsets. Specificially, we take \phi(m,n) = \lambda(m) \lambda(n). If we consider all rectangles contained in [1,N]^2, this gives us discrepancy (\log N)^2, but perhaps the discrepancy would be bounded on some interesting classes of rectangles, or at least give us some information …

    • gowers Says:

      One can say pretty explicitly exactly what the Walters function disallows, since the discrepancy on the rectangle P\times Q is precisely the product of the discrepancy of \lambda on P and the discrepancy of \lambda on Q. In particular, it tells us that to get a decomposition with constant C we will need to use squares (or fly traps) of width exponentially large in C.

  2. gowers Says:

    I had a vague thought that I have been able to do nothing with, but mention it in case anyone else has an idea. The thought was that perhaps we could divide and conquer in the following way. Instead of trying to find the decomposition all in one step, we could try to identify a class of functions F with the following two properties.

    (i) Functions in F can be decomposed efficiently into squares.

    (ii) The decomposition can be carried out efficiently in terms of functions in F.

    That is, we would begin by making some building blocks (the functions in F) that are more flexible than the original squares. Where I get stuck is that I have no idea what those building blocks might be.

  3. gowers Says:

    Here’s a question that I ask mostly out of idle curiosity and because it can be tested experimentally, but it also has some small motivation that I’ll explain first.

    If we try to prove something using a decomposition into small squares and not so small fly traps, then the result will say something like, “Either we get fairly big discrepancy on one of these small squares or we get a considerably bigger discrepancy on a fly trap.”

    So I’d like to ask whether anything like that might be true for EDP itself. Here is the question that I’ve come up with (which I don’t claim to be equivalent to anything, but just in the same ball park). The long examples we had for EDP had several subsequences consisting of four pluses in a row or four minuses in a row. What happens if we outlaw these? Does the discrepancy grow much faster? Well, it won’t grow much faster because of the character-like examples, but perhaps it is forced to be 3 at a much smaller number than 1125.

    In the light of our difficulties in improving on c=1/2, I think I’d even be interested to know what happens if we outlaw three consecutive entries of the same sign. (It may be that there is some simple construction we can do like taking a known good example and putting it on the even numbers and making the value at 2n-1 always different from the value at 2n, but I don’t at the moment see how to make anything like that work.)

    • Klas Markström Says:

      That should be a simple modification of my old sequence programs. I can try it out later today.

    • Klas Markström Says:

      If we, in addition to HAPs, include all APs with difference 1 and length 3 the maximum possible N for discrepancy 2 is 584. If we fix the sign of 1 to + there are 208 solutions.

      The solutions can be found here
      This is a raw data file from the solver so each line begins with “v 584” followed by the indices, with signs according to the solution.

      I’ll start the solver with APs of length 4 instead, but I suspect that it will have a harder time with that case.

    • gowers Says:

      Thanks for that. I haven’t studied the solutions in much detail, but they definitely seem to exhibit the sort of quasimultiplicative structure we are familiar with. For what it’s worth, I haven’t yet found an example of a number that doesn’t have the opposite sign to twice that number, though I haven’t looked too hard.

    • Klas Markström Says:

      The first position where the solutions in the file differ is at n=219. They then fall into two classes which again split at n=307.

      The solver has found a solution for N=600 with APs of length 4 and is working on N=650 now. But as I suspected it is much slower now. This is only running on a single workstation, no parallel machines involved.

    • Klas Markström Says:

      Some statistics on the quasimultiplicativity properties of the solutions with APs of length 3.

      For all solutions f(x)=-f(2x) for x \leq 198, for some solutions this is true for x \leq 238. This is true for at least 275 out of the 292 possible values of x

      For all solutions f(x)=-f(3x) for x \leq 26
      This is true for at least 156 out of the 194 possible values of x

      For all solutions f(x)=-f(5x) for x \leq 98
      This is true for at least 111 out of the 116 possible values of x

      For all solutions f(x)=f(7x) for x \leq 68
      This is true for at least 76 out of the 83 possible values of x

      For all solutions f(x)=-f(11x) for x \leq 42, for some solutions this is true for x \leq 44
      This is true for at least 50 out of the 53 possible values of x

      For all solutions f(x)=f(13x) for x \leq 26
      This is true for at least 39 out of the 44 possible values of x

    • Klas Markström Says:

      Sorry about the mass posting, but his might actually be more relevant than the statistics my program produced. Our of the 584 values 550 are equal for all 208 solutions.

      This is the core part of the solutions.
      {1, -2, -3, 4, -5, 6, 7, -8, 9, 10, -11, -12, 13, -14, 15, 16, -17,
      -18, 19, -20, -21, 22, -23, 24, 25, -26, -27, 28, -29, -30, 31, -32,
      33, 34, -35, 36, 37, -38, -39, 40, -41, 42, 43, -44, -45, 46, -47,
      -48, 49, -50, 51, 52, -53, 54, 55, -56, -57, 58, -59, 60, 61, -62,
      63, 64, -65, -66, 67, -68, 69, 70, -71, -72, 73, -74, -75, 76, -77,
      78, 79, -80, -81, 82, -83, -84, 85, -86, 87, 88, -89, 90, 91, -92,
      -93, 94, -95, 96, 97, -98, -99, 100, -101, -102, 103, -104, 105, 106,
      -107, -108, 109, -110, 111, 112, -113, 114, 115, -116, 117, 118,
      -119, -120, 121, -122, 123, 124, -125, -126, 127, -128, -129, 130,
      -131, 132, 133, -134, 135, 136, -137, -138, 139, -140, -141, 142,
      -143, 144, 145, -146, -147, 148, -149, 150, 151, -152, -153, 154,
      -155, -156, 157, -158, 159, 160, -161, 162, 163, -164, -165, 166,
      -167, 168, 169, -170, 171, 172, -173, -174, 175, -176, 177, 178,
      -179, -180, 181, -182, -183, 184, -185, 186, 187, -188, -189, 190,
      -191, -192, 193, -194, 195, 196, -197, 198, 199, -200, -201, 202,
      -203, 204, 205, -206, -207, 208, -209, -210, 211, -212, 213, 214,
      -215, 216, 217, -218, 220, -221, -222, 223, -224, 225, 226, -227,
      -228, 229, -230, 231, 232, -233, -234, 235, -236, 238, -239, 240,
      241, -242, -243, 244, -245, -246, 247, -248, 249, 250, -251, 252,
      253, -254, -255, 256, -257, 258, 259, -260, 261, 262, -263, -264,
      265, -266, 267, 268, -269, -270, 271, -272, -273, 274, -275, 276,
      277, -278, -279, 280, -281, 282, 283, -284, 285, 286, -287, -288,
      289, -290, -291, 292, -293, 294, 295, -296, 297, 298, -299, -300,
      301, -302, -303, 304, -305, 306, -308, 310, -311, 312, 313, -314,
      -315, 316, -318, 319, -320, 322, -323, -324, 325, -326, 328, -329,
      330, -332, 333, 334, -335, -336, 337, -338, 339, 340, -341, -342,
      343, -344, -345, 346, -347, 348, -349, -350, 351, 352, -353, -354,
      355, -356, 357, 358, -359, 360, 361, -362, -363, 364, -365, 366, 367,
      -368, -369, 370, -371, -372, 373, -374, 375, 376, -377, 378, 379,
      -380, -381, 382, -383, 384, 385, -386, -387, 388, -390, 391, -392,
      393, 394, -395, -396, 397, -399, 400, -401, 403, -404, 405, 406,
      -407, -408, 409, -410, 412, -413, 414, 415, -416, -417, 418, -419,
      420, 421, -422, 423, 424, -425, -426, 427, -428, 429, 430, -431,
      -432, 433, -434, -435, 436, -437, -440, 441, 442, -443, 444, 445,
      -446, 448, -450, 451, -452, -455, 456, -459, 460, -462, 463, -464,
      465, -467, 468, 469, -470, 472, 475, -476, 477, -478, 479, -480,
      -481, 482, -483, 484, -485, 486, -487, -488, 489, 490, -491, 492,
      493, -494, -495, 496, 497, -498, 499, -500, 501, 502, -503, -504,
      505, -506, -507, 508, -509, 510, 511, -512, -513, 514, -515, -516,
      517, -518, 519, 520, -521, 522, -524, 525, -527, 528, 529, -530,
      -531, 532, -534, 535, -536, -539, 540, -542, 544, -545, -546, 547,
      -548, 549, 550, -551, 552, -553, 554, -555, -556, 557, -558, 560,
      -561, 562, 563, -564, -565, 566, 567, -568, 569, -570, 571, -572,
      573, 574, -575, 576, -577, -578, 579, -580, -581, 582, -584}

    • Klas Markström Says:

      One of the solver program, which have been working for a few days now, has found a solution with for N=1024, for the case whith APs of length 4 and common difference 1 included.
      The solution is here

    • Klas Markström Says:

      I should also have said that the solver has not shown that this is the maximum for which such solutions exists.

  4. gowers Says:

    Another small thought. Suppose we think of the problem in the formulation where we are trying to write \delta_1 (the characteristic function of \{1\}, defined on positive rationals) as an efficient linear combination of functions of the form P/Q (where P,Q are intervals, and (P/Q)(x) is the number of ways of writing x as a/b with a\in P and b\in Q).

    One trivially necessary condition is that if we consider the projections of the functions we use where we just set the value at 1 to be zero, then these must be linearly dependent. But even this is quite a hard condition to obtain, particularly if we insist that P and Q are not both small. We can make the condition a bit more combinatorial by looking at the sets P/Q rather than the functions, and ask whether it is possible to find a set of rationals that can be covered nicely by such sets. What do I mean by “covered nicely”? I don’t want to guess a precise definition, so let me just try to indicate the kind of thing I am talking about. It would be very nice, for example, if one could find a set X of rationals that could be written as a union of sets P_i/Q_i in such a way that all the P_i and Q_i are large, and every x\in X belongs to several of the P_i/Q_i.

    This isn’t quite the right condition, because we can achieve it in a boring way by picking sets with big overlaps. So perhaps another way of thinking about it is that we would like to find a set X such that for almost every x\in X we can find several “genuinely distinct” sets of the form P/Q that contain x and are subsets of X.

    Even getting every point into at least two “genuinely distinct” sets is quite challenging. One thing that would help is if we could find (or prove existence of) examples of long intervals P and Q such that no element of P is coprime to any element of Q. Then (going back to the \mathbb{N}^2 formulation) we could have a chance of using P\times Q as a “boundary” rectangle.

    We can in fact achieve this by the Chinese remainder theorem, though how much use this observation is I really don’t know. Suppose we want |P|=|Q|=k. Then for each (u,v) with 1\leq u,v\leq k choose a distinct prime p_{uv}. Then we choose an integer m such that for every u\leq k m+u is a multiple of \prod_vp_{uv}, and an integer m' such that for every v\leq k m'+v is a multiple of \prod_up_{uv}. Then for every 1\leq u,v\leq k p_{uv} divides both m+u and m'+v.

    Obviously the same argument can be used to give m+u and m'+v a rich set of factors, and there will be additional small common factors around.

    What I’m ultimately hoping for here is to find some squares for which there’s even the remotest chance that they could be used in a decomposition. It’s a bit disturbing not to be able to do even that.

  5. Polymath5 « Euclidean Ramsey Theory Says:

    […] By kristalcantwell There is a new thread for Polymath5 here. The pace seems to be picking up for Polymath5 since the beginning of September there have been […]

  6. Klas Markström Says:

    I suspect that this could be a precision issue but my last run of Alec’s program, on a machine with more RAM than earlier, claims to have found a solution for N=39 with C less than 1/2.

    The coefficients look worryingly small, which is why I’m worried that this might be an artefact rather than a real solution, but today I don’t have time to do any further checks. This was a run of the unrestricted rectangle version of the program so a correct solution should be the best possible ecomposition for each value of N.

    Alec, is it possible to increase the precision used in the program?

    • gowers Says:

      I agree that something’s very fishy about the example: it has one large coefficient and all the rest are tiny by comparison, which doesn’t sound ideal if you’re trying to get things to cancel.

    • Klas Markström Says:

      I have started a run of a version the program where I replaced some of the floating point numbers in the definitions by integers. I’m not sure if this will help but it is all I had time to do now.

    • Alec Edgington Says:

      Indeed, it’s hard to see where cancellation of the off-diagonal entries in [31,33]^2 is coming from — even allowing for the fact that only coefficients greater than 5 \times 10^{-8} are output. I’m not sure what’s going on here but I think it must be either a bug or a strange precision issue.

    • Klas Markström Says:

      I’ll attempt to run the program again but include even smaller coefficients in the output.

    • Klas Markström Says:

      I too am still a bit skeptical about this solution. Unless Alec finds a bug somewhere I guess the next thing to try would be to modify the program to output an mps file for this problem and use another solver with better working precision, e.g. gurobi.

    • Klas Markström Says:

      The version of the program which includes more coefficients in the output has finished, and the result might begin to explain why there could be both precision issues and other things happening here. The new solution file
      has over 200 000 non zero coefficients

    • gowers Says:

      I notice that it’s still overwhelmingly dominated by [31,33]^2, which, now that I think about it, makes it provably incorrect: if one coefficient is more than half the sum of the absolute values of all coefficients, then by the triangle inequality you can’t cancel it.

      But perhaps what you are suggesting is that that one coefficient doesn’t really dominate everything — it’s just that we’re not seeing the whole picture.

  7. Klas Markström Says:

    The first modification did not work due to incompatible data types. I believe I now have a working version with better precision. It seems to be a lot slower but I’ll let it run over night.

    • Klas Markström Says:

      A bit of comment lag now.

      After running for about 1.5 hours the program produced a solution which looks the same as the previous one. So either my modification was not enough to bring the precision up, or that is actually, an implausible looking, solution.

      It’s worth to note that the coefficients in the solution candidate are actually of quite different size. One is much larger than the other, but the others vary quite a bit between each other. A quick check showed that the trace of the matrix is 1.0037. It is not 1 but it is close enough to make me wonder if there is actually something real here.

    • Klas Markström Says:

      There are some really large squares in this solution candidate, with not so small coefficients: {1, 39}, {1, 39}, 1.22271*10^-6

    • gowers Says:

      How easy would it be to run a quick spot check and see, for example, whether the sum at the point (31,32) is zero (which looks, from a casual glance, very unlikely)?

  8. Klas Markström Says:

    I’ve done a quick check on that point and the sum does not seem to be zero. At other points the sum is small enough that it might be zero when a few truncated coefficients have been added.

    This looks more and more like a bug of some kind.

    Alec, can you make a version of the general rectangular case which prints an mps file instead of solving it? For N around 39 the mps file will most likely be a few Gb in size.

    • Klas Markström Says:

      I forgot to use “reply”. Feel free to move this to the thread above.

    • Alec Edgington Says:

      Klas, I think you should be able to do that by replacing the line

      x = solvers.lp(c, G, h, A, b)['x']

      and what follows it in the function ‘optim’ by

      prog = dict(c=c, G=G, h=h, A=A, b=b)
      make_mps(prog, 'file.mps')

      (Solving this will give the solution to the linear program, which would then need to be converted to the L_1-minimization solution, as is done in the last part of the ‘optim’ function.)

  9. Klas Markström Says:

    Alec, I’m making and mps file and some time later today I’ll set up the other solver and see what we get from that.

    Could you make an modified program which reads and converts an mps solution file? It think the chance of me messing up your code when I’m in a hurry is greater than if you make the modificication.

    • Alec Edgington Says:

      Klas, I can do that — but won’t get a chance until this evening. As a quick check: the optimum value produced by the linear program should be equal to the optimum value of c. The solution vector should be of the form x_0, \ldots, x_{N-1}, x_N, \ldots, x_{2N} where either x_i = 0 or x_{N+i} = 0, and the non-zero value in each pair corresponds in absolute value to one of the coefficients in the solution. So it should be clear whether you have a genuinely different solution.

    • Alec Edgington Says:

      Oops, I mean x_0, \ldots, x_{N-1}, x_N, \ldots, x_{2N-1}.

    • Klas Markström Says:

      Now, two hours after starting, the program is still writing the mps file, so it will probably take some time before I can even begin to solve it. At the moment the mps files it at 262 Mb

    • Klas Markström Says:

      Today the program is still writing the mps file. the file is now a bit over 2Gb and I believe that is approximately half the final size.

    • Klas Markström Says:

      I overestimated the final size a bit, the print out has finished now and I have started the solver. I am using the Gurobi solver.

    • Anonymous Says:

      Gurobi solved the problem 330 seconds! Even at a quick glance the solution looks quite different. The only non-zero lines in the solution file are
      47_30420 0.0714286
      47_57837 0.142857
      47_83145 0.0714286
      4_106455 0.0714286
      4_127875 0.0714286
      4_210195 0.0714286
      4_222375 0.0714286
      4_473834 0.142857

      Alec, is that enough information for you to parse in into a standard form solution? If you already have a conversion program I can run it on the solution file.

      By the way, gurobi has a direct python interface so it should be possible to use it instead of cvxopt an avoid the mps files altogether.

    • Klas Markström Says:

      “Anonymous” being me on the wrong computer…..

    • Alec Edgington Says:

      Very fast! Well, the sum of those coefficients is 5/7, so that is the value of c the solution represents. Since this was the optimum also for n=38, the same solution must be optimal for n = 39. I guess the Python solver just couldn’t cope with the size of the problem.

    • Klas Markström Says:

      Yes, I suspect that the cvxopt solver get precision problems when there is a large number of small values involved.

      Gurobi was not only faster but used a lot less RAM as well, so if we could replace the call to cvxopt with a call to gurobi python module we could probably manage some even larger values of N.

      Gurobi’s presolver also removed 609180 rows and 439 columns, so it was able to make use of some of the structure in the set of constraints.

    • Klas Markström Says:

      I have kept this program running, but since writing an mps file takes several days it is very slow going. The optimum value remain 5/7 at least up to N=43

  10. Graham Says:

    I have wandered here from John Baez’s Azimuth blog (which is a bit ironic given the purpose of his blog!). I have read quite a lot of EDP posts and pages at the EDP wiki, but there is much more I haven’t, so I may be repeating something you know. It also seems fair to warn you that my maths is quite rusty and I am not familiar with wordpress.

    Suppose that x_1,x_2,... is a completely multiplicative sequence. I want to show that discrepancy C=3 is ‘very unlikely’. Choose N large enough that the density of primes in A=\{1,2,...N\} and B=\{N+1,N+2,...2N\} is less than .02. The idea is to compare the number of choices that can be made for the primes in A (about 2^{0.02N}) with the chance that a random assigment of 1 or -1 to each N-smooth number in B will result only a small amount of drift.

    If there is positive drift of at least 6 starting with an odd number (x_n+x_{n+1}+...+x_{n+k} \geq 6 for some n and k with n odd), then the discrepancy is more than 3. I consider 4 states (\leq 0, 2, 4, 6) and a Markov chain that approximates the probability that the drift is in one of these states. The state \leq 0, should be interpreted as `resetting the count for drift’ while state 6 is absorbing since we only need to visit it once to get what we want. In B, at least .98 of the numbers are N-smooth, since N-smooth is the same as composite in this range. Here is a picture.


    where S means N-smooth and p means prime. Take them two at a time, starting on odd n.

    SS SS pS SS SS SS pS SS …

    Then the transition matrix for SS is

    0.75 0.25 0.00 0.00

    0.25 0.50 0.25 0.00

    0.00 0.25 0.50 0.25

    0.00 0.00 0.00 1.00

    and for pS assuming all primes in B get -1

    1.0 0.0 0.0 0

    0.5 0.5 0.0 0

    0.0 0.5 0.5 0

    0.0 0.0 0.0 1

    (I use row vectors on the left, states in the order mentioned.) Given the ratio of p’s to S’s, I reckon adding .96 of the first matrix to .04 of the second and raising the result to a large power will give the long term behaviour of the drift. I haven’t proved anything, but it seems the probability of escaping a positive drift of 6 or more will be about .97^{N/2}. So if this multiplied by 2^{0.02N} is much less than 1, which it is, you would not expect there to be a sequence of length 2N and discrepancy 3.

    I don’t think it would be very hard to extend the argument to bigger C, or to make my rough calculations rigorous, unless I’ve made a mistake of course. The big gap is showing that the N-smooth numbers get their signs in a way that is close enough to being random.

    • gowers Says:

      I haven’t had time to read what you’ve written carefully, but let me ask the following question. Would your argument, if successful, also be able to show that a discrepancy of \log N is very unlikely? If so, then it suggests that it will be impossible to make rigorous the probabilistic heuristic you are using, since multiplicative sequences of that discrepancy do exist.

    • Graham Says:

      No, it wouldn’t make a discrepancy of log(n) very unlikely, though I don’t know exactly what it would do. N needs to be around exp(50) for my argument to work when C=3. I roughly and unreliably estimate that N would need to be around exp(300,000,000) when C=101.

    • gowers Says:

      Let me test my understanding of what you are saying by trying to put it in my own words (but with less precision about the details). The basic heuristic you are assuming is something like this. If N is very large, then almost all numbers in [N,2N] are composite. If we now choose values for the primes in [1,N] that means that we determine almost all the values in [N,2N], and in particular we determine the values on long subintervals of numbers in that range (where “long” means “comparable to \log N“, by the prime number theorem). We can’t cancel out much of whatever drift there might be by using our freedom to choose values at the primes, and if one adopts the heuristic that the values at the composite numbers are random, then the probability that the drift is small enough to be cancellable by the choices at the primes is very small. We then do a union bound over all choices of signs at the primes in [1,N] and get the result that the drift should be more than 3 for every choice.

      Assuming the calculations work out, that is indeed quite encouraging, though there are some fairly serious problems that would have to be overcome if one wanted to base a proof on this kind of idea. One that jumps to my mind is that modelling what happens between N and 2N by dividing numbers into prime and composite seems rather unrealistic, since the degree of compositeness plays an important role in the problem. For example, if we are just looking at drifts in the range from N to 2N, then if p is a prime between N/2 and N, then choosing the value at p determines the values at 2p and 3p and at no other points in [N,2N]. So if it so happens that the drifts near 2p and 3p are in the same direction, then we can improve them both.

    • Graham Says:

      Yes, you’ve got the idea, but I’m not sure about “and in particular we determine the values on long subintervals of numbers in that range”. It seems to me it is the overall proportion of primes to composites that is key, rather than the lengths of the gaps between primes (but I could be wrong). Your second paragraph expands nicely on what I meant by “the big gap”.

    • gowers Says:

      Ah, you picked up on the fact that initially I thought it was the lengths of the intervals that mattered, but during the process of writing my summary I looked back at what you had written and came to understand that that was not the case. I should have deleted those words you rightly object to, but some stupid instinct caused me to leave them in …

  11. Graham Says:

    Some background on how I came up with the idea. I am more of a programmer than a mathematician these days, and I was wondering how feasible it would be to write a program to show that C=3 is impossible for a completely multiplicative sequence. Part of that was estimating (for example) how many of the 2^{25} choices for primes less than 100 could be eliminated purely by looking at drift between 100 and 200. It could be interesting to find out if the probabilistic argument gives good estimates in small cases like this.

    • Mark Bennet Says:

      The EDP6 thread to which the following may be the correct link shows that there is a completely multiplicative sequence of discrepancy 3 with length 3545.


      I’m not sure that this is the best that can be done – there may be longer sequences to find.

      On another general note, if there are no infinite sequences of bounded discrepancy, then somewhere there is a block to a sequence of discrepancy 10 (as an example) – and this would also block discrepancy 3. If fact if our estimates are crude we may need (in intuitive language) to find a size 10 blockage to prove that we can block a discrepancy 3 sequence – but there will always be a big enough blockage out there to be found.

    • Graham Says:

      There is an example on the wiki, length over 13,000 with C=3. I was assuming that it was completely multiplicative, not just multiplicative. Is that right?


      Whatever else it may be, my estimate is certainly crude.

    • Mark Bennet Says:

      I’m sure you are right and that multiplicative = completely multiplicative in this context. And I’m not (yet) convinced that this is the longest possible for discrepancy 3. Thanks for linking the example.

    • Alec Edgington Says:

      I’d be very surprised if it were the longest possible, since it was generated from a much smaller sequence by a depth-first search that ran in a couple of seconds with very little backtracking.

    • Mark Bennet Says:

      At the risk of being really obtuse – and because it does not seem immediately obvious … and because the question is inspired by the above …

      Is there an N for which there is an infinite number of N-smooth pairs (r, r+1)?

      Or N-smooth triples (r, r+1, r+2)? … etc

      Or, for fixed N are N-smooth numbers known to become sparse so that every sufficiently large N-smooth number is eventually an isolated point?

      Motivation – it seems to me that one way of expressing why EDP is difficult (and it might be worth gathering a collection of such statements) is that to block a potential sequence of discrepancy r one needs a blockage of size at least linear in r (eg to force a drift of size 2r – maybe quadratic in r, or polynomial, say size R). Such a blockage can contain no freely assignable values (primes).

      Suppose N is chosen to give a reasonable number of blockages of size R which cause problems for discrepancy r – if N grows too fast with r, then the degrees of freedom inherent in N allow all the blockages to be negotiated. And N seems to grow very quickly with r. [The early blockages are the biggest]

      But if there are N large enough to provide an infinite number of N-smooth blocks of size R, then the probability argument ought to go through.

      And if there are no sequences of bounded discrepancy there perhaps ought to be an N large enough to provide sufficient blocks to discrepancy r (given that discrepancy Mr is eventually blocked, however large M happens to be).

      Experimental evidence suggests that even for small r, the numbers grow very quickly.

      This doesn’t seem to say much which isn’t at least implicit in previous posts and insights – but I am trying to get to the root of why this is a difficult problem.

    • gowers Says:

      I can’t remember the precise definition of N-smooth, but I think the answer to your question is basically yes. It is known that almost all numbers of size around n have about \log\log n prime factors. So if I’m not much mistaken that means that for any \epsilon>0 and for large enough k the proportion of numbers with about \log\log k prime factors less than k is at least 1-\epsilon. So one should get infinitely many runs of smooth numbers (unless my grasp of what the definition is is completely wrong) of length \epsilon^{-1}.

    • Alec Edgington Says:

      On the other hand, if by N-smooth you mean having no prime factor greater than N, the answer is no. This is Størmer’s theorem:
      It states that for any finite set P of primes, there are only finitely many consecutive pairs of integers with all their prime factors in P.

      However, what Tim says is perhaps more relevant to EDP. It shows that there are ‘islands of smoothness’ that are arbitrarily big in both ‘length’ and ‘depth’ (where by depth I mean the number of prime factors).

      I think it would be interesting to understand how these islands are distributed, though. If long, deep islands are extremely rare then there might be enough freedom of choice in between to surmount the mass of constraints they impose on low-discrepancy sequences. But if they accumulate quite fast one would expect the opposite.

      One way to get a feeling for this is to revisit the problem we talked about briefly a while ago: whether there are sequences of bounded discrepancy with respect to HAPs having difference that is a prime or a power of 2. It is certainly easier to construct long sequences of discrepancy 2 with respect to these HAPs by depth-first search (I have found one of length 26328). Nevertheless, sooner or later one ‘runs aground’ on these islands of smoothness.

    • gowers Says:

      I’m glad you’ve brought up that problem again, for two reasons. First, I didn’t know you’d investigated it experimentally and am interested to hear that the search does eventually run out. I suppose I’d also be interested to know whether the examples you get still have discernible multiplicative structure. (One might suspect not, since for multiplicative sequences there is no advantage in restricting the set of common differences.) Secondly, it might be interesting to see what difference imposing that extra condition makes on the reformulated question.

    • Alec Edgington Says:

      I’m just regenerating that sequence now and will put it on the wiki when I have. I tried a few variations in the search, such as imposing the condition f(2x) = -f(x), and ‘seeding’ the search in the interval [2^k+1, 2^{k+1}] with minus the already-generated values in [1, 2^k] (in imitation of the Thue–Morse construction). But the longest sequence was obtained by the most naive method, with uniform seeds of +1 and no extra constraints. I haven’t done any analysis on it, however, and it is possible that that will reveal interesting structure, multiplicative or otherwise.

      An idea that’s just occurred to me is to explore is the seeming ‘tension’ between having low discrepancy on powers of 2 and having low discrepancy on (for example) 3: the Thue–Morse sequence has minimal discrepancy (one) on power-of-2 HAPs, but very high discrepancy on the 3-HAP (I vaguely recall that it’s linear, but I could be wrong about that). Is this merely a curiosity, or is there a deeper reason for it that could suggest a proof of EDP?

    • Alec Edgington Says:

      Here is the sequence:

    • gowers Says:

      I’m curious to know how much backtracking the program had to do. That is, do you get the impression that it was genuinely stuck here or do you think that the true maximum is much much longer? You talked about running aground on islands of smoothness, which interested me a lot: are you deducing that from the fact that the program didn’t just easily produce more and more terms without ever having to backtrack, or do you have some evidence that these islands really are stopping it from continuing for ever?

    • Alec Edgington Says:

      I don’t know exactly how much backtracking was done — I can find out by running it again with a check for this — but I guess the maximum amount was of the order of 100 rather than 1000 or 10000. The program took about an hour to get this far, and didn’t (as I recall) get any further when left to run for another day. Running it with different seeds generally causes it to stick at different places (usually over 10000). So what I mean by ‘running aground’ is not that it reaches an insurmountable limit, but that decisions made further back mean that the search has, for practical purposes, ‘lost its way’.

      The particular ‘island’ that this solution got stuck at is the region between the successive primes 26321 and 26339, which includes multiples of all primes up to 29 (as well as a multiple of 32). However, one observation, which from limited evidence appears to be characteristic of the places where the program gets stuck, is that the number immediately following (in this case 26329) is not a multiple of any small prime or power of 2 (its factors are 113 and 233).

      Come to think of it, that suggests a simple optimization. When we get to a number of the form p(q-1) where p < q, we should check the sum of the q-HAP up to q(p-1); if it is +2 then we know that the value at pq will have to be $-1$, so we have to make sure that the sum of the p-HAP up to this point is not -2 — and similarly with signs reversed. (A similar optimization would apply to the original EDP.) I’ll try that next.

    • Mark Bennet Says:

      I think I was asking both questions – what I was really interested in is whether the ‘islands of smoothness’ could be sufficiently related to one another that it was possible to work with more than one at the same time.

      The intuition being that if there are large islands of smoothness out there, there may be a number of related smaller islands (eg even numbers in large island divided by 2 etc) which place constraints on the sequence when it hits the large island. [Hence the thought that one might find a way of dealing with the same set of primes, since these are controlled by the large island]

      I was thinking especially of completely multiplicative sequences.

    • Alec Edgington Says:

      That optimization really helped. The program got as far as 36578 in the space of a few minutes, then stuck. (I’ll leave it going overnight to see if it gets any further.)

    • Alec Edgington Says:

      Overnight (in fact, in about seven-and-a-half hours) the program got to 60000, and then stopped because that was the limit I’d set for the search. I’ve just started it running again with a limit of 120000.

    • gowers Says:

      I wonder at what point one should start to suspect that this strengthening of EDP is false? The trouble is, almost anything will be consistent with a doubly exponential upper bound with a modest constant. Perhaps it depends on how easy the program is finding it to continue. If after your refinement of the purely greedy algorithm it is basically having no trouble at all, then perhaps it would be worth trying to prove that the algorithm (or some further refinement of it) doesn’t halt. I think that would be a pretty interesting result if we could prove it.

      How might a proof go? I think we would want to define some notion of degrees of freedom and argue that as long as we have enough degrees of freedom we can extend the sequence in such a way that we still have enough degrees of freedom. But to do that we would have to think carefully about islands of smoothness, to be sure that they don’t mess things up. If they do mess things up, then that is of course very good news, so this is a win-win situation: this is my secret motivation for trying to find a counterexample.

    • Alec Edgington Says:

      I had to interrupt the program, but it’s running again now and has reached 78384.

      The optimization for avoiding getting stuck at pq can, I think, be generalized. For example, one should ensure that the triple (sum of r-HAP at (pq-1)r, sum of p-HAP at (qr-1)p, sum of q-HAP at (rp-1)q) does not contain \{\pm 2\} as a subset. Since the current search seems to be sticking at multiples of three primes, I’m tempted to add a further test for this condition — or indeed a more general one — and see if it makes a significant difference …

    • Alec Edgington Says:

      On second thoughts, I think a more useful generalization would be the following: for all p < q and k \geq 1, ensure that \{ \textrm{sum of }p\textrm{-HAP at }p(kq-1), \textrm{sum of }q\textrm{-HAP at }q(kp-1)\}  \neq \{\pm 2\}.

      It isn’t (so far) products of three large primes that are blocking, but small multiples of products of two large primes.

      I’ll try to implement that this evening. The search is still stuck at 78384 = 5 \times 61 \times 257 - 1, which is an example of such a blockage. My hope is that we may be able to push depth-first searches for HAP discrepancy problems quite a bit further than we have done by means of this kind of check, at least when the range of allowed sums is small (so that its extrema are attained quite frequently).

    • Alec Edgington Says:

      I’m now running the primes-and-powers-of-2 search with the new optimization. It needed to have a 10-minute ‘think’ at 78324 (where it anticipated the problem looming at 78385, the next term in the 61-HAP), but happily it did find a way round that, and has now reached 94898.

  12. Klas Markström Says:

    I have a general question regarding the decomposition method. Given a decomposition with a given N and C, what can we say about the length of the longest sequence with a given discrepancy?

  13. gowers Says:

    I have a long plane journey later today. Amongst the questions I hope to get round to thinking about (I have far more of these than I can realistically hope to make progress on, but one can fantasize …) are the following.

    1. Klas’s question above: if the new (or rather, revived) decomposition approach we have been trying makes progress in the sense that we can push down c, what does that information tell us about the original problem? I hope that can be sorted out fairly easily.

    2. How might one go about proving that EDP restricted to common differences that are primes or powers of 2 is false?

    3. In the other direction, can we formulate two incompatible conditions, one that must be satisfied by all sequences that have bounded discrepancy for prime common differences and one that must be satisfied by all sequences that have bounded discrepancy for power-of-two common differences?

    4. Is there a decomposition approach similar to the one we have been trying that would be appropriate for prime/powers-of-two common differences?

    • Mark Bennet Says:

      I’ve been thinking a little about those places where the sequence sums to zero (on HAP with difference 1, for example). These points must come at even values. Can we say anything about the way in which the zero values are distributed? How dense they are? Are there sequences of arbitrary length (in an infinite sequence of bounded discrepancy on some set of HAPs) where the sum of the HAP remains positive (or negative) [think random walk]?

      It just struck me that parity played a large part in analysing discrepancy 2 sequences. What is the strongest parity-type tool for other cases. Bringing in the powers-of-2 suggests parity may be significant.

    • Mark Bennet Says:

      If there is an infinite sequence of bounded discrepancy (on HAPs or some sub-selection of HAPs – this to be understood throughout this comment), then there is a smallest discrepancy d for which such a sequence exists.

      Then there is a bound N on the length of any sequence of discrepancy at most (d-1).

      Now examine the structure of an infinite sequence of discrepancy d. Suppose there are arbitrary sequences m-1 < p < m+n of length n on which the total of the HAP with difference 1 is non-zero (so either greater than zero throughout, or less than zero throughout) …

      Then there is such a sequence of length kN! for arbitrary k, and starting at a multiple of N! gives a sequence in which the discrepancy on the HAP of difference 1 is no greater than (d-1). If it could be shown that there were constraints on the other HAPs with differences up to N/(d-1) there would be a model of sequence starting at 0 (N! or some multiple takes the role of 0, large primes can only add additional constraints), longer than N of discrepancy at most (d-1). So if that could be shown, there would be a bound on the distance between returns to zero on the HAP with difference 1.

      Once these are bounded (if indeed they can be) the sequence breaks up into finite segments between returns to zero. It may be that the assumption that returns to zero are bounded in length leads to a contradiction.

      Or maybe not. Sometimes it seems that either of these ought to be easy, and then difficulties come to mind.

  14. Alec Edgington Says:

    Another optimization idea, though this one only applies to HAP discrepancy problems with respect to difference sets D \subseteq \mathbb{N} such that we can find a big set E \subseteq \mathbb{N} such that E \cdot D \subseteq D. Thus, it would apply to the original EDP (with E = \mathbb{N}^+), but not to the primes-and-powers-of-2 problem.

    Suppose we are doing a depth-first search for discrepancy C using some ‘seed’ sequence \sigma with \sigma_1 = 1 (typically the all-ones sequence, but it could be any). The seed sequence defines a lexicographic ordering on sequences, where x \leq y if and only if x_n = \sigma_n for the smallest n at which x_n \neq y_n (so \sigma is the minimal element in this ordering). I’m considering all these as infinite sequences.

    Our search state is defined by an infinite sequence x (with x_1 = \sigma_1) and an index n representing our position. In this state, we know that any sequence y of discrepancy C satisfies \lvert y \rvert \geq x, where \lvert y \rvert = y_1 y is normalized to have first term 1. Therefore (because E \cdot D \subseteq D), we know that for every k \in E, \lvert T_k y \rvert \geq x, where T_k is the dilation operator (T_k(y)_m = y_{km}). And this is something we can test for cheaply. For each k \in E such that k \mid n, we check that the k-HAP up to n is, when normalized, lexicographically greater than or equal to the first n/k terms of x.

    For example, with EDP, if our search has worked out that we must have x_2 = -1 — perhaps we had to look ahead all the way to 1000 to discover this — then we know rightaway that x_{2n} = -x_n for all n, without doing any look-ahead. That is an extreme example, but I’d be interested to see what we gain in practice from this test. The main application, I imagine, would be to prove that no infinite sequence with a given discrepancy exists — rather than to find long finite examples.

    • gowers Says:

      That’s an interesting line of thought and I plan to ponder it. It also may tie in with a question that occurred to me on my plane journey yesterday, which is this.

      Adrian Mathias proved not only the easy result that the best bound for EDP is at least 2, but the much stronger result that if no HAP ever has a sum that goes above 1, then the sums in the negative direction must be unbounded. Now if I understand correctly, we now know that a sequence of length 2000 (or thereabouts) must have discrepancy at least 3. Can we get from that the stronger result that any sequence such that all HAP sums are at most 2 must be unbounded on the negative side?

      It is certainly true if we have any kind of relationship such as x_{2n}=-x_n whenever the sums are bounded, since then we can argue as follows. If the sums are bounded above by 2 and below by -C, then the sums along HAPs with even common differences are bounded below by -2 and above by C. But since they are also bounded above by 2 we get a contradiction.

      Since this would be a rather nice (and clearly publishable) result, and since we already have results that suggest that EDP counterexamples must have multiplicative structure (or at the very least imply the existence of other counterexamples with multiplicative structure, which might perhaps be induced to have one-sided boundedness too), it looks to me like a good thing to think about.

    • Mark Bennet Says:

      The multiplicative structure does come out here – and essentially this seems to that if there is a bounded infinite sequence (for any reasonable discrepancy problem) where the first value is 1, then if the value at some prime p is provably -1, the sequence is completely multiplicative with respect to that prime.

      Given a long sequence, it might be worth testing whether the +1 values at small primes were forced (ie if replaced by -1 does the sequence terminate quickly). This would create additional constraints.

      How far are we from showing that if there is an infinite sequence having some discrepancy constraint, there is a completely multiplicative infinite sequence with the same constraint (or of identifying a class of constraints for which this is true)?

    • Alec Edgington Says:

      Indeed, if we can show that f(k) = \epsilon for all discrepancy-C sequences f satisfying f(1) = 1 (for some k and \epsilon = \pm 1), then we know that f(km) = \epsilon f(m) for all such sequences.

      Conversely, if we can eliminate the case f(km) = \epsilon f(m), then we know that f(k) is not forced to be \epsilon. Therefore, for the purpose of proving EDP we can assume, without loss of generality, that f(k) = -\epsilon.

      I see from the wiki page
      that we have eliminated the case f(2m) = f(m), for C=2. If I’m not mistaken this means that to prove EDP for C=2 we can assume without loss of generality that f(1) = 1, f(2) = -1.

      There are a few other results on that page that imply alternative hypotheses. For example, the fact that we’ve eliminated the case f(2m) = -f(m), f(3m) = -f(m) means that we could alternatively assume w.l.o.g. that f(1) = 1, f(2) = -f(3).

      This sort of observation might help us to find a shorter (albeit still computer-assisted) proof that the constant must be at least 3.

  15. dominiczypen Says:

    Just a question coming from a bit of a different angle. Let a \in \{-1,1\}^\mathbb{N}. For a positive integer C we say that a has a C-run if there is n_0 \in \mathbb{N} such that for all 1 \leq j \leq C we have a_{n_0} = a_{n_0+j}.

    Does there exist a \in \{-1,1\}^\mathbb{N} and a positive integer C such that for no positive integer d the sequence (a_{d\cdot n})_{n\in \mathbb{N}} has a C-run?

    • Alec Edgington Says:

      If I understand the question, I think the Walters example works for C = 2. This is the completely multiplicative function defined by a(p) = 1 if p \equiv 1 (\mod 3), a(p) = -1 if p \equiv -1 (\mod 3), and a(3) = 1. Because it’s completely multiplicative it suffices to check the condition for d=1. But this follows because it’s made up of blocks of the form 1, -1, 1 and 1, -1, -1.

  16. Dominic van der Zypen Says:

    Oh – that’s correct, thanks!

  17. Uwe Stroinski Says:

    Fix an odd prime p and define for prime q a completely multiplicative function
    \mu_p(q)=\left\{ {(\frac{q}{p}) \textnormal{ if } q\not= p} \atop -1 {\textnormal{ if } q=p}\right.
    with (\frac{q}{p}) being the Legendre symbol.

    Then, with this notation, \mu_3 is our record-holder for slow growing discrepancy. Since its growth is logarithmic any proof of optimality, e.g. for completely multiplicative functions, would naturally settle EDP in the respective setting. I cannot handle such a general situation and have therefore considered the simpler problem:

    Is \mu_3 optimal within the \mu_p‘s?

    The idea is as follows: By a recurrence relation one can show that the growth of the \mu_p‘s is solely determined by properties of the Legendre symbol, especially by the difference of the maximal and the minimal value for arguments 0\leq i\leq p-1.

    There are discrepancy results for Legendre symbols, usually reported in terms of the existence of consecutive quadratic residues. (Obviously, three in a row imply discrepancy greater equal two.) Let me just sketch a proof of such a result. Define T_p(t):=\sum_{i=1}^{p-1}(\frac{i(i^2-1)}{p}) and Q_p(3) as the number of triples of quadratic residues in 1,…,p-1. Now consider \frac{1}{8}\left(1+(\frac{i-1}{p})\right)\left(1+(\frac{i}{p})\right)\left(1+(\frac{i+1}{p})\right) which is 1 if i-1, i and i+1 are quadratic residues modulo p and 0 otherwise. Summing \frac{1}{8}\sum_{i=2}^{p-2}\left(1+(\frac{i-1}{p})\right)\left(1+(\frac{i}{p})\right)\left(1+(\frac{i+1}{p})\right) yields Q_p(3) and quadratic reciprocity eventually implies Q_p(3)= \frac{1}{8}(p+T_p(1)-11-4(-1)^{(p-1)/4}) if p= 1 (\textnormal{mod }4). (The other case is trivial for our purposes and omitted.) The tricky part is now to estimate T_p(1). This is done by observing that the diophantine equation x^2+y^2=4p (p prime) has solutions x=T_p(1) and y=T_p(t) for some t non-residue modulo p. (T_p is constant on residues and non-residues respectively). Putting the pieces together we get |T_p(1)|\leq 2\sqrt{p} and plugging this into our representation of Q_p(3) yields a lower bound on primes such that any Legendre symbol for larger primes contains at least one triple of consecutive quadratic residues and thus has discrepancy at least 2.

    That should already be enough to prove optimality of \mu_3 among the \mu_p‘s. However, what excites me the most is that the idea scales for higher discrepancies. Currently I consider the fifth degree polynomial to get discrepancies larger than 2. Unfortunately the above ‘quadratic reciprocity eventually implies’-part of the proof requires a lot of work and I am stuck. In case all this is considered ‘on-topic’ I would, of course, elaborate things further.

    • gowers Says:

      That all sounds interesting. How does it relate to what Kevin O’Bryant was thinking about some way back? It sounds like a similar direction at least. One other immediate reaction is that I don’t understand what you mean by “the idea scales for higher discrepancies”. Can you elaborate on that?

    • obryant Says:

      Roth’s theorem also likely suffices to prove optimality of \mu_3 among \mu_p, as follows. Fix C>0. If p is sufficiently large, then Roth’s theorem guarantees an AP (modulo p, and not hitting 0 mod p) with $\mu_p$ having discrepancy at least 2C-1. By multiplicativity, the homogeneous AP with difference 1 then must have a drift of 2C-1, whence the discrepancy is at least C. We can get by even without using multiplicativity since every AP modulo p is a drift of a homogeneous AP.

      I haven’t checked the numbers, but I think our proof of Roth’s result is sufficiently explicit that this bounds p enough that brute force can handle the rest. Back in the spring, I did computations showing that there is no Matryoshka sequence that beats \mu_3 with small modulus (if I recall correctly, that means any modulus smaller than 81).

    • Uwe Stroinski Says:

      Interesting point. Is there some kind of duality argument lurking around? In the proof of Roth’s theorem we use Fourier expansion of the AP’s and get a discrepancy result for general sequences, whereas in the above one uses Fourier expansion of primitive Dirichlet characters to get a discrepancy result (with respect to HAP’s). It seems as if one could trade symmetry in the progressions for symmetry in the functions and vice versa.

  18. Uwe Stroinski Says:

    Kevin (in edp13) had a more general approach and was searching a counterexample among arbitrary (i.e. not necessarily prime) moduli. He discarded prime moduli for (please correct me if I am wrong) heuristic reasons and concentrated on composite numbers. I am indeed heading in a similar direction. The slight twist is as follows:

    We know that mu_3[.] attains its “maxima” at (3^(2m)-1)/8. For mu_p[.] we have similar descriptions. The computations are elementary albeit lengthy and I put them in an upcoming comment. Since the descriptions are explict in terms of properties of the Legendre symbol we can use known discrepancy results to give lower bounds for the discrepancy of mu_p.

    To do this, I first thought we have to generalize some idea of Gauss (this was in my first comment) however there seems to be an easier way, a classical result of Schur on the discrepancy of primitive Dirichlet characters. Btw in case there is some open source version of Schur’s result and its proof I would be grateful for a link.

    In any case, I proceed to post the ‘easy’ stuff on mu_p.

    • Uwe Stroinski Says:

      As a reminder, let me first show how to get the ‘maxima’ of \mu_3[\cdot ] . The generalization to \mu_p[\cdot ] is straight forward albeit some work and done in another comment.

      Let n=\sum_{i=0}^l d_i p^i with 0<l and 0<=d_i<p be the p-ary representation of some number n. The following can be established by induction on l: \mu_p[n]=\sum_{i=0}^{l} (-1)^i\mu_p\left[d_i\right]. Observe that
      \overbrace{22\ldots22_3}^{3^{2m}-1}:\overbrace{22_3}^{8}=\overbrace{01\ldots01_3}^{m \textnormal{ times}}. Thus \mu_3[\frac{3^{2m}-1}{8}]=m.

      Furthermore, since \mu_3[0]=\mu_3[2]=0 and \mu_3[1]=1 we have for m\in\mathbb{N} that \max_{i\leq 3^{2m}-1}|\mu[i]|\leq m. Thus \mu_3[i]\leq\log_9(8i+1) and equality holding for i=\frac{3^{2m}-1}{8}.

    • Uwe Stroinski Says:

      Define the maximum M_p:=\max\{\mu_p[i]:0\leq i< p\}, the minimum m_p:=\min\{\mu_p[i]:0\leq i<p\}, the first appearance of a maximum D_p=\min\{i:\mu_p[i]=M_p,0\leq i<p\} and the first appearance of a minimum d_p=\min\{i:\mu_p[i]=m_p,0\leq i<p\}. Obviously m_p\leq0<1\leq M_p. Since \overbrace{dD\ldots dD_p}^{m \textnormal{ times}} equals (D_p+d_p p)\cdot\overbrace{p-1 p-1\ldots p-1 p-1_p}^{p^{2m}-1}:\overbrace{p-1 p-1_p}^{p^2-1} we have \mu_p[(D_p+d_p p)\frac{p^{2m}-1}{p^2-1}]= (M_p-m_p) m.

      For i:=(D_p+d_p p)\frac{p^{2m}-1}{p^2-1} we obtain \mu_p[i]=(M_p-m_p)\log_{p^2}\left(\frac{p^2-1}{D_p+d_p p}i+1\right)=:K_1(i) as possible maximal values.

      To get the other possible maximal values we observe that D\overbrace{dD\ldots dD_p}^{m \textnormal{ times}} equals (D_p+d_p p)\cdot\overbrace{p-1 p-1\ldots p-1 p-1_p}^{p^{2m}-1}:\overbrace{p-1 p-1_p}^{p^2-1}+D_p p^{2m} and have \mu_p\left[(D_p+d_p p)\frac{p^{2m}-1}{p^2-1}+D_p p^{2m}\right]= (M_p-m_p) m+M_p.

      Therefore \mu_p[i]=(M_p-m_p)\log_{p^2}\left(\frac{p^2-1}{p D_p + d_p}i+\frac{D_p+pd_p}{p D_p+ d_p}\right)+\frac{M_p+m_p}{2}=:K_2(i).

      Combining the results yields the upper bound \mu_p[i]\leq \max\{K_1(i),K_2(i)\}.

      To get lower bounds we consider Dd\ldots Dd respectively dDd\ldots Dd and obtain \mu_p[i]=(m_p-M_p)\log_{p^2}\left(\frac{p^2-1}{d_p+D_p p}i+1\right)=:L_1(i)
      respectively \mu_p[i]=(m_p-M_p)\log_{p^2}\left(\frac{p^2-1}{p d_p+ D_p}i+\frac{d_p+p D_p}{p d_p+ D_p}\right)-\frac{M_p+m_p}{2}=:L_2(i). Hence \mu_p[i]\geq\min\{L_1(i),L_2(i)\}.

      In summary we have |\mu_p[i]|\leq \max\{K_1(i),K_2(i),|L_1(i)|,|L_2(i)|\}. For each prime p equality holds with one of the four options for infinitely many i.

  19. Alec Edgington Says:

    I wanted to revisit the notion of ‘quasi-multiplicative’ functions that we were thinking about back in January (here, for example).

    The idea was that long low-discrepancy sequences might be obtained by composing a multiplicative function \mathbb{N}^+ \to G (for some abelian group G) with a (not necessarily multiplicative) function G \to \{\pm 1\}. In particular, the group G = \mathbb{Z}_6 looked like a promising candidate. Later when I analysed the 64 sequences of length 1120 found by Klas, I noticed that they tended to match (up to a change of sign) a particular such function on the 5-smooth numbers. So it seemed interesting to look for long discrepancy-2 sequences of this form.

    The particular examples that arose were all of the form

    f(n) = \pm \sigma(n) \delta_i(\tau(n))

    where \sigma : \mathbb{N}^+ \to \{\pm 1\} is multiplicative, \tau : \mathbb{N}^+ \to \mathbb{Z}_3 is additive (\tau(mn) = \tau(m) + \tau(n)), i \in \mathbb{Z}_3, and \delta_i(j) is defined to be +1 if i=j and -1 if i \neq j.

    From the point of view of HAP-discrepancy, the choice of i \in \mathbb{Z}_3 is arbitrary: f has discrepancy bounded by C on all HAPs if and only if the three functions

    f(n) = \sigma(n) \delta_i(\tau(n)) \quad (i \in \mathbb{Z}_3)

    all have absolute partial sums bounded by C.

    I wrote a program to search for the \sigma and \tau that maximized the length over which the absolute partial sums of these three sequences are all bounded by 2.

    Despite the prominence of sequences like this among long discrepancy-2 sequences, it turns out that, for discrepancy 2, they cannot beat a multiplicative sequence (that is, \tau(n) = 0 for all n). So one can’t get a sequence longer than 246 in this way.

    • Mark Bennet Says:

      This is interesting.

      I have been wondering whether it is possible to prove that if there is an infinite sequence of bounded discrepancy then either G is infinite, or the sequence is completely multiplicative.

      The idea is that each of an infinite number of primes has to be mapped to a member of the group. Take a prime p and look at other primes q – can a finite group consistently control the values at all the pq?

      For G cyclic order 2 (the completely multiplicative case) it may be possible, or proof may be difficult.

      Where the order of G is divisible by an odd prime – can this be used to show a contradiction (work with p which maps to an element of g having odd order, can we keep the discrepancy under control? – seems to link a bit with Tim’s questions above and the relationship between the prime 2 and the odd primes)

      And if G has order a power of 2 – what can be said then?

  20. Christian Says:

    Just a small and not too related remark about dualizing twice, a topic mentioned in Tim’s initial post: Years ago Seymour and Kahn proved a fractional version of the Erdös-Faber-Lovasz conjecture (or rather: of an equivalent statement about colouring certain hypergraphs), and in their proof they do in fact dualize twice (in a linear programming sense), see:

    “A fractional version of the Erdős-Faber-Lovász conjecture”, Combinatorica 12 (2): 155–160.

    I thought for a while about why this could possibly help them, and my conclusion was that it made their argument just much easier to discover. As expected, it can be rewritten in a way that does not dualize at all, and without giving details I would like to indicate a way to do this: First one restricts ones attention to the case where “all inequalities one has” are in fact equalities, and for that case one runs the (of course still very clever) argument they perform after having dualized twice. So now we are in the case where some of the inequailities floating around are strict, and the subhypergraph of our original hypergraph containg those edges, that correspond to the “sharp” inequalities, is strictly smaller, so we may apply induction to it, think a few minutes, and then finish easily. Thus the modified strategy we get is just this: Make some extra assumptions, solve your problem assuming them, and then get rid of your assumptions using induction.

    At the moment, I cannnot see how this could help for EDP, but maybe someone else has an idea.

    • gowers Says:

      Thanks for this interesting comment. We should perhaps also think about whether there are enough people with enough enthusiasm for another serious discussion of this problem. I’d be quite interested myself.

    • Alec Edgington Says:

      I’m still interested in the problem, and would endeavour to join in a new discussion. I haven’t given it any thought lately, having come to the conclusion that it was just too hard! But perhaps the time is ripe to have another go.

  21. Uwe Stroinski Says:

    Recently I have spent some time to better understand Terence Tao’s comment https://gowers.wordpress.com/2010/02/02/edp5-another-very-brief-summary/#comment-5836 and I guess that counts as still being interested in the problem.

  22. Dominic van der Zypen Says:

    Is there a \{-1,1\}-sequence x_j with a weaker “discrepancy property” than Erdoes’ problem looks for, namely:

    for all d we have $\lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^n x_{id} = 0$?

  23. Dominic van der Zypen Says:

    Is there a \{-1,1\}-sequence x_j with a weaker “discrepancy property” than Erdoes’ problem looks for, namely:

    for all d we have \lim_{n\to\infty}\frac{1}{n}\sum_{i=1}^n x_{id} = 0?

    • gowers Says:

      Yes, the good multiplicative examples (e.g. define f(n) to be 1 or -1 according to whether the final non-zero digit of n mod 3 is 1 or 2) have partial sums that grow logarithmically.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: