Erdős’s discrepancy problem

I’ve been pretty busy over the last month or so — hence my blog silence — and continue to be busy. But here is one more discussion of a problem that was on my earlier list of possible future Polymath projects. The problem in question is sometimes known as Erdős’s discrepancy problem. This will be a shortish post, because I don’t have much of a programme for solving the problem: if it were adopted, then we would be starting almost from scratch, but that could be an interesting experiment.

Here is the question.

Problem. Is it possible to find a \pm 1-valued sequence x_1,x_2,\dots and a constant C such that |\sum_{i=1}^nx_{id}|\leq C for every n and every d?

With the help of a little terminology we can ask this question in a slightly different way. If \mathcal{A} is a collection of subsets of a set X and f:X\rightarrow\{-1,1\}, define the discrepancy of f to be the maximum value of |\sum_{a\in A}f(a)| over all A\in\mathcal{A}. In our case, \mathcal{A} is the collection of all arithmetic progressions of the special form \{d,2d,3d,\dots,nd\}, and the question is whether there is any function that has bounded discrepancy for this \mathcal{A}. I say “bounded” rather than “finite” because one can define the function \delta(N,f) to be the discrepancy of f with respect to all those sets in \mathcal{A} that are subsets of \{1,2,\dots,N\}. Then the question is equivalent to asking whether there is any function f for which \delta(N,f) is a bounded function of N. The advantage of this formulation is that one can consider more precise questions such as what the best growth rate is. In particular, it allows one to prove results in the other direction. Here is one such result, observed by Mark Walters (and very possibly by several other people).

Theorem. Let x_i=1 if the last non-zero digit of i is 1 when i is written in its ternary representation, and let x_i=-1 if the last non-zero digit is a 2. Then the growth rate of the discrepancy function is logarithmic in N.

Proof. Note that x_{ij}=x_ix_j for any i and j, since the map 1\mapsto 1, 2\mapsto -1 is an isomorphism from the multiplicative group of non-zero integers mod 3 to the group \{1,-1\} under multiplication and last non-zero digits multiply together mod whatever base you’re working in. It follows that the theorem is true if and only if the partial sums \sum_{i=1}^nx_i grow logarithmically.

To see that this is the case, partition the integers from 1 to N into subsets according to which digit is the last non-zero digit. As you go through the elements i of any one of these sets in increasing order, the values x_i alternate between 1 and -1. Therefore, their sum is either 0 or 1. There are at most \log_3N such sets (that are non-empty), so the partial sum in question is at most \log_3N. QED

This example rules out any approach that attempts to prove that the arithmetic progressions \{d,2d,3d,\dots,nd\} are “approximately orthogonal” and to deduce the result from some kind of coordinate expansion, since such results would give bounds more like C\sqrt{N}.

The thought that goes into it also shows that we cannot hope to prove a more general conjecture that replaces the condition that x_i=\pm 1 by a looser condition that asks merely for x_i to be bounded away from 0 on average. Indeed, the sequence 1,-1,0,1,-1,0,1,-1,0,\dots has discrepancy at most 1 on all arithmetic progressions of the given form. However, I do not at this stage know of any counterexample to the following generalization that is designed to avoid this problem.

Problem. Let f:\mathbb{N}\rightarrow\mathbb{R}, let c>0 and suppose that for every d the lim inf of the partial averages n^{-1}\sum_{i=1}^n|f(id)| is greater than c. Can there be a constant C such that |\sum_{i=1}^nf(id)|\leq C for every n and d?

One might want to make f map to [-1,1] to rule out boring counterexamples. The motivation for generalizing the problem in this way would be to try to make it more amenable to analytic techniques, but Walters’s example will always be a serious obstacle.

If I think of more to say about this problem, then I’ll add it. But since this post is quite short, and since this is the last problem I wish to describe in detail for now, let me also discuss my current thoughts about the next Polymath project for this blog.

It has been interesting to find how much my perception of a problem changes when I blog about it. For example, the effort to describe my tentative approach to the P versus NP problem led to my understanding that it couldn’t work, and I now do not regard that as a suitable project (though I would very much like to tackle a more realistic complexity question at some point in the future). I have found the discussion about modelling mathematically the origin of life extremely interesting: I think we are still some way from formulating a problem that is clearly good, so I would still like to wait a few months at least before embarking on a project on that general theme. (However, it might well be that formulating a good problem would be a necessary part of any such project — in that respect it would differ in an interesting way from projects where the aim is to prove a clearly stated conjecture.) I found myself regrabbed by Littlewood’s conjecture as soon as I posted about that, with the result that I now know the answers to many of the problems that I thought (rightly, as it turns out) would be significantly easier than the main problem. The result is that the drawback with that project, that it is a notoriously hard problem, is shown in rather sharper relief. If I felt that that was clearly the most popular choice, then I would certainly be happy to go along with it, but I am not sure it is the most suitable.

The least risky choices, in the sense that they are the most similar in nature to DHJ, would be the polynomial-DHJ-related problem about graph differences or Erdős’s discrepancy problem.

I welcome any thoughts that people might have about this. Once there has been some discussion, I will put up a poll, though I must warn in advance that the process will not be fully democratic, since I will attach more than one vote’s worth of weight to my own enthusiasm and to my perception of whether there is likely to be a nucleus of people who are ready to think seriously about the given problem. One reason I am drawn to the polynomial-DHJ project is that I already know of people who are very interested in that and ready to think about it. If you are keen on one of the other problems in a similar way, then feel free to let me know. If you don’t want to declare yourself publicly on this blog, then you could always drop me an email.

An additional thought about the problem.

I forgot to mention a variant of the problem that occurred to me, which is to look at the complex case instead. That is, one could ask the following.

Problem. Let z_1,z_2,z_3,\dots be a sequence of complex numbers, each of modulus 1. Is it necessarily true that for every constant C>0 there exist d and n such that |z_d+z_{2d}+\dots+z_{nd}|\geq C?

And once one has asked that, one sees straight away that it is not really a complex version of the problem so much as an \mathbb{R}^2 version. So here is a more general variant.

Problem. Let k be a positive integer and let u_1,u_2,u_3,\dots be a sequence of unit vectors in \mathbb{R}^k. Is it necessarily true that for every C>0 there exist d and n such that \|u_d+u_{2d}+\dots+u_{nd}\|\geq C?

I feel it should be easier than the original problem. Obviously, the larger k is, the easier it is to find a sequence that works in \mathbb{R}^k. But if it is not possible to find such a sequence in any dimension, then I think that trying to prove it in a higher dimension ought to be easier because it forces you not to use special and ultimately irrelevant properties of \pm 1 sequences.

It suggests yet another problem, this one a generalization of the original problem but a specialization of the problem in \mathbb{R}^k.

Problem. Let K_1\cup\dots\cup K_r be a partition of \mathbb{N}, let x_1,x_2,x_3,\dots be a \pm 1 sequence, and let C>0. Must there exist some j\in\{1,2,\dots,r\} and positive integers d and n such that |\sum_{i\in P_{d,n}\cap K_j}x_i|\geq C, where P_{d,n}=\{d,2d,\dots,nd\}?

If the answer to this problem is no, then the answer to the previous problem is no as well, since one can take an orthonormal sequence e_1,\dots,e_r and define u_m to be x_me_j, where K_j is the colour class that contains m.

However, I hope that the answer is yes, as is suggested by the Walters example. There, as I commented earlier, you can define a sequence 1,-1,0,1,-1,0,1,-1,0,\dots that has bounded discrepancy, but if you want to make it a \pm 1 sequence then you are forced to assign \pm 1 values to multiples of 3 as well. In fact, that suggests a slight variant of the previous problem that is more explicitly Ramsey-like.

Problem. Let K_1\cup\dots\cup K_r be a partition of \mathbb{N}. Does there exist j\in\{1,2,\dots,r\} such that for every C>0 and every \pm 1 sequence x_1,x_2,x_3,\dots there exist d and n such that |\sum_{i\in P_{d,n}\cap K_j}x_i|\geq C?

The difference between this problem and the previous one is that j is not allowed to depend on the sequence (x_i). It may be that the two problems are equivalent: I haven’t checked.

Another way to make the problem easier by making it more difficult, so to speak, would be to try to prove that there is no infinite sequence with bounded discrepancy even if one restricts to arithmetic progressions with common difference d that is not divisible by any of 2,3,5,7,…,101 or something like that. Then it would be an easy matter to create extremely long finite sequences that did have bounded discrepancy, so in trying to prove that there couldn’t be an infinite sequence, one would not be distracted by small-number difficulties and would be forced to come up with more global arguments. Let me state this formally as yet another problem.

Problem. Let M be a positive integer, let C>0, and let x_1,x_2,x_3,\dots be a \pm 1 sequence. Must there exist d and n such that d is not divisible by any prime p\leq M and such that |x_d+x_{2d}+\dots+x_{nd}|\geq C?

I’m not quite sure whether this is the most interesting problem that results from restricting the possible d. Other possibilities are just to insist that d\geq M or to insist that d is divisible by a prime that is at least M.

About these ads

153 Responses to “Erdős’s discrepancy problem”

  1. Greg Martin Says:

    A Hungarian-umlaut “o” can be generated from Unicode: concatenate the symbols
    & # 3 3 7 ;
    and the browser should interpret it correctly. I’ll try here:
    Erdős

    More information than you want is at http://unicode.org/charts/ – this particular datum is near the bottom of page 14 at the “Latin Extended-A” link (and 337 is the decimal equivalent of the hexadecimal 0151 listed there).

  2. ioannis parissis Says:

    This is a very naive question. But: In the original formulation, the set of vertices is the infinite set \mathbb N and the edges are the arithmetic progressions \{d,2d,\ldots,nd\} for d\in \mathbb N. Then in the reformulation your set of vertices is [N] and the set of edges is the family of AP’s on [N]$? I’m getting a bit confused comparing to AP discrepancy and the Roth, Matousek, Spencer bound \sim N^\frac{1}{4}. Could you explain the differences?

    • gowers Says:

      The difference is that the APs are always of the special form \{d,2d,3d,\dots,nd\}. So it’s definitely not about normal AP discrepancy, but just about these very special ones that start at 0.

  3. Jason Dyer Says:

    Constructively, wouldn’t it work to take the Copeland–Erdős constant and for every digit that is =5 put -1 in the sequence? I don’t know the exact nature of their proof of normality but it likely could be modified for varying values of d.

    • Jason Dyer Says:

      Ok, that got HTML-mangled: for every digit less than 5 put -1 in the sequence, and for every digit 5 or greater put 1 in the sequence.

    • Jason Dyer Says:

      Yes, this works: all you need is a lemma that says that if a number m is normal for k digits (for all k greater than or equal to 1) then if you construct a new number r choosing every dth digit, the new number is normal (only needing for k=1). This is trivial because of the reduction in number of needed digits. Suppose given some d, r is not normal. But that means m is not normal when selecting d digits. Therefore we have a contradiction.

      Copeland and Erdős’s original proof
      actually showed any concatenated increasing sequence of integers will work, not just the primes. They were unable (and I believe it still open) to prove this for any base given the original numbers are written in a fixed base, but that’s irrelevant for our current problem.

    • gowers Says:

      The problem with your suggestion is that the discrepancy in a progression with common difference d will depend on d, whereas the problem looks for an upper bound that is uniform over everything.

    • Jason Dyer Says:

      You’re right. I would give near-zero hope to a constructive proof then.

  4. Gil Says:

    It is an appealing problem. Is there anything known about the analogous problem for linear subspaces of F_q^n. (Even when q=2.) We give every element of an n-dimensional space over a field with q elements a sign; is there a linear subspace with |sum of signs| >M.

    And about the analogous problem for combinatorial lines containing 0: I.e., is for every M there is d and n so that for every signs assined to [d]^n there is a combinatorial line containing the all 0 vector so that |sum of signs over the line|>M.

    (The second problem is stronger than the original one but maybe it is obviously false or knwon to be false.)

    • kristalcantwell Says:

      “It is an appealing problem. Is there anything known about the analogous problem for linear subspaces of F_q^n. (Even when q=2.) We give every element of an n-dimensional space over a field with q elements a sign; is there a linear subspace with |sum of signs| >M.”

      Wouldn’t that be a two coloring of the vector space? Then I think there is a Ramsey theorem that gives a monochromatic space for large enough n whose sum is the number of points in the space and that would be bigger than M.

      “And about the analogous problem for combinatorial lines containing 0: I.e., is for every M there is d and n so that for every signs assigned to [d]^n there is a combinatorial line containing the all 0 vector so that |sum of signs over the line|>M.”

      We could assign each point that is on a combinatorial line containing zero the following coloring if it is all zeros the color corresponding to one, if it has a nonzero coordinate its coordinates must be either zero or one particular value giving the the point the sign corresponding to that value plus one we get a correspondence from this problem to the problem on the integers for d+1. So if there is a working solution for all of the integers there will be a working solution for each d. If there is not a working solution for the integers and it breaks down at n we can assign n+1 to d. So I think this problem is equivalent to the problem on the integers.

    • gowers Says:

      I’m not sure I agree that the problem for combinatorial lines is equivalent. Also, Gil was talking about discrepancy for subspaces that contain 0. But I think you may be right that the Ramsey result (the Graham-Rothschild theorem?) is relevant here. For instance, in \mathbb{F}_2^n if we find a monochromatic affine subspace X, then if it contains 0 we are done. If it does not, then we can add one dimension and obtain a subspace Z that can be partitioned into X and a subspace Y that contains 0 and has the same cardinality as X. Then the discrepancy on at least one of Y and X\cup Y must be large.

      It feels to me as though a more complicated variant of this argument should work for \mathbb{F}_p^n as well, but I haven’t checked.

    • Kristal Cantwell Says:

      I think it is the Graham-Leeb-Rothschild theorem. Here is the corrected version of the paper in which it is proved: http://www.math.ucsd.edu/~sbutler/ron/87_10_categories.pdf.

    • Gil Kalai Says:

      If we replace an arithmetic progression “d, 2d, 3d, …, nd” by a two dimensional array {id+je: where i=1,2,…,n and j=1,2,…,n} is the questionit known? (easy? trivial?)

    • Gil Says:

      (sorry, it is trivial if we set e=1.)

    • gowers Says:

      I don’t know how relevant this observation is, but if one took a counterexample (x_i) to the original problem and defined a function on \mathbb{N}^2 by x(i,j)=x_ix_j, then the sum over any 2-dimensional array of the form \{(id,je):i\leq m, j\leq n\} would be bounded by the square of the bound that one had obtained in the 1D case. However, it’s not clear that this would enable one to find a counterexample for your question. (I haven’t yet worked out why it is trivial when e=1.)

    • Gil Says:

      Sorry, I was wrong to say it is trivial. If we set e=1 then (for some d) in this 2 dimensional AP we have by Szemeredi theorem a whole AP with the same sign. “Clearly” this should suffice I thought. (Similarly to Tims comment above about F_2.) But this is not clear at all. Anyway, the problem for high dimensional AP might be easier than the 1 dimensional problem.

    • Gil Kalai Says:

      the vectors in a combinatorial lines containing the zero vector have the 0 vector and some coordinates replaced by ’1′s ’2′s …’r’s. So if we give all 0-k vectors the sign (-1)^k the discrepency will be bounded by 1. So we need something more than combinatorial lines that will still somehow imply the conjecture for integers.

  5. Jonathan Vos Post Says:

    I hope it’s okay for me to respond to your frame story, rather than just the specific Math question posed.

    I very much admire your writing: “I have found the discussion about modelling mathematically the origin of life extremely interesting: I think we are still some way from formulating a problem that is clearly good, so I would still like to wait a few months at least before embarking on a project on that general theme. (However, it might well be that formulating a good problem would be a necessary part of any such project — in that respect it would differ in an interesting way from projects where the aim is to prove a clearly stated conjecture.)”

    In a selfish way, I’d have wanted a polymath on that to start soon. But I felt compelled to explain how very hard the problem is even to state correctly. I suspect that this comes from Math/Computer Science people having accepted at best “toy world” models of Biology.

    I might add (I’m not sure that this will come out correctly as a drawing through your web form) that the following diagram does NOT commute:

    Cell (Monk’s) ———————> Cell (Organism)
    | |
    | |
    V V
    Cellular Automata —————-> Biology in 2-D + time

    Upper Left to Upper right arrow: The word “cell” for the basic microstructure of life evolved via Robert Hooke coining the term “cell”, … as he described the microscopic structure of cork like a tiny, bare room or monk’s cell.

    John von Neumann and Stan Ulam create the lower left corner of the diagram, to (academically improperly) cite wikipedia: “Stanisław Ulam, while working at the Los Alamos National Laboratory in the 1940s, studied the growth of crystals, using a simple lattice network as his model. At the same time, John von Neumann, Ulam’s colleague at Los Alamos, was working on the problem of self-replicating systems. Von Neumann’s initial design was founded upon the notion of one robot building another robot. This design is known as the kinematic model. As he developed this design, von Neumann came to realize the great difficulty of building a self-replicating robot, and of the great cost in providing the robot with a ‘sea of parts’ from which to build its replicant. Ulam suggested that von Neumann develop his design around a mathematical abstraction, such as the one Ulam used to study crystal growth. Thus was born the first system of cellular automata. Like Ulam’s lattice network, von Neumann’s cellular automata are two-dimensional, with his self-replicator implemented algorithmically. The result was a universal copier and constructor working within a CA with a small neighborhood (only those cells that touch are neighbors; for von Neumann’s cellular automata, only orthogonal cells), and with 29 states per cell. Von Neumann gave an existence proof that a particular pattern would make endless copies of itself within the given cellular universe. This design is known as the tessellation model, and is called a von Neumann universal constructor. In the 1970s a two-state, two-dimensional cellular automaton named Game of Life became very widely known, particularly among the early computing community. Invented by John Conway, and popularized by Martin Gardner in a Scientific American article…”

    Unreached lower right: 4-D Biology. However, the popularity and abstract power of these conceal how un-biological they are. Ulam explained that his goal was to shrink the delta t and delta x, delta y discrete model with difference equations, to a continuous space-continuous time system of differential equations, which von Neumann felt that he alone was capable of solving.

    Naive CS folks ever since have hoped that the 2-D cellular automata (2-D lartice structure + simple rules + time) would give dynamics that yield insight into 3-D cells and organisms + time. But on this planet, cells are intrinsically 4-D and smarter than little boxes with a small number of rules.

    Latest support for this:

    Cells Move in Mysterious Ways, Experiments Reveal
    http://www.sciencedaily.com/releases/2009/12/091216151151.htm

    ScienceDaily (Dec. 17, 2009) — Scientists at Brown University and the California Institute of Technology have for the first time tracked how cells move in three dimensions by measuring the force exerted by them on their surroundings. The scientists’ experiments revealed that cells move in a push-pull fashion, such as probing at depth, redistributing weight at various points and coiling and elongating….

    What they found was cells move in intriguing ways. In one experiment, a cell is clearly shown operating in three dimensions by extending feelers into the gel, probing at depth, as if thrusting a leg downward in a pool. The Brown and Caltech scientists also found that as a cell moves, it engages in a host of push-pull actions: It redistributes its weight, it coils and elongates its body, and it varies the force with which it “grips,” or adheres, to a surface. Combined, the actions help the cell generate momentum and create a “rolling motion,” as Franck described it, that is more like walking than shuffling, as many scientists had previously characterized the movement.

    “The motion itself is in three dimensions,” Franck said.

    Franck’s lab plans to use the new force-measurement technique to examine how the movement of normal healthy cells differs from mutant or malignant ones. “That promises to give us greater insight in how cells’ behavior changes when they become diseased,” Franck said.

    David Tirrell and Guruswami Ravichandran, scientists at Caltech, contributed to the research. The National Science Foundation funded the work.

  6. Jonathan Vos Post Says:

    Cell (Monk’s) ———————> Cell (Organism)
    |……………………………………. |
    |……………………………………. |
    V…………………………………… V
    Cellular Automata —————-> Biology in 2-D + time

  7. manuel Says:

    Here is a reference a remember for this problem, where a particular case is solved.
    Mathias, A. R. D. On a conjecture of Erdős and Čudakov.
    “Combinatorics, geometry and probability” (Cambridge, 1993)

    • gowers Says:

      Would it be possible for you to tell us what the main result of that paper is?

      Ah, I’ve found it now. He proves that if all the sums x_d+x_{2d}+\dots+x_{nd} are at most 1, then they cannot be bounded below. The argument is very short and can be found online here.

    • Mark Bennet Says:

      The maximal sequence for C=1 has length 11, and Mathias builds his counterexample on a contradiction with a sequence of length 12.

      It looks like one of the building blocks for any proof that a sequence with discrepancy bounded above by 2 then it cannot be bounded below would have to use a maximal sequence for C=2 – this would get the total for some subsequence down to -3. There aren’t the same structural features to work with, but other features may emerge which could be used in a proof that this implies arbitrarily small totals.

  8. Alec Edgington Says:

    If it’s still an open question whether C = 2 suffices, that would surely be a good place to start.

    For what it’s worth, this sequence (x_i : i = 0 \ldots 40) achieves C = 2:

    + - + + - + - - + - - - + + + - - + + + - - + - - + - + - + + - + + - - - + - + +

    (Note that the index starts at zero.)

    That’s the longest C = 2 sequence my simple program has found. But I bet there’s scope for more computational trickery.

    Of course, even being able to find arbitrarily long finite sequences that achieve C = 2 would not (as far as I can see) guarantee that any infinite sequence does.

    • gowers Says:

      I think it does guarantee it with the usual diagonal/compactness argument: you just take a sequence of finite sequences with lengths tending to infinity and pass to a subsequence that converges pointwise. Unless I’m missing something, that will give you an infinite sequence with a C that’s at least as small as the largest that you get for the finite sequences.

    • Alec Edgington Says:

      Ah, yes indeed. Thanks.

    • gowers Says:

      By the way, I meant to say that your example was interesting. It made me wonder whether it was better than Walters’s example (bearing in mind that with Walters’s example there is some flexibility: for any power of 3 you can reverse all multiples of that power, and you can compose operations of that type). I haven’t had time to check.

  9. Future polymath project « Euclidean Ramsey Theory Says:

    [...] polymath project By kristalcantwell Here is a link to a post on a possible polymath project. It is Erdős’s discrepancy [...]

  10. Alec Edgington Says:

    (Actually you can append a 1 or -1 to the end of that sequence and it still works. I haven’t found any going beyond 42 though, and I suspect that may be the limit.)

    It occurred to me that the problem could be framed in terms of Banach spaces. Because the points with coordinates \pm 1 are the extreme points of the \ell_\infty unit ball, the question is whether the map

    T : \ell_\infty^{\mathbb{N}^+} \rightarrow \ell_\infty^\mathcal{A}

    defined by

    T(x)(A) = \sum_{i \in A} x_i

    is bounded.

  11. gowers Says:

    Let me quickly try to optimize the Walters example by choosing more carefully the values at powers of 3. To start with I’ll go for x_{3^k}=(-1)^k. If we insist that the function is multiplicative and depends on the last non-zero digit, and that x_2=-1, then that determines everything. It starts
    1+, 2-, 3-, 4+, 5-, 6+, 7+, 8-, 9+, 10+, 11-, 12-, 13+, 14-, 15+, 16+, 17-, 18-, 19+, 20-, 21-, 22+, 23-, 24+, 25+, 26-, 27-. The partial sums so far are bounded between -1 and 2 and the last one is equal to -1. If we therefore repeat the whole sequence then the partial sums between 28 and 53 will be between -2 and 1 with the last one equal to -1. We then take the value of 54 to be +, so we get a partial sum of 0. We then repeat the sequence yet again, but at 81 we have a new power of 3, the value at which we will set to be +. So the partial sums between 55 and 80 will be the same as those between 1 and 26, and the partial sum at 81 will be 1 instead of -1. We should soon get into trouble, since now we are starting all over again, but from 1 instead of 0. Indeed, since the partial sum at 10 is 2, the partial sum at 91 will be 3. But I think we’ve managed to get all the way up to 90 with all partial sums between -2 and 2, and since the sequence is multiplicative, that means that the sums along all APs of the special form are between -2 and 2 as well.

    One might be able to go slightly further by artificially changing the value at 91. In fact, since the partial sums at 7 and 13 are both 1, and therefore not critical, this is definitely possible. Maybe a computer search could take over from here: start with the example that gets you all the way up to 90 and then search for ways of tinkering with it to extend it a bit further. But I suspect it won’t be possible to go all that much further. The fact that one can go as far as this, however, is a good indication of why the problem could be quite hard.

  12. gowers Says:

    Actually, it occurs to me that a better way of tinkering would be to choose the value at 81 to be – instead of +. That way, the next 27 partial sums would all be between -2 and 1 (because the first 27 were between -1 and 2, but now the partial sum at 81 is -1). If we do this, then the partial sum at 81+27=108 will be -2, but we could perhaps try to fix that by choosing the value at 108 to be + instead of -. At this point, we are departing from multiplicativity enough to make various checks necessary, which I can’t face doing right now. Actually, I should also check that taking the value at 81 to be – doesn’t mess things up too much. If the partial sum up to 27 was -1, then the sum over all multiples of 3 up to 81 used to be 1 and we’ve now changed it to -1. But that will cause some problems because we rather needed the partial sum at 27 to be -1, so from the point of view of multiples of 3 we rather needed the partial sum at 81 to be 3. The problem turns out to arise when we get to 111.

    I think, but haven’t quite checked hard enough to be 100% certain, that this gets us up to 110 though.

  13. gowers Says:

    All this suggests that a rather nice preliminary goal for a project of this kind would be to prove that the sums along special APs cannot all lie between -2 and 2.

  14. Ernie Croot Says:

    Of the polymath projects you have listed that you want to try, I think this one and the one on the origins of life are probably the best from the point of view of getting loads of participants (alas, I will not be among them, as I just have too many things to do — and, I would like to think about the Polymath4 project some more). However, this one has the advantage over the origins of life in that it begins with a well-defined mathematical problem.

    One more thing: I wanted to point out that your Theorem above is reminiscent of some beautiful identities of Peter Borwein and Stephen Choi:

    http://www.cecm.sfu.ca/personal/pborwein/PAPERS/P204.pdf

    Their identity is as follows: Let \lambda_3(n) = (-1)^{\Omega_3(n)}, where \Omega_3(n) is the number of distinct prime factors of n that are -1 mod 3, counting multiplicity (Tim, this is your function f(n), I believe). Then, \lambda_3(n) is, of course, completely multiplicative. And what Borwein and Choi prove is the astonishing fact that

    \sum_{j \leq n} \lambda_3(j) = d_3(n),

    where d_3(n) is the number of 1‘s in the base-3 expansion of n. I say `astonishing’ here because one wouldn’t expect there to be such a strange relationship like that between multiplicative functions and partial sums.

  15. Alec Edgington Says:

    This set me wondering about how far a completely multiplicative \pm 1-valued sequence (x_n) defined on the positive integers could continue with its partial sums lying between \pm 2. Such a sequence is determined by its values at primes. This choice gets you as far as 214:

    x_p = -1 \Leftrightarrow p \in \{ 2, 3, 5, 11, 17, 23, 29, 41, 47, 53, 59, 71, 79, 83, 89, 101, 113, 137, 139, 151, 167, 173, 179, 197, 199\}

    • gowers Says:

      That’s quite something. If we take that sequence and use it to define a function f:\{1,2,\dots,210\}\rightarrow\{-1,1\}, then we can define a function F on all the integers by taking F(n) to be f evaluated at the last non-zero digit when n is written in base 211. This will again be a completely multiplicative function and if the sum of your function up to 210 is zero, then the same argument that proves that the growth rate in Walters’s example is at most \log_3n is enough to give a bound here of 2\log_{211}n, which is smaller than \log_3n by a factor of approximately 2.4. (If the sum up to 210 is not 0, then one can try other primes such as 197 or 199.) Again, this illustrates why the problem is hard: if there is no sequence with bounded discrepancy, it is difficult to use small examples to give a hint about how to prove this. Also, there doesn’t seem to be any chance of finding the best possible sequence.

    • Alec Edgington Says:

      By the way, I’ve checked that the furthest you can possibly get with a completely multiplicative sequence having partial sums between \pm 2 is 246. There are exactly five hundred sequences that get that far, but they all trip over at 247!

    • gowers Says:

      For what it’s worth, if you let x_n be 1 if the last non-zero digit of n is 1 or 4 base 5 and -1 if it is 2 or 3, then you can improve the growth function from \log_3n to \log_5n. But that’s not as good as 2\log_{211}n.

    • Alec Edgington Says:

      Of the aforementioned 500 sequences, it turns out that 250 have x_1 + x_2 + \ldots + x_{240} = 0. So with your construction above, these would give a bound of 2 \log_{240} n.

    • Alec Edgington Says:

      Sorry, that should be 2 \log_{241} n.

  16. TonyG Says:

    Brute-force search reveals 359 as the length of the longest sequence having partial sums between +/- 2. Here is an example:
    + + – + – - + – + + – + – - + + – - + – + + – + – - – - + – + +
    - + + – + – + – - + – + + + – - – + + – + + – + – - + + – - – +
    + – + – + – + – - + – + + + – - + – - – + + – + – + + – + – - -
    + + + + – + – + – - + + – - – - + – + + – + – + + – + – - + + -
    - + + – + – - – + + – + + + – + – - – + + – + – - – + – + + + +
    - – - – + + – + + – + – - + + + – + – - + – + – - + + + – + – -
    + – - + – + – - + – + + + – - + – - + + – - + + – + + – - – + -
    + + + – - + + + – - + – - + + + – - + – - + + – - + – + + – - +
    + – - + – + + – - + + + – - + – + – + + – - – - + + – - + + – +
    + – - + + – - + – + – - + + + – + – - + + + – - + – + + – + – -
    + – + – - + – + – + + – + – - + + – - + – + + + – - + – - + + +
    - – + – - + +

    And 11 is the length of the longest sequence having partial sums between +/- 1. So…any sequence of length (2n)!/2 has a partial sum of absolute value >= n, if n is 1, 2, or 3. I sense a pattern!

    • gowers Says:

      That’s interesting. It leaves me curious to know how far it is possible to go with a brute-force search. By brute force you obviously don’t mean just checking all 2^n sequences separately. I presume the point is that most sequences are ruled out fairly early on so you gain a lot by doing a depth-first search.

      Anyhow, the question I now find myself wanting an answer to is whether there is a way of mathematically constructing an example that gives 359, or more generally (2n)!/2. The fact that there are several examples (if I read between the lines correctly) suggests that a linear-algebraic approach might be worth thinking about.

    • TonyG Says:

      >It leaves me curious to know how far it is possible to go with a brute-force search.
      Well, the next one up is 8!/2 = 2160, which is out of reach to my current program, by several orders of magnitude.

      >The fact that there are several examples…
      Yes, if only because the value of x_p for any prime p more than half-way through the sequence has no effect on any of the partial sums.

    • Miguel Lacruz Says:

      You meant to say that 8!/2=20160.

    • gowers Says:

      Looking again at your example, it seems to be + at 2,4,6,8, which would give a sum of 4. Also, the partial sums of your sequence seem to reach 5 pretty quickly. And going further I seem to be able to get up to 20, so in general the sequence is fairly biased towards +. Am I misunderstanding something?

    • gowers Says:

      Ah, it was WordPress’s fault. I’ve put spaces between your pluses and minuses and now they’re all showing up …

    • gowers Says:

      Going back to the question of how your brute-force algorithm worked, I note that not only can the sequence take any value at a prime p such that 2p is greater than 359, but it can also do so at any 2p such that 3p is greater than 359 or at any 3p such that 4p is greater than 359 and x_p+x_2p=0, or at any 4p such that the sequence hasn’t already messed up and 5p is greater than 359.

      The number of such places is fairly large, and makes me wonder whether two raised to that power is larger than the number of operations a computer can reasonably be asked to perform. On the other hand, one can tell the computer in advance not to bother looking at all those possibilities and just to set them all to +. Did you do something like that?

      Actually, on further reflection I realize that it’s not true that the value at a prime greater than 180 can be set arbitrarily, since we care about the partial sums of the entire sequence. So if these partial sums hit both the values 2 and -2 fairly late on, then the values at all earlier primes cannot be changed (or at least cannot be changed without fairly serious consequences later). So most of this comment is nonsense.

    • Alec Edgington Says:

      If the pattern is that simple, would an inductive proof be out of the question? It would be enough to show that, given a sequence of length N (2C+1) (2C + 2) whose sums are bounded by C on all its special APs, there must be a subsequence of length N whose sums are bounded by C-1 on all its special APs…

    • Alec Edgington Says:

      (Just to clarify: the ‘special APs’ of the subsequence are defined relative to its own numbering. If the subsequence happened also to be a special AP of the original sequence, then one could use the same numbering, but that’s perhaps too much to ask.)

    • gowers Says:

      The problem with that approach, I think, is that pretty well any sequence of length N will be a subsequence of the sequence of length N(2C+1)(2C+2), since the latter sequence will be pretty densely packed with 1s and -1s. So I think you need to put some conditions on the subsequence, which in any case seems a good idea because one would want somehow to use the property that the main sequence had.

    • Alec Edgington Says:

      Yes, I didn’t want to be too specific, but was thinking, for example, of chopping the long sequence into (2C+1) (2C+2) contiguous blocks, or considering subsequences consisting of the first N multiples of some n \leq (2C+1) (2C+2). (But I haven’t checked whether either of these has a chance.)

    • gowers Says:

      Yes, something like that would make more sense. It might be worth trawling through TonyG’s example to see what its restrictions to APs look like. If we could find a way of generating sequences of length 359 that did not rely on brute force, it might give useful clues.

  17. gowers Says:

    I’m going to start a new comment, though in a way it is still a reply to the previous one. I’m in two minds about the conjecture that there is an exact bound of (2n)!/2 for the point where you are first forced to get a sum of modulus n. On the one hand, I don’t feel I can completely dismiss a sequence that starts 1, 12, 360. But on the other hand, the fact that the problem has a significantly number-theoretic character, by which I mean that the freedom you have to assign a value to an integer m depends a lot on what the factors of m are, makes it seem unlikely that a formula such as (2n)!/2 would be correct.

    At the moment, I incline towards the view that the sequence 1,12, 360 is probably a coincidence (though perhaps it might be worth looking in Sloane’s database to see whether there are any more number-theoretic sequences that start like that). However, it still seems worth investigating. If it did turn out to be correct, then what could a proof conceivably look like? Is there, for instance, a clever way of associating with each term in a sequence of pluses and minuses a permutation of {1,2,…,2n} in such a way that if the sums over all APs lie between -n and n, then none of these permutations is equal to any other or equal to the reverse of any other (by which I mean what you get when you write the other backwards)? If one could do that with the additional property that none of the permutations in question was the identity or the reverse of the identity then one would be done. But I don’t hold out much hope for this approach: how will the sum of the sequence over all multiples of 17 have anything to do with whether two of the permutations are equivalent?

    • TonyG Says:

      Oh dear. After generating 169,137 sequences of length 359, my latest program found a sequence of length 360:
      + + – + – - + – + + – + – - + + – - + – + + – + – - – - + -
      + + – + + – + – + – - + – + + + – - – + + – + + – + – - – +
      + – - + + – + – + – + – - + – + + + – - + – - – + + – + – +
      + – + – - – + + + + – + – + – - + + – - – - + – + + – + – +
      + – + – - + + – - + + – + – - – + + – + + – - + – + – + + -
      + – - – + + + + – + – - + – + + – + – - + – - – + + + + – -
      + – + – - + + + – + – - + – - + + + – - – - + + + + – - – -
      + + + – + + – + – - + – - + + + – - – + + – + – - – + + + +
      - – + – - + + + – + – - + – - – + + – + + – + – - + – + + -
      - + + – + + – - – - + + + – + + – + – + – + – - + + – - – -
      + + + – - – + + + – + – - + + + – + – + – - + – - – + + – +
      + – + – + – + + – - – + + + – - – + – + + – + – + – + + – -
      There goes my credibility. Sorry if I wasted anybody’s time.

    • Alec Edgington Says:

      I’d also be interested to know how your algorithm works. If you simply keep track of all sequences that work up to n, and let the list grow and shrink as you try appending successive values, the size quickly gets beyond the capacity of my computer (at n = 57 there are 16\,406\,539 sequences). But if you have a specific target, especially one with lots of factors, perhaps you can order the n differently. Suppose you’re interested in sequences on \{ 1, \ldots, 360 \}; you can start by considering all possible values of the x_n where 120 \vert n — which enter into the most APs — and then move on to the x_n where 30 \vert n, then those where 6 \vert n, and finally the rest. But from your last comment (the fact that you went via 359) it sounds like this isn’t what you’re doing.

    • Alec Edgington Says:

      On second thoughts, that woudn’t help much: you’d probably still have millions of sequences (however many there are when you get to 60) to consider at the last stage.

    • TonyG Says:

      The program is very simple — 79 lines of C code. It does nothing clever; it simply maintains, for each r and each d | r, the sum x_d + x_2d + … x_r. This is enough to make the depth-first search fast. It took several hours to find the sequence of length 360. There may be even longer sequences; I will work on this. I can post the current code here if anybody is interested.
      The previous version had a faulty termination condition, which is why I thought 359 was the limit.

    • Alec Edgington Says:

      I see, thanks. So I presume the 169\,137 sequences of length 359 were all dead-ends that it backtracked from before eventually finding the 360 sequence? That’s quite a cull!

  18. Mark Bennet Says:

    I reckon that adding -+++-+-+—++- to the end of Alec’s list gets us up to 374.

    The method I’m using is a semi-manual one – I did brute force search up to 120 manually before inputting and checking Alec’s details.

    Basically it is an Excel spreadsheet in which the dth row of the main table keeps track of the sums of the progression of difference d. I’m able to check the next digit =1 in the nth column, and if this doesn’t work, use -1, and if this doesn’t work backtrack. The table helps by keeping track of which progressions are critical at any stage. With a bit of intelligent design (or a better programming capacity) it would be possible to improve the search to flag entries which are forced, and conflicts in forced entries, which would avoid some time-consuming backtracking.

    and 361ff = -+-++-+—++-+–+++–++ gets to 384

    Of course this method isn’t particularly scalable

    • gowers Says:

      One thing I don’t understand here — if TonyG (I presume you actually meant him rather than Alec) could get to 360 and what you say is correct, and he is using a depth-first search, then why didn’t he come up with these extensions? I don’t know whether it’s you or Tony who should answer this question.

    • Alec Edgington Says:

      Looks like WordPress is up to its tricks again. I assume that it converts double minus signs to en dashes and triple minuses to em dashes.
      So, inserting spaces, your first sequence should be:

      - + + + – + – + – - – + + -

      But I can’t work out what your second sequence should be.

    • Anonymous Says:

      >…why didn’t he come up with these extensions?

      I still thought that 359 was the limit, so I had the program print a message and halt if it got to 360.

  19. Mark Bennet Says:

    Yes I did mean extending Tony’s sequence, not Alec’s.

    Sorry about not understanding what wordpress does. I think the following (which has spaces) does up to 389 – it differs from Tony’s from position 355.

    + + – + – - + – + + – + – - + + – - + – + + – + – - – - + – + + – + + – + – + – - + – + + + – - – + + – + + – + – - – + + – - + + – + – + – + – - + – + + + – - + – - – + + – + – + + – + – - – + + + + – + – + – - + + – - – - + – + + – + – + + – + – - + + – - + + – + – - – + + – + + – - + – + – + + – + – - – + + + + – + – - + – + + – + – - + – - – + + + + – - + – + – - + + + – + – - + – - + + + – - – - + + + + – - – - + + + – + + – + – - + – - + + + – - – + + – + – - – + + + + – - + – - + + + – + – - + – - – + + – + + – + – - + – + + – - + + – + + – - – - + + + – + + – + – + – + – - + + – - – - + + + – - – + + + – + – - + + + – + – + – - + – - – + + – + + – + – + – + + – - – + + + – - – + – + + – + – - + + – + – - + – - + + – + – - + + – + + – + + – + – - + + – - – - +

    • Mark Bennet Says:

      Sorry, I’m also messing up the threading by replying to the wrong thing.
      Obviously, using my crude search I’m not trying to generate every sequence, so I’m not checking hundreds of thousands, just trying to find a sequence which works.
      One disadvantage of this is that I’m not searching for sum=0 in useful places.

    • Alec Edgington Says:

      I’m just running a C program now that will go as far as 1024 if necessary. It’s reached your 389 and is thinking hard. The fact that multiples of 30 seem to be big sticking points for the search may be a useful clue when it comes to proving bounds by methods other than brute force.

  20. gowers Says:

    The following small question occurred to me as possibly a good candidate for the first thing to attempt to prove theoretically, because there is a chance that it is easier than the problem itself. In Adrian Mathias’s paper, he proves that if the sums have an upper bound of 1, then they cannot be bounded below. Might it be possible to prove that if the sums cannot be bounded between -2 and 2, then if they are bounded above by 2, then they cannot be bounded below? That is, I’m asking for a conditional result: assuming that the sums must eventually get above 2 in absolute value, prove the one-sided result that if they are bounded above by 2 then they must get arbitrarily large on the negative side. (Even proving that they must get down to -4 would be interesting, of course.)

    In response to Alec’s 10.35 remark, I’d be interested to know whether the sticking points tend to be arithmetic progressions with small common difference (up to 6), as this multiples-of-30 phenomenon might suggest. If so, one might be able to speed up the search by trying to take care of these small-difference APs first. Of course, one would need to find a “space” of examples that work for the small-difference APs, so that it was then possible to make further adjustments to deal with larger ones.

    • H Matthew Says:

      This oddly was the same thought I had.

      For any sequence of length n there are 2^n potential solutions. However, 1/2 of the solutions are identical to the other half with the exception of a sign change (eg there are ( 2^n/2 solutions and 2^n/2 conjugate solutions). I would agree that Mathias didn’t check his conjugate solution, because it still appears to meet his other constraints (since he only specified an upper bound).

      Now the only question is what does the conjugate solution represent?

      I would think that any conjugate solution to a preferred solution represents a mapping to the negative integers. So the total set of 2^n solutions appears to be mapableto the full set of integers and not just the positive integers (since any solution I map to the positive integers automatically has a conjugate that can be mapped to the negative integers).

      It appears though that any test solution that fails to meet an upper bound will automatically have a conjugate that meets the upper bound, but fails to meet any equivalent magnitude negative lower bound. I have playing with the idea of what it might mean to define a superposition of solutions and conjugate solutions, but I am not sure if that would be useful in solving the conjecture.

    • Mark Bennet Says:

      On sticking points, I’d make the following observations about a depth first style or greedy search (just see how fay you can go).

      There is never a problem choosing a value at any prime p, because this is only in the progressions with difference 1 or p, and the p progression doesn’t have enough elements yet to cause a problem. The primes can be used (with some constraints) to sort out the values in difference 1.

      Similarly 2p only has a problem in difference 1 or 2 (not p) and this is generally easy to fix. However the choice of p and 2p can force a value at 3p [and the same is true for any 'n' of course] – so it is possible to get a fair few forced values in the progression of difference 3.

      4p causes a problem only at difference 1 or 2 or 4 – but as time goes on, other values in difference 4 get forced, and it is these, in part, which seem to be causing problems making progress beyond 389, combined with values at 330/340/350/360 in the difference 10 progression.

      Some other specific issues:

      With the values we have so far 390 is forced to be -1 by values at 130/260.

      As an example of a blockage with higher primes 374 is forced to be -1 in the 77 progression, and this has caused problems because 341 and 352 are easily forced to be -1 too, and this can leave problems with the 11 progression if 363 is not +1

      These blockages with higher primes require more backtracking to fix, and it is likely to be efficient programming to identify constraints in the progressions with larger differences, and cure the smaller ones by search. It looks as though the small differences are causing the problem, but in fact it seems to me that it is the constraints imposed by the larger differences which are more significant to a search.

      As primes become sparser, the proportion of easy values decreases, and there is less flexibility to fix any problems.

  21. gowers Says:

    Here’s an idea for a completely different way of searching for a solution. The “conventional” way would be to extend a given solution number by number and do a depth-first search. One can think of that as doing a search with respect to the standard basis of functions (from {1,2,…,n} to {-1,1}). But a much more natural “basis” for this problem consists of characteristic functions of arithmetic progressions. That’s partly because these are closely related to the statement of the problem, and partly because if you pick out a subprogression of {1,2,…,n} and change all the entries in that subprogression, you don’t have too dramatic an effect on any of the sums, unless the sequence you had was a very bad one to start with, in which case you may well improve it substantially.

    How could one use this to search for good sequences? The idea would be to start with the all-ones sequence, and then add characteristic functions of pretty long APs (by which I mean add them mod 2). There could be a scoring system for how good/bad a given sequence is, and one could look out for APs that improve the sequence as much as possible. For instance, one might start by adding the AP that consists of all odd numbers, since then one would get the sequence – + – + – + … (maybe I should say that I multiply pointwise by the function that is -1 on the AP and 1 everywhere else, but that is equivalent to addition mod 2) which is now beautifully behaved on all APs with odd common difference.

    If one only ever used APs of the form {p,2p,3p,…} then one would always end up with a multiplicative sequence. So the idea would be to allow just a bit more flexibility than that. Perhaps by doing something like this one could end up with a sequence that did pretty well, and one could then go back to a more conventional depth-first search for ways to adjust the sequence to keep it within bounds everywhere. Most of the time one would fail, but the hope would be that this procedure would give one a good enough probability of success for examples to appear from time to time.

    One thing I want to think about is this. If one just randomly adds together a few fairly long APs, will the resulting growth rate be logarithmic? One would have to think a bit about what the precise probability distribution should be, but I think there is the potential here to give a large source of examples that achieve logarithmic growth. That feels to me to be important as a way of coming to grips with the problem — one wants to “understand the enemy”, so to speak. This, incidentally, is another reason that I think looking at the standard basis is not the right thing to do, since a purely random sequence will give you a growth rate of around \sqrt{n} even if you just look at the partial sums x_1+\dots+x_m.

    Having said all that, the numbers we now have suggest to me that the true growth rate is probably slower than logarithmic. So another interesting initial target would be to prove precisely that — in other words, to attempt to construct an example of a sequence where the AP-sums are bounded between -n and n and the length of the sequence is something like n^n rather than C^n.

    • Alec Edgington Says:

      That is a very interesting idea.

      One idea it gave me straightaway was to initialize the search with a multiplicative sequence of length 246 (instead of with the singleton 1). When I tried this and ran the program, it very quickly reached 599. (The old program was still running and still stuck at 389.)

    • gowers Says:

      That’s pretty extraordinary — I would have expected the longest sequence with C=2 to be much shorter than that.

      I think it would be very interesting to take some of these examples and see whether they can be expressed as a mod-2 sum of not too many characteristic functions of APs. Or perhaps they can be mostly expressed like that, but with a few adjustments. However, this may not be an easy computational problem, since the somewhat similar problem of finding a point in a given subspace of F_2^n that is close to a point that’s given to you is in general hard, if I remember correctly. It’s also an important problem in coding theory, for obvious reasons.

    • Alec Edgington Says:

      The idea of searching over an alternative \mathbb{F}_2-basis suggests a transformation of the original problem, in terms of the coordinates of the original sequence in the new basis. If we take the set of APs of the form \{ d, 2d, 3d, \ldots \} as our basis — as seems rather natural — then the question is whether there is a \pm 1-valued sequence (x_n) and a finite C such that for all m, k \geq 1,

      \lvert \sum_{1 \leq r \leq k} \prod_{d \vert mr} x_d \rvert \leq C.

      This doesn’t look particularly friendly, however, as in general not many of the terms factor out of the sum.

      For the record, here is the longest sequence I’ve found so far, of length 669. (This is from the search that started with the long multiplicative sequence. It’s been stuck here a long time.)

      + – - + – + – - + + + – - + + + – - + – + – - + + + – - + – + – - + + + – - + + – - – + – + – - + – + – + + – + – - + + + – - + + + – - + – - – + + – + – - + – + + – + + + – - – + + – - + – + – - + + – - + + – - + – + + + – - + + + – - + – + – + + – + – - + – + – - + + + – - – + + + – + – - – - + + + – - + – + + – - + + – - – + + – - + – + – + + – + – + + – + – - + + + – - + + – - – + – + – - + – + + – + + – - – + + – + + – + + – - – - + – + + + + – - – - + – + + + + – - – + + – - + – - + – + + + – - + – + – - + + + – - + – + + – + – + – - + + + – - + – + + – - – + + – + + – + – + – + – - – - + – + + + + – - – + + – - + + + – - + – + – - + – + + + – - + – - + + + – - – + + – - + + – + – - + + – + + – - – - + + + – - + – + + – - – + – + + + + – - – - + + – + + – - + – + + – - + – + + – + + – - + – - + – - + – + + – + + – + – + – + + – + – + – - + + – - + – + + – - – - + + – + + – - + + – + – - + – + + – + + – + – - – + + – + – - + – - + – + + – + + – - + + + – - + + – - + – - + + + – - – + – - + – + + – - + – + + – + + – - + + + – - + + – - + – + + – - + – + + – + – - – + – - + + – + – + + – - – + + – - + + + – + – - – + + – + + + – - + – + + – - – + – + + + – - + + – - + – - + + – + – + – - + + – - + – + + – + + – - + – - + – + + – - + – + + – + + – - + – - + + + – - – + – + + – + + – - + – + + – + – - – + – - + – +

    • Mark Bennet Says:

      I think the reason that this calculation is hanging is that you quite likely have to backtrack before 536 to have a chance of getting to 670.

    • Alec Edgington Says:

      The problem at 670 is that x_{10} + x_{20} + \ldots + x_{660} = 2 while x_{134} + x_{268} + x_{402} + x_{536} = -2. After reaching 670 the program quickly backtracks to well below 536. After I left it running overnight it had got back to 239. It seems that the multiplicativity of the initial sequence up to 246 gives such a lot of flexibility that it is difficult to rule out numbers in the upper part of that range.

      I’ve tried starting with a few different multiplicative sequences of length 246, and they have all got stuck at 669. At that point they have all had identical sequences (x_{10}, x_{20}, \ldots, x_{660}) = (1, -1, \ldots, 1) and (x_{134}, x_{268}, x_{402}, x_{536}) = (1, -1, -1, -1). (I’m assuming that x_1 = 1.) So it did occur to me that the values on these two APs may be forced, and hence that 670 is the limit — but the evidence is pretty scanty. It’s interesting to note that the 500 multiplicative sequences of length 246 all agree at primes up to and including 67.

    • Omer Angel Says:

      A way to speed up brute force searched (perhaps you’re already implementing this) is to first optimize the order at which you fix entries. For example, there is little point in selecting $x_p$ for a prime p, before 2p is also reached.

      The optimal order may well depend on the length of the sequence one is aiming for. A good order will be one where the sums along many arithmetic progressions are determined as soon as possible.

    • Alec Edgington Says:

      Yes, I had wondered about trying to optimize the order in which the values are tried when searching. It isn’t quite so simple, though, because the value at p does determine a lot of AP sums (those of difference 1 passing through p). I think if you want the maximum number of AP sums to be determined with n numbers, the optimum choice does turn out to be \{ 1, 2, \ldots, n\}. (This determines about n + \frac{n}{2} + \frac{n}{3} + \ldots + 1, or about n \log n, APs.)

      On the other hand, it may be worth changing the order if one is particularly interested in certain APs. For example, I’m interested in whether the values at 10k and 134k are forced for a sequence of length 669 with C=2. So maybe I should be setting those values first in order to eliminate impossible sequences as early as possible.

      On that topic, I’m currently running a version of the program that checks those values whenever it finds a sequence of length 669. By the time it had tracked back to 239 it had found 39\,246\,912 such sequences, all of which agreed at those points. This lends some support to the idea that a proof could focus on them.

    • Alec Edgington Says:

      I have just belatedly noticed a symmetry that could save a great deal of wasted search time. If you have a sequence (x_n) that works (with C=2) up to N, and you have two primes p < q both greater than N/3 (so that the only relevant APs that they enter into are those with difference one), and if x_p and x_q have opposite sign, and the partial sums of the sequence from p to q are all the same sign as x_p or zero, then you can negate both x_p and x_q and the sequence will still work.

      That may sound like a lot of conditions, but I'm pretty sure this observation would reduce the time needed for a complete search by many factors of two. I wonder if there are other symmetries of this nature to be found.

    • Alec Edgington Says:

      There is at least one other symmetry of this kind, related to the first. If p < q are primes between N/3 and N/2, and x_{2p} = -x_{2q}, with all of the partial sums of APs of difference 1 or 2 between 2p and 2q having the sign of x_{2p} or zero, then the sequence obtained by swapping x_{2p} and x_{2q} works up to N if and only if the original sequence does.

      This is less useful than the first, but does allow some further economies in searching.

  22. H Matthew Says:

    A couple of probably obvious observations, but ones that haven’t been stated.

    We can define an inner product for the string of -1′s and 1′s.
    The inner product will be equal to the length of the string n (eg x * x = n).
    The arithmetic progressions are analogous to samples of the original string.
    If the sample interval is d, then then the inner product of the sample is n/d.
    This implies a decreasing upper bound on the sum of the samples as the sample interval d increases.
    This implies that there is a set of constants C, n, d that can satisfy the problem, but C isn’t arbitrary.

    • Mark Bennet Says:

      Some more trivial observations.

      If f(r) has bounded discrepancy for all r , then so does g(r) = f(nr).
      The idea of a completely multiplicative function gives g(r)=f(r) or g(r)=-f(r), but within any viable f there will be concealed a family of related functions. It ought to be possible to identify some closure properties for such a family. Is it possible that such a family is infinite?

      If there is a viable f, we also have f(2r), f(4r), f(8r) etc
      If we have f(r) therefore, we can create a trial function g(2r) = f(r), and then have to fill in the odd values of g (this process could be called expansion).

      Parity considerations suggest that the sum along any AP up to 2r+1 will be odd. If C is odd, 3 for example, and the sequence is good up to 2r, it can be completed to 2r+1 (with either of ±1).

      There are as many APs with odd d as with even d, so we have half the values to play with and half the APs to control (the even ones are all done by f). But there are constraints at the even positions in the odd APs, which are already fixed.

  23. gowers Says:

    Another idea occurred to me for a method that might produce a faster search for examples, but the people who’ve actually done some searching will have a better intuition than I have for whether it would actually work.

    The idea is based on the thought, which has already been expressed above, that multiples of 30 are particularly likely to be problematic because they have lots of small factors, and hence lots of potentially conflicting constraints. To combat this, one could perhaps define a “block” to be a \pm 1-sequence x_1,\dots,x_{30} of length 30 with the property that all AP sums are between -2 and 2, and furthermore x_1+x_2+\dots+x_{30}=x_3+x_6+\dots+x_{30}=x_5+x_{10}+\dots+x_{30}=0. One could then make a big list of blocks (of which I imagine there are quite a lot), noting various properties they have, such as whether the sum x_2+x_4+\dots+x_{30} is 1 or -1.

    Come to think of it, the multiples of 2 are slightly problematic, so another thing one could do is make a list of all pairs of blocks that give you a sequence that works all the way up to 60. But that’s just a detail. The main idea would be to restrict one’s search to concatenations of blocks. I think it could eliminate a lot of repetition from the search, and therefore be much faster. The drawback is that it isn’t clear that all very long sequences are concatenations of blocks as I’ve defined them above, but it seems to me that if the multiples of 2, 3 and 5 are in good shape at each multiple of 30 then it ought to be quite useful.

    There are also a number of variants one could imagine of this basic idea. For instance, having made the list, then instead of doing a completely basic depth-first search, one could do a slightly cleverer one as follows: having chosen blocks B_1,\dots,B_r, then one would see whether any of the next thirty values were forced, and then look only for blocks that took the right values at the forced places. Or one could make a list of mini-blocks that dealt with the next six places and first make a list of mini-blocks that don’t cause trouble and only then search for blocks that start with one of the mini-blocks that had not been ruled out. In other words, one could do a depth-first search, but because each node in the tree would have many children, one could also use simple tricks to rule out all but a few of the children when the sequence started to get long.

    • gowers Says:

      More generally, one could for each block make a table of the following information. For every d and every a between 0 and d one would have an interval I(a,d) of all the values taken by the sums x_a+\dots+x_{a+dh}. So, for example, if you have chosen the first ten blocks, and if x_7+\dots+x_{294}=2, then you will need a new block y_1,\dots,y_{30} such that the sums y_1,y_1+y_8,y_1+y_8+y_{15} etc. all lie between -4 and 0.

  24. gowers Says:

    I’d like to draw attention here to the fact that I’ve added a new section to the original post, with some additional variants of the original problem.

  25. gowers Says:

    A question that interests me, related to the final problem in the updated post above, is this. If we worry only about those progressions \{d,2d,\dots,nd\} such that d has a prime factor greater than M, then how big does M need to be before one can get very long sequences with C=1? For example, if d has to be a multiple of a prime greater than 10 (meaning that if the discrepancy on an arithmetic progression with some other d is large then we don’t care), can we find a sequence of length 500? I imagine this would be a piece of cake to program, especially if you’ve already got a program for looking at C=2. (I’d also be interested to know whether forgetting about a few small d makes sequences with C=2 so easy to find that you can just go on and on and on without getting stuck.)

    • Alec Edgington Says:

      If we only care about APs with a common difference divisible by a prime greater than 10, then with C=1 the longest I’ve found with a quick search has length 131:

      + – - + – + – - + + + – - + + + – - + – + – - + + + – - + – + – - + + + – - + + – - – + – + – - + – + – + + – + – - + + + – - + + + – - + – - – + + – + + – + – + + – + + + – - – + – - – + – + – - – + + – + + – - + – - + + – - + + + + – + – + – + + – + – - + – +

      With M=6 we can do at least 83; with M=4, at least 59. Longer sequences may be possible, but aren’t a doddle to come by.

    • Alec Edgington Says:

      In response to the question about ignoring a few small d in the case of C=2, my initial impression is that it doesn’t make a dramatic difference. If we stop caring about d \in \{2, 3\}, my program sticks at 692; and even if we stop caring about anything divisible by 2 or 3, it sticks at 714. I haven’t run it for long, though.

    • gowers Says:

      That lends support to the pretty plausible hypothesis that when n gets large it’s the APs with larger common difference that really count.

      Another question while I’m still here. Can you get any sense experimentally of what happens if you just look at prime common differences?

      Added two seconds later: oops, that’s a silly question. If you just look at odd primes then you can take the sequence (-1)^n, and if you want 2 in there as well then you can take 1,1,-1,-1,1,1,-1,-1,…

    • Alec Edgington Says:

      Come to think of it, the above numbers (for C=1) must be best possible. For if p is the smallest prime greater than M, then we care about all sub-APs of (x_p, x_{2p}, \ldots, x_{12p}); and one of these must fail.

    • Alec Edgington Says:

      I meant to say ‘if we stop caring about d \in \{1, 2, 3\}’ (not ‘d \in \{ 2, 3\}’).

  26. gowers Says:

    I am unlikely to be able to do much to this blog over the next week, so here’s one final comment. It’s a small observation connected with how one might attempt to prove the result.

    One could summarize the observation as saying that one can do quite a lot of shifting of the APs with respect to which the sequence is required to have bounded discrepancy. To explain this, I’ll pretend that we’re only interested in APs for which the common difference is either 1 or a prime number. A preliminary remark is that if you can prove that the discrepancy on APs of the form \{(m+1)d, (m+2)d,\dots,nd\} is unbounded, then you’ve solved the problem (since if that’s big then the discrepancy is big either on the first m multiples of d or the first n multiples of d). But that means that instead of looking at the interval \{1,2,\dots,N\} we can look at the interval \{M+1,M+2,\dots,M+N\} for any M of our choice. The intersections of the multiples of primes p less than N with that arithmetic progression will, when you subtract M, be shifts of the multiples of p, and by the Chinese remainder theorem we can choose, for each p, what we want the shift to be.

    For any particular choice of shifts, the problem could well be just as hard as the original problem. But it is just conceivable that one could get somewhere by taking a random collection of shifts. Of course, one would have to work out what happens for arithmetic progressions with composite common differences (unless it turns out that there must be unbounded discrepancy on a progression with prime common difference, but that feels rather optimistic).

    • Sune Kristian Jakobsen Says:

      Assume that there is a sequence x_1,x_2,\dots with bounded discrepancy, and that M\equiv a_1 \pmod{m_1}, M\equiv a_2 \pmod{m_2}, \dots M\equiv a_k \pmod{m_k} (a_i\geq 0) has a solution M. Now we can use a shift to prove:

      There exist a sequence x_1,x_2,\dots and a constant C such that for any 1\leq i\leq k and n
      |\sum_{j=1}^nx_{a_i+jm_i}|\leq C.

      This is just the shift that Gowers mentioned. But lets consider an infinite number of congruences M\equiv a_1 \pmod{m_1}, M\equiv a_2 \pmod{m_2}, \dots (a_i\geq 0), such that any finite collection of them has a solution M. Using a compactness argument we can now prove:

      There exist a sequence x_1,x_2,\dots and a constant C such that for any i and n
      |\sum_{j=1}^nx_{a_i+jm_i}|\leq C.

    • Alec Edgington Says:

      A small thought on this idea. Let D be the set of differences in the APs we are interested in. (So D = \{ 1, 2, 3, \ldots \} in the actual problem.) This argument relies on D having the property that any two of its elements are coprime. But such D are, in a rough sense, orthogonal to the ones we are really interested in: namely, those that contain long APs of the form \{ d, 2d, \ldots \}. So any step in the direction of loosening the dependence on such sets would be interesting.

  27. Klas Markström Says:

    I believe this is an example of length 717 for C=2. There are at least 14 different examples of length 700

    {1, -1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1, -1, 1, -1, 1, 1, -1, 1,
    -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, -1, 1,
    -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, -1,
    -1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1,
    -1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, 1, -1,
    -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1,
    1, -1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, 1,
    1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1,
    1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1,
    1, 1, -1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
    -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1,
    1, -1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, 1, -1,
    -1, 1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, 1,
    -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1,
    1, -1, -1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, 1, -1,
    -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1,
    1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1,
    -1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, 1,
    -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1,
    -1, -1, -1, 1, -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, 1,
    -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1,
    1, -1, 1, -1, -1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1,
    -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1,
    -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1,
    -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1,
    -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1,
    -1, 1, -1, -1, 1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1,
    1, 1, -1, 1, -1, 1, -1, -1, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, -1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1,
    -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1,
    -1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1,
    -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1,
    -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, 1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, 1,
    1, -1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1}

    • Alec Edgington Says:

      Wow. And a little modification of that gets us to 727:

      + – + – + + – - – + – + – + – + + – + – - – + – + + + + – + – - + – - + – + – + + – + – - + – + + – + + – - + + – - – - + + + – + + – + – - + – - + + – + – - + + – + – - + + + – + – - – + – + + – - – + – + – + + – + + + – - – + – + + – + – - – + + + + – + – - + – - + – - + + + + – - + – - – + + + + – + – - + + – - + – + – + – - + – + – + + – + – - + + – + + – + – - + – - + – - + – + + + – - + – + + – + + – + – - + – - + – - + + – + + – + – - + – + – - + + + – + – - + + – - + – + + – + + – + – - + – - + – - + + – + + – + – - + + – + – - + – + + + – - + – - – + + + – + + – + – - – + + + – - + – - + – - + + – + – - + + + – + – + – - + + – + – - + – - + + – + + + – - – + + – + – - + – - + + – + + – + – - + + – + – + + – - + + – + – - + – - + + – + + – + – - + + – + – - + – - + + – + + – + – - – + – + + – + – + + – - + – - + + – - + – + + – + – - + + – - – + + – + + + – + – - + – - – + – + + – + – - + + – + + – + – - + + – + – - + + – + + – + – - + – - + + – + + – + – - – + – + – - + – + + + – + + – + – - + + – + – - + – - + + – + – + – - + – + – + + – + – - + – - + + + + – - – + + – + – + – - + + + – + – + – - + + – - – - + – + + + – + – - + – - + + – + + + – - – + + – + – - + – - + + + + – - + – - + + – + + – + – - – + – + + – + – - + + – + – + + – - + – - + + – - + – + + – + + – + – - + – - + – - + + – + – - + + + – + – - + + + – + – - – + + – - – + + – - + – + + + + – - – - + – + + – + + – + + – + – - + – - + + – + + – + – - + – +

    • Klas Markström Says:

      Yes I haven’t really tried to optimize the construction here so please try out some backtracking on truncations of the sequence. When my program have finished n=700 I’ll try n=800. The the first run will probably finish some time tonight.

      I have started out with all admissible sequences of length 25 and then successively eliminated those which could not be extended to length 100, 200,300, 400,500,600. There was a large drop in the number of remaining sequences for n=600, and many for which the program reached it’s time limit.

      This was done by a simple modification of a program I already had so there is probably room for some specialised improvements if one really want’s to push this problem.

    • Mark Bennet Says:

      Well, when I went off for Christmas I had this idea that 719 might be the limit (3!! = 720).

      If we have a long sequence such as this we rely on the APs with differences 2, 3, 4 etc all being extendable. It may be possible to identify limits on the sequence by examining the APs with small differences.

    • Mark Bennet Says:

      Klas

      Do you have a record of your 14 sequences up to 700?

      These could come in pretty handy in analysing the subsequences of long sequences.

      It might then also be possible to identify sequences we could usefully extend which have been filtered out by your program by the time limit.

      Mark

    • Klas Markström Says:

      Mark
      I do not have the full sequences available right now, only the initial stubs which were extended to length 700. But later I can find explicit extensions as well. I have already modified the program so it keeps the full sequence rather than just the stubs.
      When the run had finished it had identified 202 such which are extendable to length 700. There are at least 72 which can be extended to length 798.

      Now I do not work with multiples of 100 any more bur rather number with many factors, and I have begun to extend the remaining stubs to longer lengths than 25.

      My program does not filter anything out due to the time limit. The program keeps all sequences which get a time out as potentially extendable, it only removes those which it finds inextendable.

  28. H Matthew Says:

    It occurred to me that there is a sort of self similarity involved in defining the curve. The problem in some sense is asking whether it is possible to define some sort of self similar curve with respect to constraint C.

  29. Klas Markström Says:

    Here is an example of length 798 for C=2, there is at least one more.
    {1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1,
    -1, 1, -1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, -1, -1,
    1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, 1,
    -1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, -1, 1, 1, 1, -1, 1,
    -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, 1,
    -1, 1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, -1, -1,
    1, -1, 1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1,
    -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, 1, 1, -1, -1, 1, -1, -1, 1,
    1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, -1, 1, 1, 1, -1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, 1,
    -1, 1, -1, -1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1,
    1, -1, -1, 1, 1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1,
    -1, -1, 1, 1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1,
    1, -1, -1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, -1, -1, 1, 1,
    -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1,
    1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1,
    1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1,
    -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1,
    -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, -1, 1, 1, -1, 1, 1,
    -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, -1,
    1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, 1, 1,
    -1, -1, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, 1, 1,
    -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, 1,
    -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, -1, 1,
    -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1,
    1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1,
    -1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, -1, -1, 1,
    -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, -1, 1, 1,
    -1, -1, 1, 1, -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1,
    -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1,
    1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1,
    -1, -1, 1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1,
    -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1,
    -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, 1, -1, -1,
    -1, -1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, 1, -1, 1, -1, -1, -1,
    -1, 1, -1, 1, 1, -1, 1, 1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1,
    -1, 1, 1, 1, 1, -1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, -1,
    -1, 1, -1, 1, -1, 1, 1, 1, -1, -1, 1, 1}

    • Alec Edgington Says:

      That is 799, I think? Backtracking from there gets you quickly up to 832:

      + – - + – + + – - + + + – - – + + – + – + – - – + + + + – + – - – + – + – - + + + – + – - + + + – - – - + – + + + + – - – + + – + + – + – - – - + + + – + – - + + – + – - + + – + + – + – - + + – + – - + – - + + – - + + + – - – + – + + – + – + – + – + + – + – - – + – + – - + – + + + + – - – + – - + + – + + – + – - + + – + – - + – - + + + + – - + – - + + – - + – + + – + + – + – - – - + + + – + – - + – - + + – + + – + – - + + – + + – + – - – - + + – - + – + + + – - + – + – - + + – + + – + + – + – - + – - + + – - + + + – - + – - + + – + – - + – + + + – - + – + – + – - + – + – + – - + + – + + – + – - + + – + – + + – - – + – + + + – - – + + – + – - + – - + + – + + – + – + – + – - + – + – - + + + + – - + – - + + – + – - + – - + + + + – - + – - + + – + + – + – - + + – + – - + – - + + – + + – + – - – + + – + – - – + + – + + – - + + – - + – + + – + – - + + – + – - + – - + + + + – - – + – + + – - + + + – - + + – - – - + – + + + + – - – + + – + – - + + – + – - + + – + + – + – - + – - + – - + + – + – - + + – + – - + + – + + – + – - + + – + – - + – - + + – + + – + – - + + – + + – + – - – + + + – - + – - + + – + + – + – - + + – - – - + – + + + – + – - + – - + + – + + + – - – + + – + – - + – - + + – + – - + + – + + – - + + – - – + – + + + + – - – + + – + – - + – - + – + + + – - + – + + – - + – + – - + + + + – - – - + + + – + – - + – - + + + – - + + – - + + – + – - + + – - + – + + – + – - + – - + + – + – - + + – + + + – - – - + – + + – + – - + + + – - – + – + + + – + + – + – + – + – - – - + – + + – + + – + – + – + + – + – - – - + + + + – + – - – - + + – + + + – - – + – - + – + – + – + + – + + + – + – - – + + – - – + + + – - + – - + – + + + + – + – - + +

  30. H Matthew Says:

    Just thinking out loud, please forgive any redundancy:

    Start with a string of length n.

    If p is the probability of a +1 occurring, then q = 1-p is the probability of -1 occurring.

    Constraint C places an upper and lower bound on q.

    qmax = 1/2 – (Cd/2n)
    qmin = 1/2 + (Cd/2n)

    for the initial string, d=1, therefore

    qmax = 1/2 – (C/2n)
    qmin = 1/2 + (C/2n)

    and for any C, as n approaches infinity, qmax and qmin converge at 1/2.

    when d > 1, then as d approaches n,

    qmax = 1
    qmin = 0

    I have some more thoughts on this, but I want to make them more concise before I post them.

  31. H Matthew Says:

    Sorry

    when d > 1, then as Cd approaches 2n,

    qmax = 1
    qmin = 0

  32. Proposal (Tim Gowers): Erdos’ Discrepancy Problem « The polymath blog Says:

    [...] For a description of Erdos’ discrepancy problem and a large discussion see this blog on Gowers’s blog. [...]

  33. H Matthew Says:

    I must be suffering from a fit of dyslexia…please forgive

    when d > 1, then as |Cd| approaches n,

    qmax = 1
    qmin = 0

  34. Klas Markström Says:

    Here is a sequence of length 897 for C=2, no additional optimization done

    {1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, -1, -1, 1,
    1, -1, -1, 1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1,
    -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1,
    1, -1, -1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, -1, -1, 1, 1, -1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
    -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, -1, 1, 1,
    -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1,
    -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, -1, -1,
    1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1,
    -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1,
    1, -1, -1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1,
    -1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, 1,
    1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1,
    1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, 1, -1, -1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, -1, 1, 1,
    -1, 1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, -1,
    -1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1,
    1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1,
    -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1,
    1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, -1,
    1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1,
    -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1,
    1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1,
    1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, 1, 1, -1,
    1, -1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, -1, 1, -1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1,
    -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1,
    1, -1, 1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1,
    -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1,
    1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1,
    1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1,
    -1, 1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, -1, 1,
    1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1,
    1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1,
    -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
    -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1,
    -1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, 1, 1, -1,
    1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1,
    1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, 1,
    -1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1,
    -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, -1,
    1, 1, -1, 1, -1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, 1}

    • Alec Edgington Says:

      Simple optimization only gets you one further in this case, to 898:

      + – + + – - + – + + – + – + – - – - + + – - + + – + + – - – + + + – + – - + – - + + – + + – + – - + + – - + – + – + + + – - – - + – + + + – - + + + – - + + – + – - + – - + + – + – - + – + + + – - + – + – - + + – + – - + – + + + – - – + + – - – + + + + – - – - + + – + + + – - + – - + + – + – + – - + – + + + – - – - + + – + + – - + + – + – - + – + + – - + – + – - + – + – + + – - + + – - + + + – + – - + – - + + + + – + – - + – - + + – - + – + + – - – + – + + – + – + – - + + + – + – - + – + + – - + – + + – + + – - + – - + – + + – - + + – + – + – - + – + + + – - – - + + – - + – + + + – - – + + – + + – - + + + – - – + + – + – - + – + + – + + – - + – + + – - + – - + – + – - + + – + + – - + + – + – - + – + + – - + – + + – + + – - + – - + – + + – - + – - + – + + – - + – + + – + + – - + – - + + + + – - + – + + – + – - + + – - + – - + + – - – + + – + + – - + + + – - + – - – + – + + – + + – - + + – + – + + – - – + – + + + – - – - + + + – + + – - + – - + – + + – - + – + + – + + – - + – + + – + – - – + – - + – + + + – + – - + – + + – - + – + – + + – - – + – + + – + + – - + – - + – + + + – + – - + – + + – - + – - + – + + – - + + + + – + – - – + – + + – + + – - + – - – + + + – - + – + + – - + + – + – + + – + – - – + – - + – + + – - + – + + – + + – - + – + + – + + – - + – - + – + + – - + + – - – + + + + – - + + – + – - – + – - + – + + + – - – + + – + + – - + – + + – - + + – + – - + – + + – + + – - + – + + – - – - + + – + + – - + + – + – + + – - + – - + – + + – - + – + + – + + – - + – - + – + + – + – - + – + – + – - + – + + + + – - – + + – + – + + – - + – - + – - + + + + – + – - + – + – + – - + – + + + – - + + – - + + – - + – + + – + – + – - – - + – + + – - + – + + – + + – - + – + + – - + + – + – - + + + – - – - + – + + + + – - – - + + – + – + + + – - + – + + – - + – - + – + + – + +

      The sticking point here is that \sum_{i < 29} x_{31i} = 2 while \sum_{j < 31} x_{29i} = -2.

      I'd be interested to know the number of sequences of length 25 that can be extended to 100n, for the n that you have so far (1 \leq n \leq 8?).

    • Mark Bennet Says:

      I can add a 1 to this to get a sequence of 898.

      899 = 29 x 31 and the sequences with difference 29 and 31 are the ones blocking progress to 899.

    • Mark Bennet Says:

      Looking at the subsequences of the length 898 sequence (ie the sequences obtained by picking out the elements rd for fixed d) there are quite a lot of different initial sequences.

      The one early observation which may have some interest (because it involves small primes) is:
      The sequence for d=5 is the negative of that for d=4 except at positions 59 and 61 (up to limit 179)
      The sequence for d=6 is the negative of that for d=4 (up to limit 149)

      Some other connections
      The (shorter) sequences for d=36 and d=43 are identical to the negative of that for d=9 (so far as they go)
      d=18 is the negative of d=10 and d=27

      There seems to be a tendency for the first divergence in these subsequences (for those which are identical or negative for a longer stretch than usual) at primes eg 13, 17. I have not checked, but I have not noticed any divergence at any value which has two or more distinct prime factors.

    • Mark Bennet Says:

      I still think it would be useful to investigate these subsequences in rather more depth, and to understand how far they can be extended. This may be related to Alan’s question about sequences of length 25 which can be extended.

    • Klas Markström Says:

      Alec, since my program has a time limit built in I do not known how many of the length 25 it is possible to extend to a given length. I have some which I know cna be extended and some which run out of time and are then passed on to the next step where they are often easier to eliminate.

    • Mark Bennet Says:

      Re subsequences with differences first occurring at n where n has two or more prime factors.

      The sequence of length 1124 below has subsequences for d=2, d=24 which are negatives of each other up to the 44th term.

      So though prime numbers or prime powers seem to be strongly preferred, they are not forced.

  35. Mark Bennet Says:

    Tim

    Some easy observations, and a question:

    A trivial answer to part of your final question about common difference not divisible by 2, 3, 5 … 101.

    For common difference not divisible by 2: the sequence which is 1 on odd numbers and -1 on even numbers has sum = 1 or 0 on every AP starting at 0 with odd difference. This is at the cost of every progression with even difference being unbounded (sum to nd is -n)

    The sequence which is built starting with 1, and at each stage appending the negative of the sequence to date:

    1
    1, -1
    1, -1, -1, 1
    1, -1, -1, 1, -1, 1, 1, -1

    Has sum -1, 0 or 1 on every progression whose difference is a power of 2.

    So it looks as if it is 2 which causes a problem.

    Is it possible to prove, for example, for C= 2 that there is an infinite sequence which works for differences which are divisible only by 2 and 3 (generalise to other sets of primes, perhaps higher powers of 2 and other values of C)?

  36. Alec Edgington Says:

    For the problem in higher dimensions, it’s conceivable that the answer might depend on the particular norm chosen for the condition on the \Vert u_i \Vert. (For the condition on the AP sums, of course, it makes no difference to the problem which norm one chooses.)

    I’ve done a little investigation using the Euclidean norm for both in two dimensions, and found that one can get at least as far as 47 with C=1. (One can achieve this using points of the form e^{2i\pi r/32}).

    • Alec Edgington Says:

      And with the \ell_\infty norm and C=1, one can get to at least 191 in two dimensions even restricting the points to (\pm 1, 0) and (0, \pm 1).

    • Alec Edgington Says:

      I realized that with the Euclidean norm in two dimensions, vectors placed on a regular hexagon are particularly useful. Using only such vectors, with C=1 we can get as far as 116 (and no further). The following sequence (r_n) achieves this, where u_n = (\cos \frac{\pi r_n}{3}, \sin \frac{\pi r_n}{3}):

      [0, 3, 3, 0, 3, 0, 2, 4, 0, 1, 4, 3, 1, 5, 0, 2, 4, 3, 1, 5, 5, 1, 4, 1, 2, 5, 3, 2, 5, 4, 1, 4, 1, 2, 5, 0, 2, 4, 4, 1, 5, 2, 1, 4, 3, 1, 5, 5, 2, 4, 1, 2, 4, 0, 1, 5, 4, 1, 4, 2, 2, 4, 0, 1, 5, 4, 2, 5, 1, 2, 4, 3, 1, 5, 5, 1, 4, 1, 3, 4, 2, 1, 0, 4, 1, 4, 3, 1, 5, 0, 2, 4, 5, 2, 4, 1, 1, 5, 3, 2, 0, 3, 4, 5, 1, 1, 3, 4, 1, 5, 0, 2, 4, 1, 3, 5]

      By using only these vectors we constrain the sums to a set of seven points: the six points of the hexagon and zero.

  37. Klas Markström Says:

    Here is a sequence of length 1073 of C=2.
    {1, -1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1,
    -1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1,
    -1, 1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1,
    1, 1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1, -1, 1, 1, -1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
    -1, 1, -1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1,
    -1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1,
    1, -1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1,
    -1, 1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1,
    1, -1, 1, 1, -1, 1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1,
    -1, 1, -1, -1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1,
    1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1,
    -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1,
    -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, -1, 1, 1, -1, -1, 1,
    -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1,
    1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1,
    1, -1, -1, 1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
    -1, 1, 1, -1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1,
    -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, -1, 1, 1, -1, -1,
    1, -1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1,
    1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1,
    1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1,
    1, 1, -1, 1, -1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1,
    -1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, 1,
    -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1,
    1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1,
    1, 1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, -1, -1, 1,
    -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1,
    1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
    -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1,
    1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1,
    -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1,
    -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1,
    -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1,
    -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1,
    -1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
    -1, 1, 1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1,
    -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1,
    -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, -1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1,
    -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1,
    -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1,
    -1, 1, 1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1,
    1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1,
    1, -1, 1, -1, 1, -1}

    My main program, which aims for a sequeqce of a specific length, was used to find sequene of length 1008 and then a very simple extrension program was run to find the sequence of length 1073. The sequence of length 1008 seems to have quite a lot of extension so it would be interesting to see what comes out if it is used as the initial sequqnce for a backtrack program. Here is the sequence of length 1008
    {1, -1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1,
    -1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1,
    -1, 1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1,
    1, 1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1, -1, 1, 1, -1,
    1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
    -1, 1, -1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1,
    -1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1,
    1, -1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1,
    -1, 1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1,
    1, -1, 1, 1, -1, 1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1,
    -1, 1, -1, -1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1,
    1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1,
    -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1,
    -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, -1, 1, 1, -1, -1, 1,
    -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1,
    1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1,
    1, -1, -1, 1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
    -1, 1, 1, -1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1,
    -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, -1, 1, 1, -1, -1,
    1, -1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1,
    1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1,
    1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1,
    1, 1, -1, 1, -1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1,
    -1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, 1,
    -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1,
    1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1,
    1, 1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, -1, -1, 1,
    -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1,
    1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
    -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1,
    1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1,
    -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1,
    -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1,
    -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1,
    -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1,
    -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1,
    -1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
    -1, 1, 1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1,
    -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1,
    -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1,
    1, -1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1,
    -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1,
    -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1}

    • TonyG Says:

      1124, in no time at all:
      + – + + – - – - + + – + + + – - + – + + – - – + – + + – + -
      - + + – + – - + – - + + – + + – - – + + + – - + – + – - + +
      + + – - + – - + + – - + + + – - + + – + – - + – - + + – + -
      - + – + + + – - + – + – - + + – - – + + – + + + – - – + – -
      + – + + + + – - – - + + – + + + – - + – - + – - + – + – + +
      - + + + – - – - + + + + – - – + + – + – - + – + + – - + – +
      + – + – + – + + – - – + + – + + – - + – - + – - + + + + – +
      - – + – - + + – - + – + + – - – + – + + – + + + – - + + – -
      + – - + – + + – - + – + + – + + – - + – - + – + + – - + + -
      + – + – - + – + + + – - + – + + – - – - + + – - + – + + – +
      + – - + – + + – - + + – + – - + – + – - + + – - + – + + + -
      + – - + – + – - + + + + – - – + + – + – - + – + + – - + – +
      + – + + – - + – - + – + + – - + + – - – + + – - + – + + – +
      + – - + – - + + + + – - – - + + – + + – + + – - + – - + + -
      - – + + – + + – - + – + + – + – - – + – + + – + + – - + – -
      + – + + – - + + – + + + – - – + – + + – + + – - + – - – + +
      + – - + – + – - + + + – + – + + – + – - – + + – + – + + – -
      + – - + – + + – - + – + – + + – + – + – - + – + + – - + – -
      + – + + + – - – + + – + + + – + – - + – + + – - + – + + – +
      - – - + – + + – + + – - + – - – + + + – - + – + + – - + + -
      + – + + – + – - – + – - + – + + – - + – + + – + + – - + – +
      + – + + – - + – - + – + + – - + – - + – + + – + – - + + – +
      + – - + – - + – + + – - + – + + – + + – - + – + + – - + + -
      + – - + – + + – - + – - + – + + – - + – + + – + + – + + – -
      + – + + – - + – - + – + + – - + – + – - + + + – + – - + – +
      + – - + – + – - + + + – - – + + – + + – - + + – + – + + – -
      + – - + – - + – + + – + – - + + + – + – - + – + + – - + – +
      - – + + – - + – + + – + + – - + + – + – - + – - + – + + – +
      + – - + – + + – + + – - + – - + + + – - – + – - + – + + – -
      + + + + – + – - + + – - + – + + – - + – - + – + + – - + – +
      + – + + – - + – - + – + + – - – + + – - + + – - + – + + – +
      + – - + – - + – + + + – + – - + – + + – - + – + + – + – + -
      - + – + – + + – - + – + – - + + + – + – - – + + + – - + – -
      + – + + – - + + + + – + – - – + – + + – + + – - + – - + + +
      - – - + – - + – + + + – + – + + – + – - – + – - + + + + – -
      + – + + – - + – + + – + – - + – + + – - – + – - + + – + – +
      + – + – - – + – + + + + – + – - – - + – + + – + + – - + + -
      + + – + – - + – + + – + – -

    • Mark Bennet Says:

      Working with 1124 sequence

      Some interesting patterns emerging here.

      Subsequence difference 2 is the same as the negative of subsequence 3 up to position 71.

      Subsequence 4 is the negative of subsequence 5 as far as it goes and 6 is the same as 5 up to and including position 175. (4 and 6 – cf 2 and 3), 9 is the negative of 5 up to position 105 and of 40 to position 27 (as far as it goes)

      Subsequence 7 is the same as subsequence 1 up to position 60.

      Subsequence 8 is the negative of 10 up to position 80 and of 12 up to position 88 (cf relationships between sequences 4,5,6). It is the same as subsequence 15 up to position 73, and the negative of 27 up to position 40 (and the same as 1 up to position 47).

      Subsequence 16 is the negative of sequences 20 and 24 (up to 43)

      So it looks as if there may be something going on in the subsequences which is related to the multiplicative structure of the low primes, but which isn’t anything like “completely multiplicative” – and this may be worth some further investigation.

    • Mark Bennet Says:

      Sorry, misread something subsequence 1 is not the same as subsequence 8, closest is 49.

    • Mark Bennet Says:

      There is a strong tendency to subsequences which begin (my numbers based on a simple binary code identify sequences, codes for the negative of sequence n and 511-n) sequence [values of d for which sequence applies]:

      (114) -1, 1, -1, -1, 1, 1, 1, -1, -1 … [2, 16, 21, 22, 25, 30, 36, 37, 39, 46, 57, 81
      (397) 1, -1, 1, 1, -1, -1, -1, 1, 1 … [3, 14, 17, 20, 24, 26, 33, 38, 45, 54, 69
      (181) 1, -1, 1, -1, 1, 1, 1, -1 … [4, 9, 29, 32, 35, 41, 42, 44, 50, 51, 60, 65, 72, 74, 78, 92, 95, 99, 109
      (330) -1, 1 -1, 1, -1, -1, 1, -1, 1 … [5, 6, 28, 31, 34, 40, 43, 48, 52, 55, 63, 66, 67, 75, 76, 90, 103, 108, 111
      (332) -1, -1, 1, 1, -1, -1, 1, -1, 1 … [8, 11, 15, 18, 23, 58, 64, 70, 79, 82, 84, 85, 88, 91, 93, 97, 100, 102, 106
      (179) 1, 1, -1, -1, 1, 1, -1, 1, -1 … [10, 12, 13, 19, 27, 56, 59, 62, 68, 77, 80, 83, 86, 87, 89, 96, 101, 104, 105, 110

      Apparently sporadic values
      (269) 1, -1, 1, 1, -1, -1, -1 -1, 1 … [1, 49
      (242) -1, 1, -1, -1, 1, 1, 1, 1, -1 … [7
      (458) -1, 1, -1, 1, -1, -1, 1, 1, 1 … [47
      (180) -1, -1, 1, -1, 1, 1, -1, 1, -1 … [53, 107
      (331) 1, 1, -1, 1, -1, -1, 1, -1, 1 … [94
      (333) 1, -1, 1, 1, -1, -1, 1, -1, 1 … [61, 112
      (178) -1, 1, -1, -1, 1, 1, -1, 1, -1 … [98
      (182) -1, 1, 1, -1, 1, 1, -1, 1, -1 … [71
      (329) 1, -1, -1, 1, -1, -1, 1, -1, 1 … [73

      Note that 8r tends to belong to the same sequence as r, and that while 1 and 7 belong to sporadic sequences, 8 and 56 belong to the main set.

      If 2r belongs to n, 3r belongs to 511-n

      The tendency of sporadic values to attach to primes, and the curious case of 7. Some of the higher values may change when the sequence is extended.

      The persistence of the six strong sequences suggests further work on these – the analysis above is based on the first nine values in each sequence. It is, of course, possible to analyse further in the same way. But I thought I’d give this to see if anyone spotted any obvious patterns.

    • Mark Bennet Says:

      The six strong sequences exhibit some very regular behaviour when d is multiplied by low primes, with the following cycle structure:

      2 – (114, 181, 332) (397, 330, 179) [square of 5]
      3 – (114, 330, 332, 397, 181, 179) [inverse of 5]
      5 – (114, 179, 181, 397, 332, 330)
      7 – (114, 397) (181, 330) (179, 332) [cube of 5] [13 has the same structure]
      11 – (114) (397) (181) (330) (332) (179) [identity]

      So we appear to be in the cyclic group of order 6.

      This structure could be used in the search for longer sequences.

    • Mark Bennet Says:

      Just to note that taking the coding out a little further, the same structure persists (here the negative sequence for code n is 4095-n). There are a few more sporadic values associated with particular primes, but six main sequences, with a cyclic group structure (note that numbers with the same code relate to the same group elements, so the code elements can be regarded as the elements of the group.

      What happens when we take the sequence for 5 (group generator) or 3 (its inverse) and try to extend these? -ie begin with the sequence f(3r) or f(5r)

      What about taking the sequence for 8 (group identity) – extending f(8r)? (or f(15r)) – how far do these get?

      Or should we be asking different questions? How is this group structure best deployed? Does it apply only to this sequence, or does every long sequence constrained to C=2 have this cyclic group structure (as seems likely)? And how is the structure explained – where does it come from?

      179: 13, 19
      181: 29
      1203: 10, 12, 27, 56, 59, 62, 68, 77, 80, 83
      1204: 53
      1205: 4, 9, 32, 35, 41, 42, 44, 50, 51, 60, 65, 72, 74, 78
      1206: 71
      1266: 7
      1421: 3, 14, 17, 20, 24, 26, 33, 38, 45, 54, 69
      1482: 47
      2674: 2, 16, 21, 22, 25, 30, 36, 37, 39, 46, 57, 81
      2829: 1, 49
      2889: 73
      2890: 5, 6, 28, 34, 40, 43, 48, 52, 55, 63, 66, 67, 75, 76
      2892: 8, 15, 18, 58, 64, 70, 79, 82, 84, 85
      2893: 61
      3914: 31
      3916: 11, 23

    • Mark Bennet Says:

      Passing to subsequences (eg f(8r) in previous post) looks as though it may reduce the number of sporadic sequences (so 7 would not be sporadic any more) and may help to illuminate the question of whether these sporadic subsequences are getting in the way of progress, or whether they are necessary.

    • Mark Bennet Says:

      Note that – it appears (speculating beyond data) that if one of the six key sequences arises, they are all required. For example, each of the six is a subsequence of the [identity] sequence.

      The case of 1, 7, 49 cannot be usefully investigated further (it may be significant) until (?) we have a much longer sequence to show what happens for 343.

    • Mark Bennet Says:

      Actually, it looks as though 7 would be sporadic in f(8r), because 112 is sporadic. It would be interesting to get deeper into this behaviour of 7 (which induces the same group element as 10, 13 …) and to know whether it might be reflected in larger primes too.

    • Mark Bennet Says:

      The sequences belonging to r=8 and r=10 split themselves into two when they are extended to 15 digits, while the others persist – but this involves differences in entries from about the 800th place – and it may be that a successful long sequence would differ in these places …

      … or it might be that the pattern is both necessary and unsustainable that far out (leading to a contradiction).

    • gowers Says:

      Mark, the 181 sequence you wrote down doesn’t have discrepancy 2. I presume that’s just a typo, but what should it be?

      I’ve just realized I can answer that for myself. The final 1 and -1 should be the other way round. If you confirm that you agree with that then I’ll edit your comment.

    • Sune Kristian Jakobsen Says:

      No, he forgot a -1. 181 should be the negative of 330.

  38. Sune Kristian Jakobsen Says:

    Instead of asking for a function from the natural numbers f:\mathbb{N}\to \{-1,1\} (just another way to represent the sequence x_1,x_2,\dots) with bounded discrepancy, we could ask for such a function from the positive rational numbers f:\mathbb{Q}_+\to \{-1,1\}. That is, such that for any positive rational number d and any integer n:

    |\sum_{i=1}^nf(id)|\leq C

    Using a compactness argument, these two questions turn to be equivalent. In other words: Assuming that there is a sequence with bounded discrepancy, we can find a finite sequence, that can be extended in the way Mark described in http://gowers.wordpress.com/2009/12/17/erdoss-discrepancy-problem/#comment-4618 arbitrary many times (and by arbitrary factors).

  39. Mark Bennet Says:

    Here is a possible way of making progress by looking at the subsequences.

    Suppose there is an infinite sequence f(n) for some C (one dimensional case).
    Examine the subsequences f(rd) for fixed d – these will also do for C.
    Suppose that only a finite number of infinite sequences exist for the value C.

    Then, by taking subsequences whose differences are powers of 2, we must find two the same (there are only a finite number).

    Take the smallest sequence with the lower power of 2 as the base sequence. We now have a sequence for C which has a subsequence identical to the original sequence (g(n)=g(nd) for some fixed d and all n)

    We can therefore find the sequence g(n) for which g(n) = g(nd) for the smallest possible d. What can we say about de in this case (eg with the current case of C=2)?

    • Mark Bennet Says:

      See above – it looks as though the C=2 case forces 6 significant subsequences plus some sporadic ones. The six subsequences have a group structure (cyclic, order 6). [NB this is the one dimensional case]

      This structure could do with some explanation – it depends on a map which is nearly a homomorphism from the multiplicative group of positive integers to the cyclic group of order 6 [the sporadic values are exceptions]. The subsequence structure shows that there will be a family of such maps.

    • Mark Bennet Says:

      Ignore that comment about a family of maps.

      Attempts to extend the six basic sequences as far as possible will illuminate the structure further.

    • Mark Bennet Says:

      Note that the primes associated with the sporadic sequences seem to induce permutations of the six basic sequences which are members of the cyclic group (though the data supporting this are thin). It may therefore be possible to associate a group element to each sporadic sequence too.

    • Alec Edgington Says:

      Mark, I wonder (quite speculatively) if some of the patterns you are finding arise because the sequence is ‘trying’ in various places to approximate the Legendre symbol modulo a prime p. This is a promising candidate for bounded discrepancy because it is multiplicative and periodic modulo p, so that the partial sums on all special APs are bounded by a constant depending only on p. It doesn’t actually solve the problem, of course, because it’s zero on multiples of p, but perhaps long sequences can be obtained by judicious placement of 1s and -1s at those multiples.

    • Mark Bennet Says:

      Interesting thought, Alec, but I’m not sure it will work (it may) – because I’d expect some clear distinction between primes of the forms 4n+1 and 4n+3, which doesn’t quite seem to be there. And there is the issue of what is to be done with the prime 2, which seems to me to have a key role, but which I don’t really understand.

      The issues at the back of my mind are:

      1. Without some tractable structure it would seem that an infinite sequence with maximum discrepancy 2 would be impossible to prove. It has seemed several times that a limit may have been reached, and the extent of progress has been surprising every time. There is as yet no explanation of this, in terms of how the blockages in particular cases relate to a deep structure.

      2. If a limit to sequences with maximum discrepancy 2 is eventually reached, then the case C=3 is there to be explored. It seems to me that any limit will then be very large indeed, and certainly beyond my computer resources – and that some understanding of how structure arises in the sequences could give some tools for analysis. As I noted above, there are different parity issues which arise for odd C, and the even subsequence may be very significant.

    • Alec Edgington Says:

      Incidentally, I don’t think one can improve on the \log n bound by using the Legendre symbol modulo p_0, filling in the multiples of p_0 with the Legendre symbol modulo p_1, and so on — at least not without choosing the p_i very carefully. According to the paper of Borwein and Choi referred to above, Schur has shown that the partial sums of the Legendre symbol reach at least \frac{1}{2\pi} \sqrt{p}. (Actually it’s not entirely clear from the remark in that paper whether this is true for all p or just for infinitely many p — can anyone confirm this?)

    • M. S. Smith Says:

      I believe it is true for all p, via a Fourier analysis argument (Parseval’s thm), if my memory serves me.

  40. Sune Kristian Jakobsen Says:

    I think Mark’s observations are very interesting. If I understand him correctly, a sequence of this type without sporadic subsequences would be of this type:

    Let G be a finite abelian (\mathbb{Z}/6 seems to be a good one) and definite a multiplicative function g:\mathbb{Z}\to G and a function h:G\to \{-1,1\}. Now define x_n=h(g(n)).

    Notice the a multiplicative sequence is of this form with G=\mathbb{Z}/2.

    • Sune Kristian Jakobsen Says:

      I hope this one contains fewer typos :s :
      Let G be a finite abelian group (\mathbb{Z}/6 seems to be a good one) and define a function g:\mathbb{Z}\to G such that g(nm)=g(n)+g(m) and a function h:G\to \{-1,1\}. Now define x_n=h(g(n)).

    • Sune Kristian Jakobsen Says:

      I think we should use this to search for long sequences with bounded discrepancy. I order to define a sequence of the above form, you only need to specify G, h for every element in G and g(n) for every prime. A good starting point seems to be G=\mathbb{Z}/6, h(x)=1, for x=0,1,2 and -1 otherwise, and g(2)=2, g(3)=5, g(5)=1, g(7)=3, g(11)=0, g(13)=3.

      I haven’t checked, but I think that if a sequence only have a finite number of different subsequences, then it must have a subsequence of the above form.

    • Sune Kristian Jakobsen Says:

      Here is the proof that if a sequence only have a finite number of different subsequences, then it must have a subsequence of the above form:

      Let x be a sequence with only finitely many different subsequences. Now you can alway find a subsequence y such that every subsequence z of y contains a subsequence identical to y: If this is not true for y=x, there must be a subsequence z of x such that z don’t contain a subsequence identical to x. Now try y=z: This sequence contains fewer different subsequences, so you can only do this a finite number of times. Finally you find a y that works.

      Now I want to show that y is of the above form. Let d_1 and d_2 be two natural numbers such that (y_{nd_1})_{n\in\mathbb{N}}=(y_{nd_2})_{n\in\mathbb{N}}. For any natural number $d_3$ we have (y_{nd_1d_3})_{n\in\mathbb{N}}=(y_{nd_2d_3})_{n\in\mathbb{N}}. In other word: If the “d_1-subsequence” and the d_2-subsequence of the y-sequence are the same subsequence, then for any subsequence z of y, the d_1-subsequence and the d_2-subsequence of z are the same. We now identify the natural number d with the sequence (x_{nd})_{n\in\mathbb{N}}. We want to define a group structure on the subsequences: Define the composition of two subsequences by choosing a natural number that represents them, multiply these numbers, and take the corresponding subsequence. This composition is clearly associative, commutative and has identity and inverse.

    • Mark Bennet Says:

      So the question is whether the sporadic bits are smoothed out in the big picture, or whether they multiply and become uncontrollably complex.

      It is trivial, of course, that the homomorphic image of a group is determined by the images of generators, so the behaviour of primes is fundamental.

      The fact that changing the sign of a sequence does not change the discrepancy pattern has always implied a group element of order 2. It is the existence of an order 3 element which I don’t have a handle on yet. And I conjecture that this would help to identify “pure sequences” rather than the ones we have which may contain sporadic fluctuations. The fluctuations might be necessary, of course, but they would be understood in their proper context.

      I would like to be able to make a conjecture, for example, whether C=3 would give a group element of order 4 (=3+1) or order 5 (the 3rd prime) or something else. An abelian group of order 6, or 30, is necessarily cyclic, but this is not true for order 12 or 24.

    • Sune Kristian Jakobsen Says:

      I’m not sure I understand. If the cyclic group of order 6 works for C=2 it will also work for C>2. Are assuming that the conjecture is true (no sequences have bounded discrepancy) and talking about patterns in long finite sequences for C=2 and C=3?

    • Mark Bennet Says:

      I’m planning for the contingency that C=2 is false.

  41. gowers Says:

    Just got back from my internet-free week. Not much time for anything right this moment, but I have to make one comment: wow! That’s at the breaking of the 1000 barrier. I had assumed that the conjecture was true (that bounded discrepancy is impossible) but I think that can no longer be taken for granted, even if it still seems more likely. I have a few simple observations that I’ll post later.

  42. obryant Says:

    The discussion continues here:

    http://gowers.wordpress.com/2010/01/06/erdss-discrepancy-problem-as-a-forthcoming-polymath-project/

  43. Discrepancy, The Beck-Fiala Theorem, and the Answer to “Test Your Intuition (14)” | Combinatorics and more Says:

    [...] of the form (r,2r,3r,…kr}. EDP was extensively discussed and studied in polymath5. Here is the link of the first post. Here are links to all polymath5 posts.  Here is the link to polymath5 [...]

  44. EDP22 — first guest post from Gil Kalai « Gowers's Weblog Says:

    [...] sequences where grows slowly. EDP was extensively discussed and studied in polymath5. Here is the link of the first post. Here are links to all polymath5 posts. Here is the link to polymath5 wiki. A sequence is called [...]

  45. Thomas Says:

    If you assumed that there exists an infinite sequence such that C=2, could you prove that the sequence had a given sub-sequence? It obviously has all length 2 sequences as sub-sequences, and it has + + – and – - + as sub-sequences, but is it possible to prove that the sequence has other sub-sequences?

  46. Win Five Hundred Or A Million Dollars, By Recursion? | Gödel's Lost Letter and P=NP Says:

    [...] sums stay within . For a proof of this and much more information about the EDP, see this original post by Timothy Gowers, a more-recent post by Gil, and this Polymath [...]

  47. Thomas Says:

    Have we tried to prove that the drift must be at least 4? We could make it easier by trying to prove that the drift is at least 4 or the discrepancy is at least 3. I will work on it.

    • Thomas Says:

      Never mind, I took a look at the 1124 sequence, and the drift of 4 doesn’t happen for the first 100 or so, so it would be really hard to prove it must eventually happen.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 1,423 other followers

%d bloggers like this: