Erdős’s discrepancy problem as a forthcoming Polymath project

This is an emergency post, since the number of comments on the previous post about Erdős’s discrepancy problem has become unwieldy while I’ve been away enjoying a bit of sunshine in Luxor. My head is full of amazing details from the walls and columns of ancient Egyptian tombs and temples. The main thing I didn’t see, or at least not until I finally saw them sticking up through the mist from my plane window yesterday morning, was the pyramids, since those are near Cairo. However, I didn’t feel too bad about that as my guide book, the Lonely Planet guide to Egypt, assured me that the pyramids never fail to disappoint (which was just one of many little gems of bad writing in that book).

For the first time for ages, I was completely away from all email and internet for a week, but just before I left, the results of the polls, as they then stood, about the next Polymath project were already suggesting that the Erdős discrepancy problem was the clear favourite, so I couldn’t help thinking about it a certain amount while I was in Egypt. I said that the process of choosing the next problem was not going to be fully democratic, but the fact is that I love the Erdős discrepancy problem and am currently somewhat grabbed by it, so I see no reason to go against such a clear majority, especially as it also has a clear majority of people who say that they are ready to work on it.

Now that I’m back, I see that the longest sequence yet found with discrepancy at most 2 in any arithmetic progression of the form $\{d,2d,3d,\dots,nd\}$ has gone up to 1124, and is exhibiting some clear structure. So yet another argument for choosing this particular project is that it has in a sense already begun. I’d like to regard what is happening now as a kind of preliminary stage before the true launch. The fact that such long sequences with low discrepancy can exist, and the fact that they appear to have to have some structure, are two extremely helpful pieces of information, since they place very definite constraints on what any proof could look like. For example, it no longer seems all that likely that the true bound for the best possible discrepancy of a sequence of length $n$ is logarithmic in $n$, and even if it is, the proof will not be some easy induction if the bound is logarithmic to a very high base. And I don’t think one can rule out the possibility that there exists an infinite sequence of bounded discrepancy: at any rate, it seems foolish to stop trying to find longer and longer sequences, since there is almost certainly more that we can learn from them.

There are a couple of terminological decisions I’d like to make collectively before we start properly. Then main one is what we should call an arithmetic progression of the form $\{d,2d,\dots,nd\}$. I’ve never managed to come up with a good answer to this, and the lack of a good phrase for it is annoying when writing about the Erdős discrepancy problem, so any suggestions would be very welcome. [While writing this post, I found a description of the problem that called them homogeneous arithmetic progressions. That seems a pretty good solution to me, so perhaps we should go for that unless someone comes up with a clear improvement.] A more minor decision is how to refer to the problem itself. I think I favour EDP, since three-letter acronyms seem somehow right. The only argument against that is that we talked about DHJ rather than DHJP (or DHJT), but I think that is outweighed by the catchiness of EDP: ED just doesn’t do it for me in the same way. A yet more minor decision is what number to give to this project. I’m pretty sure it should be polymath5 (following on from Terry Tao’s polymath4, which was the search for a deterministic algorithm for finding primes). But before giving it that tag I thought I’d quickly check in case anyone thinks I’m wrong. Another decision I’d like to make, since not having made it clearly yet is another source of slight annoyance when I try to write about the Erdős discrepancy problem, is what counts as a positive answer. I’m not sure what Erdős actually conjectured, but I’d like to assume that the conjecture is that all sequences have unbounded discrepancy. Therefore, if I refer to a positive answer, that is what I mean, and a counterexample is an infinite sequence with bounded discrepancy. Another practical question: is there some way of displaying good examples so that their structure can be more easily seen? At the very least it might be good to put them in tabular form, perhaps with each row of some highly composite length such as 24. If I just see a long line of pluses and minuses and I want to know what the subsequence $x_{23},x_{46},x_{69},\dots$ is like, then it is extremely tedious to find out. I don’t know what WordPress’s support for tables is like, though. If anyone can help me to display the sequence of length 1124 (and others) in a nice way, I’d be extremely grateful.

At some point reasonably soon, I plan to write a post that will describe various approaches to the problem that do not work, in the hope of stimulating a discussion about what kind of approach could conceivably work — which at the moment is not at all clear to me (at least if the conjecture is true). That post, rather than this one, will be the “official launch” of the project, and it is at that point that I shall start working seriously on the problem.

For the benefit of anyone who does not want to wade through well over a hundred comments on the previous post, here is a very quick summary of what we know now that we did not know when I put up the post. The main thing is that it is possible to have extremely long $\pm 1$ sequences of discrepancy at most 2 (meaning that the sum over any homogeneous arithmetic progression has modulus at most 2). But there is more to say. A simple observation that I mentioned in the previous post is that multiplicative sequences, by which I mean sequences $(x_n)$ such that $x_{mn}=x_mx_n$ for every $m$ and $n$, are good candidates for low-discrepancy sequences, since the discrepancy of such a sequence is bounded by the largest size of any of its partial sums, which sort of means that there is less to check.

Of course, that is not a very convincing argument, because the price we pay for having less to check is a huge constraint on the sequence: we are now free to choose its values only at primes and everything else is determined. And the initial experimental evidence quickly showed that multiplicativity was indeed too strong a constraint if one is looking for the longest possible sequence with a given discrepancy, since the longest sequence of discrepancy 2 is much longer than the longest multiplicative sequence of discrepancy 2. However, what the examples given in the comments on the previous post are suggesting is that a weaker form of multiplicativity may well still hold, perhaps in some approximate sense. We can characterize a multiplicative $\pm 1$ sequence as a sequence $(x_n)$ that starts with $x_1=1$ and has at most two distinct subsequences of the form $(x_{dn})$ (one of which will be minus the other — there will be exactly two unless the original sequence is 1 everywhere). The weaker property is simply that there should be a small number of distinct subsequences. The very long examples don’t quite exhibit this, but they do show something a bit like it: if you look at the subsequences of the given form, then there are six beginnings that appear a lot. This has led to a suggestion that another way of building good examples is to take an Abelian group $G$, to choose a function $f:\mathbb{N}\rightarrow G$ such that $f(mn)=f(m)+f(n)$ for all $m$ and $n$, and to compose that with a function from $G$ to $\{-1,1\}$. In the light of what we have seen, a good choice for $G$ appears to be $C_6$. It turns out that if a $\pm 1$ sequence $(x_n)$ has only finitely many distinct subsequences $(x_{dn})$, then it must have a subsequence of this form.

It doesn’t look as though the example of a sequence of length 1124 is of this form for the group $C_6$, but it certainly does seem that we can get some understanding of the sequence (which we do not yet fully have) by looking at it with $C_6$ very much in mind. If the conjecture is false, then it seems possible that to produce an infinite sequence of bounded discrepancy one would start with a long finite sequence produce with the help of a small group, and one would then introduce complications that would eventually be explained by the presence of a much larger group, and one would iterate this process. (That is intentionally vague — I do not know how to make it less so.)

Incidentally, the account of the problem I alluded to above, by Josh Cooper, gives it as an open problem whether a multiplicative sequence can have bounded discrepancy. I have some dim memory of being told by a number theorist once that it couldn’t: does anyone know? The most obvious $\pm 1$-valued multiplicative sequence, the Liouville function, defined to be -1 raised to the power the number of prime factors of n, has partial sums that grow at rate that is, I think, known to be at least in the rough region of $n^{1/2}$ and at most at this sort of rate if and only if the Riemann hypothesis is true. [Added later: I now know that both these statements are correct.] But what about an arbitrary multiplicative sequence? We know from the Walters base-3 sequence that the partial sums can grow quite slowly, so perhaps this question really is also unsolved, in which case it would make an excellent auxiliary project: a strictly easier question that now appears to be highly relevant to the main question.

125 Responses to “Erdős’s discrepancy problem as a forthcoming Polymath project”

1. Sune Kristian Jakobsen Says:

“It doesn’t look as though the example of a sequence of length 1124 is of this form for the group $C_6$
I think it is worth noticing that if you pass to the subsequences $(x_{2n})$ or $(x_{3n})$ you’ll get rid of many (if not all) the sporadic sequences.

2. gowers Says:

Maybe someone can answer the following question, about which I do not yet feel entirely clear, though I might be able to work it out from a closer look at the relevant comments on the earlier post. Is the claim that there are just six (major) ways in which sequences of the form $x_d,x_{2d},\dots$ begin, or are we talking about the entire subsequences? If the former, how far does one typically go before the phenomenon stops and the sequences that start out the same become different?

The reason I ask is that I’m wondering whether one can argue that some kind of group structure is highly likely to appear. The rough reasoning would be as follows. If it is hard, or at least not very easy, to produce long sequences of discrepancy 2, then there are probably not all that many ways that such a sequence can start. But if that is the case, then quite a lot of what has been observed is fairly forced. For instance, here is a simple result: if there were a unique infinite sequence with $x_1=1$ and discrepancy 2, then it would be forced to be multiplicative (since any subsequence of the form $x_d,x_{2d},\dots$ would have to be either the original sequence or minus the original sequence). So it seems that in a certain sense, the more difficult it is to produce examples, the more those examples have to have multiplicativity properties, which suggests that if the conjecture is true, then the best examples should indeed have a multiplicative nature. So, oddly enough, the existence of these interesting examples could be taken as weak evidence that the conjecture is true.

• Mark Bennet Says:

There look to be six major ways in which the subsequences can begin. These are
(d, code, length of subsequence, start of subsequence starting with position 1)

2, 2674, 562, -1, 1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1
3, 1421, 374, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1
4, 1205, 281, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1
5, 2890, 224, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1
8, 2892, 140, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1
10, 1203, 112, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1

The value of d is the smallest for which the particular sequence appears. The code is a binary one: convert -1 to 0 and read the earliest members of the sequence as the least significant digits (not sure this is the best coding, but it was what worked at the time).

Working on longer subsequences is possible, but for higher value of d there are divergences at the tail end, and it is not yet possible to say whether these are necessary. If this were the case the 6 sequences might become 12, for example as each sequence diverges into two possibilities. Given that there are two choices at each stage, it is natural to expect a split into two.

The alternative is that the long sequence needs adjusting to extend it, and once adjusted the regular pattern will persist.

[eg the 15th value in the subsequence for d=56 is at position 840, and previous major extensions to the sequence have involved adjustments that far back]

The existence of the factor 3 in a process which seems to have such a clear binary nature intrigues me. It looks as though the subsequences for d=4 and d=10 have a lot in common, but they are different and behave differently, and diverge further.

One matter worth considering is whether early divergences in the case of d=prime which give rise to some of the sporadic effects eventually disappear (so that eventually the sequence becomes identical with some canonical one).

I am not sure that the sporadic elements do disappear on passing to “regular” subsequences – that is a conjecture, which would require proof.

• Mark Bennet Says:

Working with 1124 sequence

Subsequence difference 2 is the same as the negative of subsequence 3 up to position 71 [71 is a sporadic prime, and this is a significant point since this affects position 213 of the main sequence]

Subsequence 4 is the negative of subsequence 5 as far as it goes and 6 is the same as 5 up to and including position 175. (4 and 6 – cf 2 and 3), 9 is the negative of 5 up to position 105 and of 40 to position 27 (as far as it goes)

Subsequence 7 is the negative of subsequence 1 up to position 60 [corrected from previous comment]

Subsequence 8 is the negative of 10 up to position 80 and of 12 up to position 88 (cf relationships between sequences 4,5,6). It is the same as subsequence 15 up to position 73, and the negative of 27 up to position 40 (and the same as 1 up to position 47).

Subsequence 16 is the negative of sequences 20 and 24 (up to 43)

Most of these divergences are out beyond f(800) except for the one between 1 and 7 – but the main sequence and 7 are sporadic approximating the sequences for 8 and 10 respectively.

So, to answer your question, we could very well be looking at entire subsequences.

• Mark Bennet Says:

Sorry, subsequence 8 diverges from the main sequence in positions 1, 7, 47, 49, 53, 61, 71, 73, 94, 98, 107, 112, 116
[I suggested above that the first divergence was at position 47]

These positions have a strong relationship with the values of d for sporadic sequences.

3. gowers Says:

Thanks for all that information, which I am getting to grips with. Apologies if I repeat things that others have said, but let me do so in case it helps anyone else who might be reading this. Looking at those six sequence starts, one can get a cyclic structure on them by means of the operation of taking every third element. Denoting each sequence by its starting element, this gives us the permutation (2 5 8 3 4 10). From this it is clear that one should expect 2 and 3, 4 and 5, and 8 and 10 to form negative pairs.

It seems a good idea to try to factor this sequence into a map from $\mathbb{N}$ to $C_6$ and a map from $C_6$ to $\{-1,1\}$. Without loss of generality, we can map 3 to 1 (where 3 is a natural number and 1 is the generator of $C_6$). Of course, we also map 1 to 0. And 2 maps to 4, since the 2-sequence is obtained from the 3-sequence by multiplying everything by 3 three times, which translates into adding 1 three times to 1. For similar reasons, 5 maps to 5. By multiplicativity we then know that 4 maps to 2, 8 maps to 0 (which corresponds to Mark’s observation that the 8-sequence is more or less the same as the original sequence) and 10 maps to 3. We can also get those more directly.

It would be nice if someone could produce a table of the values associated with each prime. Just to clarify what I mean here, for each prime p I’d look at the sequence $x_p,x_{2p},x_{3p},\dots$ and see which of the six fundamental sequences it corresponded most closely to. Then I’d define $f(p)$ to be the element of $C_6$ corresponding to that sequence. It would then be nice to write out the whole function that results from these assignments and multiplicativity (where again that is the now the property $f(mn)=f(m)+f(n)$). I’m also intrigued to know whether it connects in any surprising way with Alec’s hexagonal investigations.

One can also see what the map from $C_6$ to $\{-1,1\}$ is by looking at the initial values of the sequences numbered 3,4,10,2,5,8, which should also be the values taken at the first six powers of 3, with the caution that subsequences 8 and 10 differ from what they “should be” in the first position. This gives us the map that takes 1,2,3,4,5,0 to 1,1,-1,-1,-1,1. This is pretty close to what Sune Kristian Jakobsen was saying, though it doesn’t seem to be quite identical.

Yet another question, which might be answerable by brute force. If we insist on producing a sequence by first mapping multiplicatively to $C_6$ and then composing that with the map just defined in order to get a $\pm 1$ sequence, then what is the best we can do? In particular, can we do substantially better than we can do with a multiplicative sequence? (We can do at least as well by just mapping everything to 0 and 3 in $C_6$, which will result in a multiplicative sequence.) Has anyone tried that?

It’s also tempting to be slightly more general and compose a multiplicative homomorphism from $\mathbb{N}$ to $\mathbb{T}$ with the map that takes complex numbers with positive imaginary part to 1 and those with negative imaginary part to -1 (and takes 1 to 1 and -1 to -1). Here is a situation where working with the positive rationals might be rather nicer, since they form a group under multiplication, so we could talk about a character in the usual sense. The idea here would be that each sequence obtained by restricting to a homogeneous progression would be a sort of “rotation” of each other one, so if the original sequence was sufficiently well-distributed then we’d be completely done and have a counterexample.

• Mark Bennet Says:

Here is a table of primes – the sporadic primes seem to behave quite like the regular ones, so I’ve allocated these to the sequence to which they seem to belong. The primes 47, 71, and 73 are conjectural. 7 is a potential counterexample to regularity.

2, 37
3, 17
4, 29, 41 [sporadic: 53, 61]
5, 31, 43, 67, 103 [sporadic ?47]
8, 11, 23, 79, 97 [sporadic: 1, 49, ?73]
10, 13, 19, 59, 83, 89, 101 [sporadic: 7, ?71]

I like the idea of working with the rationals and with characters.

• gowers Says:

Let me tabulate that in a different way: f(2)=4, f(3)=1, f(5)=5, f(7)=3?, f(11)=0, f(13)=3, f(17)=1, f(19)=3, f(23)=0, f(29)=2, f(31)=5, f(37)=4, f(41)=2, f(43)=5, f(47)=5??, f(53)=2?, f(59)=3, f(61)=2?, f(67)=5, f(71)=3??, f(73)=0??, f(79)=0, f(83)=3, f(89)=3, f(97)=0, f(101)=3.

Now I’ll put together a sequence that’s the multiplicative function from $\mathbb{N}$ to $C_6$ that you get by assigning these values to the primes. I’ll arrange it in rows of 10, to make it clearer what each number is the image of. I’ll go up to 50.

0, 4, 1, 2, 5, 5, 3, 0, 2, 3
0, 3, 3, 1, 0, 4, 1, 0, 3, 1
4, 4, 0, 1, 4, 1, 3, 5, 2, 4
5, 2, 1, 5, 2, 4, 4, 1, 4, 5
2, 1, 5, 2, 1, 4, 5, 5, 0, 2

Next, I’ll map that to $\{-1,1\}$ by mapping 0,1,2 to 1 and 3,4,5 to -1.

+ – + + – - – + + -
+ – - + + – + + – +
- – + + – + – - + –
- + + – + – - + – -
+ + – + + – - – + +

Encouragingly, the partial sums of this sequence are well-behaved. So are the sums along multiples of 2, 3, 4, 5 and 7, which implies (since that covers all six possible subsequences) that it gives us an example of a sequence that works up to 50 and for all I know will work quite a bit further.

• Mark Bennet Says:

I reckon that this scheme requires an adjustment (anomalous value) at position 112 (=7 x 16).

• Mark Bennet Says:

And there seems to be an issue with the 7 sequence generally, though it is easy to get out to 182 by altering values at 112 and 122.

The scheme seems to have a long-term bias towards the value -1, so new primes have to come in at +1 more often than not. There also look to be issues with the 61 sequence.

The other previously sporadic primes look to be under control, but the multiples of 2 and 3 tend to blow up, suggesting that there will need to be some variation/anomalous values in these sequences.

• Alec Edgington Says:

Tim, your sequence works all the way up to $121$, if we further set $f(103) = 5$ and $f(107) = f(109) = f(113) = 0$.

• Alec Edgington Says:

One can get rather further with this type of sequence, as follows:

f(2) = 3; f(3) = 3; f(5) = 0; f(7) = 3; f(11) = 0; f(13) = 4; f(17) = 5; f(19) = 0; f(23) = 3; f(29) = 2; f(31) = 5; f(37) = 1; f(41) = 2; f(43) = 5; f(47) = 5; f(53) = 5; f(59) = 0; f(61) = 2; f(67) = 5; f(71) = 5; f(73) = 0; f(79) = 0; f(83) = 0; f(89) = 3; f(97) = 3; f(101) = 3; f(103) = 3; f(107) = 3; f(109) = 0; f(113) = 0; f(127) = 3; f(131) = 0; f(137) = 3; f(139) = 0; f(149) = 0; f(151) = 3; f(157) = 0; f(163) = 0; f(167) = 3; f(173) = 0; f(179) = 3; f(181) = 3; f(191) = 0; f(193) = 0; f(197) = 3; f(199) = 3; f(211) = 3; f(223) = 3; f(227) = 3; f(229) = 0; f(233) = 0; f(239) = 0; f(241) = 0; f(251) = 0; f(257) = 3; f(263) = 0; f(269) = 0; f(271) = 3.

However, one can’t get much further than with a purely multiplicative sequence. In other words, with this particular map $C_6 \rightarrow \{-1, 1\}$, factoring the sequence through $C_6$ is only slightly better than factoring through $C_2$.

• Alec Edgington Says:

Apologies — I made a mistake in my program. It’s quite possible you can get longer sequences. I’ll correct the program and try again!

• gowers Says:

It’s striking that there are large numbers of 3s and 0s there, and in particular that from a fairly early point on every single choice is a 3 or a 0. Do you have an explanation for that?

• gowers Says:

I was writing my comment before seeing your last one, so I’ll wait for the new data since it seems that my question may not apply after all.

• Alec Edgington Says:

Yes, that was my mistake: I thought that for primes $p \geq \sqrt{N}$ (where $N$ is the limit of the search), it only mattered whether $p$ mapped to $1$ or $-1$. Of course, this is only the case for primes above $N/2$.

• Mark Bennet Says:

Alec

For p greater than the square root of N, all the multiples of p up to N will also belong to earlier progressions, which are already determined. For p of any reasonable size these values will determine the class to which p belongs.

• gowers Says:

Mark, I don’t understand what you mean here. If, say, p=89, then in what sense is 178 “already determined”?

• Mark Bennet Says:

My mistake – I was going back to the era of my thinking when the subsequences were ‘given’ rather than constructed.

It would, though, be possible for the value at 178 to be forced because it is in the 2 subsequence, which will be complete up to 176, and if the sum along the 2 subsequence to 176 is 2 or -2 the value at 178 will have to be -1 or 1 respectively.

• gowers Says:

That’s an interesting point, since it could in theory speed up the search for multiplicative examples: after choosing the values at all primes up to n, one would fill in all values that were forced by multiplicativity, and then search for further values that were forced because they lay at the end of a homogeneous AP where the sum was 2 or -2, followed by further values that followed from multiplicativity, and so on until the process stopped. And one would then have a free choice for the smallest prime that had not yet been filled in. It may be that Alec has already been doing something like this, but it may also not have been worth his while. Unfortunately, it seems less likely to be all that helpful for the more sophisticated going-through-$C_6$ examples.

• Mark Bennet Says:

Tim

Looking at the prime 89 we need to know whether it maps to 0, 1 … 5. This is not determined by its first value, so if the value at 178 is forced – and some later values too, these may help, I think.

• Sune Kristian Jakobsen Says:

“It’s also tempting to be slightly more general and compose a multiplicative homomorphism from $\mathbb{N}$ to $\mathbb{T}$ with the map that takes complex numbers with positive imaginary part to 1 and those with negative imaginary part to -1 (and takes 1 to 1 and -1 to -1).”

I will call the above function from the complex numbers to $\{-1,1\}$ for $s$ (s for sign).
Lets call a finite multiset $A$ with elements from $\mathbb{T}$ $C$-balanced (or just balanced) if for any $u \in \mathbb{T}$ the multiset $s(uA)$ contains almost the same number of 1′s and -1′s (so the difference between the number of 1s and -1s is at most C).

What can we say about balanced sets? It is easy to construct a set with sum 0, that is not balanced (choose i and a lot of numbers just below 1 and -1 ) but will every balanced set have sum close to 0?

Lets call a (finite or infinite) sequence $C$-balanced (or balanced) if for any n the multiset of the first n elements in the sequence is C-balanced.

What can we say about balanced sequences?

The motivation for the definitions is, that if we can find a infinite balanced multiplicative sequence, then s taken on this sequence is a counterexample to the conjecture.

• Sune Kristian Jakobsen Says:

To answer my own question: The sum of the elements in a $C$-balanced set is at most $C$. Sketch proof: Assume WLOG that the sum is real and positive. Because the set is C-balanced, we can prove that the number of elements with real part $>x\in (0,1)$ is almost C greater than the number of elements with real part $< -x$. So the sum of the real part of the element with (C+i)th greatest real part and the real part of the element with i'th smallest real part is negative. So the sum of all the real parts is less than the sum of the C greatest real parts.
QED

This implies, that if the partial sums of a sequence is unbounded, it cannot be balanced, and we cannot use it in the way Gowers suggested. So if we can prove that for any sequence that takes values in $\mathbb{T}$ and any C, the sequence has a HAP (homogeneous arithmetic progression) with a sum greater than C in absolute value (e.g. using the approach Gowers suggested here:
http://gowers.wordpress.com/2010/01/06/erdss-discrepancy-problem-as-a-forthcoming-polymath-project/#comment-4758 ) we would have to:

1) Use a different function from $\mathbb{T}$ to {-1,1} or
2) Let $G$ be a non-cyclic group or
3) Abandon the idea of using a multiplicative function from N to a group G and compose it with a function from G to {-1,1}.

• Sune Kristian Jakobsen Says:

“is almost C greater”:
“almost” should have been “at most”

4. Henry Segerman Says:

I don’t know how useful this might be as a means of visualising a sequence, but perhaps this will inspire some further ideas:

http://www.ma.utexas.edu/users/henrys/polymath/fibspiralKlas717.pdf

The number n is plotted at radius Sqrt[n] and angle n*phi, where phi = (Sqrt[5] – 1)/2. The colouring is based on Klas Markström’s sequence of length 717 from http://gowers.wordpress.com/2009/12/17/erdoss-discrepancy-problem/#comment-4646

Red is +1, blue is -1. I’ve also drawn on spirals which hit the numbers d*n, for the cases d=5 and d=83. For d=5 and near the center it is easy enough to follow along and tally the sums. Further out, and for d=83 it isn’t so appealing.

I can tidy up and post the Mathematica code if anyone is interested.

• Alec Edgington Says:

I like the idea of representing a sequence as a spiral, as it may show up interesting patterns. However, I suggest that for normal posts of interesting sequences a format that is easily cut-and-pastable into a program (for analysis) would be best. I’d hope that WordPress tables would be fine for this; otherwise (or if tables prove to be a lot of work to enter), just a convention of putting line breaks after every 24 (or 30) numbers, and ensuring that the numbers align in a fixed-width font (for example, by inserting plus signs as well as minus signs), would work.

5. Chandra Says:

In combinatorial optimization EDP refers to the well-known edge-disjoint paths problem.

6. Kristal Cantwell Says:

I think you are right this should polymath5. Polymath3 is scheduled to start in April of 2010 according to http://gilkalai.wordpress.com/2009/12/08/plans-for-polymath3/#comment-2244.

7. Jason Dyer Says:

“Homogeneous arithmetic progressions” seems to be the accepted term for arithmetic progressions containing 0, see for example this paper on quasi-arithmetic progressions:

http://www.combinatorics.org/Volume_15/PDF/v15i1r104.pdf

(It’s possibly a relevant paper to our problem, since it proves the conjecture false in the case of quasi-arithmetic progressions.)

8. Polymath 5 « Euclidean Ramsey Theory Says:

[...] The topic has been chosen for polymath 5. it is Erdős’s discrepancy problem. See here for more [...]

9. Terence Tao Says:

Hi, just dropping by to confirm that to my knowledge, Polymath5 is not “reserved” by any other project.

Granville and Soundararajan have made some study of the discrepancy of bounded multiplicative functions. The situation is remarkably delicate and number-theoretical (and is closely tied with the Granville-Soundararajan theory of pretentious characters). See for instance http://arxiv.org/abs/math/0702389 .

I am unfortunately so swamped with other work right now that I will probably not be able to contribute in any meaningful manner for at least a month, but I certainly give this project my moral support ;-). Also, one should open up a page for this project on the wiki as soon as possible, it seems like there is a lot to put on there already. I hope someone else other than myself will be willing to take the initiative to do this, otherwise it won’t get done for ages…

• gowers Says:

Thanks for that reference, and also for reminding me about setting up a wiki page, which I had (slightly surprisingly, given what an integral part it was of DHJ) not thought about. I’ll get that done before the project starts properly. If nothing else, it would be good to have a place with some interesting and well-displayed sequences, and other experimental data too.

• Terence Tao Says:

It may also be good to try to have a parallel discussion thread to accompany the research thread, as I can imagine that there are going to be a number of meta-issues to discuss which will involve people (like myself) who are not going to be able to follow the flurry of activity on the research thread.

10. Kevin O'Bryant Says:

In the U.S., there are an insane number of commercials advertising medicines to treat “ED” (google it if you aren’t sure what it is!). So many, in fact, that I suspect we might find trouble with spam filters, and I certainly couldn’t tell anyone that I was working on “ED”.

I vote for “EHD” as the name for the problem, as in Erdos’ Homogeneous Discrepancy problem.

11. Kevin O'Bryant Says:

Is it really more natural to look at the discrepancy in the positive integers than in the nonnegative integers? Including 0, of course, only changes the discrepancy by at most 1, but it doesn’t always change the discrepancy at all. This means that the extremal sequences will be different, and may exhibit more structure. Or less, or the same.

• Alec Edgington Says:

I wondered about this too. In particular, it could be worth investigating sequences of discrepancy at most $2$ in this arena, as they may reveal different structures.

• Klas Markström Says:

Do you man that you want to look at APs which begin 1,d,2d….?

• Alec Edgington Says:

No: APs that begin at zero.

• Mark Bennet Says:

Interesting. This changes the parity considerations, which seem to have a powerful effect.

Assume C=2. Suppose we have a finite sequence f(n) starting with f(0)=1. Take g(2r) = f(r) and try to fill in the gaps. The d=2 sequence is under control, and the d=1 sequence can always be controlled (by parity).

Thus it remains to manage the other sequences, for which there is some inherent flexibility.

• Alec Edgington Says:

If one asks for discrepancy at most $2$ with respect to all zero-based HAPs, one is much more constrained. I believe the longest attainable sequence has length $84$ (in other words, one can’t get beyond $x_{83}$). Here is an example of a maximal sequence:

+1, +1, -1, +1, -1, +1, -1, -1, +1, -1, -1, +1,
+1, -1, +1, -1, -1, +1, +1, -1, +1, +1, -1, +1,
-1, -1, +1, -1, -1, +1, +1, -1, +1, -1, -1, +1,
-1, -1, +1, +1, -1, +1, -1, +1, +1, -1, -1, +1,
+1, -1, +1, -1, -1, -1, +1, -1, +1, +1, -1, +1,
-1, -1, +1, +1, -1, +1, +1, -1, +1, +1, -1, -1,
-1, +1, +1, -1, -1, +1, -1, -1, +1, +1, -1, +1

The set of such sequences is small (there are $432$), so it may be interesting to analyse it for symmetries.

• gowers Says:

Well that one you’ve put up is certainly quite interesting. Here are the first twelve multiples (starting at 0) of 1, 2, 3, 4 and 5:

+1 +1 -1 +1 -1 +1 -1 -1 +1 -1 -1 +1
+1 -1 -1 -1 +1 -1 +1 +1 -1 +1 +1 -1
+1 +1 -1 -1 +1 -1 +1 +1 -1 -1 +1 -1
+1 -1 +1 +1 -1 +1 -1 -1 +1 -1 -1 +1
+1 +1 -1 -1 +1 -1 -1 +1 -1 -1 +1 -1

If you compare these, you find that the multiples of 1 and 4 are very similar, though not quite identical, and the same goes for the multiples of 2, 3 and 5, which are similar to minus the multiples of 1 and 4. In short, the sequence is struggling to be multiplicative, but not with complete success.

In fact, the sequence is pretty close to the Walters example: if you look at integers that are 1 mod 3 then their images are almost all -1, while those that are 2 mod 3 have images that are almost all 1. Similarly, numbers that are 3 mod 9 almost all go to 1 and numbers that are 6 mod 9 almost all go to -1. (In fact, they all do this for a long time but the pattern breaks unexpectedly towards the end.)

• gowers Says:

Potentially, at least, there is a convincing argument for why it is so much harder to find sequences if you define HAPs to start at 0. If, as the experimental evidence is suggesting, we are getting a group structure out of the sequences, and if that group has even order, then we should expect there to be some d such that the values of $x_d,x_{2d},\dots$ are minus the values of $x_1,x_2,\dots$. If we really did have two sequences that were exact negatives of each other, then in order to get $C=2$ with zero included, we need to confine the partial sums of that sequence, if we don’t start at zero, between -1 and 3, and also between -3 and 1. In other words, we need to obtain a $C=1$ sequence in the old sense. Of course, what actually happens is that the sequences aren’t exact negatives of each other near the beginning, but it’s at least believable that including 0 is going to create difficulties.

• Alec Edgington Says:

The first thing to note about the $432$ maximal sequences of zero-based discrepancy $2$ is that they are quite similar! In fact, modulo negation of the whole sequence, they only differ at the points:

$\{ 29, 31, 53, 58, 59, 61, 62, 69, 71, 73, 75, 77, 79, 81, 83\}$

In particular, fixing $x_0$ forces the sequence up to $x_{28}$.

• Alec Edgington Says:

The second thing to note is that we have complete freedom to choose the values at $0$, $29$, $59$ and $83$. With these values set, there are $27$ possible sequences, which differ at the points:

$\{ 69, 71, 73, 75, 77, 79, 81\}$

12. Kevin O'Bryant Says:

Fix an odd prime p, and let f map $[p-1]$ into +1,-1, so that $\sum_{i=1}^{p-1} f(i)=0$. Define the cyclic discrepancy $D_p(f)$ to be the maximum of $|\sum_{i=1}^n f(id \mod p)|$, with the maximum being taken over all n, d in $[p-1]$. Let $D_p$ to be the minimum of $D_p(f)$ taken over all functions f.

Now take an f so that $D_p(f)=D_p$ and so that the worst discrepancy is along the $d=1$ progression.

Then we can make a sequence $a(n)$ over the naturals by setting $a(n)=f(n \mod p)$ if n is not a multiple of p, and $a(n)=f(n/p)$ if n is a multiple of p. This sequence will have logarithmic discrepancy. In particular, it should have discrepancy around $D_p \log_p(n)$.

So then, what is the smallest value of $D_p / \log(p)$? If this quantity goes to 0, then we probably have accomplished (using a compactness argument, but I haven’t done it) a sub-logarithmic discrepancy. My intuition is that it achieves it smallest value at $p=5$, unfortunately.

• gowers Says:

We investigated something pretty like this here and in the surrounding comments. It seems that things are definitely not optimized at $p=5$. (In the example we looked at there, we had an example with $p=211$ and $D_p=2$, achieved with a multiplicative function $f$.)

• Kevin O'Bryant Says:

Ahh, wonderful. That’s exactly what I was suggesting.

I think the analysis of the example there is flawed though, and doesn’t have $D_{211}=2$. For example, the progression with difference 209 isn’t automatically handled by the way we truncated and extended the sequence, so the discrepancy could be much larger.

Specifically, with p=11 and the seed sequence
f=+ + – - – + – + – +
(which has normal discrepancy 3 but cyclic discrepancy 4), we get an F sequence with F(9), F(18), F(27), F(36) being f(9), f(7), f(5), f(3), which are all -. Also F(2), …, F(12) shows a new difficulty.

Am I missing something about that example? It doesn’t seem that it coming from a completely multiplicative f is relevant, since that property won’t survive the truncation and periodic extension.

• gowers Says:

Yes you’re right: I stupidly didn’t spot that mod-211 multiplicativity was needed, which is a much stronger property.

My instinct is now the same as yours. I think $D_p$ probably has to be around $\sqrt{p}$, so one has to look at small $p$ only. The way I would attempt to prove that $D_p\approx\sqrt{p}$ would be to find a Fourier coefficient of about that size and then do some averaging tricks to get from a trigonometric function to a progression mod p. (I’m not using multiplicativity, but I don’t think I need to.)

• Kevin O'Bryant Says:

$D_p$ goes to infinity. Proof: Every AP in ${\mathbb Z}/p{\mathbb Z}$ is a subprogression of a homogenous AP (dilate by the inverse of the difference). Since the van der Waerden number $W(2k+2)$ is finite, for any prime larger than $W(2k+2)$ must have discrepancy at least $k+1$.

Unfortunately, van der Waerden numbers grow to rapidly.

• obryant Says:

Claim: $D_p \geq c p^{1/4}$. Proof: By Roth’s lower bound for the nonhomogeneous discrepancy, there is some AP with discrepancy at least $c p^{1/4}$. Since any AP in $\{1, 2, \dots, p-1\}$ is a sub-AP of a homogeneous mod-p AP (with the same difference), we get $D_p \gg p^{1/4}$.

13. gowers Says:

I have now started, in a very small way, a wiki page for Polymath5. In due course I shall add to it, but anybody else is of course welcome to do so as well. (You have to register, as we’ve had problems with spam, but the registration process is easy.) It would be particularly good to have nice tables of some of the sequences that people have been producing.

I’ve put a link to the wiki from the main page of this blog, so you don’t necessarily have to find this comment again if you want to visit the wiki.

• Jason Dyer Says:

Mainly because I had to kept digging for the link, I added a “annotated bibliography” stub with a link to our (as of the moment) most important paper.

As a gentle request, could we keep the bibliography annotated? One of the issues with the bibliography of polymath1 is it started to become unclear what each particular reference was used for.

14. gowers Says:

A couple of thoughts about multiplicative sequences and the like. First, let me try to describe a not fully precise algorithm for producing good sequences: the idea is that it would be something to investigate theoretically rather than use as a basis for computational experiments, though perhaps the latter could be done too.

To choose a multiplicative sequence, one just chooses its values at each prime. But might there be a sort of semi-greedy algorithm for doing this? The idea would be that, having chosen the values at primes $p_1,\dots,p_k$, which are not necessarily the first k primes, one would then have a look at everything that is implied by multiplicativity, try to identify “areas of danger” and make further choices to alleviate the danger. For example, if one found an interval that contained many more 1s than -1s, one might choose $p_{k+1}$ in such a way that a multiple of $p_{k+1}$ lay inside that interval, and choose the value at $p_{k+1}$ so that the value at the multiple would be -1. At any one moment there might be quite a number of competing constraints, but for large n the constraints would be quite weak (because the set of places where the values had been chosen would be quite sparse). It seems to me at least possible that some cleverly designed procedure could produce multiplicative sequences that do better than the logarithmic growth rate.

To turn that thought into an algorithm, one would need to think carefully about how to ascribe danger levels to intervals. (An extreme example would be something like that if $C=2$ and your choices so far imply that f(p-1)=f(p-2)=f(p+1)=f(p+2)=1, then there’s a super-dangerous interval {p-2,p-1,p,p+1,p+2} containing p, so we’re forced to choose f(p)=-1. But the idea of this quasi-algorithm is to start getting anxious before one’s moves are completely forced. Erdős himself had some results about game strategies based on this kind of idea: one could perhaps attach to each unspecified point in the sequence a number that measured the “pressure” that number felt to be assigned a value, where the sign of the number would tell you whether the value should be 1 or -1. Then one would try to choose values that relieved as much pressure as possible. Forced moves would correspond to infinite pressure, or perhaps just pressure that was so large that it swamped everything else.)

The other idea is that we might be able to get somewhere by aiming first for a weaker, but quite natural, target. The way discrepancy is measured, there is a sharp cutoff: a homogeneous AP reaches the end and then suddenly stops. One could smooth this out as follows. Instead of looking at homogeneous APs, let’s look at functions of the following form. You pick a constant s>0 and a positive integer d. You then define f(nd) to be $n^{-s}$ and set f to be 0 everywhere else. Finally, you define the discrepancy of a $\pm 1$ sequence $(x_n)$ with respect to f to be the sum $\sum_{m=1}^\infty f(m)x_m=\sum_{n=1}^\infty n^{-s}x_{nd}$. One can then ask, for a given sequence $(x_n)$, what the largest possible discrepancy with respect to functions f of the above type for fixed s and varying d. (Here, s relates to something like the length of the homogeneous AP. We think of s as quite small, so $n^{-s}$ starts to get small only when n is exponentially large in 1/s: that is, we can think of the length as something like exp(1/s).)

If $(x_n)$ is a multiplicative sequence, then the function $x_nn^{-s}$ has all sorts of nice properties. Can it be bounded near zero? If not, then I think partial summation shows that the partial sums of $(x_n)$ are unbounded too, in which case the conjecture would be proved for multiplicative functions. But I think it’s quite likely that it can be bounded: if one smooths the progressions like this, then the occasional “accident” can be compensated for, whereas with ordinary homogeneous progressions every accident is fatal.

• Kevin O'Bryant Says:

If $a_n$ is the indicator function of the set A of positive integers, then

$\lim_{s\to 1} \zeta(s)^{-1} \sum_{n\geq 1} a_n n^{-s}$

is exactly the logarithmic density of A:

$\lim_{x\to \infty} (\log x)^{-1} \sum_{a\leq x, a \in A} a^{-1}$.

This leads me to wonder if the meaning of “the occasional accident can be compensated for” may actually be that the the density of `1′s in each progression is 1/2. That is, if we let s go to a pole of the generating function, we lose all information except density. Maybe something more can be salvaged, though, doing as you suggest and taking a fixed s.

15. gowers Says:

Continuing with the second of the above themes, an obvious thing to do would be to write $\sum_n x_nn^{-s}$ as the Euler product $\prod_p(1-x_pp^{-s})^{-1}$. We want that to be bounded. The whole thing is a little bit strange because we don’t have absolute convergence, so we’re relying heavily on cancellation. But that is not too worrying when in a sense the original problem is about cancellation.

I’m not sure whether my intuition is correct here, but it also looks as though there needs to be a fairly heavy bias towards -1 as the value at primes. That’s because when s is small then $(1-p^{-s})^{-1}$ is extremely large.

And now I’m starting to wonder whether the product formula itself is correct: in the absence of absolute convergence it is not as obvious as I unthinkingly took it to be. And it worries me because it seems as though to make the function bounded my best chance is trivially to take $x_p$ to be -1 for every prime p, in which case we end up with the Möbius function.

Formally speaking, $\sum \mu(n)n^{-s}=\zeta(s)^{-1}$, but this isn’t valid near s=0. If we pretend it is, then $\zeta(0)=-1/2,$ which suggests that the Möbius function does in fact give an example for boundedness in this weaker sense.

So there’s a wildly non-rigorous and not obviously correct argument that it is indeed easier to get boundedness when you smooth off the ends of the homogeneous APs. Maybe some number theorist out there can go through the previous paragraphs and tell me what I ought to have said. If by a miracle the argument is correctable, then it doesn’t help much, except to demonstrate that the hardness of the problem lies, in a certain sense, in the sharpness of the cutoff.

• Sune Kristian Jakobsen Says:

I thought the Möbius function had zeros?

As I said above, it might be interesting to consider sequences over $\mathbb{T}$.

Meta comment:
Perhaps we could ask some of the small isolated questions at http://mathoverflow.net/ ? Of course there is a risk that the discussion gets too spread out, but it would make it possible to contribute without reading though lots of comments. (I think I’ve seen someone, perhaps you?, mentioning the same idea, but I couldn’t find it anywhere.)

• gowers Says:

Oops, you’re right. I meant the Liouville function (which is -1 raised to the number of prime factors). Of course, now it is no longer the case that we get the inverse of the zeta function.

A quick glance at Wikipedia tells me that $\sum_{d|n}\lambda(d)$ equals 1 if n is a perfect square and 0 otherwise. If we set $\phi(s)=\sum_n\lambda(n)n^{-s}$, then this tells us that $\phi(s)\zeta(s)=\sum_n (n^2)^{-s}=\zeta(2s)$. Thus, $\phi(s)=\zeta(2s)/\zeta(s)$. This would still seem to suggest that $\phi(s)$ was bounded near $s=0$, though the argument is still so unrigorous that it could well be wrong.

I like the idea of asking questions at mathoverflow, though it would be important to make the answers available here too. A possible code of practice would be as follows. If you have a question that can be understood in isolation from the rest of the discussion and think the answer is known, and if nobody participating here supplies an answer, then you post it at mathoverflow. If someone answers it, then you write a comment explaining the answer if it is short, or else summarizing it and providing a link to the relevant mathoverflow page if it is long.

• gowers Says:

I see that Kevin has put a link on the wiki page to a paper of Borwein and Choi that is relevant. Amongst the results it mentions is that showing that partial sums of the Liouville function are bounded in absolute value by $n^{1/2+\epsilon}$ is equivalent to proving the Riemann hypothesis. I didn’t find any mention of results in the opposite direction.

Perhaps this would make a good question for mathoverflow actually.

• Sune Kristian Jakobsen Says:

Perhaps we should have a polymath or ever polymath5 user, like the 20 questions user http://mathoverflow.net/users/85/20-questions or a polymath tag?

• gowers Says:

When you wrote that, I had already asked the question here. I now have some useful answers. In particular, the Liouville function (-1 raised to the power the number of prime factors of n) has discrepancy $n^{1/2-o(1)}$ infinitely often. I don’t have any answers about general multiplicative functions — this fact and a similar one for $\mu$ depend on the close relationships between these functions and the Riemann zeta function.

• Sune Kristian Jakobsen Says:

16. Kevin O'Bryant Says:

Here’s another (equivalent) form of the problem whose finite approximations enjoy a bit more symmetry.

Let D be a finite set of positive integers (the differences) with least common multiple M, and let N be the set of divisors of M that are also multiples of some element of D (include 0 in N). The discrepancy of a function f (from N to $\pm 1$) is the maximum of

$\left| \sum_{i=0}^k f( id ) \right|$

with the maximum going over all $d\in D$ and all $k\in [0, M/d]$. Denote this discrepancy by $DISC(f,D)$, and denote the minimum of $DISC(f,D)$ over all f by $DISC(D)$.

Note that $DISC(D)$ is bounded as D goes through $\{1\}, \{1,2\}, \dots, \{1,2,\dots, n \}, \dots$ if and only if Erdos’s question has a negative answer.

Understanding $DISC(D)$ for various D may help us understand how some progressions are interacting. Also, the inclusion of 0 in the discrepancy-defining summation seems to introduce some useful symmetry of the “x goes to $M-x$” sort.

17. Kevin O'Bryant Says:

Does anybody know the discrepancy of the Thue-Morse sequence?

• Gil Kalai Says:

Another famous example with low discrepency (when you replace 2 by -1) is the sequence that equals to its own run sequence (Conway’s?) 122112122…

Also (inspired by properties of Morse sequences) we can ask the following strenghening of the original problem. Is it possible to find +- 1 sequence x_1,x_2,… so that $|\sum_{i=1}^n x_{id}i^r| \le C n^r$ for every n
d and r. (Or we can restrict our attention to $0 \le r \le R$ for some fixed R.)

• Jason Dyer Says:

Gil, do you mean Conway’s Look and Say Sequence? I thought that included 3s?

There is a binary encoding version.

• Gil Kalai Says:

Jason, I do not think it is the same but it is a very similar idea.
In the sequence that equals its runs there are only 1s and 2s.

For example the runs in the sequence I wrote above are 122112
Which is the first 6 terms of the sequence and if you want to add more terms to the sequence so that the sequence of runs will capture all 9 terms
you need to add a few terms: 12211212212211
and now this determines even more terms 122112122122112112212 etc.
It is a conjecture that the density of 1s tends to 1/2 and the discrepency is possibly logarithmic. I do not have references and when I tried to google it I got the “look and say” sequence which is little different.

• Gil Kalai Says:

Regarding the question: Is it possible to find +- 1 sequence x_1,x_2,… so that $|\sum_{i=1}^n x_{id}i^r| \le C n^r$ for every n, d and r. It looks that it will be easier to show that this is impossible. On the other hand, is it possible that an example for the original problem is also automatically an example for the more general problem? Do the uge sequences with C=2 also have low value of C’ for the more general requirement?

18. gowers Says:

Here’s a quick proof that it’s infinite. Some kind of quantitative bound can be obtained from the argument too.

The Morse sequence tells you the parity of the number of 1s in the binary expansion. It follows that if you take a number such as 100000001 as your d, then you will get $x_d=x_{2d}=\dots=x_{md}=1$ for a large value of $m$. Indeed, if we take $d=2^k+1$, then we get a discrepancy of $2^k$ appearing inside an interval of length $2^{2k}$ or so. In other words, the discrepancy of the sequence is at least $\sqrt{n}$-like. I suspect that’s an upper bound as well but I don’t know of a proof.

• Alec Edgington Says:

This makes me wonder whether there is a bias towards positive values in the partial sums of the HAPs in this sequence, say for $d = 3$, and even (though it seems unlikely) whether they might be bounded below.

• Sune Kristian Jakobsen Says:

“The Morse sequence tells you the parity of the number of 1s in the binary expansion”
I think it would be more natural to index the sequence such that $x_n$ tells you the parity of the number of 1s in n-1. But you can modify your proof to work in this case to: Like before we choose $d=2^k+1$ but now we find a number n such that $dn\equiv 1 \mod {2^{2k}}$. Now $x_{dn}=x_{d(n+1)}=\dots = x_{d(n+2^k-1)}$, because the high bits of $dn-1, d(n+1)-1,\dots d(n+2^k-1)$ are identical and the low bits contains an even numbers of 1s for the same reason as before.

• gowers Says:

That’s an interesting question of Alec’s, and might be something worth thinking about, even if it’s not likely to have a bearing on the Erdős discrepancy problem itself. But before even that it might be amusing to do some computations. For instance, it could turn out that if you choose a fairly random $d$ then the values at multiples of $d$ are themselves fairly random and therefore the discrepancy is $\sqrt{n}$-like in both the positive and negative directions. This hypothesis seems worth testing experimentally before any attempt at proving it.

• Alec Edgington Says:

Here is one intriguing observation. The partial sums

$\sum_{i

never go below zero for $1 \leq k \leq 12$ and $0 \leq m \leq 10^6$.

• Alec Edgington Says:

It appears that some $d$ (those divisible by a $2^k+1$) show a strong bias towards positive values, while others show a stronger-than-expected neutrality. An example of the latter is $d=23$, whose first $10^7$ partial sums lie between $-112$ and $36$.

(For $d$ a power of two, the partial sums are bounded by $\pm 1$, because $x_{2n} = -x_{2n+1}$ for all $n$.)

• Jason Dyer Says:

An aside related to the Thue-Morse sequence (which might be trivial for everyone, but I just noticed it) is that if we restrict ourselves to considering d to be powers of 2, we get a a discrepancy of 0. The sequence can also be modified to give a discrepancy of 0 or 1 when restricting d to be the powers of any natural number. (Suppose we want powers of p. The sequence would be placed into x_p, x_2p, x_3p … x_n and the remaining values would be an alternating sequence.)

This suggests perhaps we can (cyclically) add different versions of the Thue-Morse sequence to obtain some useful result.

• Joseph Myers Says:

For how partial sums of HAPs in Thue-Morse for d=3 behave, see J. Coquet, A Summation Formula Related to the Binary Digits, Invent. math. 73, 107-115 (1983). There are similar results for other d, but I don’t remember the details or have references to hand.

• Alec Edgington Says:

I could only find the first page of that paper online, but it mentions that Newman proved in 1969 that the sums $\sum_{i all lie between $\frac{1}{20} (3m)^\alpha$ and $5 (3m)^\alpha$, where $\alpha = \frac{\log 3}{\log 4}$; Coquet proves more precise bounds. There are some related results (though no discussion of other $d$) in this paper of Wang:

19. Kevin O'Bryant Says:

I’m still working on optimizing the $\log_3(n)$-discrepancy sequence. In the polymath spirit, I’m recording a first lemma in that direction.

Fix an odd prime p and a function f from $\{1,2,\dots,p-1\}$ into $\{-1,1\}$. Extend f to ${\mathbb N}$ by $f(n)= - f(n/p)$ if n is a multiple of p, and $f(n)=f(n-p)$ if n is not a multiple of p. For any d, k, the discrepancy of f along the HAP $d, 2d, \dots, kd$ is

$\sum_{i=1}^k f(id) = (-1)^t \sum_{r=0}^{\ell} (-1)^r \sum_{i=1}^{b_r} f(id/p^t \bmod p)$

where $p^t$ is the largest power of p dividing d, and $k=b_0+b_1 p+\cdots+ b_{\ell} p^{\ell}$ is the base-p expansion of k.

The next step is to work out the corresponding lemma replacing the odd prime p with any odd m. Then, to do some computation modulo m to find the best seed for the f sequence.

• Kevin O'Bryant Says:

Oops. Also need to assume $\sum_{1\leq i < p} f(i) = 0$.

BTW, this already gives us a new record for infinite sequence with smallest discrepancy. The improvement is due to the minus sign in
$f(n)= -f(n/p)$.

Corollary: there is an f with $\delta(N,f) \leq 1/(\log(25)) \log N+C$, for some constant C.

Proof sketch: Start with p=5 and f taking 1,2,3,4 to 1,-1,-1,1, respectively. In the above formula, for any particular d the inner summation is always in {0,1} or in {0,-1}, depending on d. So half the terms drop out! The worst cases are $d=1, N=k=1+5^2+5^4+\dots +5^{2n}=(25\cdot 5^{2n} - 1)/24$, in which case

$\delta(N,f)=n+1 \approx \frac 12 \log_5(N)$.

• Kevin O'Bryant Says:

Oops again. The part about the inner summation being in either {0,1} or {0,-1} is wrong. The proof seems to work for p=3, and f taking 1, 2 to 1, -1, though, giving $\delta(N,f) \approx \frac12 \log_3(N)$, a slight improvement.

• gowers Says:

While I was in Egypt I had a similar thought (that it should be possible to get from log to base 3 to log to base 9 by alternating the signs at powers of 3) though I also thought that that would improve the wrong base-211 example to a base-$211^2$ one.

If something like this is best possible, then it’s interesting as it gives a bound that is surprisingly far from the best for all functions.

20. Kevin O'Bryant Says:

Recall that $\delta(N,f)$ is defined in terms of the set ${\cal A}$ of all HAPs contained in $\{1,2,\dots,N\}$. We can also define $\delta'(N,f)$ in terms of the set ${\cal A}'$ of all HAPs of length at most N contained in ${\mathbb N}$. The same f that shows it is possible for $\delta(N,f) \leq 1+\log_3 N$ shows that it is possible for $\delta'(N,f) \ll 1+\log_3 N$.

Many of the arguments so far end up caring about the length of the progression instead of its diameter.

21. Sune Kristian Jakobsen Says:

I’ll try to make the group idea more general, so let $(x_n)$ be any sequence.
The natural numbers under multiplication is a commutative monoid. If we identify $d_1$ and $d_2$ if $(x_{nd_1})=(x_{nd_2})$ we get another commutative monoid (I think). If, for every $d_1$ there is a $d_2$ such that $(x_{nd_1d_2})=(x_n)$ (that is: Every HAP has a HAS identical to the original sequence), then every element is invertible, and we have an abelian group.

22. Anonymous Says:

I’ve made the following tree to help visualize the 1124-term discrepancy 2 example. The vertices are the +/- patterns that a HAP starts with, with an edge connecting e1 to e2 if e2 is a continuation of e1. Each vertex is labeled with with the pattern and the number of HAPs that start with that pattern.

For example, the vertex “63 – - +” is connected to “45 – - + +” and to “7 – - + -”. There are 63 HAPs that start “- – +”, and of those 45 continue with +, 7 continue with -, and 9 are unknown (the next term would be after 1124).

http://obryant.wordpress.com/2010/01/09/a-rauzy-tree/

I may be naive, but perhaps it is even possible to work backwords from the distribution seen here to speed up the search. For example, almost all “+ + – -” patterns get extended to “+ + – - + + – +”, so a smart search might try that extension first.

Except for the exact layout of the tree, the process is automated within Mathematica. If anyone would like to see variants, or this kind of picture for different stubs, it’d likely take me less than two minutes.

• Sune Kristian Jakobsen Says:

Could you plot it for $(x_{2n})$ and $(x_{3n})$, where $(x_n)$ is the 1124 sequence? Perhaps also the tree for the sequence with only the first 1124/2 resp. 1124/3 terms of the 1124 sequence, so we can compare trees of sequences of the same length.

• Jason Dyer Says:

Timothy, I went ahead and wrote a computer program to finish your numbering. I can also use it to generate CSV files to make spreadsheet format, but I need to know what you think would be most helpful.

• gowers Says:

Two things. First, thanks for finishing off the numbering, which I had been doing laboriously by hand. Secondly, it might be quite nice to have exactly the same thing (that is, number followed by sign) but arranged in a table. I think a good arrangement of the table would be to have rows of length 24, but the first row should go from 0 to 23 (with the 0 position of the table left blank) so that it is easier to identify at least some of the HAPs.

The other thing is that, like Sune, I am interested in the possibility that by passing to subsequences of the form $(x_{dn})$ we may remove some of the “errors” and get better properties. So it would be very nice to have some of these subsequences displayed too. One that would be interesting is the subsequence $(x_{8n})$, which should be the same as the original but is in fact slightly different. A selection of subsequences that should be the same could be fascinating, especially if a majority vote led to a sequence with good multiplicativity properties.

• Jason Dyer Says:

As an experiment (before I saw the most recent message) I tried converting the table to Google Docs although I did not have a 0 space. When I make a 24 table I will put one. I am however running into a stupid problem — does anyone know how to change column size in Google Docs?

Only displaying the subsequences will also be easy. Could you list all the ones you want specifically?

• Thomas Sauvaget Says:

I’m in the process of writing a code to automatically convert a sequence of + and – into an HTML table with colors.

Here’s the 1124 sequence into a 24-column table. Is that the king of thign you’re looking for?

http://thomas1111.wordpress.com/2010/01/09/erdos-discrepancy-a-program-to-get-html-tables/

• Jason Dyer Says:

Here’s my google docs version of what you were looking for

although I like the color formatting Thomas is using better.

• Sune Kristian Jakobsen Says:

It seems that Thomas’ sequence is wrongly indexed. It first element of the sequence is called 0 in his plot. But it is nice to have the colors.

• gowers Says:

Thomas, that’s a nice display, but at the moment the colour you have assigned to n is the colour you ought to have assigned to n+1 (and zero should have no colour, or a special colour).

• Thomas Sauvaget Says:

Yes Sune and Tim, apologies, I’ve now fixed it and updated the code and table.

• Jason Dyer Says:

It’s also quite easy with the spreadsheet to do the subsequences that are factors of 24, just remove columns.

• Jason Dyer Says:

One more link and I’m going to bed. This is multiples of 2 only of the sequence, all I did was delete columns.

I still think also having the data in HTML format is useful.

• gowers Says:

One thing that jumps out from the data when it is presented in rows of 24 is that the sequence is highly biased with respect to most non-zero residue classes mod 24. I can think of various possible reasons for this, but I somehow can’t express them clearly enough to myself to feel ready to post them. At any rate, it seems to me to be a phenomenon worth investigating.

• obryant Says:

@sune: I’ve given the same tree for various subsequences (the same link still works). I’m letting Mathematica place the vertices, so some of the labels are obscured but the shape of the graphs is more consistent.

• Sune Kristian Jakobsen Says:

Thanks. But I can only find the old tree.

• Kevin O'Bryant Says:

Sorry, I was having problems with “update” hanging, but it’s fixed now.

http://obryant.wordpress.com/2010/01/09/a-rauzy-tree/

23. Sune Kristian Jakobsen Says:

If we let C depend on d, we get a weaker version of the EDP.

This is motivated by the fact that for the sequence -1,1,-1,1,… , we can choose C_n=1 for all odd n, but C has to e infinite for even n.

24. Sune Kristian Jakobsen Says:

@Gowers: Have you decided when to officially launch polymath5?

• gowers Says:

That’s a good question. I wanted to wait till I had done various things that absolutely need doing, but I’m also finding it hard not to think about the problem. I am slightly holding back thoughts about how one might start to build a proof. What about other people? There still seems to be some mileage in looking at the experimental data, but is there some impatience to get going properly, or are people happy to wait a bit longer?

Perhaps a compromise could be this. I have another post ready, which I was going to put up when the number of comments here reached three figures, which it is just about to do. It isn’t the official launch post, but it does contain some more theoretical thoughts. It could perhaps be regarded as a warm-up post, and the one after that could be the official one. But again, I’d welcome any views on this, particularly from people who plan to be serious participants.

• Sune Kristian Jakobsen Says:

Well, I’m impatient, but I won’t have much time the next week. I’ll try to be “an interested non-participant”, as I said I would, but “interested” and “non-participant” is a difficult combination!

• Kevin O'Bryant Says:

As far as I’m concerned, the current level of activity is satisfactory. There’re still about a dozen things I’d like to compute…

25. Alec Edgington Says:

I’ve added a section to the ‘Experimental results’ page about the maximal lengths of sequences with varying upper and lower bounds for the partial sums along HAPs.

If $N(a, b)$ is the maximum length of a $\pm 1$ sequence with partial sums along its HAPs bounded below by $-a$ and above by $b$, and $N_0 (a,b)$ is the corresponding maximum length for a zero-based sequence, then $N_0 (a, b) = 1 + \max (N(a-1, b+1), N(a+1, b-1))$. So I have just listed values of $N(a, b)$.

Sets of sequences that are $(a, b)$-admissible in this sense could be useful objects of study when it comes to building a proof.

• Kevin O'Bryant Says:

Awesome. Thanks for the data (I’d just been dreaming about N(a,b)!), and the formatting!

26. Klas Markström Says:

I had two more examples of length 1008, again not optimized in nay way

{1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, -1, 1, 1, -1, 1, -1, \
-1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, \
1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, \
-1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1, -1, 1, \
-1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, 1, \
-1, 1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, -1, \
1, 1, -1, 1, -1, -1, -1, -1, 1, 1, 1, 1, -1, -1, 1, -1, -1, -1, 1, 1, \
-1, 1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, 1, \
-1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, \
-1, 1, -1, -1, 1, -1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, \
1, 1, -1, 1, 1, -1, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, \
-1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, -1, -1, 1, 1, \
-1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, \
-1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1, -1, 1, \
-1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, 1, \
1, -1, -1, 1, -1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, \
-1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, \
-1, 1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, \
-1, 1, 1, 1, -1, -1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, \
1, -1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, \
1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, \
-1, 1, 1, -1, -1, 1, 1, -1, -1, -1, 1, -1, 1, 1, 1, -1, -1, -1, 1, 1, \
-1, 1, -1, 1, 1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, \
-1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, \
1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, \
1, -1, -1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, 1, \
-1, 1, -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, \
-1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, 1, -1, \
-1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, \
1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, \
1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, 1, \
-1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1, -1, 1, \
1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, \
-1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, \
1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, \
-1, 1, 1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, 1, -1, 1, \
-1, -1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, -1, 1, 1, \
-1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, \
-1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, 1, 1, -1, -1, 1, -1, \
-1, 1, 1, -1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, 1, \
-1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, \
-1, -1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, \
-1, 1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, \
-1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, \
-1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, \
-1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, \
1, -1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, -1, 1, 1, 1, \
1, -1, -1, 1, -1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, 1, \
-1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, 1, -1, 1, 1, -1, \
-1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, \
-1, -1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, 1, 1, \
-1, -1, 1, -1}

and the second one

{1, -1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, \
1, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, \
-1, 1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, \
1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, \
1, -1, -1, 1, -1, -1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1, \
-1, -1, -1, 1, 1, 1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, -1, 1, \
-1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, 1, -1, -1, \
1, -1, 1, -1, -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, \
-1, 1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, \
1, 1, 1, -1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, \
-1, 1, -1, -1, 1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, \
1, -1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, -1, -1, \
1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, 1, -1, 1, 1, \
-1, 1, -1, -1, -1, -1, 1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, \
-1, 1, 1, 1, 1, -1, 1, -1, 1, -1, -1, -1, -1, 1, -1, 1, 1, -1, -1, 1, \
-1, 1, 1, -1, 1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, \
1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, \
1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, 1, 1, -1, \
-1, -1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, \
-1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, \
-1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, 1, 1, -1, -1, -1, 1, -1, 1, \
1, -1, -1, 1, 1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, -1, -1, \
1, -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, -1, -1, 1, 1, \
-1, -1, 1, 1, -1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, -1, 1, 1, -1, -1, \
1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1, \
-1, 1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, -1, 1, \
1, -1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, 1, \
1, -1, -1, 1, 1, -1, -1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, \
-1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, \
-1, 1, 1, 1, -1, -1, 1, 1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, \
-1, 1, 1, -1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, \
-1, -1, 1, 1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, -1, 1, 1, -1, \
-1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, \
-1, 1, -1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, \
-1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, \
-1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, -1, \
1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, \
-1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, \
1, -1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1, -1, 1, 1, -1, \
-1, 1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, \
-1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, \
-1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, \
-1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, \
-1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1, 1, 1, 1, -1, \
-1, -1, 1, 1, 1, -1, 1, -1, -1, 1, 1, -1, -1, -1, -1, 1, 1, 1, -1, 1, \
-1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, \
-1, 1, -1, 1, 1, -1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, \
-1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, \
1, -1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, \
-1, 1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, \
-1, -1, 1, -1, 1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1}

I’m busy with other things at the moment but once I have done some modifications to my program I hope to be able to push it a bit further too. The computer can keep on working on this even if I am busy writing a paper.

• Alec Edgington Says:

The first of these yields another sequence of length $1124$, which I’ll post on the Wiki.

• gowers Says:

It would be great if someone could do a quick Mark-Bennet-style analysis of this sequence (that is, look at its HAP subsequences and classify them up to close resemblance) to see whether it too has a quasi-multiplicative structure derived from the group $C_6$.

• Thomas Sauvaget Says:

I’ve just made tables for that second 1124 sequence here:

http://thomas1111.wordpress.com/2010/01/09/tables-for-the-second-1124-sequence/

• gowers Says:

Thomas, that’s great, but the 8-sequence is once again shifted by 1.

• Thomas Sauvaget Says:

Oops, indeed, sorry! I’ve now fixed the 8-subsequence tables in both cases.

• Alec Edgington Says:

At numbers congruent to $2$ modulo $6$, the first sequence seems to contain a preponderance of $-1$s, the second a preponderance of $+1$s.

Incidentally, my program is still chugging away; it’s tracked back to $269$ now and found $428\,995\,120$ sequences of length $1124$. This is using the $(p, q)$ optimization trick, so is probably an underestimate by several orders of magnitude of the actual number reachable from the first $269$ members of the sequence.

• Sune Kristian Jakobsen Says:

What is the (p,q) optimization trick?

• Kevin O'Bryant Says:

I’m guessing, based on the format of the output, that you are using Mathematica. Mind sharing the code?

• Alec Edgington Says:

Sune, I was referring to this observation:

http://gowers.wordpress.com/2009/12/17/erdoss-discrepancy-problem/#comment-4645

It means that under certain circumstances during a search you can know that $(x_p = 1, x_q = -1)$ will work if and only of $(x_p = -1, x_q = 1)$ will work, so you only need check one of these.

• Mark Bennet Says:

This is a brief analysis of the second of the 1124 length sequences.

With the same coding as before, the same six common sequences arise. It is also interesting to note the primes 11, 23; 13, 19 which are paired both times in sporadic sequences, and that the sequence appears in some respects more regular than the previous one (the basic sequence has no discrepancy and 7 appears to be regular).

179 11 23
1187 61
1203 1 8 15 18 49 58 64 70
1205 5 6 28 31 34 40 43 48 52 55 63 66
1206 73
1268 67
1421 2 16 21 22 25 30 36 39 46 57
1422 37
2674 3 14 17 20 24 26 33 38 45 54 69
2676 47
2844 59
2889 41 71
2890 4 9 29 32 35 42 44 50 51 60 65 72
2891 53 74
2892 7 10 12 27 56 62 68
3916 13 19

27. obryant Says:

Oddball thought. Every positive integer has a unique representation in the form $\sum_{n=1}^\infty a_n \cdot n!$, where $0\leq a_n \leq n$. In this representation, the tail of every HAP has a nice digital expansion! The HAP with difference d, for instance, has a finite set of possible $(a_1,\dots,a_{d-1})$, and then $a_d,a_{d+1},\dots$ are completely arbitrary.

Here’s one way not to use this, but maybe it can be fixed. Color m according to whether its factorial expansion has $\sum a_i$ even or odd. The discrepancy on each HAP is bounded (if I’m thinking straight), but unfortunately the bounds aren’t bounded. The HAP for 11 isn’t controlled (at least not a priori) until N gets up to 11! or so.

28. Jason Dyer Says:

Random extra source: Erdős brings up the problem in this paper and also mentions the weaker problem which I believe we’ve already brought up: if $f(n) = \pm 1$ is completely multiplicative, then is $\left | \sum_{k=1}^{n} f(k) \right |$ unbounded?

29. Jason Dyer Says:

Looking at the case (by Alec’s notation) of N(1,1), I note it can be proved brute force in that every placement of + or – is forced.

Suppose 6 is +. (Our first placement can be arbitrary, as our bounds are symmetrical). This forces 12-, which forces 3- and 9+, which forces 2- and 4+, which forces 1+ and 8-, which forces 5- and 10+, which forces 7+, and a contradiction when d = 1 and k = 10.

The sequence is thus: + – - + – + + – + + . -

30. obryant Says:

The discussion continues here:

http://gowers.wordpress.com/2010/01/09/erds-discrepancy-problem-continued/

31. Leonid Gurvits Says:

The problem reminded of the following result:

The exists a binary(i.e. 0,1) sequence (b_n, -\infty < n < \infty) such that for any binary
periodic sequence (y_n, n \geq 0, y_{n+T} = y_n)

\lim_{n \rightarrow \infty} \frac{\sum_{0 \leq k \leq N} b_{i+k} \oplus y_k}{N+1} = \frac{1}{2}
uniformly (for a given periodic y_n) on i.

Here a \oplus b is a sum modulo 2.

Apperared in my LAA 1995 paper "Stability of Discrete Linear Inclusion".