## Erdős discrepancy problem, continued

This is another post for the sake of not having too many comments on any single post. I actually wrote a completely different one yesterday, but it contained some theoretical thoughts that are probably better held back until the project starts in earnest. So I’m going to make this post very short indeed.

A quick report on what’s going on right at the moment. We have just been fed some more data: several more sequences of length 1124 and discrepancy 2, some sequences that have HAP-sums bounded between numbers like -1 and 3 instead of -2 and 2, and some nice two-dimensional examples. The Polymath5 wiki has some details about these and other aspects of the problem.

If anyone feels like doing an experiment that I’d be interested to know the result of, they could investigate the following sequence. I want to know whether it is good for anything. It’s the simplest example I can think of of a sequence that appears to exhibit multiplicative behaviour but keeps breaking the pattern. Let $\phi:\mathbb{N}\rightarrow\mathbb{C}$ be the completely multiplicative function that takes every prime to $\exp(2\pi i\alpha)$, where $\alpha=(\sqrt{5}-1)/2$. Then define $x_n$ to be 1 or -1 according to whether the imaginary part of $\phi(n)$ is positive or negative. (Of course, $x_1=1$.) At what kind of rate does the discrepancy of this sequence seem to grow?

Another experimental question: are all the HAP subsequences of the new 1124 examples significantly different from all the HAP subsequences of the old one? In other words, is it reasonable to say that they are completely different examples, or are they more like modifications of HAP subsequences of the original one?

### 104 Responses to “Erdős discrepancy problem, continued”

1. gowers Says:

Here are a few observations based on the data connected with the second sequence of length 1124.

First off, the 8-sequence starts out very similar to the original sequence, but becomes fairly different after a while. This I find interesting. It’s not hard to check it on the page linked to above. (It might be rather nice to have a third display with the 8-sequence directly below the 1-sequence so it was easy to see where they were the same and where different.)

Secondly, here are the starts of the 2-sequence and the 3-sequence. I’ve put them in pairs, for easy comparison. (Thus, the 2-sequence begins + – + + and the 3-sequence begins – + – -.) I’ve done the first 50 multiples. The only ones that are not reverses of each other are at (74,111) and at (82,123).

+ -/- +/+ -/+ -/- +/- +/- +/+ -/+ -/- +/+ -/- +/- +/+ -/+ -/- +/+ -/+ -/- +/+ -/- +/- +/+ -/+ -/- +/+ -/- +/- +/+ -/- +/- +/+ -/+ -/- +/+ -/- +/+ +/+ -/- +/- +/- -/+ -/- +/+ -/+ -/- +/- +/- +/+ -/+ –

So multiplication of n by 3/2 appears to multiply $x_n$ by -1. Peeping ahead a bit, I get the impression that the agreement remains remarkably good.

This suggests another idea for generating good sequences quickly. Perhaps one should artificially impose constraints of this kind (that multiplying by 3/2 reverses the sign) and initially search for sequences with the extra constraints. The fact that the constraints aren’t satisfied exactly would complicate matters, but perhaps one could have some other routine that came into play when one got into difficulties and made a few pairs go wrong. The main point is that the search space would be reduced a lot.

• Mark Bennet Says:

The differences correspond to positions 37, 41, 71, 73 then 352, 353, 355, 356, 368, 372

ie the early entries (position 2 or 3) in the sequences for primes which tend to be on the irregular list plus entries near the end of the sequence.

The sequences for d=20 and d=26 are also remarkably close to that for d=3. Sequence d=14 differs at positions 37, 41, 47, 49, 56 and then more extensively. d=17 differs from d=3 first at position 23.

So this suggests that the search space could be constrained by looking to change these sequences in prime positions.

The sequences for d=5 and d=6 are close to identical. They differ at positions 59, 61, 176, 178, 184, 185, 186 ie again at primes and at late positions.

• Mark Bennet Says:

d=8 differs from d=1 at positions 37, 41, 53, 67, 71, 73, 74, 82, 94, 98, 101, 103, 109, 112, 116 …

2. Sune Kristian Jakobsen Says:

I have moved most of the experimental page on the wiki to subpages:
http://michaelnielsen.org/polymath1/index.php?title=Experimental_results
I have also added a wish list, with experiments we should try.

As I wrote on the talk page http://michaelnielsen.org/polymath1/index.php?title=Talk:The_Erd%C5%91s_discrepancy_problem (by the way, how do you “make words into links” on this blog? html-code?) we should take a look at our definition of quasi-multiplicative. Right now it is

Let us call a $\pm 1$ sequence quasi-multiplicative if it is a composition of a completely multiplicative function from $\mathbb{N}$ to an Abelian group G with a function from G to {-1,1}.

But you can always choose G to be the group of positive rationals under multiplication, and the function to be the inclusion map.
We could say that G had to be finite, but then the sequence you define in you post is not quasi-multiplicative. We could also keep the definition, and notice that all sequences are quasi-multiplicative. Then we could still say things like: Let $(x_n)$ be a quasi-multiplicative with $G=C_6$.

• gowers Says:

That’s a good point. I’m thinking more and more that what we want is either a finite group or a topological group with a particularly simple function. For example, we might ask for the set of points that go to 1 to be open, though I think I want something simpler still.

How might one measure simplicity? The kind of thing I have in mind is that G would have a natural measure on it and if you look at the partition of G (into points that go to 1 and points that go to -1) and let G act on itself, then the images of the cells of the partition should have a good compactness property: for any $\epsilon$ there exists $n$ such that if you take any $n$ images of one of the cells of the partition, then two of those images must differ in a set of measure at most $\epsilon$.

All this is of course inspired by ergodic-theoretic concepts. I’m thinking more and more that if the conjecture is true, then an infinitary approach could be a good one. I’ll say more about this in my next post.

• Alec Edgington Says:

I’d also been vaguely wondering about an ergodic approach to the problem. If we define a family of maps from sequences to sequences by $T_k(\mathbf{x})_n = x_{kn}$, and we let $S_C$ be the set of sequences $\mathbf{x}$ such that the sums $x_1 + \ldots + x_n$ are bounded by $C$, then $k \mapsto T_k$ is an isomorphism between the monoid of positive integers under multiplication and the family $T_k$ ($k \geq 1$), and we are want to know whether $\bigcap_k T_k^{-1} (S_C) = \emptyset$. I was wondering if we could turn these maps into topological ones in a nice measure-theoretic context; but am not at all sure how.

• gowers Says:

My thoughts are similar (and in particular I have also been using the notation $T_k$ for that map), but not quite identical. What I thought we might be able to do is start with just one putative counterexample $x=(x_n)$ and take the closure of the set of sequences $T_kx$ (in the product topology). Better than this would be to start by converting the original sequence $x$ into an example that works over the rationals, so now we would have a map $T_a$ for every positive rational $a$. In particular, the maps would form a group and it would act on a compact space.

What I am hoping is that we would be able to prove some interesting facts about this action that would allow us to conclude that the sequence had some kind of multiplicative structure. A yet wilder hope is that we could end up with a contradiction.

• Sune Kristian Jakobsen Says:

How do you make $T_a$ is well-defined for rational a? That would imply that you could determine the sequence $(x_{dn})$ from the $(x_{2dn})$ and $(x_q)_{q\in\mathbb{Q}}$, without knowing d. Are you assuming some structure on $(x_q)_{q\in\mathbb{Q}}$?

• gowers Says:

Another good point. Let me think about it, because I think it may be possible to pass to an example where the maps $T_a$ are well-defined, even if they aren’t obviously well-defined to begin with. But I’m not sure about this yet.

• Sune Kristian Jakobsen Says:

Of course it is possible if every HAP has a HAP identical to the original sequence, but that seems to be a strong assumption.
I don’t know if you can prove that every HAP has a HAP identical to the original sequence assuming that $T_a$ is well-define, but lets see what we can do:

Assume that $T_a$ is well-defined and that there is a HAP that don’t contain the original sequence. Now “x is a HAP of y and y is a HAP x” is a equivalence relation, and “x is a HAP of y” is a partial ordering on the equivalence classes. If the ordering has a maximal element, we would have a sequence that is a HAP of all its HAPs. I think this would contradict the fact that this is not the case for the original sequence and that $T_a$ is well-defined(?) So there must be a infinite sequence of sequences, such that if n>m then the sequence $x_m$ is not a HAP of $x_n$. I don’t know if we can use this.

• Sune Kristian Jakobsen Says:

This is just an idea. I’m not sure if it works:

We start with a $(x_q)_{q\in\mathbb{Q}}$. Now $T_{1/p}(y)$ might be ill-defined, but we just define it to be one of the possible sequences (I think it should be one that is not a HAP of y if possible), and then we adjust $(x_q)_{q\in\mathbb{Q}}$ accordingly. When we have done this for all sequences y and all primes p, we have a well-defined $T$ (using compactness argument, perhaps?)

• gowers Says:

My idea was the following, but I also haven’t checked that it works. You take a sequence $(x_n)$ and define $(y_q)_{q\in\mathbb{Q}}$ by setting $y_q$ to be the limit of $x_{n!q}$ along some ultrafilter. I think this should have some nice properties, but I haven’t had time to investigate it properly.

3. Alec Edgington Says:

Tim, the discrepancy of your sequence up to $N$ appears at first sight to grow linearly with $N$. Here are the discrepancies for $N = 1000, 2000, \ldots, 10000$:

103, 205, 313, 409, 539, 641, 752, 877, 1013, 1117

This is surprisingly fast growth. For large $N$ the discrepancy seems usually to be attained for $d=1$, and to reflect a bias towards positive values. This must mean that the integer part of $(\sqrt{5}-1) \sum a_i$ is biased towards even numbers as $n \rightarrow \infty$ where $n = \prod p_i^{a_i}$. I’ve no idea why this should be so. (It’s possible that this phenomenon would disappear if one went much higher, as often happens with phenomena related to the distribution of primes. It’s also possible that I’ve made a mistake!)

• gowers Says:

That’s disappointing, but perhaps it is indeed a small-number phenomenon — for instance, there are a lot of primes and they’ll all give positive answers, and unlike with the Liouville function this bias could well take longer to get rid of (because it isn’t as nicely cancelled out).

How about the following alternative suggestion: define $\phi(p)$ to be alternately $\exp(2\pi i\alpha)$ and $\exp(-2\pi i\alpha)$ as you go up the primes? That might share out the bias nice and evenly.

• Alec Edgington Says:

The growth seems even faster now. I get the following discrepancies for $n = 1000, 2000, \ldots 10000$:

167, 374, 535, 697, 899, 1053, 1238, 1405, 1593, 1731

This time the bias is towards negative numbers. So, with the above notation, the integer part of $(\sqrt{5}-1) \sum (-1)^i a_i$ is apparently biased towards odd numbers (where $p_0 = 2, p_1 = 3, \ldots$).

• gowers Says:

Funny — I was expecting the bias to be positive because I’d not spotted that $\sum (-1)^ia_i$ would often be zero. One other possibility might perhaps be to send every prime to $\exp(-2\pi i\alpha)$. But I’m losing faith in this idea now.

• Alec Edgington Says:

Ah, yes, the zeros rather spoil the calculations, because it’s not clear whether the function should be $+1$ or $-1$. I was testing for the imaginary part being strictly positive, so got a lot of dubious $-1$s. If you make the function $+1$ when $\sum (-1)^{a_i} = 0$ then you get a positive bias that is stronger (the discrepancy at $N=10\,1000$ being $2050$).

However, I strongly suspect that the apparent linear growth is a small-number phenomenon.

I’m not sure I understand your last suggestion: wouldn’t this just give the negative of the first sequence, except for $x_1$?

• Alec Edgington Says:

Oops, I mean at $N = 10\,000$.

• gowers Says:

You’re right, that was a stupid suggestion. (I won’t bother to try to explain the confusion that lay behind it.)

4. Jason Dyer Says:

I have combined the sequences and colored.

5. Thomas Sauvaget Says:

I’ve also been computing Tim’s sequence. The results I find are broadly consistent with those of Alec, but are off by a few units. For N=1000 to 10000 I find :
78, 196, 292, 402, 524, 640, 746, 870, 1006, 1116.
I’ve made a plot of my data here:
http://thomas1111.wordpress.com/2010/01/09/discrepancy-of-a-golden-ratio-related-sequence/

I’m probably the one who made the mistake though. Alec: are the first few values given in the table in that link the same as yours, or already different?

• Alec Edgington Says:

Thomas, how do you get all those discrepancies of zero? Or am I misunderstanding your table? (My sequence $x_n$ agrees with yours.)

• Alec Edgington Says:

Ah, I see, you’re plotting the partial sums $x_1 + x_2 + \ldots + x_n$, rather than the maximum of $\lvert \sum_{i \leq r} x_{id} \rvert$ over all $d \geq 1$ and $rd \leq n$. The fact that our sequences look similar for large values confirms the dominance of $d=1$ and the preponderance of positive values.

• Thomas Sauvaget Says:

Yes that’s what I did, now I understand, thank you.

6. Sune Kristian Jakobsen Says:

I have solved the easier problem I suggested earlier today, where we let C depend on d: choose an irrational number $a$ and let $x_n$ be 1 iff the integer part of $an$ is even.

• gowers Says:

Funnily enough, that example appears in the post that I decided to hold back. If you let $a$ be badly approximable, then it gives a sequence that has discrepancy of order $\sqrt{n}$ up to $n$.

7. Gil Kalai Says:

One meta remark: one thing that will be very useful is a summary or discussion of discrepency theory and in particular some techniques which may be relevant to proving the conjecture. Of course it will be ideal if some experts in discrepency will take part in the project.

8. Polymath5: Erdős’s discrepancy problem « The polymath blog Says:

9. Mathematics, Science, and Blogs « Combinatorics and more Says:

[…] (January 2010) Polymath5 devoted to Erdos discrepency problem is launched on Gowers’s blog. […]

10. gowers Says:

Here are a couple of things I’m going to put on the wish list, but I’ll mention them here too, since they’re more likely to get noticed. (By the way, to the person — I’ve forgotten who it was — who asked how to get links into comments, yes, you do indeed just do it in the normal html way.)

1. Take a moderately large $k$ and search for the longest sequence of discrepancy 2 that’s constructed as follows. First, pick a completely multiplicative function f to the group $C_{2k}$. Then set $x_n$ to be 1 if f(n) lies between 0 and k-1, and -1 if f(n) lies between k and 2k-1. Alec has already done this for k=1 and k=3.

2. Search for the longest sequence of discrepancy 2 with the property that $x_n=x_{32n}$ for every $n$. The motivation for this is to produce a fundamentally different class of examples (different because their group structure would include an element of order 5). It’s not clear that it will work, since 32 is a fairly large number. However, if you’ve chosen $x_{32n}$ then that will have some influence on several other choices, such as $x_{4n},x_{8n}$ and $x_{16n}$, so maybe it will lead to something interesting.

• Alec Edgington Says:

1. I should say that I probably haven’t found the longest sequence possible for $k=3$. (I’ll try to revisit this at some point.) I’ll put the $k=1$ data somewhere on the wiki.

2. Just looking at this now. The program is still running and has found a sequence of length $407$. I’ll put this on the wiki too. Sorry, run out of time for now, so here’s a quick cut-and-paste of it:
+ + – – + + – + – – + – – – + + – + – – + + + – + – – + – + + + – – – + – – + – + – + + – + – – + + + – + – – – + + – + – – + + + – + – – + + + – + – – – + + – + – – + + – – + + – + – – + + – – + + + – + – + – – – – + – + – + + – + – – + + + + – + – – + – + – – – + + + – – + – + + – + + – + – + – – + – – – + – + + + + – + – – + – – + – + – + + – – – + + + – + + – + – – – – + + – + + – + – + + – – – – + + + – – + – – + + + + – – – + – + – – + – + – + + – – + + – + + – – + + – + – – + + + – – + + – – + – – + + – – – + + – + – + – – + + + – – + – + + – + + – – + – – + – – + – + + – + + + – – – + + – + + – – + – + + – – – + + – + + – – – – + + + + – – + – – + – + – + – – + + + – – – + + – – + – + + + + – – – + – + + – – + + – – + + – – – + + – – + + – – + – + – + + – + – + – + + – + – + – – + + + – + – – +

• gowers Says:

Some observations about that sequence. Here are the beginnings of its 2-subsequence and its 3-subsequence.

+ – + + – – – + + – + – – + + + – + – – – + + – + – – – + + – + – – + + +

– + – – + + + – – + – + + – – – + – + + + – – + – + + + – – – – + + – – +

So it seems that $T_{3/2}$ takes the sequence to minus itself, give or take the odd error.

$T_7$ appears to be another map that takes the sequence to minus itself. So it’s tempting to conclude that we have indeed got the group $C_{10}$ here. The difficulty is that it is not very easy to detect the presence of 5-torsion, except where it has been shoehorned in. But we should be able to do it by jumping from sparse sequences to equivalent denser ones: basically, we can be confident that we’ve got our group if there are ten classes of sequence starts, divided into five plus/minus pairs.

Let me try to check this. I’ll write [a,b;c,d,e] to mean that the a-sequence and b-sequence are roughly the same, and are roughly minus the c-, d-, and e-sequences, etc. So far I’ve got [1,32;7] and [2;3]. From this I can deduce [4;6] and I know that 4 and 6 don’t repeat anything from earlier, or else $T_2$ would not have order 5 after all. So the next sequence to look at is the 5-sequence, which starts as follows.

+ – + – + + – – – + – + + + –

This appears to be very close to minus the 4-sequence, and hence to the 6-sequence. So I’ll expand [4;6] to [4;5,6].

The 9-sequence should be the same as the 4-sequence, so we get [4,9;5,6]. I’ve still got only three of the five sequence pairs. The 8-sequence ought to give us a fourth class, so let me write out how it begins.

+ + – + – – – + +

That appears to be minus the 10-sequence, as indeed it should since the 4-sequence is minus the 5-sequence. In fact, we’ve got [8,18;10,12] from that. So the next unforced sequence is the 11-sequence, which starts

+ + – + – – – + +

so I think we can safely say that the 11-sequence equals the 8-sequence. So we have [8,11,18;10,12]. Next up is the 13-sequence, which starts (modulo medium probability that I have made a mistake)

– – + – + + + + – – +

which seems to be a new class of sequence, so I hope it will equal plus or minus the 16-sequence. That one starts

+ + – +

so there is some hope, though it’s not yet obviously different from the 8-sequence. I can’t face working out more though, so I’ll wait for someone to produce a nice presentation of the entire sequence, which will make this kind of thing much easier.

• Thomas Sauvaget Says:

I’ve just made a table of Alec’s sequence here:
http://thomas1111.wordpress.com/2010/01/10/tables-for-a-c10-candidate/

I’ll add the subsequences later on.

• gowers Says:

Many thanks. I’ve checked quite a bit further with the help of your chart and the 13-sequence definitely does seem to be minus the 16-sequence.

• Alec Edgington Says:

I’ve just posted a $T_{32}$-invariant sequence of length $417$ on the wiki. (Search still going on.)

• gowers Says:

If your search starts to get stuck, then the next thing to do might be to take the sequence you have so far and revert to the old algorithm — that is, allow backtracking that might mess up the $T_{32}$-invariance slightly. With a bit of luck, that could lead to a significantly longer sequence that was still approximately $T_{32}$-invariant.

• Thomas Sauvaget Says:

I’ve now worked out and displayed the subquences $(x_{8n})=$ to $(x_{31n})$ at the page mentioned above.

The shortest subsquence $(x_{32n})$ has 12 elements, does that mean we should truncate the others there before comparing them? Combining with your previous comment the following 10 groups of similar sequences seem to exist: [1,32;7] [2;3] [4;5,6,28] [8,18,11;10,12] [13,20,24;16,22,25,30] [17,19,27;15;18] [14;21] [23] [26] [29]. Does that sort of make sense?

• gowers Says:

Some further identifications need to be made. For example, the 2-sequence looks the same as the 21-sequence, so we get [2,14;3,21]. Also, the 9-sequence is missing in your list — it goes with 4, so we have [4,9;5,6,28]. Finally, the 17-sequence is minus the 8-sequence, which leads to the group [8,15,18;10,12,17,19,27]. We now seem to have five classes of sequence (as we want) together with three sporadic sequences.

• Mark Bennet Says:

I’d see the sequences

[1,32; 7]
[2, 21; 3, 14]
[4, 9; 5, 6, 28]
[8, 11, 15, 18; 10, 12, 17, 19, 27]
[13, 20, 24; 16, 22, 25, 30]

With 23, 26, 29, 31 unallocated

• gowers Says:

It’s a little troubling that the 26-sequence is unallocated. It ought by rights to be minus the 1-sequence. It does start like the 1-sequence but fairly quickly it stops working. I would very much like to understand these deviations from multiplicity — though it’s not clear what there is to understand.

11. gowers Says:

Another piece of experimental data: the prime factorizations of all the places where the first and second sequence of length 1124 differ, until I couldn’t face continuing.

1, 7, 37, 41, 47, $49=7^2$, 61, 67, $74=2\times 37$, $82=2\times 41$, 101, 103, 107, 109, 191, 193, $262=2\times 131$, 263, 271, $274=2\times 137$, $289=17^2$, $295=5\times 59$, $305=5\times 61$, 307, $319=11\times 29$, $329=7\times 47$, $341=11\times 31$, $343=7^3$, $358=2\times 179$, 359, $361=19^2$, $362=2\times 181$, $377=13\times 29$, 379, 383, $391=17\times 23$, $393=3\times 131$, 397, $403=13\times 31$, 409, $411=3\times 137$.

It’s noticeable that fairly large primes tend to be involved in these numbers, suggesting that when you choose the sequence, you get, as you might expect, a fair amount of flexibility when it comes to choosing the values at numbers with few factors.

12. gowers Says:

One of the things I thought about in Egypt, which is closely related to what I was working out in my previous comment above, was methods of converting sequences of bounded discrepancy into other such sequences. An obvious method is to change them in only finitely many places. Another is to choose an increasing sequence of pairwise coprime integers, pick a subsequence such that the values alternate, and then change everything in the subsequence. The reason that works is that apart from the sequence itself, no HAP contains more than one element of the sequence, so the discrepancy goes up by at most 2.

Are there less trivial ways of changing a sequence and preserving bounded discrepancy? Yes there are, but I don’t think I’ve reached anything like the limit of what can be done. The motivation for this question is to try to develop a notion of “essentially the same” and see whether it suggests a quantitative version that would go some way towards explaining the data about the length-1124 sequences.

Suppose I choose two primes $p with $x_p=x_q=1$ and $x_{pq}=-1$. Suppose I now change all three. The effect will be to add -2 temporarily to the partial sums along the p-sequence and q-sequence, but eventually to add 2 again. However, I’ll also add first -2 and then -4 to the partial sums of the main sequence, getting back to -2. But I can correct for that by having another such triple later on, this time with $x_r=x_s=-1$ and $x_{rs}=1$. I can then take an infinite sequence of such triples.

That’s only very slightly less trivial than the previous example, but at least it shows that there can be infinitely many pairs in the sequence that have common factors.

I’m fairly sure I had a more sophisticated example, but I’ve got to go now so I don’t have time to reconstruct it.

A second motivation for thinking about this is that if we identify a suitable notion of “is essentially the same as” then we will know that any structural property we try to prove about counterexamples will have to be “up to essential sameness”.

• Sune Kristian Jakobsen Says:

Let $(x_n)$ and $(y_n)$ be two sequences with discrepancy at most $C$. For each natural number m define the sequence $((z_m)_n)$ such that $(z_m)_n= y_n$ if n<m and $(z_m)_n=x_n$ otherwise. I think that each $z_m$ has discrepancy at most 3C and $z_1=x$ and $\lim_{m\to\infty} z_m=y$, so in a sense all sequences with bounded discrepancy are the same (if we are allowed to make infinitely many changes)?

• gowers Says:

I don’t have a precise response to what you say, but somehow the difference between what I was doing and this is that my procedures for altering sequences gave new sequences that depended heavily on the old one. (The reason this is not a precise response is that you could artificially choose for each sequence $(x_n)$ a distinct sequence $(y_n)$ to convert it to.)

13. gowers Says:

Here’s another experiment that should be pretty easy to program and might yield something interesting. It’s to look at the how the discrepancy appears to grow when you define a sequence using a greedy algorithm.

I say “a” greedy algorithm because there are various algorithms that could reasonably be described as greedy. Here are a few.

1. For each $n$ let $x_n$ be chosen so as to minimize the discrepancy so far, given the choices already made for $x_1,\dots,x_{n-1}$. (If this does not uniquely determine $x_n$ then choose it arbitrarily, or randomly, or according to some simple rule like always equalling 1.)

2. Same as 1 but with additional constraints, in the hope that these make the sequence more likely to be good. For instance, one might insist that $x_{2k}=x_{3k}$ for every $k$. Here, when choosing $x_n$ one would probably want to minimize the discrepancy up to $x_{n+k}$ if $x_{n+1},\dots,x_{n+k}$ had already been chosen. Another obvious constraint to try is complete multiplicativity.

3. A greedy algorithm of sorts, but this time trying to minimize a different parameter. The first algorithm will do this: when you pick $x_n$ you look, for each factor $d$ of $n$, at the partial sum along the multiples of $d$ up to but not including $n$. This will give you a set $A$ of numbers (the possible partial sums). If $\max(A)$ is greater than $\max(-A)$ then you set $x_n=-1$, if $\max(-A)$ is greater than $\max(A)$ then you let $x_n=1$, and if they are equal then you make the decision according to some rule that seems sensible. But it might be that you would end up with a slower-growing discrepancy if you regarded $A$ as a multiset and made the decision on some other basis. For instance, you could take the sum of $2^k$ over all positive elements $k\in A$ (with multiplicity) and the sum of $2^{-k}$ over all negative elements and choose $x_n$ according to which was bigger. Although that wouldn’t minimize the discrepancy at each stage, it might make the sequence better for future development because it wouldn’t sacrifice the needs of an overwhelming majority to those of a few rogue elements.

The motivation for these experiments is to see whether they, or some variants, appear to lead to sublogarithmic growth. If they do, then we could start trying to prove rigorously that sublogarithmic growth is possible. I still think that a function that arises in nature and satisfies $f(1124)=2$ ought to be sublogarithmic.

14. gowers Says:

Yet another item for the wishlist. I produced by hand the following sequence that has (I think) discrepancy 2 and has the additional property that $x_{2n}=-x_n$ for every $n$. I’d be interested to know what happens if this is used as a “seed sequence” for a backtracking algorithm: does it lead to a much longer sequence $x$ such that $T_2x$ is roughly equal to $-x$?

The sequence (which, amusingly, starts very like the Morse sequence) is this: + – – + – + + – – + + – + – + + – + – – + – – + + – – + + – + – – + – – + + + + – – – + + + – – + – + + – + – – + –

• Alec Edgington Says:

Just seeding the backtracking algorithm with this sequence gets you fairly quickly to a sequence of length 371, but the property $x_{2n} = -x_n$ breaks down immediately (in other words, it’s by no means forced to continue). However, if you insist on it continuing, things get more interesting! You quickly get as far as 580. I’ll let the program run a little longer and then put the result on the wiki.

• Alec Edgington Says:

OK, I’ve posted a sequence of length $584$ satisfying $T_2(x) = -x$ on the wiki.

Interesting that the constraint sped up the search so much. I’d be interested to see if any other multiplicative properties are apparent in it, and perhaps to use these as constraints in a new search with this as a seed.

• gowers Says:

I was expecting constraints of that kind to speed up the search because it hugely reduces the size of the search space. I do wonder whether some mixture of imposing constraints and eventually relaxing them could break the 1124 barrier. I’d also be interested to understand why $C_6$ seems to be the preferred structure if you let the algorithm run on its own. Also, given that good sequences seem to like multiplicative structure, I find it rather mysterious that the constraint was under no pressure to continue.

By the way, if you backtrack with the sequence you’ve just produced, can you get much further or do you quickly get stuck?

• Alec Edgington Says:

Now found one of length $594$. Wiki page updated.

• gowers Says:

Needless to say, it would be wonderful to have this sequence (or whatever holds the record by the time someone gets round to it) displayed in a nice colourful way and its subsequence structure looked at. My guess at the moment, without having looked properly at the sequence, is that there will be six broad classes of HAP-subsequences.

• Mark Bennet Says:

When a group element of order 5 was forced, the primes 7 and 11 found better places in the group structure than in the examples of long sequences previously analysed – particularly 7. It may be that there is some kind of ‘natural’ order for each prime, and that in the absence of forcing the primes 2, 3 and 5 tend to dominate.

If this were true it might be possible to show that increasing C would accommodate more primes with their natural order (with prospects of a proof or counterexample).

• Alec Edgington Says:

Quick answer to Tim’s question about unconstrained backtracking from the long constrained sequence: it’s not particularly productive. It gets to 622 and then sticks for a long time.

• gowers Says:

This is beginning to suggest that having a sort of rough multiplicative structure is very helpful to a sequence, regardless of what that structure is, but that some structures work better than others.

Another interesting question is this. The original 1124 sequence had the property that $T_2(x)$ is roughly the same as $-T_3(x)$. What happens if one does a search imposing that constraint? Does one quickly get to a very long sequence, or does imposing the constraint too strictly mess things up? This is potentially important for understanding what’s going on: is the sequence with $T_2(x)=-x$ shorter because the constraint is imposed more strictly or because it is a less fruitful constraint?

• Mark Bennet Says:

Here is an analysis of the length 594 sequence based on f(2r)=-f(r). This precludes the identity which has been so evident in previous long sequences of discrepancy at most 2 ie f(3r)=-f(2r).

617: 1 4 7 10 16 22 25 28 37 40
873: 19 31 34
1202: 3 12 30
1266: 43
1458: 9 21 26 36
1833: 39
2262: 27
2454: 41
2637: 13 18 33 42 45
2893: 6 15 24
3222: 17 23 38
3478: 2 5 8 11 14 20 29 32 35 44

In another notation this is

[1, 4, 7, 10, 16. 22, 25, 28, 37, 40; 2, 5, 8, 11, 14, 20, 29, 32, 35, 44]
{note that 2 has order 2 in this group}

[19, 31, 34; 17, 23, 38]

[3, 12, 30; 6, 15, 24]
{21 should be here, but isn’t}

[13, 18, 33, 42, 45; 9, 21, 26, 36]

[39; 27]

39, 41, 43 unallocated

• Alec Edgington Says:

Initilailizing the search with the first 1124 sequence and imposing the constraint $T_2(x) = -T_3(x)$, we quickly reach a sequence of length 587, and then stick for a long time. I’ve put that sequence up on the wiki too.

• Alec Edgington Says:

One of length 614 now on the wiki. And I have found one of length 594 that satisfies the 2=5 identity exactly, coming to the wiki soon.

• Alec Edgington Says:

Some observations on the sequence of length 594 satisfying $x_n = -x_{2n} = -x_{5n}$ everywhere:

$x_{3n} = -x_{13n}$ whenever $3 \nmid n$.

$x_{7n} = -x_{9n}$ whenever $7 \nmid n$, except at the end.

$x_{11n} = x_{13n}$ everywhere except at the end.

• Alec Edgington Says:

Sorry, that last should have been $x_{11n} = -x_{13n}$.

I’ve now found (and put on the wiki) a sequence of length 636 satisfying 1=-2=-5 exactly. The first and third of the above identities still hold, but the second does not: there are more exceptions to $x_{7n} = -x_{9n}$, apparently at multiples of 11, 13 as well as 7 (and a few others).

• Mark Bennet Says:

Analysis of Alec’s 636 length sequence with subsequences of difference 2 and 5 both the negative of the main sequence.

617: 1 4 10 16 19 25 31 34 40 46
873: 7 13 22 28 33 37 42
1458: 3 12 30 48
1833: 18 27 45
2262: 9 36
2454: 41
2637: 6 15 24
3222: 11 14 21 26 29 35 39 44
3478: 2 5 8 17 20 23 32 38 47
3661: 43

or
[1, 4, 10, 16, 19, 25, 31, 34, 40, 46; 2, 5, 8, 17, 20, 23, 32, 38, 47]

[7, 13, 22, 28, 33, 37, 42; 11, 14, 21, 26, 29, 35, 39, 44]

[3, 12, 30, 48; 6, 15, 24]

[18, 27, 45; 9, 36]

with 41 and 43 unallocated

This looks highly regular, consistent with the method of construction. The behaviour and grouping of multiples of 3 looks interesting.

• Harrison Says:

“which, amusingly, starts very like the Morse sequence”

I’m not actually convinced that this is that much of a coincidence. After all, if we only look at homogeneous progressions where d is a power of 2, the Morse sequence has discrepancy 1, right? (Right, it’s self-similar.)

So, a question (to which I don’t have any idea what the answer is): Are there analogous 0-1 sequences to the Morse sequence for other prime powers? I guess the idea would be to have the same self-similarity…

• Alec Edgington Says:

I’m not sure if you can naturally generalize to a zero-one sequence, but a generalization that maps to $p$th roots of unity could be defined recursively by
$f(n) = e^{2i\pi n/p} f([\frac{n}{p}])$

• Alec Edgington Says:

I’ve uploaded to the wiki a sequence of length 688 that satisfies 1=-2=-5 and 11=-13 exactly.

• Alec Edgington Says:

And now one of length 854. Mark has pointed out that the 688 one also satisfies $x_{7n} = -x_{11n}$. I haven’t yet checked this for the longer one.

• Alec Edgington Says:

It does mostly except at multiples of 9.

Sadly I now have to go to work! But I’ll post those plots later.

• Mark Bennet Says:

Sequence length 854 is interesting. Subsequence 2 is the same as subsequence 13, which means that the forced sequences are the same. 3 and 7 both split off and disrupt the pattern.

617: 1 4 10 13 16 19 22 25 27 31 34 40 42 46 52 55 58 64
3478: 2 5 8 11 17 20 21 23 26 29 32 38 41 44 50 54 59 62 65

809: 18 37 45
3286: 9 36

873: 7 28 49
3222: 14 35 47 56

1458: 3 12 30 39 48 57
2637: 6 15 24 33 43 51 60 63

1266: 53

• Mark Bennet Says:

Sorry that should have been subsequence 2 the same as 11, the negative of 13 …

One way of interpreting what is happening here is that the constraints are forcing the constrained parts of the sequence to be multiplicative. The unconstrained parts then adjust to reflect the fact that it is impossible to have a fully multiplicative sequence (the behaviour of the primes 3 and 7)

• gowers Says:

Mark, in your analysis here the number 27 appears in an unexpected place: given that 3 is not in the list of numbers that do the same as 2, it should not be on the other side of 9. Is that a mistake, or is it just a bizarre feature of the sequence? I’m trying to work out whether we’ve got a $C_8$ structure going on here but this is confusing.

• Mark Bennet Says:

Tim the 27 subsequence of the length 636 example does look out of place – it is the negative of 9 except in place 19 (it differs from the subsequence for difference 18 only in place 19).

But this is different in the 854 sequence where the 27 sequence moves, and the first difference between the sequences for 9 and 27 is in the 7th place (3 and 7 are the low primes which behave differently)

• gowers Says:

In the 854 example, the 49 sequence is in a curious place. If the 7-sequence is not the same as the 1-sequence, it doesn’t seem right that it should be the same as the 49-sequence.

• Mark Bennet Says:

In the 854 example, the 49-sequence is identical to that for 7 up to position 17 (as far as possible). The two sequences differ from the 1-sequence only in position 9 (and the 7-sequence goes on to differ in position 18, 36, 37 …).

The sequence 9, 18, 36, 37 goes on to include 45 and 53 – then 71, 72 74. Up to 53 these numbers are grouped in the shorter sequences.

I did a double-take with 49. There is some curious structure there, and the forcing technique seems to have eliminated some noise caused by arbitrary free choices in positions which would otherwise be unforced.

• Alec Edgington Says:

It seems that the division into classes may not be so clear-cut. A pattern that is emerging from these forced sequences is the existence of pairs of HAPs that agree when values on some other HAP or set of HAPs are excluded.

15. Thomas Sauvaget Says:

Here are tables for the 584 one, to help spot further structure:
http://thomas1111.wordpress.com/2010/01/10/a-sequence-of-584-elements-with-t2x-tx/

• gowers Says:

A preliminary look is very interesting. I looked at the first sixteen multiples of all odd numbers up to 17. That gave the following exact equalities, where I’ll write $a=\pm b$ to mean that the a-sequence is plus or minus the b-sequence.

2=5=17, 3=-15 (as one would expect if 2=5), 7=-9, 11=-13. In addition, 3 and 13 are pretty similar (and hence 11 and 15 are pretty similar). I still haven’t worked out what the orders are of the transformations that are not of order 2.

• gowers Says:

A possible way of guessing the order of $T_3$ is to look at the values at powers of 3. These are – – – + +, which isn’t all that helpful. If one uses the identity $7=-9$ to look at the values at $3,-7,-21,49,147,-343$ (where by the value at -a I mean minus the value at a) then you can extend this to – – – + + +. So I’m going to hazard a guess that $T_3$ has order 6. But I’ll need to think a bit more about this.

• Alec Edgington Says:

The 2=5 identity is strikingly well observed. We have $x_{2n} = x_{5n}$ for all but 15 of the 116 possible $n$, and for all $n$ up to 56.

• Alec Edgington Says:

In the hope that there might be a simple explanation for the identity 2=5, I tried being awkward and imposing the additional constraint that $x_n = x_{5n}$. No such luck: I got to 356 with both of these constraints. That is the limit, however.

16. TonyG Says:

When analysing examples of long sequences, we should be aware of those elements whose value is flexible. For instance, in the original 1124-sequence, the values of $x_{1109}, x_{1111}$, and $x_{1117}$ can be (separately) changed without invalidating the sequence — the sequence remains valid as long as at most one of these elements is -1. Perhaps we should colour these elements differently?
On a completely separate note, I have looked at the number of sign flips in the HAP’s in this sequence (e.g. + – – + – ++ has 4 sign flips). Here is a list of d,f,r with f = no. of sign flips, r = f / (length – 1):
1, 691, 0.615316
2, 350, 0.623886
3, 226, 0.605898
4, 185, 0.660714
5, 151, 0.677130
6, 127, 0.682796
7, 99, 0.622642
8, 89, 0.640288
9, 83, 0.674797
10, 72, 0.648649
11, 63, 0.623762
12, 57, 0.619565
13, 49, 0.576471
14, 52, 0.658228
15, 46, 0.630137
16, 45, 0.652174
17, 42, 0.646154
18, 38, 0.622951
19, 37, 0.637931
20, 36, 0.654545
Is this hovering about the golden ratio? Or is it just noise?

17. New thread for polymath5 « Euclidean Ramsey Theory Says:

[…] thread for polymath5 By kristalcantwell There is a new thread for polymath5 here. There is also a wiki page for experimental results […]

18. Sune Kristian Jakobsen Says:

@Alec:
What happened to the program you talked about here:
https://gowers.wordpress.com/2010/01/06/erdss-discrepancy-problem-as-a-forthcoming-polymath-project/#comment-4734 ? I think this is the most interesting experiment, because it could tell us if we have to “break to system” all the time. Have you tried to correct it?

• Alec Edgington Says:

Hi Sune,

I did correct it, but it was just a Python script and hadn’t got anywhere after running for some time. It needs to be written in C for speed. I intend to do that (unless anyone else would like to), but haven’t got round to it, what with all the other fascinating things to think about.

19. Kristal Cantwell Says:

I think that it would be a good idea to have a polymath5 tag at Math Overflow I think this was mentioned in the previous thread. It would be useful because there could be a search for that tag that would give a complete list of questions so there could a link from the wiki to all the polymath5 questions at Math Overflow. Otherwise if there were a lot of questions it could become a problem to keep track of them.

• Harrison Says:

I agree that this sounds like a good idea, particularly if we start asking a bunch of questions on MO. Tim asked a question a few days ago about partial sums of multiplicative functions — I’ll leave it up to him whether to retag. Otherwise, I’ll keep my eyes open.

20. gowers Says:

I have a prediction but not the time to check it (which in any case I’d have to do by hand, whereas others here could do it in seconds). The prediction is that if you take two HAP subsequences of the 1124 sequence and look at the partial sums of their pointwise product, then they will grow approximately linearly with a gradient between -1 and 1 that’s a multiple of 1/3. The idea is that this gradient should correspond to the angle between the roots of unity corresponding to the elements of $C_6$. This prediction is obviously correct when the two sequences are either approximately equal or approximately minus each other — it’s the other cases that interest me.

• Alec Edgington Says:

I’m afraid my first few calculations are inconclusive. The sequence may just be too imperfect. But it may be worth plotting a histogram showing the distribution of the gradients to see whether there seem to be peaks at these values.

• Alec Edgington Says:

I do have some interesting plots of these gradients. I’ll upload them and post a link soon.

• Alec Edgington Says:

Here is a plot to give some idea of the situation (apologies for the primitive presentation):

This plots, for the first 1124 sequence $x$, the partial sums of $T_p(x) \cdot T_q(x)$ for primes $p$ and $q$ up to and including $17$.

Short of time now, but I can easily produce similar plots for other HAPs on request!

• Alec Edgington Says:

An interesting phenomenon revealed by those graphs is that, whereas some pairs of HAPs start in synchrony and then get out of step towards the end, others start out of step and get into synchrony later on. We have probably been missing these in our analysis.

• gowers Says:

That is indeed very interesting, and may completely change my view about what is going on. There now seems to be the possibility that sequences like to mess about a bit early on and then gradually get more and more multiplicative, which is almost the opposite of what I thought before.

Now that I’ve got a bit more time, let me explain why I made the prediction that doesn’t seem to be especially backed up by the data you have provided. Suppose that we define a sequence by taking a completely multiplicative function $\phi$ to $C_6$ and composing it with the map that takes 0,1,2 to 1 and 3,4,5 to -1. Suppose now that we compare the d-sequence with the 1-sequence. If $\phi(d)=1$, say, then $\phi(dn)=1+\phi(n)$ for every $n$. So we’d expect that $x_n$ and $x_{dn}$ would be equal two thirds of the time and different for the other third. Moreover, I’m pretty sure we would expect the unequal places to be distributed very evenly, so that we should go one step down for every two steps up, giving a gradient of 1/3. Similarly, if $\phi(d)=2,3,4,5$ we’d expect gradients of $-1/3,-1,-1/3,1/3,$ respectively.

The fact that this does not seem to be happening is one of those bits of bad news that is not really bad news at all, but a sign that something interesting and unexpected is going on.

• Alec Edgington Says:

A caveat to my comment above: it is possible that those high correlations appearing late on may be an artefact of the search algorithm used to find this sequence: for example, by values being initialized to $+1$. it would be worth regenerating the sequence with random initialization values to see if the pattern persists.

21. Harrison Says:

This is a meta-comment; it would be useful for those of us “following along at home” (i.e., haven’t contributed much yet but are optimistic about their chances of being able to contribute later on), and it would likely be useful even for the core group of contributors that has established itself so far, if we had some uniformity in how we analyzed and presented experimental results.

It shouldn’t be that difficult to set up and make available some little bit of software that made it really easy to extract and visualize and compare HAPs (given a text file containing a sequence of +s and -s and some whitespace as input) and various other patterns with a really basic UI, should it? Right now it’s probably not very necessary, but in the future if we start looking at more complex patterns it might become essential. I could probably hack something like this together myself if I had to, but I’m a pretty lousy programmer, plus my semester’s starting this week and I’ll have other obligations…

22. Thomas Sauvaget Says:

Since I haven’t seen it on the wiki, nor apparently discussed in previous posts (I may overlooked it), I’m wondering if listing valid sequences as a function of length has already been discussed. In particular, to what extent could an analogy to prime sieving be made (and to, say, Euclid’s proof that there are infinitel many primes)? I’ve started the list here: http://thomas1111.wordpress.com/2010/01/11/number-of-sequences-as-a-function-of-length/

23. gowers Says:

Thomas, it occurs to me that the following would be easy to produce and very useful. For each sequence, it would be great to have a diagram that displayed a grid of numbers, with the number mn in the mth row and the nth column, together with a plus or minus and a colour, given by the value of $x_{mn}$. Probably going up to about 20 columns and 30 rows would be enough (if the sequence has length at least 600 — one could reduce it for shorter sequences). With such a chart, it would become a much easier matter to compare the values of the sequence at HAPs.

24. gowers Says:

This is one for Alec. I still haven’t given up on the golden ratio (or some other irrational) making an appearance and leading to interesting examples with low discrepancy. My first attempt was a dismal failure, but here’s a method that might work. In fact, it would also be very interesting if it fails to work.

Recently, we have been experimenting with sequences that are subject to constraints, which can be summarized as saying that maps of the form $T_d$ are forced to have certain periodicity properties (or are forced to equal each other). I am interested in forcing $T_2$ to be quasiperiodic, or “periodic with an irrational period”. To achieve this is slightly trickier than it is just to make $T_2x=-x$, so let me try to say exactly what I’m asking for.

Given a point $z\in\mathbb{T}$ we can define a $\pm 1$ sequence by taking $\alpha$ to be the golden ratio and looking at whether the imaginary part of $z\exp(2\pi in\alpha)$ is positive or negative. (Let’s choose $z$ so that it is never zero.) I’ll call this a sequence of period $\alpha^{-1}$. I’m interested to create a sequence subject to the constraint that every subsequence of the form $x_k, x_{2k}, x_{4k}, x_{8k},\dots$ is periodic with period $\alpha^{-1}$.

The difficulty is that there are infinitely many choices for $z$. One could artificially reduce this to two: $z=\pm 1$. I think even this could work. A more complicated alternative would be to choose some small (and probably even) integer k and choose a kth root of unity, optimizing in some way what happens at the first few values. But to begin with just choosing $z=\pm 1$ is very much in the spirit of what you have already been doing.

The motivation for this experiment is that multiplicative structure that is too exact seems not to work. Here is a multiplicative structure that avoids periodicity, and may thereby achieve theoretically what the experimental sequences seem to do. Having said that, I have not detected any evidence in the experimental sequences for this quasiperiodic behaviour. I just think it would be fascinating to see what happens if we try to impose it. Needless to say, there are many variants one could try if the first one doesn’t give good results.

25. Gil Kalai Says:

One question that I find interesting is if there is some notion of lines for $\{1,2,\dots,p-1\}^n$ so that the analogous question for these lines will imply Erdos’ conjecture and also will have a chance of being correct. (Combinatorial lines containing the 0 vector do not work.)

The following notion of lines may be appropriate: a line has p-1 points $x^1,x^2,\dots,x^{p-1}$ so that in every coordinate $j, x^i_j=i \times r_j$, and $r_j$ can be arbitrary numbers between 1 and p-1. (We can assume $r_1 = 1$.)

If we replace $(x_1,\dots, x_n)$ by $\sum x_i t^{n-i}$ where t is large compared to p then such lines are maped to homogenious AP. So the question is if we give every element of {1,2,..,p-1}^n a sign is there such a line with unbounded dscrepency. (Maybe the answer is easily “no” but I dont see it off hand.) Such lines might be handled by some Fourieish arguments.

• Gil Says:

For the reduction to work we do need that the lines will have only s points and that $r_i \times s < p$ for every i. The problem is of interest also for the lines as I defined above.

26. Klas Markström Says:

Alec, what does your program find if you require both that $x_{2n}=-x_{n}$ and $x_{3n}=-x_{n}$ ?

The reason that I ask is that I know that there is at least one sequence which has this property initally and can be extended to a length of at least a few hundred, but I don’t know if the property is preserved in the extensions.

27. obryant Says:

The discussion continues here:

https://gowers.wordpress.com/2010/01/11/the-erds-discrepancy-problem-iii/

28. Terence Tao has solved the Erdős discrepancy problem! | The Aperiodical Says:

[…] Terence mentioned in the abstract, his proof builds on the work of the collaborative Polymath5 project, which took place on Timothy Gowers’s […]