The Erdős discrepancy problem IV

Again, this is just a brief report on what is going on. Plenty of work is still being put into making sure that the main results, questions, ideas, etc. are appearing on the Polymath5 wiki and especially (for the moment) the experimental results page, so looking at that is still a good way to keep up.

Alec came up with a sequence of length 1112 that is fairly different from the sequences of length 1124. See this comment and succeeding ones for information about it. I am particularly intrigued by the diagram showing partial sums of pointwise products of different HAP-subsequences. There is data here that demands to be explained, but the focus of the discussion is elsewhere for now.

Kevin has suggested thinking about how the number of sequences of length n compares with the number of sequences of length m, where by “sequences” I mean sequences with relevant properties such as having discrepancy 2 and perhaps being multiplicative as well. There is some interesting data here too.

The logarithm of the number of multiplicative sequences of discrepancy 2 and length n behaves in a suggestive way. Yet more to chew over.

I myself have started thinking about an algorithm for generating long low-discrepancy multiplicative sequences in the hope of proving that the discrepancy of such sequences can grow more slowly than logarithmically.

Gil has suggested looking at randomized modifications of the problem with a view to obtaining insights about the problem itself.

Alec has looked at the behaviour of Dirichlet series built out of some of the good examples we have.

Finally, I am hoping for good news from Klas in the not too distant future.

105 Responses to “The Erdős discrepancy problem IV”

  1. Jason Dyer Says:

    Timothy, regarding the “intelligent greedy” algorithm, what I’d really like to do is a computer implementation, so I’d like to nail down the exact process.

    From what I gather you’re using two kinds of forcing:
    a.) using a completely multiplicative property
    b.) using prohibited subsequences.

    For b.) then it would be useful to start by generating a list of the prohibited subsequences.

    Regarding a “stupider” version which might be more theoretically tractable, I was thinking of setting the main look-ahead of the sequence at numbers where (given a natural number x, \mu(x) is the corresponding term of the Möbius sequence) \mu(x)^x > 0. (Here in the OEIS.) It’s a distinct-factor rich sequence but also related to something with known properties.

    • Jason Dyer Says:

      I see at the very end of the last thread Thomas has similar thoughts to mine, so instead of coding I might focus instead on some theoretical algorithm analysis, perhaps with a finite-state automata or something along those lines.

    • gowers Says:

      Your description of what I’m doing is right. Let me say a little bit more about the forbidden subsequences that kept coming up. I found that it was useful to put a little tick above a number every time I could prove that the partial sum up to that number had to be zero. (This may be less useful when the range of allowed sums becomes bigger.) For instance, as I said before, if you get the sequence – + + with the initial minus at an odd number 2n+1, then you know that the sum up to 2n must be 0, and you don’t need to know all the values up to 2n in order to deduce that. And of course, if you know what the partial sum is up to some m and you do know all the values from m to m+k, then you know the partial sum up to m+k.

      On the subject of a theoretical analysis, I think it would be interesting, and I also think it could well be possible, to describe circumstances that will force some partial sum to go above C in magnitude. (I’m still talking about completely multiplicative sequences.)

      The kind of thing I mean is this. Let us say that the values at p_1,\dots,p_k force the value at n if p_1,\dots,p_k are the factors of n that occur an odd number of times. For example, as I found when building a multiplicative sequence of length 246 by hand, the value at 2 forced the value at 242: once I had decided that x_2=-1 I had no choice but to set x_{242} to be -1 as well.

      Now let’s suppose we are trying to pick the values for all primes less than some M. We could make a list of all values that will be forced by the values we choose. If we did so, then for every n that is forced, there would be an associated set A_n of the primes that force the value at n. If the number of -1s in A_n is odd, then x_n=-1 and otherwise x_n=1. So if we have several intervals with a high density of forced numbers, we could find ourselves in a situation where we have several collections of sets A_n and we need to choose -1s in such a way that for each collection we want to hit roughly half the sets an even number of times and roughly half an odd number of times. And that could turn out to be difficult.

      Of course, proving the existence of these runs of forced numbers could be difficult too. But perhaps at the very least one could guess how much they should occur and try to come up with heuristic arguments that predict how far you need to go before you cannot avoid a discrepancy of C. My suspicion is that you can’t go for ever but that you can do better than exponentially in C.

    • Mark Bennet Says:

      So that involves factoring n as st where s is a square and t is square-free and listing the prime factors of t. It ought to be possible to compute a long list for this quite rapidly.

    • Américo Tavares Says:

      Comment to Mark Bennet, January 14, 2010 at 4:52 pm.

      Just to confirm whether I understand your words in the general context.

      For instance, for the full sequence by Thomas Sauvaget posted in Tables for the second 1124 sequence is 1124=4\cdot 281, (n=st,n=1124,s=4,t=281; 281 is a prime).

    • Mark Bennet Says:


      My understanding is that in your example, Tim is noting that, in a multiplicative sequence, the value at 1124 is determined by that at 281.

      So once the values at a number of primes are known, they force further values in a similar way.

      Stage 1 of an algorithm is to fill in all the forced values.

      Stage 2 is to deal with the implications of that forcing – so that if there are sequences of forced values matching particular patterns, they will imply constraints on any further choices at new primes. (Potentially, later, when we look at more intelligent searches, intervals in which the forced values are relatively dense, and match more sophisticated patterns). The fill in further forced values (stage 1) and iterate until no more values are forced by constraints, or a contradiction is reached (impossible to continue).

      Stage 3 is then to determine that it is impossible to continue with the current value of C, or to make a free choice of the value at a new prime. It would be possible to do backtracking over free choices to find the optimum choice at each stage, and this may reveal further patterns to be taken into account at stage 2 to make the search more efficient.

    • gowers Says:

      That’s a point (I’m referring to Mark’s first comment above). Having a list might be pretty useful as one could look for promisingly difficult areas and try to use them to prove lower bounds. For example, an initial target might be a proof that the discrepancy of an infinite multiplicative sequence cannot be 3. (The aim here would be to have a proof that doesn’t just do an exhaustive search.)

    • Mark Bennet Says:

      I tried to implement something in a crude way – but overflowed the memory constraints on my laptop. I was aiming at stats up to N=100,000.

      The procedure aimed at factoring n=pqr
      p is the maximal square factor
      q is a product of small primes (I went up to cube root of N)
      r (which I left unanalysed) is then the product of at most two large primes.

      Then analysing q gets you most of the constraints you are worried about.

    • Américo Tavares Says:


      Thank you for your reply (January 14, 2010 at 6:17 pm)
      and explanation.

  2. Jason Dyer Says:

    This hasn’t gone much of anywhere, but in the Polymath spirit I’ll post what I did anyway.

    I was thinking about how to represent what the relationships between forcings was rather than what the exact elements were, and using the C=1 case as an example I tried a graph structure where one could collapse nodes (see picture). With all the links and cycles drawn in the graph can eventually collapse into a single node (hence every move in the case C=1 is forced).

    I was hoping to extract some meaning using leaves (which would include primes p such that 2p n) and cycles, but I haven’t had any luck yet.

  3. Américo Tavares Says:

    Sometimes I post a draft post (or comment) just to make sure I get what I want. Then I copy the HTML code and post it as a comment.

  4. Américo Tavares Says:

    My last comment was intended to

    Jason Dyer, January 14, 2010 at 6:35 pm

    “(Something is breaking in my equation attempts but I have no idea why.)”

  5. Harrison Says:

    This is probably too generalized/abstract/nonsensical to actually be useful for the problem, but it occurred to me just now and I can’t resist mentioning it, just in case:

    Of course the positive integers form a poset under divisibility. A poset is a special kind of category.

    In this category, each morphism has an associated integer. If we have a completely multiplicative sequence, this is equivalent to assigning to each morphism a +1 or -1 such that the sign of the morphism acts nicely taking commutative diagrams; equivalently it’s a 1-dimensional quiver representation over GF(2).

    If we look at the first step in a quasi-multiplicative sequence, this gives us a 1-dimensional quiver representation (over C) which is also an irreducible representation of an Abelian group.

    Can we study the problem using quiver representations? Like I said, I kind of doubt it (especially since it seems kind of hairy to define HAPs with any sort of generality) but it’s worth throwing out there.

  6. Alec Edgington Says:

    A while back, Tim asked what happens if you take one of the 1124 sequences and search for sequences whose even terms match it. I thought this was an interesting idea, and tried it just now. It doesn’t get far. The maximum length of sequence subject to the constraint was 362. Applying unconstrained search using the resulting sequence got no further (in the time I let it run) than 363. Repeating the experiment with the additional constraint that x_1 = -x_2, the maximum was 314 and the furthest I got with unconstrained search from there was 494. Still, it is quite interesting that it doesn’t work.

    • gowers Says:

      It is certainly interesting. It occurs to me to wonder whether we might have more luck if we tried to choose which T_d to and sign to go for (where here you chose T_2 and both plus and minus) with the multiplicative structure of the existing sequence very much in mind? It may be that one would still end up choosing T_2 and minus — I haven’t looked. For example, if T_3 is roughly minus 1, then it might make sense to look for a sequence whose third element came from the 1124 sequence.

      But probably the idea just doesn’t work. I think we ought to see if we can think of a plausible reason for this.

    • Mark Bennet Says:

      The 1124 sequence is closest to its 8-subsequence, so that might be worth a try. But the first 1124 sequence has 1 as a sporadic value (ie its initial values are not replicated in any of its own subsequences), and this may provide an explanation for the difficulty.

      I’d try one of the more regular sequences: the second 1124 sequence has more regularity than the first.

    • Alec Edgington Says:

      Curiously, one gets exactly the same distance (363 and 314 respectively) constraining \pm T_2 to be the first 1124 sequence as one does constraining \mp T_2 to be the second 1124 sequence. (Note reversal of signs.) I can’t believe this is a coincidence, but whether it is something deep or a result of how the seed sequences were generated by Klas, I don’t know. Let me try it with the 1112 sequence next.

    • Alec Edgington Says:

      Not surprisingly, given how the 1112 sequence was formed, one gets much further in this case (to at least 998). Even with the ‘wrong’ sign constraint, one can get to 374 (but no further).

  7. Thomas Sauvaget Says:

    A quick update regarding refined greedy algorithms and multiplicative sequences: I’ve now a added a source code which I hope is straightforward enough to modify in order to test many ideas. So far I’ve done some runs with a method that compares the sum of partial sums in the little section here The sequence I obtained is clearly not sub-logarithmic unfortunately. I’ll try to test more ideas using Tim’s comments tomorrow.

    • Thomas Sauvaget Says:

      I’ve now obtained and plotted some data which go some way in answering Tim’s question 4 of the wish list, and I’ve put it on the wiki

      Namely, the answer thus seems: yes, the partial sums grow at least logarithmically.

    • Alec Edgington Says:

      Thomas, in your last plot, I wonder what it would happen if you calculated \ell_s (q) as a weighted sum, giving weight 1/d to the discrepancy-d sequence (in other words, weighting the HAPs according to how often they matter). This would be in line with the spirit of the greedy algorithm.

    • gowers Says:

      Thomas, these plots are interesting, and seem to show that a naive greedy algorithm fails. My guess would be that a naive greedy algorithm is about as good as the Liouville function — that is, giving square-root-like discrepancy.

      That still leaves me curious about the more sophisticated greedy algorithm where at each stage you try to minimize not just the discrepancy so far but some parameter associated with the difficulties you are potentially making for yourself later. (I call this greedy because what I am not allowing is any backtracking — once you’ve made a decision you have to stick to it.)

      I’m not sure I see the point of plotting the partial sums along any of the HAPs other than the sequence itself, since the discrepancy for a multiplicative sequence is equal to the largest partial sum so far (in modulus).

    • Alec Edgington Says:

      Of course: please ignore my last comment then — I was forgetting the functions were multiplicative.

  8. Alec Edgington Says:

    While investigating the Dirichlet series, I noticed that the sequence (y_n) obtained as the Dirichlet inverse of the 974-sequence with nice multiplicative properties has some rather nice ones of its own, evidently closely related to those of the original sequence. For example:

    y_{4n} = 0 for all n

    y_{2n} = y_n whenever n \neq 2 (\mathrm{mod} 4)

    y_{5n} = y_n whenever n \neq 0 (\mathrm{mod} 5)

    (I’ve posted the sequence on the wiki, and plotted it here:

    I think these properties can be derived from the Dirichlet inversion formula by induction, but haven’t unravelled the details: I’ll attempt to do so. It’s possible that understanding the properties of the Dirichlet inverse might provide an alternative way to generate long low-discrepancy sequences.

    • Alec Edgington Says:

      OK, I’ve checked by induction that if T_p x = \pm x, and y is the Dirichlet inverse of x, then y_{pm} = \mp y_m for p \nmid m, and y_{p^2 m} = 0 for all m. Now I wonder what happens in the case of T_p x = \pm T_q x

  9. Harrison Says:

    A while back, I posted about what I called the “HAP basis.” I finally wrote some code to compute the HAP basis of a given sequence, and have begun to look at the HAP basis of some of our low-discrepancy sequences. Disappointingly, what I was hoping (that they would “look like” bases of completely multiplicative sequences) seems not to be the case in general, but if someone else wants to try to find patterns (and I’ll keep looking), please go for it!

    By the way, I’m considerably less optimistic about this than I was about an hour ago, but I’ll post it anyway. What motivated me to finally write the code was the thought that perhaps long low-discrepancy sequences only had a small number of vectors in their HAP basis — like how completely multiplicative sequences may only contain primes and prime powers in the basis.

    I conjectured to myself that perhaps any sequence whose HAP basis had positive upper density had to have high discrepancy, and even thought briefly about how to go about proving it. (Density increment of some sort?) Again, I’m much less optimistic now that this is in fact something that we can prove (without proving the EDP itself), but it’s possible that it is true, provable, and just happens to give much worse bounds than the “actual” bound. If someone else thinks it could be attackable, please comment!

    • Harrison Says:

      Actually, I just tried something else which makes me more optimistic about the conjecture possibly being approachable, which is run my program on Alec’s length-1112 sequence. So I think I’ll write a little more about my thoughts on it.

      So, the HAP basis is an alternative basis of the vector space of GF(2)-valued functions on N. It behaves rather nicely under passing to a HAP as a subsequence, and also it’s often possible to change slightly without affecting the discrepancy much. It also behaves nicely under truncation, but it’s pretty sensitive to changing the sequence pointwise. The definition of the basis is the characteristic functions of HAPs.

      If you don’t look at discrepancy, the basis has the same statistical properties as any other basis (duh). In particular a random sequence of length N will decompose into about N/2 basis vectors.

      One curious thing is that multiplicative sequences have highly restricted HAP-basis decompositions; they are sums of characteristic functions of HAPs with prime-power differences. Since multiplicative sequences (and sequences resembling them) seem to be able to have quite low discrepancy, I was led to my conjecture, which is again:

      Let S \subset N have positive upper density, and let x_S be the sequence that decomposes in the HAP basis into exactly the (characteristic functions of) HAPs with common difference in S. Then x_S has unbounded discrepancy.

      This sort of resembles Szemeredi’s theorem, and I wondered if some of the same techniques might not carry over.

    • Harrison Says:

      So how would we go about proving this conjecture? Well, one other thing about multiplicative and even quasi-multiplicative sequences is that passing to a HAP-subsequence doesn’t change the discrepancy much. So I thought, well, maybe we could try a “density increment” argument?

      There are a few obvious problems here, though. First of all, I’m not even sure that we know how to prove the conjecture taking S = N! So the “base case” for density increment would probably be highly nontrivial. Second of all, you can’t partition N with a finite number of HAP sequences (reference: Euclid), which is a major respect in which this problem differs from similar ones. Finally, note that passing to a HAP subsequence isn’t likely to drastically increase the upper density of the basis when you start with a random sequence, so you’d have to introduce yet another “random/structured” dichotomy.

      Still, I hoped that one might be able to construct an argument along the following lines: First, prove the conjecture if the density is greater than some c < 1. (I'm not sure how this would go, even now.) Then, if there's some k for which T_k (x_S) has a denser basis, pass to that subsequence. Otherwise, use the fact that there's no such k to prove that one can add a positive fraction of new HAP basis vectors to obtain a new sequence which again would hopefully have bounded discrepancy.

      This is rather farfetched, but I felt like if it had a chance of working, we'd be capable of making it work. But my hopes were dashed when I ran my program on the first sequence of length 1124 and saw that there were 494 basis vectors in the HAP decomposition — less than the "expected number" of 562, but not a whole lot less. I thought, well, clearly the density of the HAP basis isn't controlling much at this point.

      But then I computed the HAP decomposition for Alec's 1112 sequence — it had only 247 basis vectors! (That's exactly half of 494, by the way, and it's also 1 more than the maximum length of a completely multiplicative sequence. I'm 99% sure that these are both coincidences, but I greatly hope I'm wrong, because that would be fascinating.) That's density about 1/4, which makes me wonder if there are some connections after all.

      I no longer know if density increment is the way to go, though. I wonder if the 1124-sequence can be obtained from the 1112-sequence by pointwise operations and adding extra HAP vectors — actually trying to do this is probably out of my reach right now, but someone else might have an idea. I wish I knew the ergodic proof of Szemeredi; the "natural home" for my conjecture seems to be the infinitary setting, so I'd like to look at it with infinitary tools, but I don't have them in my toolbox. Apart from that… that's all I have to say for now.

    • gowers Says:

      One small remark is that the natural decomposition to look at for an additive combinatorialist would be into a small number of HAP-basis vectors plus some kind of quasirandom error, which would conveniently deal with the difficulty you are facing with changes at a small number of points.

      I’m not altogether clear that a structure/randomness dichotomy approach could be made to work, but I don’t think it can be ruled out. The structured case would be multiplicative (and perhaps quasimultiplicative) sequences, and the random case would be one where you have almost no multiplicative structure at all. There are various ways of making that precise. I put one on the wiki in the interesting-subquestions section. Another, which I think was in a blog comment somewhere, is to say that a sequence has almost no multiplicative structure if the sum of x_ax_bx_cx_d over all quadruples (a,b,c,d) such that ab=cd is small. When the theoretical aspect of the project starts properly, proving that a low-discrepancy sequence must have at least some multiplicative structure is one of the first things I want to think about.

    • Gil Says:

      Harrison’s HAP basis looks very interesting. I could not understand the precise conjecture though. Can you restate?

    • gowers Says:

      Here’s what I understand it to be. I’d be interested to know whether my interpretation is correct. Let’s begin with the beginning of the first 1124 sequence. We’ll then expand it as a mod-2-linear combination of characteristic functions of HAPs. It’s easy to see that this can be done in precisely one way. In fact, I’m going to do it over \mathbb{Z} instead because I think that gives interesting extra information (but you can always reduce mod 2 if you want to).

      The sequence starts

      +1 -1 +1 +1 -1 -1 -1 -1 +1 +1 -1 +1 +1 +1 -1 -1 +1 -1 +1 +1 -1 -1 -1 +1 -1 +1 +1 -1 +1 -1

      Following Harrison I’ll write (d) for the HAP {d,2d,3d,…}. So this sequence is (1) plus

      0 -2 0 0 -2 -2 -2 -2 0 0 -2 0 0 0 -2 -2 0 -2 0 0 -2 -2 -2 0 -2 0 0 -2 0 -2

      which is (1)-2(2) plus

      0 0 0 2 -2 0 -2 0 0 2 -2 2 0 2 -2 0 0 0 0 2 -2 0 -2 2 -2 2 0 0 0 0

      which is (1)-2(2)+2(4) plus

      0 0 0 0 -2 0 -2 -2 0 0 2 -2 0 0 2 -2 -2 0 0 0 0 -2 0 -2 0 -2 2 0 -2 0 0

      which is (1)-2(2)+2(4)-2(5)-2(7) plus

      0 0 0 0 0 0 0 -2 0 2 2 -2 0 2 4 -2 -2 0 0 2 2 -2 0 -2 2 -2 2 2 -2 2 0

      (Remark: in order to get rid of the -2s at 5 and 7 there I “spoilt” a lot of zeros later on, which suggests that the HAP basis is not always very efficient, just as Harrison found.)

      I think the idea is clear, so I’ll leave it there. I’m basically Möbius inverting. More precisely, if c_d is the coefficient of (d) in this expansion, \phi(s) is the Dirichlet series associated with the original sequence (x_n), and \psi(s)=\sum c_nn^{-s}, then \phi(s)=\psi(s)\zeta(s). Since \zeta(s)^{-1}=\sum\mu(n)n^{-s}, it follows that c_n=\sum_{d|n}x_d\mu(n/d).

      It would be good if someone could do this to a few of the interesting sequences and post the results on the wiki. It would complement nicely Alex’s intriguing computations of Dirichlet inverses.

    • Harrison Says:

      Hm, I meant to consider {+1, -1} multiplicatively (which is of course isomorphic to GF(2)), so the decomposition would instead proceed like:

      +1 -1 +1 +1 -1 -1 -1 -1 +1 +1 -1 +1 +1 +1 -1 -1 +1 -1 +1 +1 -1 -1 -1 +1 -1 +1 +1 -1 +1 -1

      Which I’ll write in binary so that it’s shorter, as:


      Now this is (2) = 010101… plus:


      which is (4) plus:


      which is (5) plus:


      and so on.

      (All of this is addition (mod 2), or multiplication in {-1, 1}.)

      Actually this seems to be giving essentially the same basis as Tim’s decomposition — I’m not too sure why this should be the case, particularly if the sequence isn’t multiplicative, and it may be a coincidence. If they do agree in general (and I could easily be missing something obvious) then probably the computation over \mathbb{Z} should be used since it would seem to give strictly more information.

    • Gil Says:

      OK: So let me see if I understood the conjecture:

      Given a 0-1 sequance x= (x_1,x_2,x_3,\dots) you can write it uniquely as a sum modulo 2 of characteristic functions of HAPs. Lets us write x=\sum\{(d): d\in S\}. Here (d) is the characteristic vector of the HAP (d,2d,3d,…).

      Suppose that in this sum the set S has positive density.

      Then there is no constant K such that the partial sums \sum_{i=1}^n x_i are all between -K and +K.

      (Is this the conjecture?)

    • harrison Says:

      Well, not quite, but you have the right idea. First there’s the small issue of x_i being a 0-1 sequence (in your formulation) rather than a +/- sequence; in addition, you need to take HAPs of difference other than 1, or else I’d imagine you can probably construct a counterexample. So replace the last statement by:

      There is no constant C such that the partial sums \sum_{i=1}^n (2x_{ki} - 1) are all between -C and +C.

      And that’s the conjecture.

    • Gil Says:

      I see, so your question is a special case of the entire conjecture. (You look at the discrepency of all HAPs, namely, you want that there is no constant C so that all the partial sums for every k and n are between -C and +C; right?)

      (Also you insist that 1 is not in S?)

      Do you have an example where the stronger statement is false: S is dense and yet all sums

      \sum_{i=1}^n (-1)^{x_i} are bounded between C and -C.

    • Harrison Says:

      Amusingly, I can’t construct a counterexample to your “stronger statement”, but I can easily prove that one exists! Think about the set of sequences obtained by replacing 0 by 01 and 1 by 10. There are too many of these sequences for them to all have small decompositions, but all of them have bounded sums of the form you give.

    • Alec Edgington Says:

      Harrison, I think your construction is essentially the same as Tim’s, and is what I understood too. Alternatively, in product notation:

      x_n = \prod_{d \mid n} c_d

      where c_n = \pm 1.

      The inversion formula is then:

      c_n = \prod_{d \mid n} (-1)^{\mu (n/d)} x_d

      I did experiment a while ago with searching over the c_n instead of the x_n, and found no dramatic difference in results. But it is an interesting construction, and perhaps as Tim suggests there may be important information in the coefficients over \mathbb{Z}.

  10. Harrison Says:

    By the way, I’m looking back over the old threads and this seemed to drop off a while back, but did we ever find a quasi-multiplicative sequence (factoring through C_6) with discrepancy 2 that was significantly longer than the longest completely multiplicative sequence? (I know Alec had a Python script which was meant to compute the longest such sequence, but it was too slow.) This seems to be one of the major things missing from the experimental results page, and I’d really like to be able to take a look at such a thing.

  11. Jason Dyer Says:

    Alec, I’m having trouble getting the depth-first search to work. I’m trying to get the four sequences you report for N(1,2), but I get no sequences. I set N to 41, C_min to -1, and N_TO_COUNT at 41, but the maximum sequence it manages is 11. That’s the only thing I change in the code.

    • Jason Dyer Says:

      For that matter, the N(1,2) sequence on the website doesn’t make sense to me either…

      0 + – – + – + + – + +

      0 1 0 -1 0 -1 1 2 1 2 3

      am I reading this wrong?

    • Jason Dyer Says:

      Never mind. I really should get some sleep. Still can’t get the code to work, though.

    • Alec Edgington Says:

      Jason, you also need to remove the EXCLUDE conditions, either by changing the definition of EXCLUDE to ‘#define EXCLUDE 0’ or by commenting out the text ‘ || (EXCLUDE)’ in the program.

      And another thing I should have mentioned is that init_x should not be longer than N. I neglected to put a check for this in the program.

    • Jason Dyer Says:

      Thanks, I got it working. I also modified the code display all the sequences (instead of only one) and should note that N(1,2) has only slight differences between the valid sequences:





  12. Harrison Says:

    Alec, I’m writing some code to look at quasi-multiplicative sequences, and I’m attempting to do it — perhaps naively — by modifying your code for completely multiplicative sequences. However, I’m getting stuck at one point interpreting the code:

    xx = [[[False]*n_primes, 1]] # 0th item = specification of primes for which x[p]=-1, 1st item = partial sum

    It seems like xx is meant to contain some data about every low-discrepancy list of the length we’re at; but I can’t for the life of me figure out what data it contains. Can you shed some light on this situation?

    • Alec Edgington Says:

      In that script, xx is a list of representations of good sequences. Each of those representations is a two-member list consisting of, first, a list of booleans corresponding to whether or not each prime is mapped to -1, and secondly, a number representing the resulting partial sum. In other words, xx[m][0][i] is True if and only if sequence m maps prime i to -1; and xx[m][1] is the sum of sequence m.

  13. gowers Says:

    I’m going to post a question over at mathoverflow, because I’m almost certain there’s a good answer to it but I don’t know the answer myself. Here’s the question.

    Suppose you take a \pm 1 sequence and you want to “improve it” by taking pointwise limits of translates. What properties can you guarantee to get in the limit?

    Two examples illustrate what I think should be the extremes. If your sequence is random, then every finite subsequence occurs infinitely often, which means that it is easy to find translates that converge pointwise to any sequence you like. In particular, you can make the final sequence constant, though what interests me is that it is highly structured.

    By contrast, let’s suppose that the sequence is quasiperiodic, or more concretely that you take a real number \alpha and define x_n to be 1 or -1 according to whether the integer part of \alpha n is even or odd. In that case, it is not hard to prove that every pointwise limit is quasiperiodic as well. (Unless I’ve made a mistake, the result is that there must be some \beta such that z_n is 1 or -1 according to whether the integer part of \alpha n+\beta is even or odd.)

    What can be said in general? Can we always “get rid of the randomness” and find a highly structured limit?

    In our case I’m hoping to apply whatever insights people can provide to subsequences of the form x_m, x_{pm}, x_{p^2m},\dots, since if these are highly structured, then that should imply multiplicativity properties.

    Incidentally, this time I tagged the question with polymath5. I agree that doing that is a good idea.

  14. obryant Says:

    Here’s another possible mathoverflow question, if nobody here knows the answer. Depending on the answer, it might illuminate the situation with multiplicative sequences.

    Let S(k) be the set of number whose squarefree part has no prime factors larger than k. Does S(k) contain arbitrarily long intervals?

    Anticipating a negative answer, set L(k) to be the length of the longest interval in S(k). Clearly L(k) \geq k, but does it grow linearly?

    • gowers Says:

      One could prove some nonrigorous bounds by assuming that the maximum gap between consecutive primes is (\log n)^2. That would mean that in order to get a long interval of the kind you want, you’d have to go exponentially far in k, at which point you might argue as follows. A number has a positive probability of being square free, and typically you expect a number to have about \log\log n prime factors, so in particular it will have a factor of size at least n^{1/\log\log n}. If n=2^k, then that gives us 2^{k/\log k}, which is a lot bigger than k.

      I haven’t checked carefully whether what I’ve just written is a load of rubbish, so it might be. But if not, then it suggests that long intervals, other than the trivial one, will be hard to find.

      Having said that, it might well be enough to have long intervals in which almost all the numbers belonged to S(k). In that case, it would no longer be necessary to go a long way out to avoid primes, since one could afford a few primes. On the other hand, primes aren’t the only thing you need to avoid: for example you don’t like numbers that are products of primes with something small.

  15. gowers Says:

    Added to wish list: take some of the interesting long sequences that we’ve generated (particularly the ones that are not constrained to satisfy certain multiplicative relationships) and calculate the following. If the sequence has length n, then work out the expected value of x_ax_bx_cx_d over all quadruples (a,b,c,d) such that ab=cd and all of a,b,c,d are at most n. For a random sequence one would expect this quantity to be fairly close to zero. For a completely multiplicative sequence it is 1, which is the trivial maximum. For a quasimultiplicative function with a small group it should be a not too small positive constant. My guess is that for the first sequence of length 1124 it will be something like 0.3, but I’d like to know whether that is right.

    • Thomas Sauvaget Says:

      Tim, I’m computing it at the moment, I have a preliminary plot here which I’ll update tomorrow Indeed it looks to be around 0.3ish…

    • gowers Says:

      That’s interesting. It would also be interesting (and much quicker) to do a similar computation with Alex’s discrepancy-1 sequence for the complex case. My guess would be that the parameter works out larger in this case.

    • Thomas Sauvaget Says:

      The first computation is now finished (just short of 12 hours on my laptop, at 50%CPU): actually the final value is 0.493215…, so just about midway between complete randomness and complete multiplicativity, with a total of 2 541 424 such quadruples.

    • Michael Says:

      I may have been calculating something different from Thomas. I looked for quartets a<b<c<d<1125 for which ab=cd, and found 952678 of them. For example, there were 435 quartets with ab=30240 because there were 30 pairs with ab=30240, from 27*1120 to 168*180. It took about ten minutes on a 2GHz machine using Matlab.

      The first length-1124 sequence gave a sum 310378 from 952678 terms, or average 0.3258. The second sequence a sum 299842 from 952678 terms, average 0.3147.
      The 1112 sequence gave a sum 377253 out of 930443 terms, average 0.4055.

      For the complex sequence, I guess you want to sum x_ax_b times the conjugate of x_cx_d. That gave a sum 2814 from 5535 terms or .5084.

    • gowers Says:

      Oops, yes I did indeed want conjugates there. As I expected, the parameter is larger, but I have to admit that I thought it would be larger still. But it’s interesting to see this.

  16. Alec Edgington Says:

    I decided to try searching for discrepancy-2 sequences with bounded Dirichlet inverses.

    If we require the Dirichlet inverse to be bounded by 1, we obtain a multiplicative sequence of maximum length (246).

    If we require it to be bounded by 2, we obtain a sequence of length 389. I haven’t looked into the properties of this sequence, but it is interesting that 389 is a frequent sticking-point in searches, so perhaps it represents the limit of some other useful property. I’ll post this sequence on the wiki.

    The search with a bound of 3 on the Dirichlet inverse is still running (and has reached 489).

    • gowers Says:

      Alec, have you tried to work out in what region the Dirichlet series \sum x_nn^{-s} converges if x_n is a sequence of bounded discrepancy? Given the close relationship between the fact that the partial sums of the Möbius function are expected to grow at rate \sqrt{n} and the Dirichlet series \sum\mu(n)n^{-s} is expected to converge for \Re(s)>1/2 (and many similar facts), it seems at least possible that one could prove convergence for \Re(s)>0. I haven’t thought about this properly, but I’m tempted to see what I can prove using partial summation.

      Actually, now that I’ve written that I suddenly feel much less sure about it: it seems to me that either it will be true for any sequence with bounded partial sums, in which case it isn’t very interesting to us since it isn’t using the discrepancy property, or it would need boundedness on all APs and not just HAPs, which is impossible. But it would still be good to clarify this.

    • Alec Edgington Says:

      Tim, yes, we have convergence for \Re s > 0. For this it is enough that

      \limsup \frac{\log \lvert x_1 + \ldots + x_n \rvert}{\log n} = 0.

      (This is proved in Hardy and Riesz, ‘The General Theory of Dirichlet’s Series’, pp. 6–7.)

      So you are right, this fact doesn’t depend on the discrepancy property in any very useful way (unless we can get some handle on the Dirichlet series defined by the T_k x).

    • gowers Says:

      After reading that I realize that you have already pointed it out …

    • Alec Edgington Says:

      I’ve now analysed the 389-sequence that has DI bounded by 2. Up to 337, it is almost completely multiplicative, but departs at powers of 3 — while remaining multiplicative in the weaker sense. That is:

      x_{2^a 3^b 5^c \cdots} = x_2^a x_{3^b} x_5^c \cdots

    • Alec Edgington Says:

      And in fact, that is almost as far as you can get with ordinary multiplicativity. The longest discrepancy-2 sequence with ordinary multiplicativity has length 344, and indeed satisfies the above identity.

  17. gowers Says:

    I may be missing something, but it seems to me that the first result of Borwein and Choi stated in their abstract is much simpler than they make it seem. I’d be interested to know whether others agree with me about this.

    Note first that if n is congruent to 2 mod 3, then the number of prime factors (with multiplicity) that are congruent to 2 has to be odd. More generally, this is the case if the last non-zero digit of n in its ternary representation is a 2. So \omega_3(n) equals -1 if the last non-zero digit is a 2, and 1 otherwise. This is our old friend the function that Mark Walters defined. The partial sum of this function up to n can be computed (as I did in my first post) by partitioning the numbers up to n according to where the last non-zero digit is, and in that way one sees that it is exactly what they say, the number of 1s in the base-3 expansion. End of proof!

    I’d be pretty surprised if their more general “surprizing” formula isn’t equally simple to prove. (An even bigger mystery about the paper is how they were allowed to get away with that Z.)

    • gowers Says:

      Just checked the mod-5 one, and it is indeed equally simple.

      A couple of minutes later: just checked the mod-p one, and again it is equally simple.

    • obryant Says:

      I agree that it is as simple as you make it.

    • gowers Says:

      I’ve just found the published version of the paper, in which they’ve added a coauthor (Michael Coons) and presented the proofs in a way that makes them look less complicated. It looks to me as though they are essentially carrying out the investigation that you were interested in a week or so ago, though I haven’t checked how far they got. I’ll add this reference to the wiki.

  18. obryant Says:

    The following trees represent ways to assign primes in a completely multiplicative sequence. If a leaf is labeled {N,k}, it means that the corresponding assignment has \sum_{i=N-k+1}^N x_i with absolute value “drift”.

    drift = 2

    drift = 3

    drift = 4

    • obryant Says:

      drift = 5

    • gowers Says:

      I find these diagrams interesting but I don’t understand them. For example, if I follow the branch that corresponds to the sequence posted by Alec on the wiki in the drift-5 diagram, it stops at 23 with the pair (23,7). What does that mean? (The values of the sequence up to 23 are 1+ 2- 3- 4+ 5- 6+ 7- 8- 9+ 10+ 11+ 12- 13- 14+ 15+ 16+ 17- 18- 19+ 20- 21+ 22- 23- .)

    • Alec Edgington Says:

      By the way, I’ve posted on the wiki a completely multiplicative sequence of length 627 that satisfies C=3. I don’t know whether this is the limit.

    • gowers Says:

      Alec, there are a few further bits of information that would be very useful to know in relation to your C=3 sequence, and you can probably provide them with your computer much more quickly than I can by hand. The main thing that would be good would be a list of the values that the sequence takes at primes. But it would also be interesting to know the largest m such that your sequence up to m can be extended to a completely multiplicative sequence of maximal length with C=2. I ask this because my hand-calculation suggested that if you go as far as you can with C=2 then you are suddenly forced to go up to C=4, so I’m curious to know how far you have to backtrack if what you actually want is C=3.

      I’d also be interested to know roughly how your algorithm works, as it would be pointless to speculate about cleverer algorithms if they aren’t in fact any cleverer.

    • Alec Edgington Says:

      I’ve found one of length 819 now (and put it on the wiki). I’ll do some analysis on it and report back.

      The algorithm is not clever: it’s just an incremental depth-first search. I am intending to try to implement something a bit cleverer, with more look-ahead, this afternoon, and see how far that gets.

    • gowers Says:

      I’m hoping it will get you much further, since that would suggest the possibility of a very interesting result (sublogarithmic growth for a multiplicative function).

    • Alec Edgington Says:

      Yes, that would be exciting; the jury’s still out on that question I think.

      Right, I’ve added a list of the primes that map to -1. This list does not include 7, and therefore the sequence can’t be extended to a discrepancy-2 sequence of maximal length once it gets past 6. In fact, its discrepancy goes to 3 at 75.

    • Anonymous Says:

      @gowers: I think you followed 19- on the tree, whereas Alec’s has 19+. If you take 2,3,5,7,13,17,19,23 to – and 11 to +, then you get

      0 1+ 2- 3- 4+ 5- 6+ 7- 8- 9+ 10+ 12- 13- 14+ 15+ 16+ 17- 18- 19- 20- 21+ 22- 23-

      and the sum of the seven entries, the last of which is 23, is – – – – + – -, the sum of which is -5.

      Alec’s sequence ends with label {372,5}: all five of 368, 369, 370, 371, 372 are negative.

    • Anonymous Says:

      Alec’s new C=3 sequence (length 819) has drift 5 at 72,73,74,75,76 (all +).

    • gowers Says:

      Yes that’s exactly what I did, so my confusion is now dealt with.

      It occurs to me that another measure of drift would be the maximum difference that you can prove must occur between two partial sums. So if, for example, you have chosen the primes up to n and you find a consecutive block of k integers of which all but j are guaranteed (by multiplicativity) to take the value +1, then the drift is at least k-2j. I wonder how much difference that would make. I also wonder how a greedy algorithm would fare if it tried to minimize the drift at every stage.

      I’m assuming that you calculate the drift of an interval only when the function is defined everywhere in that interval: am I right about that? (What I’m suggesting is that for not-yet-defined values one just assumes that they will end up making the sum in the interval smaller rather than bigger. An advantage of this is that it is a bit more continuous: if you choose the value at a large prime, it is unlikely to affect any drifts by more than 2.)

    • gowers Says:

      The fact that Alec’s sequence takes the value 1 at 7 is a bit of a blow for greedy algorithms. Does it seem that the maximum length of a multiplicative sequence of discrepancy 3 that equals -1 at 7 is much shorter, or is this just an artefact of the algorithm (as might be the case if, say, you always started with x_p=1 and changed it only if you needed to when backtracking — but it would seem more natural to me to start with x_p=-1)?

    • Alec Edgington Says:

      It could well be an artefact of the algorithm. This search actually did start with x_7=-1 and I don’t think it’s tracked back that far. I’ll try using a maximal C=2 sequence as initialization and see what happens then…

    • Alec Edgington Says:

      Started with x_7 = +1, I mean.

    • Alec Edgington Says:

      The good news for greedy algorithms is that you don’t have to change the values at any primes less than 71 if you want a long discrepancy-3 sequence. I’ve found one of length 545 (now on the wiki) that agrees at primes up to 67 with the maximal discrepancy-2 sequences, and that retains discrepancy 2 up to 243.

      But this seems to contradict your earlier observation, so maybe one of us has made a mistake?

    • gowers Says:

      I hope that if one of us has made a mistake then it’s me, as I wasn’t very pleased about the sudden jump from 2 to 4. My justification for that assertion was here. If I have a moment, I’ll check it again.

    • obryant Says:

      Yes, I only calculated the drift of intervals that are completed determined. The calculation took less than a minute for drift=5. The computation for drift=6 ran for 10 hours and still was only about 40% done. The easy 40% (biased toward +’s and away from -‘s), actually.

      The code isn’t optimized, and actually is running in Mathematica, so the timing isn’t interesting. The ratio of time is, however. The COMBINATORIAL EXPLOSION happens between drift=5 and drift=6.

      Perhaps interesting is that the longest branches of the trees seem to be of the alternating sign sort. Sometimes two plusses followed by two minuses.

    • Alec Edgington Says:

      Tim, in your analysis did you skip 244 (which maps to +1)? I’m also not sure why you say the sum to 241 must be at most 1 if we are allowing discrepancy 3.

  19. New thread for polymath5 « Euclidean Ramsey Theory Says:

    […] this there is now a third thread for polymath5 see here. Let me update further there is now a fourth thread. There is also a polymath5 tag on Math […]

  20. Matemáticos colaboram num projecto aberto de investigação conjunta através do blogue do Professor Timothy Gowers « problemas | teoremas Says:

    […] 16.01.10: dentro do mesmo espírito, está em curso o The Erdős discrepancy problem, no blogue do Professor Gowers, que já vai no seu quarto artigo sobre este problema. […]

  21. gowers Says:

    I have just tried a little computational (but simple enough to do by hand) experiment on the first 1124 sequence, and am absolutely fascinated by the result. Indeed, I now think there’s a real chance that we will be able to understand what this sequence is and perhaps even define an infinite extension of it.

    The experiment was very simple. I took the numbers n=1,2,3,4,... in turn and for each n I wrote out the values at n, 2n, 4n, 8n, 16n,\dots. Here is a table of the results. I start with what n is and then give the values. I have deliberately shifted the rows by varying amounts (I hope it will come out nicely when I submit the comment, but if not then perhaps someone like Thomas can produce a nicer table). Added later: sorry, it hasn’t come out all that well, so a proper table would be great.

    2: – + – – + – – + – +
    3: + – + + – + + – +
    4: . + – – + – – + – +
    5: . – + + – + + – +
    6: . – + + – + + – +
    7: . . – + – + + – + –
    8: . . – – + – – + – +
    9: . + – – + – – + –
    10: . . + + – + + – +
    11: . . – – + – – + –
    12: . . + + – + + – +
    13: . . + + – + + – +
    14: . . . + – + + – + –
    15: . . – – + – – + –
    16: . . . – + – – + – +
    17: . . . + – + + – + –

    What I find remarkable about this table is just how perfect it is. The only anomalous value is x_7 (which will come as no surprise to Mark). If you changed that, then everything would be a shift, or minus a shift, of the sequence that you get for 2. (Perhaps I should have started with the row at 1.) Also, the row for 2 looks to me very like a “periodic sequence with irrational period”. Could it be the golden ratio? Also, it is notable that the amount you shift is closely related to the size of the number. So if we count the row I didn’t put (starting with 1) as the basic row, then rows 2 and 3 are shifted by 1, rows 4,5,6,9 are shifted by 2, rows 7,8,10,11,12,13,15 are shifted by 3, and rows 14,16,17 are shifted by 4. (The shifts at powers of 2 obviously have to be what they are.)

    In the Polymath spirit, I am mentioning this straight away rather than trying to work out exactly what is going on and reporting back. One question I’d like to answer is whether the sequence + – + – – + – – + – + is compatible with a period that is related to the golden ratio. I very much hope so as I haven’t quite let go of that little fantasy. (That’s the sequence of values at 1,2,4,8,16,32,64,128,256,512,1024.) Another is whether everything continues to work wonderfully well if you look at powers of 3, though the sequences will be a bit short. Here’s what the values are at 1,3,9,27,81,243,729: + + + + – – -. And at twice those numbers: – – – + + +. Hmm, it’s looking pretty good!

    • gowers Says:

      This also suggests that a modification to an experiment I suggested earlier could produce lovely long sequences. The earlier idea was to fix a sequence for powers of 2, given by a “periodic sequence with golden-ratio period” and then to insist that the values at d,2d,4d,8d,… were always given by either this sequence or minus this sequence. It now looks as though what I should have suggested was that the values were always given by a shift of this sequence, and that the shift associated with n should be roughly \log_2n. With hindsight, this makes much better sense, since we shift by exactly \log_2n at powers of 2.

      So there’s a new item that’s jumped right to the top of my wishlist. Alec, can you get any super-long sequences this way?

    • gowers Says:

      One other tiny thought is that the “anomaly” at 7 might actually be that the amount of the shift is unexpectedly large, or that it corresponds to a small phase shift that is just enough to tip the value at 7 from + to -.

    • gowers Says:

      I’ve now found a golden-ratio-related sequence that agrees with the values of the 1124 sequence as you go along powers of 2, starting at 1. I’m getting excited — sorry, can’t help it.

    • Harrison Says:

      You’ve probably realized this already, but since you didn’t say it: If the sequence were perfectly multiplicative all the shifts would be 0 (and the rows would be periodic of period 1 or 2.) If the sequence were random, we wouldn’t expect any sort of structure like this to emerge.

      I’m not entirely convinced that this is a phenomenon that’s really unique to the 1124-sequence — it just seems to encapsulate some of the multiplicative structure in a rather nice way.

    • Harrison Says:

      I looked at the 1124 sequence using powers of 3 instead of powers of 2, and the analogous subsequence appears to begin:

      + + + + – – – + …

      With the following shifts:

      n = 1: Shift of 0
      n = 2: Shift of 1
      n = 3: Shift of 1
      n = 4: Shift of 2
      n = 5: Shift of 2
      n = 6: Shift of 2
      n = 7: Shift of 0 (?!)
      n = 8: Shift of 3
      n = 9: Shift of 2
      n = 10: Shift of 3
      n = 11: Shift of 3
      n = 12: Shift of 3
      n = 13: Shift of 3

      This is of course a very small sample but it does in fact look as though the shifts may be of size log n.

  22. Alec Edgington Says:

    Interesting — there’s definitely a pattern here! If we write \theta_k = x_{2^k}, and (to a good approximation) x_{2^k (2m+1)} = \epsilon_m \theta_{k + s_m}, where \epsilon_m = \pm 1 and s_m is the shift (m \geq 0), then the \epsilon_m seem to have quite low discrepancy too, as far as you’ve calculated.

  23. Alec Edgington Says:

    I calculate, for 0 \leq m \leq 10:

    \epsilon_m = +1, -1, -1, -1, +1, +1, -1, +1, -1, -1, +1

    s_m = 0, 1, 2, 3, 2, 3, 3, 3, 4, 3, 4

    • Alec Edgington Says:

      To continue a bit further, for 0 \leq m \leq 20:

      \epsilon_m: +1, -1, -1, -1, +1, +1, -1, +1, -1, -1, +1, +1, +1, -1, +1, -1, -1, +1, +1, +1, +1

      s_m: 0, 1, 2, 3, 2, 3, 3, 3, 4, 3, 4, 3, 4, 3, 5, 5, 4, 5, 4, 4, 5

      Apart from one anomalous value in the 3 sequence (the one starting at 7) and one in the 16 sequence (the one starting at 33), the agreement is perfect up to here.

    • gowers Says:

      Alec, I’d like to draw a distinction between serious anomalies and anomalies that are potentially explicable. I should say first what I now think the sequences may be. It seems to me that you get them by continually adding the golden ratio (where for convenience I’ll define this to be (\sqrt{5}-1)/2 and assigning a + if the integer part changes and a – if it doesn’t, or the other way round. That’s equivalent to adding the golden ratio mod 1 and making your answer depend on whether you’re in a golden-ratio-sized interval or not.

      If that’s the case, then each sequence will either be biased towards 1 or towards -1. If the former, then it should never have two -1s in a row. So I’d be interested to know whether the sequence for 16 has this property (that it doesn’t have both two 1s in a row and two -1s in a row).

      Come to think of it, I can work this out for myself, so here goes. I get + – + + – -, which means that, disappointingly, it seems to be a serious anomaly rather than a mild one. (However, the anomaly first shows up at 1056, so maybe there are other examples that are OK.)

    • gowers Says:

      In general, I now think we should be looking for a sequence defined as follows. (It may be necessary to modify this definition — this is a first attempt.) For the ith prime p_i we associate a sign \epsilon_i=\pm 1 and a real number \alpha_i. Then if n=p_1^{r_1}\dots p_k^{r_k} we calculate the number \alpha(n)=r_1\alpha_1+\dots+r_k\alpha_k mod 1. Next, we work out the product \epsilon(n)=\prod \epsilon_i^{r_i}, and we set f(n) to be \epsilon(n) if \alpha(n) belongs to a golden-ratio-sized interval and -\epsilon(n) otherwise.

      For this to work, it’s quite important that the signs Alec was calculating above (but renamed so that I define \eta_{2n+1} to be what he calls \epsilon_n) should be completely multiplicative. To my great delight, they seem to be.

    • gowers Says:

      I meant to say in the previous comment that I think there’s a good chance of getting somewhere by making the wild guess that every \alpha_i is a small integer multiple of the golden ratio. From that it should be possible to guess what each \alpha_i is, and we can guess what the sign is just by seeing whether the corresponding sequence (when you multiply by powers of 2) has more 1s or more -1s. So one ought to be able to take the 1124 sequence and extract from it the formula that generates it — unless of course my wild guess is wrong.

    • Harrison Says:

      That definition, except for the part where we fix the size of the interval at \phi, is a generalization of quasi-multiplicative sequence, right? In quasi-multiplicative sequence, we take all the \alpha_p to be rationals (with bounded denominators) and… hmm. I feel like there’s a small gap that may still need to be bridged — in quasi-multiplicative-land \epsilon_p depends on \alpha_p but f(n) depends only on where \alpha(n) is and not on any of the signs at the primes.

      I still think we can, and almost certainly should, unify the two concepts. I have a sneaking suspicion that there’s some really deep, but rather well-understood, number-theory stuff lurking beneath the surface — adelic analysis? I don’t know enough about that stuff to be sure — and I’d love to have a clearer view of what it is.

  24. Miodrag Milenkovic Says:

    Apologies if this has been considered before, I’m still trying to catch up with the whole conversation. Here’s a simple argument to suggest that there are \approx 3^{n/2} (3/4)^{\log(n/2)} length n sequences of discrepancy 2. The argument is faulty, but I don’t know if fatally so.

    There are 4\dot 3^{n-1} \pm 1, sequences of length 2n such that \| \sum_{i=1}^{2n} x_i \| \leq 2 and 2\dot 3^n of length 2n+1, so there are \propto 3^{n/2} of lengthh n. There are 2^n sequences in total, so the probability that a random \pm 1 sequence of length n survives the sum \leq 2 test is p(n) \approx (3/4)^{n/2}. Looking at x_{2i} subsequence or our sequence, the probability that it survives the test is p(n/2). (This is where the argument turns faulty, because I don’t know how correlated the events \| \sum_{0< i \leq n/d} x_{d i} \| \leq 2, 0 < d < n are. I don't know if it kills the argument. Ignoring that and pressing on…) Continuing with the rest of the subsequences we get that the probability that the sequence has discrepancy \leq 2 is \prod_{1<k<n}p(n/k) \approx (3/4)^{(n/2)\log(n/2)}. Multiplying by the total number of \pm 1 sequences we get that there are \approx 3^{n/2} (3/4)^{\log(n/2)} length n sequences of discrepancy 2.

    Similar argument suggests that there are \approx 2^{n(1 - (1/2)\log(n/2))} i.e. 0 for large enough n sequences of discrepancy 1.

    • Miodrag Milenkovic Says:

      I apologize for the LaTeX mistakes, this is my first time posting LaTeX in WordPress. I assumed I’d be put in a preview/fix mode, but I was wrong, and now I don’t see a way to edit it.

      I meant to say that there are 4 x 3^{n-1} of +-1 sequences of length 2n and 2 x 3^n of length 2n+1. The norm brackets were meant to be absolute value brackets…

    • gowers Says:

      I think there is likely to be a problem with probabilistic arguments, since it seems to me that the events “survives the d-HAP test” are likely to be negatively correlated. Somehow, the easiest way to survive the tests seems to be to be periodic with a different period (for instance, the sequence 1 -1 1 -1 1 -1 1 -1 … passes all tests when d is odd), so it feels as though if you pass one test it will increase the chances that you will fail another. This is of course a completely non-rigorous argument but for now I find it convincing.

  25. Miodrag Milenkovic Says:

    I think that another interesting experimental direction would be exploring the structure of the trie constructed by the C=2 sequences by breadth first search instead of the depth first search, has anybody done that? I expect the width of the tree would overwhelm the memory pretty early on in the levels, so some judicious sampling would be necessary. The idea is to see whether the sequences are statistically similar to samples of some sort of a stochastic process. I’ll try to run some C code for that and report back.

  26. Sune Kristian Jakobsen Says:

    The idea was mentioned in some of the replies to this comment. I’m not sure if anyone tried it.

    The discussion is continued over here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: