EDP2 — a few lessons from EDP1

I plan to continue my policy of writing a new Polymath5 post every time the number of comments on the current one threatens to go into three figures. Since I won’t always have a substantial post’s worth to say, I shall simply write whatever I do feel like saying and have time for. I can’t guarantee that it will always be a full and accurate summary of the discussion so far.

In this post I want to highlight a couple of ways in which my view of the broad proof strategy explained in the previous post has changed. Let me begin with the multiplicative case of the problem (item 1 of the proof strategy). My proposal for this was to attempt to find clusters of odd-smooth (there is now a suggestion that these should be called odd-friable) numbers, and argue that the constraints that these put on small primes cannot all be satisfied simultaneously. I would like to explain now why I do not think it likely that an argument along these lines can be developed.

The first reason is a remark of Sune’s. He points out that if such a proof exists, then we should see how it applies to the Walters-type examples (where x_n is 1 if the last non-zero digit of n base 3 is a 1 and -1 otherwise). These examples have so much alternation of signs that they demonstrate that you can have an infinite multiplicative sequence such that any subsequence of drift at least C needs to be exponentially long in C. Thus, it seems pretty hard to find “clusters” of a kind that would be helpful.

I find that an important argument, but a small part of me thinks there might conceivably be a way round it. However, that lingering hope is crushed by a second argument, which is derived from my experiences of building long low-discrepancy multiplicative sequences by hand.

To explain this second argument, I’d like to distinguish more explicitly between two kinds of constraint that a multiplicative sequence of low discrepancy must satisfy. At the risk of stating the embarrassingly obvious, it must be multiplicative, and it must have low discrepancy. Why did I bother to say that? Well, let’s suppose that, as happens when one tries to build up a sequence, we are given the values on a certain set of integers. The way my by-hand algorithm works, I then try to write down all other values that I can deduce from these ones by directly using either the multiplicativity condition or the low-discrepancy condition. By “directly”, I mean that I do not allow arguments of the following kind: if you set x_p=1 then you are forced by the following chain of implications to get five 1s in a row over here; therefore, x_p=-1. I would count that as an indirect implication.

So what are the direct implications? I can describe these by explaining how I would program a computer to work them out. I would start by calculating everything that follows from multiplicity. That is, if I have three numbers a,b,c such that ab=c and I know the values at two of them, then I can deduce the value at the third. (In practice, the values I know have most often been those at a and b, but deducing x_b from x_a and x_c is more interesting when you get the chance.)

There are of course infinitely many values that follow from multiplicity, so before I start I fix a large N and only go up to that N.

Once I have made all the multiplicity deductions I can, I then make discrepancy deductions. One useful one is the following lemma: if the discrepancy is at most 2 and x_{2n}=x_{2n+1}, then the partial sum up to x_{2n} is 0. (The proof is that it must be even and for obvious reasons it can’t be either 2 or -2.) From this it follows, for example, that if x_{128}=x_{129}=x_{130}=1, then x_{131}=-1. More generally, if you can work out what the partial sum up to a certain point must be, and if that point belongs to an interval of values that have been decided, then you can work out the partial sums everywhere in that interval — and it is often possible to do that even if you don’t know all the values before that interval.

So the second stage of my procedure is to make all discrepancy deductions of that kind. They then allow me to make more multiplicativity deductions, which in turn allow further discrepancy deductions, and so on.

The point I want to make is this: in practice, I found myself doing several alternations between multiplicative and discrepancy deductions before I had deduced everything I could. I might add that when I ran out of deductions, I would typically find that making a choice for just one more prime would unleash a surprisingly long chain of implications before I next ran out.

What has this to do with the proposal in my previous post? Well, the clusters argument in that proposal seems to be suggesting that for every assignment of values to all the primes up to some m, there will be a cluster of m-odd-friable numbers somewhere such that the values you have chosen will force a drift that is too large (at least 2C, say). But the actual experience of searching for long multiplicative sequences of low discrepancy doesn’t feel like this: the contradiction is reached not after one step, when all you’ve done is explore the consequences of multiplicativity, but after several, when you’ve alternated quite a bit between using multiplicativity and using the low discrepancy. So it now seems to me that what I was asking for before was too strong.

There is a trivial sense in which it wasn’t too strong, which is that the numbers up to m are themselves a cluster of m-odd-friable numbers. But that is reducing the problem to itself, which does not count as a serious proof strategy.

So I now think that the right approach is to try to examine the following question: if you are given a certain set A of values that a multiplicative sequence of discrepancy at most C must take, how many other values do you expect to be able to deduce? Ultimately, one is hoping to be able to make two contradictory deductions: if A is the set of all primes up to m, can we make some kind of guess about how large m needs to be in order to force us to make two contradictory deductions (whatever values we choose at those primes)?

That second question suggests another. Up to now, we have looked for the longest multiplicative sequence of discrepancy at most 2, and Alec has shown that the best possible bound is 246. However, in choosing such a sequence, we probably commit ourselves to a contradiction much earlier. So a more fundamental question (from the point of view of finding a proof that the partial sums of a multiplicative function must be unbounded) is this. What is the largest prime p such that one can choose \pm 1 values for all primes up to p, follow through all the implications of multiplicativity and discrepancy at most 2, and not reach a contradiction? This is still a computational question and I think it should be easy to program and quick to run.

At this point I’d briefly like to highlight an interesting post of Guy Srinivasan. He was interested in the question of when one would expect contradictory constraints to arise in the general problem, and devised a probabilistic model for it. (An example of contradictory constraints for him would be something like that x_3+x_6+x_9+x_{12}=2 and x_5+x_{10}=-2, making it impossible to choose x_{15} without violating the discrepancy-2 condition.) His model appears to agree remarkably well with the actually observed data, so perhaps it can give us some real insights into what is going on. It would be nice if we could use such a model to guess what the right lower bound should be for the discrepancy of a sequence of length n. (Even without a formal proof, it is good to have as precise an idea as possible of what we think ought to be true.) It would be good to try to develop heuristic arguments for the multiplicative case as well. Is there some probabilistic model that would allow us to predict how broad and long the chain of implications described above should be, given the set A where the values have already been assigned?

Now let me move on to step 2 of the proposed proof strategy. Here I think I can claim that there has been some progress, though of a rather modest kind. In this comment and the subsequent one, I proposed an argument for showing that if there is a counterexample to EDP then there is a counterexample such that for every d the sequence x_d,x_{2d},x_{4d},\dots has a well-defined density. Terence Tao confirmed that this does indeed almost certainly follow from well-known dynamical-systems techniques. I haven’t thought about it yet, but I feel fairly confident that one should be able to get the same result simultaneously for all geometric progressions and not just those with common ratio 2. (It seems as though all it would need is a relatively standard generalization to several commuting operators — maybe Terry can confirm this too.)

Why does it help to have well-defined densities along GPs? Well, I don’t know for sure that it does. But I am hoping we could use averaging arguments of the kind introduced in this comment, to produce from a hypothetical counterexample a much nicer example, obtained from averaging along GPs. That brings me to my next point: a discussion of step 3 in my previous post.

Here I think we have something surprisingly simple: I said we should walk before we fly, but in fact it seems to be easy to prove that if you can walk then you can fly. That is, it seems to be easy to prove that if there is a quasimultiplicative example, then there is a multiplicative one. (So although the very long and mostly quasimultiplicative examples that have been generated experimentally are very interesting, they now appear to be a red herring from a theoretical point of view.) The argument is in the comment referred to in the previous paragraph.

I would also like to mention an entirely different proof strategy, which is to modify the problem by shifting the HAPs in order to destroy structured examples, so that all you have to do is analyse much more random-looking ones. A few experiments need to be done before we will know whether this is a promising idea — it seems that we will learn a lot from them whatever they turn out to yield. More details about the modified problem can be found in this comment. [Added slightly later: in the responses to the comment, Sune makes an observation that rather pours cold water on this idea. But the experiment may still be interesting.]

A quick thing to report on the numerical side: if you follow the responses to this comment, you will see that Alec has come up with a completely multiplicative sequence of length 1530, and you can get some idea of how he did it. This sequence holds the current record for the longest sequence yet to have been generated that is of significance to the project. The sequence itself is displayed here.

Finally, I would like to draw attention to a problem I like that ought to be easier than the EDP itself. I also think that if we can solve it then we will have a lemma that could be extremely useful when it comes to “cleaning up” hypothetical counterexamples. Rather than repeat myself, I refer you to this comment and the one after it (and the responses to them). Roughly speaking, a positive solution to this problem would show that there were “few” counterexamples to EDP, which would then tend to show that the operators T_d had to have some kind of stronger compactness property than the trivial one (since after you iterate a few times, you’d have to get a sequence that was substantially more similar to your original sequence than the pigeonhole principle would guarantee). This, coupled with limiting arguments, could enable siginificantly greater tidying up to be done of the original sequence.

104 Responses to “EDP2 — a few lessons from EDP1”

  1. Mark Bennet Says:

    I’d just like to draw attention to a couple of observations.

    The first is the way in which the prime 2 seems to have a different role from the odd primes. Because it is the smallest prime it is the first we tend to deal with, and its influence can easily go unremarked.

    The sequence which is +1 on odd numbers and -1 on even numbers has discrepancy 1 or 0 on every HAP with an odd difference. This sequence is not multiplicative, though it has a very simple structure, and deals with every prime except 2.

    • Mark Bennet Says:

      Sorry – I keep on having doubts about the second, which has to do with the way that a multiplicative sequence has a natural bias towards +1 at lower values – the values on any HAP are the same or alternating. On sequences with difference a prime they are either +1 or alternating. So it takes a composite number difference to have a constant -1 difference along the HAP, and this occurs only if there is a +1 sequence earlier.

      This should mean that the choice -1 is marginally preferred at primes.

      So the assumption of a multiplicative sequence with bounded discrepancy might generate a counting argument.

      But I keep on getting unconvinced.

  2. Jason Dyer Says:

    I have converted Alec’s 1530 sequence to spreadsheet form.

    • Sune Kristian Jakobsen Says:

      Why does it end at 1530, when 1531 is a prime?

    • Alec Edgington Says:

      Because the program starts to backtrack at p_i when it can’t fill in all the numbers from p_i to p_{i+1} - 1. In this respect it ‘failed’ at 1531. The sequence could be extended a little but not as far as the next prime.

  3. Terence Tao Says:

    I wrote up the proof that failure of EDP implies existence of a bounded discrepancy sequence in which all limits exist (and we also have the almost periodicity property) at


    • gowers Says:

      Many thanks for that. Would I be right in thinking that the result you prove there is sufficiently general (since you talk about arbitrary statistics) to show not only that all the individual limits exist, but also that if we take e.g. two GPs of the same common ratio and take their pointwise product, then that too will have a well-defined density? I think that could be a useful property to have.

    • Terence Tao Says:

      Yes, one can make any countable number of “local” statistics converge in this sense ( I wrote something to this effect on the wiki). One can also average over two-dimensional geometric progressions, or more generally on any countable collection of Folner sequences for some multiplicative subgroup of the rationals. One can also use fancier pointwise ergodic theorems to get convergence along sparser sets of shifts (e.g. Bourgain return times theorem), but these are unlikely to be directly useful to the project.

  4. Alec Edgington Says:

    Regarding Tim’s question here about local characterizations of completely multiplicative (CM) functions, I think that in a very weak sense there are none: any finite sequence of \pm 1s is a restriction of some CM function. Suppose the sequence x_1, \ldots, x_n is -1 in k places. Choose k primes p_1, \ldots, p_k all greater than n (so that there’s at most one multiple of each p_i in any shift of the interval [1, n]). Define a CM function f to be -1 at those primes and no others. Then by the Chinese remainder theorem there’ll be some shift of [1,n] whose members are a multiple of one of those primes just at the points that we want to map to -1. Now we can shift this sequence again by any multiple of p_1 \cdots p_k and retain this property. In particular we can shift it by \prod p_i^{a_i} where a_i \in \{1, 2\} and ensure that f maps the shifted interval precisely to the original sequence.

    This is a weak answer, but if the argument is correct it should be possible to put some bounds on the shifts, so that we could say that for shifts greater than some not-too-fast-growing function of n every sequence of \pm 1s can occur.

    • gowers Says:

      Oops, yes. And that means that my “global characterization” was less interesting than it seemed, unless you have additional information about precisely what the shifts are, or some restriction on how far out you’re allowed to go.

      Since in the PAP problem we are not interested in primes that are bigger than the length of the interval (as on each corresponding PAP we have discrepancy at most 1) perhaps it’s enough to look for characterizations of functions that are restrictions of multiplicative functions determined by primes less than that length. But it starts to seem quite unlikely that there will be any useful ones. Yet another possibility, and one that might actually be genuinely sensible, would be to look for characterizations of restrictions of multiplicative functions determined by just the first few primes. This is relevant to the model where you define a nice structured multiplicative function as best you can and then you make ad hoc adjustments to it at larger primes. An example to bear in mind is the Walters example. If you look at shifts of that then you get a function that’s pretty similar to the original function.

      With that kind of thought in mind, I did some little experiments yesterday, taking some of our examples of low-discrepancy sequences and having a look at subsequences such as x_{721},x_{722},x_{723},\dots. One would expect these to have low discrepancy at least to start with, since several of their HAPs are HAPs in the main sequence. (This expectation is all the stronger if the partial sums of those HAPs up to 720 are zero, which they seem to be on the whole.) It might be interesting to have a more systematic look at the discrepancy and multiplicative structure of subsequences of the form (x_{M+n}:n=1,2,\dots) when M is divisible by many small primes.

  5. gowers Says:

    Here’s a tiny but amusing (in my opinion) observation. Take the sequence 1 -1 -1 1 -1 1 1 -1. It coincides with the Morse sequence but I’m thinking of it as our base-3 example: that is, multiplicative with 1 mod 3 going to 1, -1 mod 3 going to -1 and 3 itself going to -1. Because of this multiplicativity, it’s easy to check that if you take this sequence and repeat it over and over again, separating each occurrence with a zero, then you get a sequence of discrepancy 1. The sequence I’m talking about is 1 -1 -1 1 -1 1 1 -1 0 1 -1 -1 1 -1 1 1 -1 0 1 -1 -1 1 -1 1 1 -1 0 … It also has the property that whenever you hit a multiple of 9, the partial sum so far is 0.

    Now let’s take any other sequence of discrepancy C and let’s replace the zeros in the above sequence by successive terms of this discrepancy-C sequence. I claim that the resulting sequence has discrepancy at most C+1. To see this, consider the partial sums along an HAP of common difference d. If d is divisible by 9 then the assertion is trivial. Otherwise, it takes a few steps (either 3 or 9) during which the partial sum deviates from 0 by at most 1, before hitting a multiple of 9, then does the same before hitting twice that multiple, and so on. So the partial sum will always be a sum of a number between -1 and 1 and a partial sum along some HAP of the sequence known to have discrepancy at most C. Hence the result.

    Corollary: there exists a sequence of length at least 9×1124+8=10114 and discrepancy at most 3.

    Moral of corollary: in the general (non-multiplicative) case, we shouldn’t be amazed and impressed by an example unless we can get a discrepancy substantially larger than 10,000, which would appear to rule out brute force and mean that we have to look for clever constructions with perhaps a bit of brute-force adjustment if we want to have any hope of getting anywhere.

    Needless to say, one can generalize the corollary to say that for every C there is a sequence of discrepancy C and length slightly greater than 9^{C-2}\times 1124.

    • Mark Bennet Says:

      This could be interesting in establishing some lower bounds for longest sequences of discrepancy C… the longest sequence of discrepancy 1 has length 1: 9 x 11 + 8 = 107 which is much lower than 1124 – so there is potential room to improve the lower bound.

      There may be some interest in looking at ways of combining sequences, and this one seems to have some interest – what conditions are required on a sequence of length L and discrepancy C and a sequence of length M and discrepancy D which allow them to be combined (in this or other ways) to make a sequence of length LM (or LM + L -1) and discrepancy no more than C+D? In particular what more is required of L than it is multiplicative and has sum zero? Can the conditions on L be weakened if more is said about M?

  6. gowers Says:

    Here’s a proof that the “entropy” version of the question is less interesting than I had hoped. It’s somewhat similar in spirit to the observation in my previous comment.

    Recall that the variant of the problem is the following. Imagine a game in which you are trying to build a sequence of discrepancy C and length n but you have an opponent who is trying to wreck your attempt. The rules of the game are as follows. First, you have to present to your opponent a large subset (meaning of size cn for some fixed positive constant c) A of \{1,2,\dots,n\}. Your opponent then gets to choose all the values of the sequence at points in A. Finally, you get to choose the rest of the sequence. I speculated that for fixed c your opponent would win, provided that n is large enough. More to the point, I speculated that that might be easier to prove than EDP itself (from which it would obviously follow).

    Unfortunately, the latter speculation is wrong. Suppose the set you present is the set of all numbers that are congruent to 1 mod 3. Suppose next that your opponent chooses values of the sequence at all the numbers 1,4,7,10,…. Partition …

    OK, I realized as I was writing that my strategy for choosing the remaining values didn’t work. I now think the opponent has a winning strategy, as I’ll try to explain in the next comment.

  7. gowers Says:

    Here’s an attempt at a strategy for the evil opponent (see previous comment) who is trying to stop me from creating a bounded-discrepancy sequence by choosing inconvenient values at 1,4,7,…

    The strategy is as follows. The opponent divides the interval \{1,2,\dots,n\} into subintervals of equal length. In the first subinterval he (I have arbitrarily decided that this particular opponent is male, but I could just as easily have been facing a female opponent) chooses only 1s, in the second -1s, and in the third he doesn’t care. Then in the fourth he chooses 1s, in the fifth -1s and in the sixth he doesn’t care. And so on.

    Now if I want to defeat this, then I’ve got to make sure that the discrepancy of the subsequence of multiples of 3 (not just the partial sums but the entire discrepancy) is small. So let me do that first. Since small plus small equals small, I can now completely ignore multiples of 3, or equivalently I can take the values at multiples of 3 to be zero, and it won’t affect subsequent calculations by more than the discrepancy of this sequence I’ve placed at the multiples of 3.

    What next? Well, if I just look at the first interval (which, by the way, has unbounded length — we could say about \sqrt{n} for definiteness), then the partial sums have to remain bounded. So that means that for almost every number that’s 2 mod 3 in the first interval I’m going to have to put a plus. Likewise, for almost every number in the second interval that’s 2 mod 3 I’m going to have to put a minus. In the third interval anything could happen. And then in the fourth interval I’m back to pluses, and so on.

    Now choose a random number d that’s pretty near the end of the first interval and equal to 1 mod 3. Then it gets a plus sign from my opponent. But 2d also gets a plus sign — not definitely but with high probability. We are treating the value at 3d as zero, then at 4d and 5d we have plus (the latter with high probability), and so on. So the HAP with common difference d is heavily weighted towards plus.

    I suspect, but haven’t time to check at the moment, that this argument is rather general. That is, if the opponent chooses a long string of pluses, then a long string of minuses, then a long string of pluses, and so on, then there’s not much one can do to win the game. But I’ll have to think about how much it depends on the shape of the set A.

    • Alec Edgington Says:

      An obvious strategy for the ‘good guy’ (the player who chooses A) is to give A as little multiplicative structure as possible. Your first choice would be 1, since you don’t care whether that’s +1 or -1. Then, you’d probably want a bunch of primes; and when you run out of those you’d want a set of numbers with few common factors. I’m struggling to find the right formulation for this, though.

    • gowers Says:

      Yes, that’s a good point. So in the general case, the first step for the opponent is to choose plenty of numbers with plenty of factors. Since almost all numbers do have plenty of factors and A is of proportional size, at least something along these lines is going to be possible, but quite what is not clear.

      Actually, having said that, I’m not 100 percent sure that it’s necessary. The point of the opponent’s strategy is to place certain drifts in various intervals that good guy has to cancel out (which depends only on partial sums of the whole sequence rather than along HAPs). So it could be that even if the opponent is choosing values at numbers with few factors, they force the good guy to counter by choosing awkward values at smoother numbers.

      Now I’m disagreeing with myself again: good guy could identify two sets of numbers with few factors and use values in one to cancel out values in the other. So it’s back to the fact that most numbers have quite a lot of factors. Exactly how to use that fact will need some thought.

    • gowers Says:

      As a first step in the direction of more general A, I’m going to have a go at the obvious “next” case, when A is the set of all integers equal to 1 mod 4. I think that will introduce new difficulties but not too many at once.

      I take that back. It’s too simple because we know that the 4-sequence has bounded discrepancy, so the odd multiples of 2 must be very well balanced, so we can set all values at even numbers equal to 0, and then the proof is much as it was before. So we need some m such that \phi(m)>2 to force a new argument.

      Let us therefore go for m=5: I’ll take A to be the set of all integers congruent to 1 mod 5.

      A simple attempt to generalize bad guy’s strategy would be to have alternating runs of pluses and minuses. Does this work?

      The answer seems to be that it doesn’t work anything like as easily, because good guy now has much more flexibility about how to respond: if f(5r+1)=1 then all that’s needed is to make sure that on average two out of f(5r+2), f(5r+3) and f(5r+4) are -1. And it’s not clear that we can’t make the choice in some clever way that keeps discrepancy down.

      The proof I had in mind was an averaging argument. Speaking loosely (sufficiently so that I am not yet sure whether it can be made rigorous) the idea would be to pick a random d close to m (the size of the intervals) that’s congruent to 1 mod 5 and look at its multiples. Now good guy can easily deal with this by e.g. choosing f(x) to be -1 at numbers congruent to 2 in the second interval, and so on.

      But the averaging argument isn’t finished yet, because I could average not over random d that are congruent to 1, but random d that are coprime to 5. If I do that, then I can’t defeat all these HAPs by choosing values only according to the interval I’m in and what I am mod 5. So I think that some kind of Parseval argument coupled with the strange biases has a chance of working.

      But thinking about it some more, I’m starting to feel pessimistic again. The problem is that there are so many numbers that will not fall into one of these HAPs, and my moves are so much less forced than they were in the mod-3 case, that it is far from clear that we need to choose values that correspond so closely to values mod 5. More thought needed.

    • Sune Kristian Jakobsen Says:

      I don’t have time to think this though, but how about this strategy for the bad guy (it generalize the strategy Tim used against 1,4,7,…, but I’m not sure it works in general):
      Like before we divide {1,2,3,…n} into intervals of equal length. Let x_n be Walters example. Now if the bad guy can decide the value in n and n is in the kth interval, he set it to x_nx_k.

    • gowers Says:

      I can’t see an obvious way of defeating that strategy, but I also think that proving anything is going to be a bit tricky. In this comment I want to try to clarify some muddled thoughts that are floating around in my head. If I fail, I may at least leave something in the mess that somebody else can beat into shape.

      The vague thought is this. We know that “most” numbers have plenty of factors. Now one of the things that slightly hampers us when we think about this problem is that it feels as though different numbers have very different numbers of factors: some are prime, some are smooth, etc. But higher up, this seems to be a less good picture of what is going on. Suppose, then, that we find a region that’s criss-crossed by lots of HAPs, and now almost every number is contained in roughly the same number of HAPs. (In order to achieve this, we might perhaps want to forget about HAPs with small common difference — that is, we could do a sort of “w-trick”.) This is of course very similar to the idea of looking at the shifted problem. The hope here is that the resulting set of numbers would be much better suited to averaging arguments because one would be able to interchange averages and introduce only a small error as one did so.

      A slight generalization of the idea would be to pick several intervals I, I+r, I+2r, I+3r,\dots,I+(s-1)r, where I itself is an interval of length t (and minimal element far bigger than any of r,s,t). Then we would look for HAPs with first element in I+r, second in I+2r, etc. etc. And the idea would be to find lots of these HAPs in such a way that almost every element of the union is contained in roughly the same number. I think something like this may be possible, but would need to check.

      The next stage of the argument would be for bad guy to find such a set that intersected substantially with the set A that good guy puts forward.

      What is the point of doing this? Well, my previous averaging arguments were flawed in that several points didn’t belong to any of the HAPs I was averaging over. Here I’m hoping to have identified a structure inside which the averaging arguments can be made to work.

    • gowers Says:

      OK, here’s an attempt to be a bit clearer about what I was saying just above. Let’s suppose that I have a collection of intervals I_1,I_2,\dots,I_s, all pretty long, all of the same length, and with equal amounts of space between them. (The picture I have in mind is that the minimal element of I_1 is very large, and the spacing between the intervals is a lot larger than the lengths of the intervals.)

      I now make the hypothesis, which I think shouldn’t be too hard to justify (but would require some technical work) that there is a pretty large collection of HAPs, each of which intersects each interval I_j exactly once, such that almost every element of the intervals is contained in approximately the same number of those HAPs. (We know that this won’t be true for all elements — for example, it won’t be true for any prime elements — but replacing “all” by “almost all” should make it OK I think.)

      Now let’s suppose that good guy chooses A to be the set of all numbers that are 1 mod 5, and that bad guy sets those elements to be 1 if they belong to an interval that’s numbered 1 or 4 mod 5 and -1 if they belong to an interval that’s numbered 2 or 3, and otherwise it doesn’t matter.

      How can good guy counter this strategy? Or rather, how can we prove that it’s a winning strategy? To answer this, let us suppose that good guy has chosen the values at all the remaining points, and as usual let’s set the value to be zero at all multiples of 5. Now partition each interval I_j into residue classes mod 5, and let us take note of the densities in each residue class and each interval. Actually, forget the “each interval”. But let’s partition according to the residue class mod 5 and the parity of the interval.

      I now realize I want one more hypothesis. I’m going to assume that any HAP in my collection has a common difference that’s congruent mod 5 to the element where it intersects the first interval. That way, I know, e.g., that if it intersects the first interval at a point that’s 3 mod 5, then it intersects the second interval at a point that’s 1 mod 5, and things like that.

      Now let’s average the drift over all the HAPs in the collection. I can’t quite see it in my head, and may need to go away and do some calculation, but I’m fairly confident that some of these drifts are going to have to be big. For example, if I choose a common difference equal to 1 mod 5, then in the first interval it will get a +1, in the second it will have a tendency to get a +1 (because now a different residue class has been filled up with -1s), and so on. I think after a big averaging, this may get somewhere.

      It occurs to me that every element of an HAP that also belongs to an interval numbered 0 mod 5 must itself be a multiple of 5, which violates one of my assumptions about the system of HAPs. However, I don’t think that matters: all I need is that the numbers are about the same on the non-multiples of 5.

      This needs very careful checking, but I get a good feeling about it. I also believe that residue classes ought to be the best hope for good guy — I’ll try to justify that at some point — so I think it may be possible to prove something here.

    • Alec Edgington Says:

      I think I see one reason why good guy might choose residue classes. If A is a union of a small number of (non-homogeneous) APs a, a+d, \ldots where (a,d) = 1, then it is hard for bad guy to find large subsets of A that are also subsets of HAPs — and these are the subsets that are most useful to him.

    • gowers Says:

      Let me try to get the averaging argument straight. Let the average density of the residue class i mod 5 inside an interval whose number is j mod 5 be a_{ij}. If we average over all HAPs such that d\equiv 1, then we get a_{11}+a_{22}+a_{33}+a_{44}. (Bad guy, incidentally, has fixed it so that a_{1j}=1 if j=1,4 and -1 if j=2,3.) We also know that a_{1j}+a_{2j}+a_{3j}+a_{4j}=0 for each j.

      If we average over the d\equiv 2 HAPs, we get a_{21}+a_{42}+a_{13}+a_{34}. And so on. If we now look at (a_{11}+a_{22}+a_{33}+a_{44})-(a_{21}+a_{42}+a_{13}+a_{43})
      -(a_{31}+a_{12}+a_{43}+a_{24})+(a_{41}+a_{32}+a_{23}+a_{14}) we get …

      OK, I don’t seem to get anything. My good feeling about it has subsided (but it could conceivably come back). So where I am at the moment is that I think it would be interesting to analyse good guys 1-mod-5 strategy, and I also think that the idea of covering a set evenly with HAPs could come in useful, even if I don’t seem to have successfully used it yet.

  8. Alec Edgington Says:

    I tried generated some PAP-discrepancy-2 sequences with random large shifts (between 2^{63} and 2^{64}), letting the program get to a sequence longer then 200 (which usually takes a few seconds), and then plotting the partial sums of pointwise products between HAPs on the resulting sequence. Here are the results, for six sequences, with correlations plotted between the HAPs with differences \{ 1, 2, 3, 5 \}:

    The results are much less striking than in the unshifted case. I’m not sure if there’s much to be learned from this experiment. Remember though that these sequences are relatively short and therefore probably more random: I haven’t got examples of anything like the lengths obtained for unshifted sequences.

    • gowers Says:

      I think to be fair to the shifted case you should be taking partial sums of pointwise products of PAPs rather than HAPs though. And perhaps even that is not quite fair and what one should do, given two differences d and d', is look for starting points m and m' such that the partial sums along products x_{m+rd}x_{m'+rd'} behave roughly linearly. (In other words, not only do you take PAPs but you also don’t insist that they start as near as possible to 0 — after all, why should they?)

      I’m glad to hear that the lengths are not that good. An obvious thought is to try to use one of those formulae for the values at smooth numbers, which gave pretty good results, except that it isn’t clear how to do so. In the PAP context, one could define “smooth” to mean “is contained in many PAPs with small common difference” or something like that. Indeed, perhaps that is what to do: define a sort of fake “prime factorization” by associating with each integer n a k-tuple a_1,a_2,\dots,a_k, where a_i is the largest power of the ith prime p_i such that n belongs to the PAP with common difference p_i. That even allows one to define a weird notion of a GP, but I’m not sure whether it would be good for anything.

    • Alec Edgington Says:

      Yes, I should have taken the sums along PAPs, of course. Doing so is usually not much less random-looking, but occasionally it is. For example, the last sequence on the wiki, of length 257, has a pronounced correlation of -\frac{1}{3} between the PAPs of difference 1 and 2 (the sum of the first 128 terms being -42).

    • Alec Edgington Says:

      I see what you mean about the pseudo-factorizations, but I’m not sure it can work. The a_i will be the powers occurring in the prime factorization of L-n, where L is our big ‘generator’. It would be easy enough to work out these powers for 2, 3, 5 and so on, but there would be many powers of higher primes in there as well — so we wouldn’t find many smooth numbers, if any at all. For similar reasons I don’t think we can get pseudo-GPs on a short segment, since the higher prime powers will fluctuate all over the place. Or am I misunderstanding your idea?

  9. Alec Edgington Says:

    I’ve added a page to the wiki with some quickly-generated sequences having low discrepancy on PAPs, in case anyone has any ideas about how to analyse them.

    One little observation, based on minimal evidence: the lengths at which the search ‘stalls’ seem to depend more on the shift L than on the sequence used to seed the search (without trying to be clever in the choice of seed, that is).

    • Sune Kristian Jakobsen Says:

      If you ran the same program with 0 shift (or should I say shift=1) for the same amount of time, how long a sequence would it find?

    • Alec Edgington Says:

      It’s a bit quicker in the unshifted case: a second or two to 273, then another minute or two to 360. None of the random-shifted sequences has got above 300 yet.

      (By the way, the description on the wiki is inaccurate — I’ll correct it.)

    • gowers Says:

      I’ve had a look at the first of your sequences. Here are a couple of observations.

      1. I haven’t looked that hard, but I have yet to find a partial sum along a HAP that is greater than 2 or less than -2 (even though the condition is for PAPs rather than HAPs). The fact that we have low partial sums along the odd numbers and along all numbers leads us to expect low partial sums along the even numbers, so perhaps this isn’t altogether surprising. In a moment I’ll try multiples of 5 and 7 where there might be more hope.

      2. It is notable that numbers congruent to 6 mod 18 are heavily biased towards -1 and numbers congruent to 12 mod 18 are heavily biased towards -1. So there seems to be some kind of base-3-type structure here. (The sequence along multiples of 18 seems to have low discrepancy, so the sequence along multiples of 6 appears to be OK too.)

      3. Going along multiples of 5 or 7 doesn’t seem to get me anywhere.

      4. Picking a larger prime such as 19 I then start to find some big partial sums.

    • Alec Edgington Says:

      2. I think your numbers may be out by one, since the sequence starts at zero (the multiples of 18 are down the left-most column). Did you mean that the numbers congruent to 5 mod 18 are biased towards +1? There seem to be a few such biases in the other sequences too.

    • gowers Says:

      Sadly, your diagnosis of what I wrote is exactly correct. Let me go back and look at the HAPs.

    • gowers Says:

      OK, I now see that there are huge partial sums along multiples of 3, suggesting that we’ve got some base-3 like structure and are restricting to a subinterval of it (and then making some adjustments, most likely).

    • Sune Kristian Jakobsen Says:

      “These sequences were generated by choosing a large random ‘shift’ L (between 2^{63} and 2^{64}) and requiring that | x_{r_d} + x_{r_d + d} + \ldots + x_{r_d + md} | \leq 2 for all d and m, where 0 \leq r_d < d is the reduction of L modulo d." – the wiki
      Shouldn't r_d be the reduction of minus L?

    • Alec Edgington Says:

      I don’t think so. For example, the first sequence has L = 2 \, \mathrm{mod} \, 3), and it is the partial sums x_2 + x_5 + x_8 + \ldots that are bounded. (Remember the sequence starts at x_0.)

    • Sune Kristian Jakobsen Says:

      But I thought x_{n} the sequences should correspond to y_{n+L} in a unshifted sequence? If you look at x_2,x_5\dots when L=2, you are looking at a sequence with indexes =1 (mod 3).

    • Alec Edgington Says:

      I see: so I was wrong to call L the shift. But the sequence is still (trying to be) a shifted sequence, where the shift L\prime satisfies L\prime = -r_d \, (\mathrm{mod} \, d) for all relevant d — for instance, we could take L\prime = 10\,100! - L — and since the r_d are meant to be random this doesn’t matter statistically.

  10. New Thread for Polymath5 « Euclidean Ramsey Theory Says:

    […] Thread for Polymath5 By kristalcantwell There is a new thread for Polymath5. It now looks like if there is a quasi-multiplicative sequence of bounded discrepancy […]

  11. gowers Says:

    Here’s a first attempt (written straight out of my head, so likely to degenerate into confusion) to guess how long a completely multiplicative sequence of discrepancy C you can get if you start with a character-like example and make little adjustments to it at large primes whenever it strays from the interval [-C,C]. This is inspired by the actual behaviour of the longest multiplicative examples for C=2.

    To do this, I will try to describe an algorithm for generating long multiplicative sequences — but I’ll be working it out as I go along.

    To begin with, let’s take C=3, and let’s take Kevin’s modification of the Walters example, which is the multiplicative function that sends 3 to -1, and all other primes to 1 if they are 1 mod 3 and -1 if they are -1 mod 3. The partial sum of the first n terms of this sequence equals the number of 1s in the ternary representation of n that multiply even powers minus the number that multiply odd powers. For instance, if I take a number like 128 (I just chose that pretty much at random) then I see that it is 3^4+3^3+2.3^2+2.3^0, so the partial sum up to there is 1-1=0.

    The point of interest to us is that not only do the partial sums not get big, but they are also locally approximately constant (that is, not only do they take exponentially long to get to C or -C, but any two terms for which the partial sums differ by C have to be exponentially far apart). To give a feel for this, let me write out the sequence of partial sums for a bit. I’ll split it into groups of 27.

    1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0 -1
    0 -1 -2 -1 -2 -1 0 -1 0 1 0 -1 0 -1 0 1 0 1 0 -1 -2 -1 -2 -1 0 -1 0
    1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0 1
    2 1 0 1 0 1 2 1 2 3 2 1 2 1 2 3 2 1 2 1 0 1 0 1 2 1 0
    1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 2 1 0 -1 0 -1 0 1 0 1
    2 1 0 1 0 1 2 1 2 3 2 1 2 1 2 3 2 1 2 1 0 1 0 1 2 1 0
    1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0 -1
    0 -1 -2 -1 -2 -1 0 -1 0 1 0 -1 0 -1 0 1 0 1 0 -1 -2 -1 -2 -1 0 -1 0
    1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0 -1

    and there will then follow a repeat of what I have just written, but with everything reduced by 1, and so on and so on.

    Now we are trying to avoid partial sums of absolute value greater than 3. The first time that we hit such a partial sum is at 1+9+81+729=820. What do we do about it? Well, for a multiplicative function we are free to choose the values at primes. Up to now we’ve been choosing \pm 1 according as the prime itself is \pm 1 mod 3. So to get that big partial sum down without doing permanent damage, we would like to find a prime lower than 820 (and because the partial sums are locally approximately constant, it can afford to be quite a lot lower than 820) congruent to 1 mod 3 so that we can change its value from 1 to -1, thereby reducing all later partial sums by 2. Then when the danger has passed, we should choose a prime that’s -1 mod 3 and change its value from -1 to 1, so that the partial sums go back to what they were before.

    To clarify this task, let us see what the danger points are between 729 and 2.729. They occur at all numbers of the form 729+243a+81+27b+9+c+1, where each of a,b,c is equal to 0 or 2. At all these points the partial sum is 4, and everywhere else it is lower. The smallest it can be in this range is -2, which happens at all numbers of the form 729+243+81a+27+9b+3+c, where again each of a,b,c is 0 or 2. This tells us that we can’t just reduce everything by 2, since then we will create some -4s. But we could reduce everything by 2 and then do a second correction to get rid of the -4s in the middle. So we’d like to start by finding p_1\equiv 1 between 729 and 820. Looking at a table of primes, we find that e.g. 809 would do for p_1. I chose it as big as possible so that the later consequences of the choice would be felt as little as possible.

    Before we go any further, we need to think about the fact that we shall change the values at all multiples of 809. For example, the value at 2.809=1618 will change from -1 to 1, the value at 3.809=2427 will change from -1 to 1, the value at 4.809=3236 will change from 1 to -1, and so on.

    So the algorithm would proceed by finding the next point (after these adjustments) where the partial sum goes out of the range [-3,3], which will be at 729+243+27+3=1002 (where it would originally have been -2 but has now been pushed down to -4), finding a prime congruent to -1 mod 3 that’s a bit less than this value — 983 will do nicely — and changing the value there from -1 to 1.

    The feeling I get from this is that as one continues one has to make more and more adjustments to deal with the aftereffects of previous adjustments, until eventually one gets completely stuck. However, I think I have basically just described a fairly simple algorithm that would allow for a bit of backtracking and might very rapidly give extremely long multiplicative examples of discrepancies such as 3 and 4. It would be interesting to know how much better they do than 9^3+9^2+9 and 9^4+9^3+9^2+9, respectively.

    This is pretty similar to what Alec did in the base-5 case, but I’m not sure whether it’s quite identical (I think the nature of the backtracking may be different), and it starts with the base-3 example, which I think could give better results in the discrepancy-3 case.

    A more concise description of the algorithm is as follows. Start with the base-3 sequence. If you find a partial sum that’s wrong, go back to the largest prime below that point that will fix that partial sum. Then iterate. If at some point you find that you cannot fix a partial sum without messing up a smaller partial sum (messing up a larger one is fine because you can correct it later) then you’re stuck. So the first question is how far this very simple algorithm can be expected to go.

    Of course, if you get stuck, then you can backtrack, exploiting the fact that there was in fact a lot of flexibility when choosing the adjustments. So the second question is how you do if you follow the first algorithm and backtrack every time you get stuck.

  12. Alec Edgington Says:

    Here’s another experimental curiosity. Noting the near-linear relations between many of the HAPs in the long sequences of length 1124, I considered, for each of these sequences, the n \times n matrix whose (i,j) entry is x_{ij}, for 1 \leq i, j \leq n and for 1 \leq n \leq 32. If the sequence were completely multiplicative, these matrices would have rank 1.

    It isn’t clear, without exact multiplicative relations, even whether the rank of the matrix is less than n. But it turns out to be much less. For both the 1124 sequences, it seems to grow roughly as n/3.

    I’ve plotted graphs to show how the rank grows with the size of the matrix. The blue graph corresponds to the first 1124 sequence, the green graph to the second:

  13. gowers Says:

    Following on from my previous comment, here is an attempt to guess how long a greedy algorithm with no backtracking might be able to continue in an attempt to find a long completely multiplicative sequence of discrepancy at most C.

    Ultimately, the result is going to depend on the density of the primes, in the following sense. If we start with a sequence that’s pretty good, like the Walters/O’Bryant base-3 sequence, then we can make an adjustment just before it first goes wrong, then another adjustment just before it next goes wrong (given the first adjustment), and so on. And the only thing that can stop us continuing is if we can’t find a suitable prime to adjust. Since the restrictions on that prime, at least at the beginning of the process, are that it should lie in an interval of width comparable to the prime itself, the prime number theorem (or rather a known quantitative version of Dirichlet’s theorem, so that we can get the prime to be 1 or 2 mod 3 according to our requirements at that point) tells us that we will be able to find one. So what we have to decide is how quickly the process degenerates, in the sense that it requires us to make so many adjustments that we cannot simply use the prime number theorem to assure ourselves that it is possible. And a second question of interest is whether we get a substantially better estimate if we allow ourselves the Riemann hypothesis, so that we can find a prime between n-n^{1/2} and n+n^{1/2}. (This isn’t quite what the Riemann hypothesis says, but let’s stick with this oversimplification for now.) Alternatively, we could (and probably should) use unconditional results that give us a prime between n-n^{0.535} and n+n^{0.535}, or we could go the whole hog and assume the Mertens conjecture that the maximum gap between primes is C(\log n)^2. The question that particularly interests me is whether this will allow us just a small improvement on the 9^C-type bound that we get from the initial example, where by small I mean an improved bound that is still exponential in C, or whether we can get a better kind of bound. I’d be delighted even with \exp(C\log\log\log\log C) or something like that, because it would give us a proof that the discrepancy can be sublogarithmic.

    So how can we guess what will happen? I’m going to try to make a start on that by discussing a random model of a slightly different situation. Let’s suppose that we’ve made adjustments at r primes p_1,\dots,p_r, fairly well spaced out but of comparable magnitude, all of size somewhere in the region of m. The effects of these adjustments later on will be a bit like taking a bunch of random variables, some of them equalling 0 or 1 with probability 1/2 and some of them equalling 0 or -1 with probability 1/2. Or rather, if we add up the effects of the adjustments later on, we’ll get something like a random walk where we take a random step roughly r times in every interval of width m. This random walk ought to deviate by around \sqrt{r} most of the time, necessitating new adjustments. I’m not sure how many of these new adjustments will be necessary, but let me guess that it’s about \sqrt{r} (which come in the form of extra steps to the random walk that keep it within bounds). So in every interval of width m we are making \sqrt{r} adjustments.

    But if we now look a bit further ahead, this means that the number of steps in our random walk is going up all the time, as the number of adjustments increases. Being ridiculously vague, let’s say that if we go along by m\sqrt{r} then the density of the steps has doubled.

    If that is correct (which I have no reason to suppose it is) then it takes a further m\sqrt{2r} steps to double again. So we might expect to continue this process c\log m times before we have no chance of finding the primes we need. By that time, the number of steps we have taken is m\sqrt{r}^{c\log m}=m^{c'\log r}.

    Something’s clearly wrong about that, since it’s bigger when r is bigger. So I’ll backtrack a bit and this time assume that the number of adjustments you have to make to an r-step random walk to keep it within [-C,C] is c(C)r. (Does anyone know the answer to this question? It feels as though it could be pretty standard. Perhaps suitable for mathoverflow?) Therefore, in the first interval you make r adjustments, in the next you make r(1+c), and so on. This allows you to carry on for \log (m/r)/c steps, or something like that. And if m=\exp(C) then that’s at most C/c(C) steps.

    Leaving aside all questions of whether there is anything to that argument, I think we can be pretty confident that c(C)\geq m^{-1}. (That is, we expect to make at least one adjustment.) So I don’t see how we can hope to get further than m^2, which is still going to be exponential in C.

    Indeed, I think we can forget most of what I’ve just written and work with a far simpler assumption. In each interval of width m=\exp(C) we’ll have to make at least one adjustment (since the drift of the original sequence is too big for it to be left alone). Therefore, by the time we reach … oh, it doesn’t quite work. I was going to say that by the time we reached m^2 we’d have intervals with m things to worry about from earlier adjustments, but we don’t know that multiples of those will land in this interval. But a slight modification to the argument will, I’m almost certain, show that we cannot hope to get a superexponential bound out of this.

    • gowers Says:

      The more I think about this, the more I think that the algorithm I am trying to understand is likely to be precisely the algorithm Alec used here. If so, then we have some experimental evidence to go on. The point at which the initial seed sequence first hits 4 or -4 is at 5^3+3.5^2+5+3=208, where it hits -4, which makes it entirely expected that the first prime that gets the “wrong” value is 197, which is sent to 1 instead of -1. Now we need to worry about places where it would have hit 2 and will now hit 4. The next such place is at 2.5^3+3.5+1=266, which is why the first prime to be changed from a 1 to a -1 is at 251. And so on.

      So Alec’s algorithm, which is greedy plus backtracking, increases the time taken by the seed sequence to reach \pm 4 by a factor of approximately six (which is nothing like as good as raising it to a power, so my argument above looks rather weak). I’d be interested to know how far it gets before the backtracking starts.

      All this is giving me a more theoretical idea, which I’ll put in a separate comment.

  14. gowers Says:

    This comment is to suggest that we should try to come up with a useful measure of oscillation. Ultimately the hope would be to formulate a stronger version of EDP that would be easier to prove. Let me explain as precisely as I can (which is not all that precisely at the moment) what I mean.

    My last couple of comments have been about taking a character-like sequence, which gives you a nice long example of a multiplicative sequence of low discrepancy, and making little adjustments to it at large primes in order to convert it into an even longer sequence. One might very naively ask why this process can’t just be iterated. That is, why can’t you take any sequence and make adjustments to it and thereby allow it to be extended? The answer is that the character-like sequences have an additional “low-drift” property, which one could describe by saying that if you
    restrict to a short subinterval then the drift will be considerably lower than it is for the sequence as a whole.

    This makes adjustments much easier. For instance, if I’m hoping for a discrepancy of at most C and at some point n the partial sum hits C+1, then if I find a prime less than n where the value is 1, I can change it to -1 and deal with the problem, and as long as it isn’t too much less than n I don’t have to worry about accidentally creating partial sums less than -C, because the low-drift property means that all the partial sums around n will be fairly close to C.

    However, once I start tinkering around with the sequence, making adjustments like this, I start to find that the low-drift property is slightly spoilt, and it gets gradually worse as I proceed, until eventually I find that the sequence is full of short subsequences with maximal drift (that is, where the partial sums at the beginning and end of the subsequence differ by 2C), making any further adjustment impossible (assuming that in these short subsequences there aren’t any primes).

    So here is a slightly more precise version of my vague question. What we are trying to do is prove that the discrepancy of a sequence of length n is unbounded. Can we do it by means of a proof strategy of the following type?

    1. Define some notion of “average local oscillation” or something along those lines.

    2. Prove that if the average local oscillation is at least C then the discrepancy is at least C. (This I would imagine would follow fairly straightforwardly from the definition.)

    3. Prove that a sequence of length n has to have average local oscillation of at least C(n), where C(n) tends to infinity.

    I need to add a further property, which would say something like that the parameter “average local oscillation” is a useful one. For example, the discrepancy itself satisfies 1-3 if EDP is true. What I had in mind was that the average local oscillation should depend much more continuously on n, so that there was a real hope of proving something like that if you double the length of the sequence then you must increase the average local oscillation by a factor of at least 1.01, or something like that, which there is no hope of proving for discrepancy itself. Similarly, one might be able to prove that nice character-like sequences have the lowest possible average local oscillation, so that even though you can get longer multiplicative sequences with the same discrepancy, ultimately they don’t get you anywhere.

    The next question, it seems to me, is how one would go about devising an appropriate parameter. Perhaps one should try to list the properties that this parameter ought to satisfy and then consider the abstract question of whether a parameter with those properties exists.

    • Kristal Cantwell Says:

      For each number k one can get a parameter for total oscillation for k as follows for every time there are k pluses or minuses add one. This can be modified by subtracting one if the k pluses follow k pluses and only adding one when the sign is opposite. if the discrepancy is bounded only a finite number of these is non-zero but it is not clear to me how to combine these parameters to get something useful.

    • Kristal Cantwell Says:

      I think simply taking the total number of sign changes would be a measure of total oscillation, and in regards to trying to count the number of of consecutive numbers of the same sign as in my above post just counting the number of instances of k 1’s that are consecutive ie that begin and end with one. There are two problems I did not forsee in the methods of the above post one is that is that care must be taken not to count k plusses in a row as k-1 plusses etc and two if I count k pluses followed by k plusses as zero I can construct a sequence that oscillates and has all the k parameters mentioned above as zero.

      So taking the total number of sign changes looks good for total oscillation. If for some reason trying to find long oscillations is desired then looking at the number of the same signs in a row between two numbers of opposite sign might be used.

    • gowers Says:

      I would suggest that the total number of sign changes is not quite what is wanted — indeed, in some ways it is the reverse of what is wanted. For example, the sequence + – + – + – + – … has partial sums that are as constant as possible, but lots of sign changes, whereas the sequence + + + + + – – – – – + + + + + – – – – – has fewer sign changes but oscillates more. (Similar examples can be constructed if you are talking about the sign changes of the partial sums, but that would be a strange parameter in the context.)

    • Sune Kristian Jakobsen Says:

      “in some ways it is the reverse of what is wanted”
      Then change the sign;) As you say, the number of sign changes IS an indicator of how much the sequences oscillate, but few sign changes means that the sequence oscillates a lot. But this indicator is not too precise: + + + – + – – – + – + + + – + – – – … and + + – – + + – – … has the same number of sign changes, but the first oscillate more.

      Another measure of oscillation: For each C we find the length of the smallest interval containing n and with drift C.

    • gowers Says:

      I’ll try to make this thought more precise at some point, but what I’m looking for is a more “global” measure of operation (i.e., more like an average than a maximum) that would provide some kind of measure of how hard it is to make adjustments to the sequence in order to keep it bounded. It’s certainly true that if you have a big drift on a small interval, then this should make a bigger contribution to the parameter than a smaller drift on that interval, or (more significantly) the same drift but over a bigger interval. (The reason we care less about big intervals is that we are more likely to be able to find a suitable value to change.) In the multiplicative case, we might consider measuring the “width” of an interval by the number of primes it contains.

      I also have in mind the hope that we could find a proof that was inductive in flavour, so I would imagine that it wouldn’t be possible to change the parameter very much if you just changed the sequence in a small number of places.

      Note that for a proof it is not necessary that every sequence for which the parameter is low has low discrepancy. In fact, I would expect that not to be the case. What is critical is the converse, since we want to prove that all long sequences have high discrepancy by showing that the parameter has to be large.

      In an ideal scenario, we might be able to prove that character-like functions are the ones for which the parameter is smallest.

      I think that if we think hard enough about all these requirements (and quite possibly others) then we may have a chance of coming up with something that’s not obviously useless.

    • Sune Kristian Jakobsen Says:

      Some ways of measuring the degree of oscillation of the partial sums (I’m not sure any of these are useful):

      *The average of the squares of the partial sums: This is not really measuring the oscillation, since a constant high partial sum would give a high value, but perhaps this measure is useful for a greedy algorithm.

      *Calculate the “average” partial sum around a term (the partial sum at terms close to this term counts more than partial sums at terms further away), and see how much this differ from the partial sum at the term (perhaps square this difference and sum over all terms). This is similar to f” for a continuous function, where f is the partial sums.

      *Something like the Hausdorff dimension of the graph: Given a number C, we want to cover the graph of the partial sums with rectangles with high C. These rectangles should have disjoint projection to the x-axis. How many rectangles do we need. How does this number change when C increase?

      The last measure made me wonder: Is it possible to generalize the partial sums to all rational, in a way similar to the way we generalized the EDP? Of course we could, assuming that sequences with bounded discrepancy exist, use compactness to define a integer valued function on the rational, but what I want is a continuous function.

    • gowers Says:

      If my hunch (or perhaps it might be better to call it a wildly optimistic hope) is correct that character-like functions should be best possible for the parameter in question, then one could consider trying to devise a parameter that is particularly good for precisely these functions. For example, the fact that the drift of a character-like function on an interval of length m is proportional to \log m suggests to me that perhaps one should define d(m) to be the average drift over intervals of length m, and then take the parameter to be \mathbb{E}_{k\leq\log_2n}d(2^k). (Here I am defining the drift in an interval J to be the maximum minus the minimum of the partial sums inside J.) For a Walters-type example, d(2^k) is proportional to k, so the value of the parameter will be proportional to \log n, just as we want. Also, if the parameter is at least C, then the discrepancy must be at least C/2, so that’s OK.

      I feel as though this has something going for it, so I’d be interested to know how this parameter for the 1124 sequences compares with the same parameter for the base-3 sequence (the multiplicative one with f(3)=-1). I’m hoping that the multiplicative sequence will be better, even though its maximum drift is worse.

      Having written that last sentence, I realize that I forgot that I was talking about multiplicative sequences here, which is why I was looking only at drifts in intervals and not in bits of HAPs. So I was wrong to suggest comparing the multiplicative sequence with the 1124 sequence. Instead, I would suggest taking the multiplicative sequence of length 1530 and doing a comparison with that. But it might be quite interesting to do the 1124 sequence as well, just out of interest. And it might also be interesting to try to work out what the best HAP analogue of the parameter is. Obviously one can work out the parameter along all HAPs, but what then? Some weighted sum? If so, then what should the weights be? Perhaps for each k we should average the drifts over all HAP segments of length 2^k and average all those averages.

      Going back to the computational side, it would also be interesting to see whether Alec’s algorithm that produced the 1530 sequence tends to increase the parameter at every step (not, of course, including backtracking steps), even as it keeps the maximum down. If so, it would suggest that there was something to prove: the further away you go from a character-like function, the bigger this particular average of the drifts. (Just to be clear what I mean here, you start with the sequence of partial sums, and each time you change the value at a prime you recalculate it and recalculate the parameter. For the parameter to be a promising one, it would at the very least be locally minimized at the original sequence.)

  15. Thomas Sauvaget Says:

    Here is another idea (which may well turn out to be silly, just a quick thought). This idea is to approach the problem “globally”: fix some integer of the form n=m! and some aim like C=2. Start with a string of 0’s of length n. Firstly, convert the first and last 0’s to +1. Now because of that and the constraint C=2 we infer that at place n/2 there must be a -1. Then this implies that at the pair of places (n/4;3n/4) we cannot choose (+1;+1) (otherwise we hit C=3 along the sequence) and also (-1;-1) is excluded (or else we hit C=3 along the \frac{n}{4}-HAP), so only the two other choices (+1;-1) and (-1;+1) are possible, so pick either one. Now consider 4-uplets at places (n/8;3n/8;5n/8;7n/8): some are excluded while others are possible. Go on until reaching the final \frac{n}{2}-uplet (2;\dots ; n-2), or track back if hitting a dead-end and make another possible choice at a previous step.

    Assuming we did reach that final uplet for powers of 2, we next deal with powers of 3. That is we start with some valid choice at (n/3;2n/3), i.e. taking into account the C=2 constraint, and iterate hopefully until reaching a final \frac{n}{3}-uplet (3;\dots ; n-3). Go on with each prime divisor of n, that is the prime divisors of the much smaller m.

    Now because by Stirling’s formula we have n\simeq \sqrt{2\pi m}\left (\frac{m}{e}\right )^m, and that the number of prime factors of m can be controlled as a function of m (for example if m itself is a primorial number), that might be a way to build an exponentially long sequence as a function of few prime factors (i.e. convergence is not ensured at all, but whenever all the final uplets for each prime factor can be attained we have one, but no idea how rapidly i.e. how many dead-ends one might hit in general). In particular this approach is different from the multiplicativity idea and from Tim’s recent 9^C ideas and Alec’s backtracking algorithms (unless I’m seriously confused, which is possible…).

    • gowers Says:

      It’s certainly interesting to think about different orders on the set \{1,2,\dots,n\} when doing a greedy algorithm. I think you may not have said quite what you meant when you said you would go on to the final n/2-uplet (2,4,…,n-2), since n isn’t a power of 2. But in any case you wouldn’t want to go all the way down to that sequence, since to choose the values everywhere on that sequence you have to choose a sequence of discrepancy 2 and length n/2. (What I’m wondering is whether you are forgetting that you need to consider lots of HAPs with common differences that are not powers of 2 even when you are filling in at the powers-of-2 stage.)

      But maybe the general idea could be interesting if you start with the product of the first few primes, so that you don’t go all the way down to the bottom, so to speak. And maybe I haven’t properly understood your suggestion.

    • Thomas Sauvaget Says:

      Oops, you did understand it well and your two objections are of course perfectly to the point, I had overlooked that! I’ll think about it a bit more, but probably not much can be salvaged…

  16. Miodrag Says:

    Another simple search heuristic for general sequences. Let’s associate with each partial HAP sum of the n\/-length sequence – all D = \sum_{k \geq 1} \lfloor n/k \rfloor of them – a different vector in an orthonormal basis in a D\/-dimensional Euclidean space. Also, for each 1 \leq i \leq n, associate with the i\/-th element of the sequence a vector k_i in this basis, with coordinates equal to 1 for the partial sums to which the i-th element contributes, and zeros elsewhere. Then the Erdos sequences are vertices \sum_{1\leq i \leq n} \epsilon_i k_i, (\epsilon_i = \pm 1, 1 \leq i \leq n) of the n-dimensinal parallelepiped embedded in the D-dimensional space, and we are looking for vertices that intersect the D\/-dimensional cube of side 2C+1 centered at the origin. So we can start from, say, vertex k_1 + \cdots + k_n and move along the edges of the parellelepiped, trying to get as close to the origin as possible with the Best-first search, “best” being “closest to the origin” here.

    One of the problems is remembering where we were already in order to avoid cycles, there are potentially a lot of vertices we could visit. Perhaps keeping the list of vertices already visited in some sort of compressed data structure might help. Also, only \lfloor D/(C+1) \rfloor of the coordinates matter.

  17. Alec Edgington Says:

    A few remarks on attempts to use greedy algorithms with limited look-ahead to find long discrepancy-2 sequences.

    I generate a sequence one item at a time, by first generating all possible sequences that extend what I have so far by a further k, and then choosing the next sign by majority vote from the sequences generated.

    For example, it turns out that about two-thirds of length-(k+1) sequences that begin with +1 follow it with a -1. So, I choose -1 as the second element. Then, about three-fifths of length-(k+2) sequences beginning (+1, -1) have a -1 next; so I choose -1 as the third element. Then, about three-quarters of length-(k+3) sequences beginning (+1, -1, -1) have a +1 next; so I choose +1. And so on.

    One thing that strikes me is that the ratios, for the first few terms, do tend to be simple rational numbers (and not too dependent on k as long as it is reasonably large). This suggests the presence of some subtle symmetries in the problem. But the rational numbers quickly become less discernible as one proceeds.

    The bare results are disappointing. How far one gets is not a monotonic function of k: for k = 10, 20, 30, 40, 50 one gets to 104, 77, 124, 164, 144, respectively. With k = 24 one gets to 215.

    But perhaps the most interesting observation is that the sequences produced have a marked base-3 character. They seem to be given by

    x_{3^r m} = \lambda_r (m \vert 3)

    for r \geq 0 and 3 \nmid m, where the first few values of \lambda_r are +1, -1, -1, +1.

    Anyway, what all this suggests is that there may be some measure on the set of low-discrepancy sequences with respect to which ‘almost all’ such sequences are character-like — perhaps even to base 3.

    • gowers Says:

      That was a very interesting idea for an experiment. I suppose the explanation for the non-monotonicity was that there are some bad choices that give you a temporary advantage but doom you in the longer run. I wonder if there might be some variant of what you did, where instead of just looking at how many extensions there are to the next k numbers, you also look at how many extensions there are of the multiples of 2 to the next k of those (i.e., you’re looking further down the even numbers to keep those as healthy as possible), and the multiples of 3, and so on. That should cause the algorithm to choose the values much more carefully at smooth numbers, as one would hope. (I haven’t said on what basis the choice should be made, given that one now has several numbers to work with. It isn’t instantly clear to me.)

      I’ve just deleted a paragraph in which I tried to give some kind of explanation for the character-like behaviour: it wasn’t convincing enough …

  18. obryant Says:

    Has anybody tried generating sequences like this? Any advice before I start coding? Am I missing something that makes it obvious this won’t work?

    I want to generate a discrepancy 2 sequence (call it x) with 1125=25×45 terms. According to these stats, there are 2.2274560 sequences of length 45 and discrepancy 2 (the factor of 2 is because the table assumes the first term is +). Let S be the set of such. I don’t think it’s relevant, but I can’t help but notice that since N(1,2)=41, all of these actually have HAPs that get to both 2 and -2.

    Write [x] for the sequence x restricted to 45 terms.

    Both [x] and of [T_2 x] are in S. Given [x], how many candidates are there for [T_2 x]? There are 22 terms of x that are in both [x],[T_2x], so naively I expect 2^{-22}. |S|\approx 1. Similarly for [T_3 x], except that now we are conditioning on [Tx] and [T_2 x], and on up. By the time we are asking for the possibilities for [T_7 x], twenty-nine terms of it are already determined, so that in most cases there is no consistent possibility for [T_7 x].

    If these naive assumptions are correct this should give us around 5 million seeds for the 1125 computation, each of which guarantees a priori a discrepancy of 2 on every HAP with difference at most 25 and length at most 45. And, except for constructing the list of 2274560 sequences, the only computation to this point has been selecting from S sequences with specific values at specific terms. That is, there should be very little backtracking. This just leaves the problem of testing each seed, which may or may not be fast. But even just having the 5 million possibilities may reveal statistically some structure (for example, do 2 and 3 have different colors in all 5 million possibilities?).

    This raises the question of the structure of the following graph. For the sake of specificity, it has 2274560 vertices (you can guess the labels for the vertices), and connect x to y (a directed edge) if T_2 x is the beginning of y. Does this have many isolated points? If so, then we can discard x as a beginning of a length 90 sequence immediately (such savings at each step in the backtracking algorithm would be wonderful). Does the graph have essentially one big cycle, or many small cliques?

    • obryant Says:

      I suppose the quasi-multiplicative structure that’s been noticed suggests that the graph will probably have many small cliques, and so the naive assumptions I was thinking with are not realistic.

      I could quantify the awfulness of my naivete if I knew how many of the length 45 sequences can be extended to a length 90 sequence.

    • gowers Says:

      I find what you’ve just written very interesting, but haven’t got to the point of understanding why lots of small cliques would be bad news. (I’m sure it’s obvious — I just haven’t been thinking for long enough.)

      One small remark is that the quasimultiplicative structures that are apparently based on irrational rotations suggest to me that there ought to be at least some very long cycles indeed. Since the examples of length 1124 all appear to have this kind of almost periodicity, then perhaps there is some hope. (However, the “rational” example of length 1112 may mess things up — or is there some explanation for why that one breaks down at 1113=3x7x53?)

  19. gowers Says:

    This is a continuation of the discussion in my comment about oscillation and the responses to it.

    First, I’ve got a slight modification of the oscillation measure I proposed towards the end of that discussion. Let’s suppose we have a very very long finite multiplicative sequence and let N be its length. Then I want to define its “oscillation up to m” as follows. Given any interval I, define the Idrift to be the difference between the maximum and the minimum of the partial sums up to numbers in I. Given any integer r, define the average r-drift to be the average I-drift over all subintervals I of \{1,2,\dots,N\} of length r. If I stuck closely to what I did earlier, I would now average the r-drifts over all powers of 2 that are less than m. Instead, just to smooth things out even more, I’d like to take a weighted average of all the r-drifts up to m. If f(r) is the r-drift, then I’ll take (\log m)^{-1}\sum_{r=1}^m r^{-1}f(r). (I don’t think this makes a fundamental difference to what is going on, but it feels more natural.)

    One difference from what I suggested before, which could be useful for technical reasons, is that I stop when m is still a lot smaller than N. That means that for every r\leq m, almost every number less than N is contained in r intervals of length r, so we can ignore annoying edge-effects. (We could even get rid of them altogether by making the sequence infinite, but then we’d need some tricks to define the averages, so in the end long and finite feels like the simplest thing to do.)

    The aim now would be to prove that if you take any multiplicative \pm sequence of length N, then its average oscillation up to m is at least c\log m, with (locally) character-like functions being extremal.

    • gowers Says:

      To give some kind of feel for how this quantity might be useful, let me try to show that making adjustments to a character-like function will not easily decrease its oscillation up to m. For ease of discussion, I’ll take the first Walters example, where the partial sum up to n is just the number of 1s in the ternary representation of n. Since two numbers that differ by at most r agree about where there 1s are in all but the last \log_3r digits and possibly one other digit (this would happen if between the two numbers there happened to be a multiple of a large power of 3 — then the intermediate digits would change from 2s to 0s, not affecting the number of 1s), the r-drift is (roughly) at most \log_3r.

      Now let’s suppose that we want to do something about the r-drift. Our first troublesome point will be at 1+3+3^2+\dots+3^k, where k\approx\log_3r, so we might find a prime p just below that that’s equal to 1 mod 3 and change the value at that prime to -1.

      If that’s all we do, then the value at 2p changes from -1 to 1, the value at 3p from 1 to -1, and so on. The effect on the partial sums is therefore to take a copy of the partial-sums sequence, repeat each term of that sequence p times to form a new sequence (prefacing it with p-1 zeros), and to subtract that from the original partial-sums sequence.

      The effect of this on the s-drifts with s much less than r is negligible, since the sequence we’ve added is constant on almost all intervals of width s. But for longer drifts, we do have some effect.

      Thinking about this, it doesn’t do quite what I expected. In an interval I of length around 3^t, there will be a maximum value that’s about t bigger than the minimum value, and it will occur about once. (I’m talking fairly roughly all the time.) If the start point of the interval is much bigger than p, we can treat it as more or less random whether our adjustment at p will increase the maximum value or decrease it. As for the minimum value, it will typically occur in many places (because we’re taking the “positive” example), so it will be increased in places and decreased in places. Therefore, with probability 1/2 the adjustment will increase the I-drift by 2 and with probability 1/2 it will have no effect.

      In the “negative” example (where 3 goes to -1) there will typically be a solitary maximum and a solitary minimum. In this case, it looks as though the expected change to the difference could be zero. In that case we could perhaps look at the mean-square drift instead of the mean drift. But it’s probably simpler just to look at a slightly longer interval, so that we have two or three maxima and two or three minima. On such an interval, the expected change to the drift will surely be positive.

      Now so far I am being a little unfair, because the idea of adjusting a sequence is that you make further adjustments to compensate for additional defects introduced by earlier ones. But it seems at least possible that for the mean-drift parameter you just can’t get this to work.

      Quite what the exact statement should be I don’t know, since you could take one character-like function and make all the adjustments you need in order to convert it into another. What I think might be needed is a second measure — of how close a sequence is to being locally character-like — so that one could attempt to prove that a sequence that is far from locally character-like must have high oscillation up to m. I’ll think about that in my next comment.

    • gowers Says:

      Can we characterize character-like functions? I think we have to allow slightly more general examples, such as mixed base-3 and base-5 examples. So I would venture the following as an inductive definition (rather than a characterization). A multiplicative \pm 1 function is character-like if there exists a prime p such that the non-residues mod p are periodic, and the subsequence you get at multiples of p is itself character-like. I’d also want to quantify that in some way, so as to make clear that larger p are “worse” than smaller ones.

      The periodicity in fact forces the values at the non-multiples of p to be given by the Legendre character. (This observation is absolutely not new, but there may be readers for whom my spelling it out is helpful.) Indeed, let s be a generator of the multiplicative group mod p. Then the value at s determines the value at all powers of s and hence, by periodicity and the fact that s is a generator, at all numbers that are not multiples of p. If the value at s is 1 then we have to get 1 everywhere, which forces the partial sums to grow linearly — contradicting low discrepancy. So the value at s is -1. But s^k is a quadratic residue if and only if k is even, so the resulting function will be 1 at residues and -1 at non-residues.

      What I don’t yet see is a good way of measuring how close a function is to being character-like. I’m sure it’s easy to measure how close it is to the Legendre character mod p, but that’s not the point, because presumably the Legendre character mod some other prime q will be far from the Legendre character mod p while nevertheless being a very good function from the point of view of average drifts.

      So my next question is exactly that: can we come up with some numerical measure of how character-like a multiplicative function is? An obvious thought is to look at Fourier transforms, since they will detect periodicity. But what is the best way of using them?

    • gowers Says:

      In connection with my previous comment, I’d be interested in general in measuring periodicity in the sequences we’ve found so far, and this is something that would be very easy to do experimentally.

      For example, given the 1124 sequence, it would be interesting to take a few small values of k and look at the average value of x_jx_{j+k}. I’d be particularly interested in small primes p for which this average was close to 1-1/p, which is a sort of trivial maximum, because if it’s any bigger then the partial sums along multiples of p would I think have to grow linearly. It would also be interesting to do this for the long multiplicative sequences, though the results won’t be all that interesting for the sequences that were generated from character-like sequences.

      As well as just the averages, it could be interesting to look at the graph of the partial sums of the sequence x_jx_{j+k} when k takes small values. One might expect that for some values these partial sums start by growing pretty fast, but that later on the “adjustments” start to kill off the character-like behaviour and the growth levels off. This levelling off should somehow be measured in our parameter that is supposed to capture the “distance from character-like functions”.

      Something I find interesting here is the interplay between additive and multiplicative structure. Indeed, that is a fascinating feature of the whole problem.

    • Alec Edgington Says:

      That definition of ‘character-like’ certainly sounds natural. But I’m struggling to think of a reason why it isn’t the case that every completely multiplicative function is character-like, if we can choose the primes freely.

    • gowers Says:

      I’m not sure quite what your misconception is here, but for example the Liouville function isn’t character-like, even remotely. If the “bottom base” of a character-like function is p, say, then the function has to be constant on all residue classes mod p apart from the multiples of p themselves. This is a very special property that very few multiplicative functions have.

    • Alec Edgington Says:

      Sorry, yes, I wasn’t thinking straight!

    • gowers Says:

      Incidentally, the merest glance at Thomas’s table of the first 1124 sequence makes it clear that something is going on mod 24 at least.

    • Thomas Sauvaget Says:

      I’ve just computed the pair averages x_jx_{j+p} for the first 1124 sequence and p=2,3,5. Unless I’ve made a mistake in my code, it doesn’t quite behave as expected, a plot is here http://thomas1111.files.wordpress.com/2010/01/avg1124pairs_2_3_5.png In words: at place 1120 (beyond which p=5 doesn’t make sense) I find for p=2 avg=-0.2392 (while 1-1/2=0.5), for p=3 avg=0.2125 (while 1-1/3=0.666), for p=5 avg=-0.0955 (while 1-1/5=0.8).

      (By the way, this table you mentionned above on google docs was actually produced by Jason.)

    • gowers Says:

      I’m not sure what is “expected” here. I would think that for most p one wouldn’t notice anything interesting, but for some p the correlation might be quite large. From the table, it seems that it’s also worth looking at x_jx_{j+k} for composite k, and perhaps looking at how the average changes as you look at more and more terms.

      But even with all those modifications, it’s not quite clear whether one should expect anything for the general (as opposed to multiplicative) examples.

    • Thomas Sauvaget Says:

      Here is a plot adressing the second part of your question: partial sums of x_jx_{j+k} for small k, again for the first 1124 sequence http://thomas1111.files.wordpress.com/2010/01/avg1124pairs_partsum.png (although I’m not sure I’ve interpreted well what you meant, these are not plots along HAPs, but really the running value of the PS_k(m):=\sum_{j=1}^mx_jx_{j+k} as a funcion of m.)

    • gowers Says:

      Already some very intriguing information there. For example, why do the graphs split into three sets of two very similar curves — 2 and 4, 3 and 6, 5 and 7? To guess the pattern here I’d want to see these curves for a few more values of k, which would be great as it seems as though there is something going on here.

    • Thomas Sauvaget Says:

      Here’s a new plot for k up to 17 http://thomas1111.files.wordpress.com/2010/01/avg1124pairs_partsum_2_17.png and one up to 31 http://thomas1111.files.wordpress.com/2010/01/avg1124pairs_partsum_2_31.png
      There seems to be two main sets in fact.

    • gowers Says:

      Another thing that would be interesting, which I can’t quite read off your diagrams, is a list of which k give positive correlation and which negative. (I’m wondering whether this partition has any multiplicative structure.) I’m also interested to know whether there is any correlation when k=1.

      In general, if y_k is the correlation between x_n and x_{n+k}, then the sequence (y_k) will have bounded partial sums, by an averaging argument: if y_1+\dots+y_m > C, then the average value of x_{n+1}+x_{n+2}+\dots+x_{n+m} is greater than C. If we also had some kind of multiplicative structure, this would be a strong statement. (If the y_k were bounded below, it would be even stronger, but that may be too much to hope for.)

    • Thomas Sauvaget Says:

      I’ve just had a look: yes there is some multiplicative structure, namely all values of k which are a multiple of 3 are positively correlated, while the rest are all negatively correlated (in particular k=1 too).

      Some other thing I’ve noticed is that there are some exact identities between some of these partial sums at the final value 1124 (although since these curves cross before as n grows perhaps the final identities are more of an accident than anything else): denoting them by their indices I’ve seen for k\leq 39 that 5=11=35, 29=31, 15=39, 6=-8, 20=-12. The data and a new plot containing k=1 is here http://thomas1111.wordpress.com/2010/01/25/correlations-of-the-first-1124-sequence/

  20. gowers Says:

    Continuing from my previous comment the theme of oscillation, it seemed worth looking at Alec’s examples of particularly long low-discrepancy multiplicative sequences. In particular, I’ve calculated the partial sums of the maximal discrepancy-2 example that he presents. They go like this:

    0 1 0 -1 0 -1 0 -1 -2 -1 0 1 0 -1 0 1 2 1 0

    1 0 1 0 -1 0 1 2 1 0 1 0 1 0 -1 0 1 2 1

    0 1 2 1 0 -1 0 -1 0 -1 -2 -1 -2 -1 -2 -1 0 -1 0

    -1 -2 -1 0 1 0 -1 0 1 2 1 0 1 0 -1 -2 -1 0 -1

    0 -1 -2 -1 -2 -1 0 -1 0 1 2 1 0 -1 0 1 0 -1 0

    -1 0 -1 -2 -1 0 -1 -2 -1 0 -1 -2 -1 -2 -1 0 1 0 -1

    0 1 2 1 0 1 0 1 0 1 2 1 2 1 0 1 0 1 0

    -1 0 1 2 1 0 -1 0 1 2 1 2 1 0 -1 -2 -1 0 1

    0 -1 0 -1 0 1 0 -1 0 1 0 -1 -2 -1 0 -1 -2 -1 -2

    -1 -2 -1 0 -1 0 -1 0 1 0 1 0 -1 0 1 2 1 0 1

    2 1 0 -1 0 -1 0 -1 -2 -1 -2 -1 0 -1 0 1 0 -1 -2

    -1 0 -1 0 1 0 1 2 1 0 -1 -2 -1 -2 -1 0 1 2 1

    0 -1 -2 -1 -2 -1 0 1 2 1 0 -1 0 1 0 -1 0 -1 -2

    Following Alec, I’ve put them in groups of 19 (because 247=13×19) and started at zero.

    Here for comparison are the partial sums of the base-3 character-like function that’s -1 at 3. These I’ve put in groups of 27 and I’ve gone up to 242 (=3^5-1).

    0 1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0

    -1 0 -1 -2 -1 -2 -1 0 -1 0 1 0 -1 0 -1 0 1 0 -1 0 -1 -2 -1 -2 -1 0 -1

    0 1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0

    1 2 1 0 1 0 1 2 1 2 3 2 1 2 1 2 3 2 1 2 1 0 1 0 1 2 1

    0 1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0

    1 2 1 0 1 0 1 2 1 2 3 2 1 2 1 2 3 2 1 2 1 0 1 0 1 2 1

    0 1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0

    -1 0 -1 -2 -1 -2 -1 0 -1 0 1 0 -1 0 -1 0 1 0 -1 0 -1 -2 -1 -2 -1 0 -1

    0 1 0 -1 0 -1 0 1 0 1 2 1 0 1 0 1 2 1 0 1 0 -1 0 -1 0 1 0

    I haven’t done any calculations, but just looking at these sequences I’m pretty sure that the oscillations of the second are a lot smaller, even though the total drift is 3-(-2)=5 instead of 2-(-2)=4.

    If someone felt like producing plots of these sequences, it would be very nice. I’m sure there is plenty of interesting information that would jump out of a visual representation.

    • Thomas Sauvaget Says:

      Here is a plot http://thomas1111.files.wordpress.com/2010/01/seqvchar31.png The one noticeable thing is that the 246 sequence does have switches from 2 to -2 (i.e. 4 minuses in a row) while the character-like has more localised oscillations. Or is there more to see?

    • gowers Says:

      That is the kind of thing I’m interested in. Would it be possible to stretch the graph out horizontally and reduce it vertically, and present it over several rows? I think it would be easier to see what was going on.

      Here, by the way, is how the I drift of Alec’s 246-length multiplicative sequence behaves as I runs through all intervals of nine integers.

      3 3 4 3 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3

      3 4 3 3 3 3 3 4 4 4 3 3 3 2 2 2 2 2 2

      2 3 4 4 4 3 3 3 3 3 3 2 2 2 3 3 2 2 2

      2 2 2 2 2 2 2 3 3 3 3 3 3 4 3 3 3 3 3

      2 2 2 2 2 2 2 2 2 2 2 3 3 2 3 3 3 3 3

      3 3 3 3 3 3 3 4 4 4 4 4 3 3 3 3 2 2 2

      2 2 2 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2

      3 3 2 2 2 2 2 3 3 3 3 3 3 2 3 3 3 3 3

      4 3 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3

      3 3 3 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 3

      4 4 4 4 4 3 3 3 3 2 3

      The nth term in this sequence is the difference between the largest and smallest partial sum between n and n+8 (inclusive, and starting with n=0).

      And here is the same process applied to the base-3 sequence with 3 going to -1.

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2 2 2 2 3 3 3 3 2

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2 2 3 3 3 3 2 2 2

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2 2 3 3 3 3 2 2 2

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2 2 2 2 3 3 3 3 2

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2 2 3 3 3 3 2 2 2

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2 2 2 2 3 3 3 3 2

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2 2 2 2 3 3 3 3 2

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2 2 3 3 3 3 2 2 2

      2 2 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 2 2

      I find it quite interesting that the first sequence is sometimes very good, with long runs of 2s, but gets worse towards the end, just as one might expect if the adjustments to a character-like function are making the function more and more difficult to keep under control.

    • Thomas Sauvaget Says:

      I’ve done another plot streching things according to your comment. It seems that no two rows for the 246 sequence are alike: http://thomas1111.files.wordpress.com/2010/01/seqvchar3better2.png (click to see in normal size)

    • gowers Says:

      One final request would be to see separate full graphs, one above the other, for the two sequences (that is, not superimposed). I ask that for the minor reason that it is a little bit hard to see the shape of the red curve when it keeps disappearing behind the green one.

    • Thomas Sauvaget Says:

      Here they are http://thomas1111.files.wordpress.com/2010/01/seqvchar3better2c.png

  21. Sune Kristian Jakobsen Says:

    I had an idea on how to prove that multiplicative sequences with bounded discrepancy doesn’t exist, but I found out that (for a trivial reason) it probably couldn’t work. It might inspire someone, so I will post it anyway.

    Assume that we have a multiplicative sequence with bounded discrepancy. Is seems reasonable to assume that the set of terms where the partial sum is at its maximum has positive density. We know that there must be a + at this term and a – after it. Can we use this for anything? Probably not, since we already know that the set of +’s followed by a – must have positive density (perhaps this was a bit too trivial to be posted?).

    However, I still think it might be interesting to look at the density of the set of terms where the partial sum is S. If the sequence has bounded discrepancy, we would probably expect the density to be positive for all possible partial sums S, and given the densities we would be able to calculate the chance of a + at a term given the partial sum up to that term.

    • gowers Says:

      Your comment has provoked the following idea, which I’ll put down here in rather brief form and expand on it some other time if it doesn’t quietly die first. The idea is for another parameter that might be useful, connected with entropy.

      Suppose we have a multiplicative sequence. If it is character-like, then it puts big restrictions on the possible subsequences of various lengths. I haven’t done the calculations, but I think the number of distinct subsequences of length k (by which I mean restrictions to intervals of length k) grows very modestly as a function of k — perhaps even polynomially but as I say I’d need to do the calculation.

      Perhaps it is possible to prove that a multiplicative sequence of bounded discrepancy cannot be significantly longer than a character-like example without the number of subsequences growing much faster, to the point that we’re eventually forced to have long strings of 1s or -1s.

      This is a very vague idea at the moment. I think the point would be to prove that either a sequence has very few distinct restrictions to intervals of length k, in which case it is forced to be character-like, in which case we are done, or it has lots, in which case we are done for a different reason. But this could turn out to fail miserably.

    • Sune Kristian Jakobsen Says:

      For a multiplicative sequence, what can we say about the partial sums at the HAPs? That is, for a fixed d I want to know something about \sum_{k=1}^{nd}x_k (not \sum_{k=1}^{nd}x_{kd} since this is the same as the partial sum, perhaps with the sign changed). In particular, if d is fixed and odd (if it is even all the partial sums will be even), do these partial sums have the same distribution as in the original sequence?

    • Alec Edgington Says:

      This suggests yet another generalization of the problem that may be interesting to think about. What if we are only allowed to build our sequence from some small set of length-k blocks?

      For example, if we have to build it out of the blocks


      then exhaustive search shows that the longest sequence of discrepancy 2 has length 402 (composed of 134 such blocks).

    • gowers Says:

      That’s interesting in the light of the following observation. The sequences you get if you use those two blocks are all sequences you can obtain by taking a sequence S and expanding it by a factor of 3 by inserting + – between any two terms (and starting with + -). What properties must S have for this expansion procedure to work? Well, obviously it needs to have discrepancy at most 2, and if it does then that takes care of all HAPs with common difference a multiple of 3.

      Now let’s consider HAPs with common difference d, where d is congruent to 1 mod 3. The sequences themselves will go + – ? + – ? + – ?, where the question marks denote the values at the HAP with common difference 3d. For this HAP to work, we need that the sum along an HAP with common difference 3d should never equal 2. With this extra condition we’ll be OK.

      A similar argument shows that if d is congruent to 2, then the sum along an HAP with common difference 3d should never equal -2.

      Returning to S, this says that S must have discrepancy 2, but also that for HAPs with common difference equal to 1 mod 3 we need the partial sums between -2 and 1, and for HAPs with common difference equal to 2 mod 3 we need them to be between -1 and 2.

      In a small way, this provides more evidence that low-discrepancy sequences are forced to become more “chaotic” as they get longer (since if they don’t then you get stronger discrepancy conditions such as the one just described).

  22. Sune Kristian Jakobsen Says:

    “From a theoretical point of view, it may be better to define a generalized character-like function to be any completely multiplicative function with the following properties: there exists p such that the value at any non-multiple n of p is λp(n); the sequence (x_{pn})_{n=1}^\infty is a generalized character-like function. In other words, instead of going up by a factor of p every time, one goes up by different primes at each stage.” – the wiki
    I think the sequence has to use the same prime at each stage in order to be multiplicative.

  23. Terence Tao Says:

    I finally managed to eliminate sequences of drift 2:


    The argument is quite combinatorial (with an initial use of passing to subsequences to improve things a little bit). It was lengthier than I anticipated. It relies quite a bit on the constrictive nature of drift 2; it would require quite a bit of ingenuity to push it up even to drift 3. Still, perhaps it may trigger some ideas in someone else…

  24. gowers Says:

    This is a continuation of a sequence of comments, of which this was the previous one, on the subject of trying to prove that multiplicative functions have unbounded partial sums by looking at a weighted average of the drifts at different scales.

    The main gap in what I have written up to now is a proper account of how this parameter might be used in a proof. This comment aims to fill that gap. I don’t mean that I will actually use the parameter or actually prove anything, but I will argue that the parameter is set up in such a way as to make it rather plausible that it could be used in an inductive argument.

    Recall that we have up to now discussed two potential strategies for making longer sequences out of shorter ones. The obvious strategy is simply to take the shorter sequence and try to extend it. But another way (more “quotient-like” than “subspace-like”) is to expand the original sequence by some factor and attempt to fill in the gaps. That is, given a sequence (x_n), one defines y_{kn} to be x_n and then tries to choose values of y_r when r is not a multiple of k. An easy observation is that this can be used to multiply the length of a sequence by about 9 while adding only 1 to its discrepancy.

    Here, then, is something we could hope for. Note that if we have a multiplicative sequence (x_n), then the sequence (x_{3n}) is either the same sequence or minus that sequence. So we can regard (x_n) as (x_{3n}) with the gaps filled in. Now it seems to be hard to fill in the gaps without increasing drifts a certain amount. What we don’t seem to be able to do is show that we are forced to increase the discrepancy (given the existence of the sequence of length 246 and discrepancy 2), but what we do know is that the drift of the (x_{3n}) scale puts pressure on the (x_n)-sequence at three times that scale, and we now have a new contribution at the bottom that should often have the effect of increasing drifts in intervals. (I’m imagining here a much more sophisticated version of the kind of easy observation I was making here.)

    So I think that it is at least vaguely believable that one could prove that if you expand a sequence by a factor of at least t (for some fixed t) then you have to increase the average-oscillation parameter by at least 1, which would imply a logarithmic lower bound for multiplicative functions.

    • Sune Kristian Jakobsen Says:

      If (x_n) is a (multiplicative) \{1,-1\} sequence with discrepancy at most C, and we turn every p’th term into a 0, we get a (if p is prime: multiplicative) sequences with discrepancy a most 2C (using the principle of inclusion-exclusion we can even show that we could do this for any finite number of HAPs). In particular it would be interesting to know if the +,-,0,+,-,0,… sequences is the only multiplicative sequence with x_n=0 for n divisible by 3 and x_n\in \{-1,1\} otherwise with bounded discrepancy.

    • Sune Kristian Jakobsen Says:

      “using the principle of inclusion-exclusion we can even show that we could do this for any finite number of HAPs”
      … and still have bounded discrepancy. It wouldn’t be bounded by 2C, but by 2^nC where n is the number of HAPs.

    • gowers Says:

      “In particular it would be interesting to know if the +,-,0,+,-,0,… sequence is the only multiplicative sequence with x_n=0 for n divisible by 3 and x_n\in\{-1,1\} otherwise with bounded discrepancy.”

      That is a very nice question, and one that just begs to be tested experimentally. What is the longest multiplicative sequence anyone can find that satisfies these conditions, has discrepancy 2, and begins + – 0 + – 0 – – 0, say?

      Maybe I can do that one by hand. Filling in the forced values up to 24, I get

      + – 0 + – 0 – – 0 + ? 0 ? + 0 + ? 0 ? – 0 ? ? 0

      No, there seems to be far too much flexibility there. So does anyone feel like looking into this with a computer search?

    • Sune Kristian Jakobsen Says:

      Let (x_n) be a multiplicative sequence and let (y_n) be the same sequence with every p’th term turned into a 0. Now we have:


      This is how we prove that Walters example is unbounded (with p=3). Perhaps this could be used for a general multiplicative sequence? It would explain why the discrepancy is logarithmic, if it is logarithmic. And if the discrepancy is sublogaritmic, this approach might still work, by letting p depend on C (that is: We want to prove that no multiplicative sequence is bounded by C. Now let p=…).

      A toy problem might be to try to use this idea with p\neq 3 to prove that Walters example has unbounded discrepancy.

    • Sune Kristian Jakobsen Says:

      Of course the formula should be
      (with x_p instead of x_3)

    • Sune Kristian Jakobsen Says:

      “A toy problem might be to try to use this idea with p\neq 3 to prove that Walters example has unbounded discrepancy.”
      That didn’t make sense: If p is a power of 3 it is trivial, if not (y_n) doesn’t have bounded discrepancy.

  25. Terence Tao Says:

    I managed to show (in relation to Jason’s problem) that there is no sequence of bounded discrepancy and of lower discrepancy 1, thus there must be a partial sum that dips below -2. The writeup is at


    The main trick (which I first found by working with the ergodic theory formulation of the problem) is to observe that a bounded discrepancy sequence must have “mean zero” in a suitable sense, by averaging the bound |f(x)+\ldots+f(nx)| \leq C in x and then setting n to infinity. Thus the set E of x for which f(x)=+1 has density 1/2.

    On the other hand, with lower discrepancy 1, f(x) and f(2x) can’t both be -1, thus E and 2E cover the whole space. Since E and 2E both have density 1/2 (using the right sort of notion of density, and working on the rationals rather than integers) we conclude that E and 2E are almost disjoint. Passing to a subsequence one can make E and 2E genuinely disjoint, so f(2x) = -f(x). This allows one to flip lower discrepancy 1 to upper discrepancy 1 as well, and now it is easy to eliminate the sequence (e.g. one can use the non-existence of drift 2 sequences, though it is easier than this.)

    • gowers Says:

      If I understand correctly, that is also the main result of Mathias’s paper. But it looks as though your proof is nicer and may be more amenable to generalization.

    • obryant Says:

      Maybe I’m missing something, but doesn’t working over the positive integers make “density 1/2” part of the argument trivial: Let E be the set of integers x in [1,n] with f(x)=+1$. If S:=f(1)+…+f(n) is bounded independent of n, then S/n \to 0 with n. But S = (+1) |E| + (-1)(n-|E|)= 2|E|-n, so |E| \sim n/2.

    • gowers Says:

      Maybe I’m missing something, but I thought that was Terry’s argument! (Or at least, equivalent to it.)

    • gowers Says:

      Ah, I see now. The point is that 2E has density 1/4 over the integers, but if you pass to the rationals then it becomes “equivalent” to E, so to speak. But for that you have to be a bit careful about what you mean by density.

      But an alternative argument would be to say that E has density 1/2 inside the even numbers, E and 2E are virtually disjoint, and 2E has density 1/2 inside the even integers (because it has density 1/4 and consists just of even integers). Therefore, after passing to a suitable limit to get exact disjointness, if x is any even number, we have f(2x)=-f(x). At least, I think, without really checking, that this should work if you want to stay in the integers.

    • Terence Tao Says:

      One way to think about the ergodic theory formulation is that it rigorously defines for you the notion of a “random” x. (Strictly speaking, this x doesn’t live in either the integers or the rationals, but instead lives in a measure space upon which the rationals act by multiplication, but this is not terribly important for us, and we can imagine that x is rational for sake of discussion.) Furthermore, this notion of a random x is shift-invariant in the sense that qx has the same distribution as x for all fixed rationals q, thus for instance 2x as the same distribution as x; more generally, the joint distribution of a_1 x, \ldots, a_k x is the same as that of qa_1 x, \ldots, qa_k x for any rationals a_1,\ldots,a_k,q. Thus for instance

      Prob( f(x) = 1 ) = Prob( f(2x) = 1 )

      and the averaging argument also gives

      Prob(f(x)=1) = 1/2

      but as f(x) and f(2x) can’t both be -1, we get f(2x) = -f(x) almost surely. But we can always remove probability zero events from the space and get f(2x)=-f(x) everywhere.

    • gowers Says:

      I normally prefer finitary arguments, but I found your recent and very even-handed blog post about the merits and demerits of finitary and infinitary arguments (and in particular the use of nonstandard analysis) very interesting, and right from the start this problem has seemed like one where infinitary methods would be appropriate. And the argument you give above is a rather good illustration of the advantages. It seems clear to me that in this instance one can fairly easily translate the argument into a finitary one. However, the concept of a random x that does everything one wants it to do is clearly an extremely useful one, since it enables one to think about the actual ideas of the problem without being distracted by error estimates. In other words, even if one ends up presenting a finitary argument (and it is debatable whether one would want to) the infinitary view seems to be the right view when one is thinking about the question.

      I have to go, but when I get back I want to apply this remark to the parameter I have been thinking about — it seems that it can be cleaned up in a nice way.

  26. Sune Kristian Jakobsen Says:

    Earlier I asked what we could say about a multiplicative sequence with bounded discrepancy and with x_n=0 if n is divisible by 3 and x_n\in\{-1,+1\} otherwise. In particular it would be interesting to know if it had to look like the character. Inspired by a comment by Gowers and a comment by Tao I tried to do some calculation (I’m not sure if they can be justified):

    First assume that we can talk about the density of 1s along the AP 1,4,7,… Let call this density p. Now we would expect that if we take two numbers =1 (mod 3) the term at the product would be a 1 with probability p^2+(1-p)^2, and (at least if we worked with rationals) this would give a random number =1 (mod 3). The equation p=p^2+(1-p)^2 have two solutions p=1 and p=1/2. This suggest that either the sequence is almost identical to the character, or it is not at all correlated with the character. If this argument works, I think it generalize to all other primes.

    Of course this doesn’t prove that there exist such sequence not correlated with the character. I’m not sure if we should expect that such sequences exists.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: