EDP5 — another very brief summary

I wasn’t expecting to have to write another post quite this soon, so this is another one where I don’t have much to say. Here are a few bits and bobs from the last lot of comments, but there’s plenty more in the comments themselves, and actually I think that it’s not that hard to browse through them now that we have depth-1 threading and quite a lot of the comments are very short.

Johan de Jong came up with a very interesting variant of the problem, in which \mathbb{N} is replaced by the space of polynomials over \mathbb{F}_2. I confess to being a little sad when the problem was solved negatively soon afterwards, as it had looked as though it might be a rather good model problem. However, the solution was nice.

One of the general aims at the moment is to try to show that a multiplicative function of bounded discrepancy must have some kind of character-like behaviour. Terence Tao has come up with an intriguing argument that shows not that but at least something in the right ball park.

Work has continued on a human proof that completely multiplicative sequences must have discrepancy greater than 2. It looks as though the proof will be complete before too long, and not too painful. Some nice tricks of Jason Dyer have come in helpful in reducing the amount of case analysis that is needed.

106 Responses to “EDP5 — another very brief summary”

  1. gowers Says:

    Here is a possibly rather pointless-looking observation. I’ll explain in a later comment why I stumbled on it.

    Suppose we build a \pm 1 sequence as follows. It is somewhat like the way one builds the character-like sequence \mu_3, but there is a twist.

    The first step is to write 1 * -1. That is, we are saying that f(1)=1, f(3)=-1 and we’re not yet sure about f(2). Having done that, we say that this decides the values whenever n is congruent to 1 or 3 mod 3. That is, our sequence will be 1 * -1 1 * -1 1 * -1 … whatever happens.

    Now we fill in two thirds of the gaps by taking minus the sequence we have so far. That is, we take the sequence

    1 -1 -1 1 * -1 1 1 -1 1 -1 -1 1 * -1 1 1 -1 …

    Next, we fill in two thirds of the remaining gaps by using the original sequence again, getting

    1 -1 -1 1 1 -1 1 1 -1 1 -1 -1 1 * -1 1 1 -1 …

    and so on.

    This sequence is not multiplicative and it does not have small discrepancy (for example it is always -1 at multiples of 3). However, it does have partial sums that grow only logarithmically. One proof of that is an obvious inductive argument. Another is to “observe” that this sequence is precisely the sequence \mu_3(1),\mu_3(3),\mu_3(5),\dots, and we know that the partial sums of \mu_3 grow logarithmically along the integers and also along the even integers. (The word “observe” was in inverted commas because in fact I took the odd terms of \mu_3 and worked out this alternative way of generating the sequence.)

    Since these are not very difficult properties to obtain, I’d better explain in a new comment why I looked at this sequence.

  2. gowers Says:

    I had an idea in this comment here and it is not letting go of me. It’s the usual story: it doesn’t work in a completely obvious way, but neither have I found some counterexample that would crush a lemma I needed, or something like that. In this comment I’ll describe the basic thought, and in subsequent comments I’ll try to explain where I’m at (which is not far) with the details.

    The thought is this. Let f be a completely multiplicative function, and for simplicity suppose that f takes values in \{-1,1\}. We want to get a handle on the mean-square partial sums of f. That is, we want to bound from below the L_2 norms of vectors of the form (F(0),F(1),F(2),F(3),\dots,F(N)), where each F(n) is defined to be the partial sum f(1)+f(2)+\dots+f(n). (I take F(0) to be 0. Putting it in makes certain expressions tidier later.)

    Now I want to write the vector (F(1),F(2),\dots,F(N)) as a sum of other vectors. Let us define G(n) to be \sum_{2m-1\leq n}f(2m-1). In other words, G(n) is the partial sum of the odd terms of the sequence up to n. Then the vector (G(0),G(1),G(2),\dots,G(N)) is equal to

    \displaystyle (0,f(1),f(1),f(1)+f(3),f(1)+f(3),f(1)+f(3)+f(5),\dots)

    and F-G is equal to

    \displaystyle (0,0,f(2),f(2),f(2)+f(4),f(2)+f(4),f(2)+f(4)+f(6),\dots),

    which in turn equals

    \displaystyle f(2)(0,0,f(1),f(1),f(1)+f(2),f(1)+f(2),f(1)+f(2)+f(3),\dots),

    which is f(2) times a “horizontal stretch” of F by a factor of 2.

    We can therefore repeat the trick with the stretched version of F, decomposing it into a stretched version of G (multiplied by f(2)) and an even more stretched version of F. If we allowed ourselves to extend the sequences to infinity, and if we defined \sigma_2 to be the map that takes a sequence (x_1,x_2,x_3,\dots) to the sequence (x_1,x_1,x_2,x_2,x_3,x_3,\dots), then we would be able to say that F=G+f(2)\sigma_2G+(f(2)\sigma_2)^2G+\dots. And indeed, this is true pointwise since the sequence on the right hand side is eventually zero at every n (because of the stretching of the initial 0).

    The thought at the back of my mind is that the more you stretch G, the more you kill off high frequencies, so if r is quite a lot less than s then \sigma_2^rG shouldn’t correlate much with \sigma_2^sG. (There are all sorts of problems with this idea, which I’ll come to, but let me just give the idea.)

    Now if there were no correlation at all between the functions \sigma_2^rG then we would be done. That is because for parity reasons at least half the values of G have absolute value at least 1, so the L_2 norm of G is at least 1/\sqrt{2}, and therefore the L_2 norm of the sum of k of the stretches \sigma_2^rG is at least \sqrt{k/2}.

    While writing this, I did finally think of a “crushing” example. I’ll explain it in my next comment. I don’t yet know how damaging it is.

  3. gowers Says:

    The key to making an approach like the above work would of course be proving that the functions (f(2)\sigma_2)^rG did not correlate sufficiently negatively for the sum to be small in L_2. However, while I was writing the previous comment, it occurred to me to wonder where I was using the condition that the sequence takes values of modulus at least 1. Indeed, suppose we try out the argument on the “proto-Walters sequence” 1 -1 0 1 -1 0 …, which has bounded discrepancy. We know that the argument must fail somehow, but what actually happens?

    Well, in this case,

    \displaystyle G=(0,1,1,1,1,0,0,1,1,1,1,0,0,1,1,1,1,0,0,\dots),

    so it is clear that the functions \sigma_2^rG correlate with each other a lot. Therefore, once we multiply them by f(2)^r=(-1)^r, we divide them into two parts, one very positive and the other very negative. So there is no particular reason to stop the cancelling that happens.

    That’s not quite the same as explaining why the cancellation does happen, which I don’t fully understand yet.

    At this stage, I hope it’s clear what my motivation was for looking at odd values of \mu_3. There is a much nicer decomposition of the partial-sums vector of \mu_3 based on the number 3 rather than 2, but I wanted to see what happened if one split it up the “wrong” way, because if we have to decide what the “right” splitting is, then we’re back to the difficult problem of trying to classify multiplicative functions.

    One final remark. If we know that the odd-partial-sums vector G averages almost zero on all long intervals, then we should be able to get small correlation just because the inner product of a function that averages zero on long intervals with a function that is constant on long intervals (apart from the very occasional step) will be small. So there seems to be a chance of proving a curious result of the following kind. We are given a multiplicative function f. Then either its partial sums are unbounded, or the odd partial sums must have significant bias, in the sense that they do things like staying positive (on average) for long chunks of time. I haven’t worked out the argument in enough detail to know what the precise statement should be,
    but it seems to be saying something like that you either have unbounded discrepancy or you have unbounded “double discrepancy”. (By this I mean that discrepancy is what you get when you integrate once, and double discrepancy what you get when you integrate twice — I’m glossing over the fact that you take just odd values.)

  4. obryant Says:

    An automatic sequence is defined like this: let A={a_1,…,a_r} be a finite alphabet, and extend any function “from letters” to “from words” by f(w_1w_2\cdots w_n) = f(w_1)f(w_2)\cdots f(w_n). Now if the length of f(a_i) is k (the image of each letter has the same length), and the first letter of f(a_1) is a_1, then the sequence of words

    f(a_1),f(f(a_1)), f(f(f(a_1)))), …

    converges in the sense that each word in the sequence is a prefix of the next. Let W be the infinite limit word (with index starting at 0, say). A proper coding is a map from A to the complex unit circle. An automatic sequence is a proper coding of W.

    As far as I can tell, all of our examples of logarithmic growth are automatic (up to a small number of changes). Borwein, Choi, and Coons ask at the end of their paper whether automatic sequences with \sum_{n<x} f(n) = o(x) have logarithmic growth.

    Two things of note: any HAP of an automatic sequence is an automatic sequence. A sequence x_0,x_1,… is k-automatic (k is the length of f(a_i)) if and only if the set of infinite sequences

    \{ (x_{k^\ell n + r})_{n\ge 0} \colon \ell\geq 0, 0 \leq r < k^\ell\}

    is finite.

    Showing EDP for automatic sequences may be a nice toy problem.

    • gowers Says:

      Something I don’t quite understand about the question as Borwein, Choi and Coons ask it is that they seem to suggest that the growth should be logarithmic rather than at most logarithmic. Or does their approximate equality symbol mean that the ratio of the two sides tends to a limit that is allowed to be zero? I suppose that must be what they mean, given that the partial sums of the Morse sequence are bounded. Anyhow, that’s a nice question I agree.

      I’d have thought showing EDP for automatic sequences should be a reasonably straightforward generalization of the proof for the Morse sequence, but that remark is made without having tried it so perhaps there is a difficulty that would take me by surprise if I did.

    • Sune Kristian Jakobsen Says:

      Why isn’t the constant sequence 1,1,1,… automatic?

    • gowers Says:

      The point is that one adds the condition that the partial sum up to n is o(n).

  5. gowers Says:

    Here’s a question that might be worth investigating. I think it is probably too simple to lead to a serious advance, but giving a good answer to it might clarify one or two things.

    Recall from this comment that if F is the partial-sums sequence and G is the odd-partial-sums sequence, then F=G+\tau_2G+\tau_2^2G+\dots, where \tau_2=f(2)\sigma_2. This implies (and was derived from the fact) that F=G+\tau_2F. This gives us an easy way of deriving F from G.

    For example, if we take the troublesome sequence 1 -1 0 1 -1 0 …, then G works out to be 011110011110011110…, as I mentioned above. So now we are in a position to write out the first term in F, which is 0 (because of our convention that the partial-sums sequence always starts at 0). Therefore, \tau_2F starts 00. Adding this to G tells us that F starts 01. Since f(2)=-1, that tells us that \tau_2F starts 0 0 -1 -1. Adding this to G tells us that F starts 0 1 0 0, so \tau_2F starts 0 0 -1 -1 0 0 0 0, so F starts 0 1 0 0 1 0 0 1.

    We can continue this process and prove quite easily by induction that F is 0 1 0 0 1 0 0 1 0 0 1 0 \dots (which, reassuringly, is the correct answer).

    Note that what we are doing here is very like calculating a substitution sequence, except that at each stage we add G. It’s a bit like an “inhomogeneous” substitution sequence.

    The question that interests me is this. We want to prove that F is unbounded. Is there some nice property of G that is necessary and sufficient for this? And is it conceivable that we could prove that this property cannot occur for multiplicative functions with values of modulus 1?

    Here’s a sequence G of some potential interest: we start with 0, then have a 1, then two -1s, then four 1s, then eight -1s, and so on. What is the resulting F if F=G+\sigma_2F? I get 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 \dots. In other words, I get (G+1)/2. This shows that G can have quite long stretches of bias and F will remain bounded.

    For comparison, if G is itself the sequence 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 \dots, then F works out as 0 1 1 1 2 2 2 2 2 2 2 2  2 2 2 2 3 3 3 3 \dots and looks to me as though it grows logarithmically.

    Putting this together, it seems that G can afford to have oscillations of logarithmic speed as long as they average zero. But so far I’m saying that just on the basis of a quick look at these examples.

    • Terence Tao Says:

      Tim, can you remind me what the definition of the sequences \mu_d, \sigma_d, \lambda_d, etc. are? Or is this on the wiki?

    • gowers Says:

      Terry, \lambda_p (for a prime p) is defined by the conditions that \lambda(n)=\left(\frac np\right) if n is not a multiple of p, \lambda(p)=1, and \lambda is multiplicative. \mu_p is the same but \mu(p)=-1. I defined \sigma_d to be the map that takes a sequence and repeats each term d times, so it’s an entirely different sort of object. Perhaps S_d would be a better choice of notation, and it’s probably not too late to change it.

      Going back to my comment above, I now see that there’s an easy expression for F(n) in terms of G if one goes back to the original equation. If I’m not much mistaken, it is G(n)+G(\lfloor n/2\rfloor)+G(\lfloor n/4\rfloor)+\dots. Turning that around, it says that F will be bounded if and only if G(a_1)+G(a_2)+G(a_3)+\dots+G(a_k) is (uniformly) bounded for all sequences such that a_1=1 and a_{i+1}\in\{2a_i,2a_i+1\}. This, it seems to me, is a rather strong requirement, since one can imagine an “adversary” always attempting to choose a_{i+1} in a way that will make the sum go away from zero, and unless G is pretty structured the adversary would seem to be able to win.

    • gowers Says:

      Sorry, that expression I gave was valid only when f(2)=1. If f(2)=-1 then we get

      \displaystyle F(n)=G(n)-G(\lfloor n/2\rfloor)+G(\lfloor n/4\rfloor)-\dots.

      One other small (but encouraging) observation: when f takes values in \pm 1, the values of G(2x) and G(2x+1) are always distinct. This makes life easier for the adversary (who is trying to make F unbounded, so is actually on our side).

    • gowers Says:

      Back on the proof-crushing side, here’s a thought that seems rather problematic. So far, all I’ve really used is that f(2x)=f(2)f(x), and we know that it’s easy to find sequences with that property and bounded partial sums (such as the Morse sequence).

  6. Terence Tao Says:

    I updated the code to make it take command line input, which should make it easier to use. For instance

    [program name] 2 -3

    would set f(2)=1 and f(3)=-1 and see what one can deduce from this. (I’m adopting the convention that f is odd, so that f(-n)=-f(n).

    It occurs to me that we should look at the long multiplicative sequences that have already been found and see whether they have a fixed value at given primes (e.g. it already seems that they need to be -1 at 2 and 5). If we see such a fixed value, one can presumably eliminate the complementary possibility easily enough.

    • gowers Says:

      Alec Edgington worked out all the sequences of length 246 (there are 500 of them) on his computer, and says that they agree on all primes up to 67. See this wiki page. His algorithm was (I think) a pure backtracking algorithm, so whether it’s safe to conclude that it should be easy to work out by hand what the function is up to 67 is not completely clear to me.

    • Alec Edgington Says:

      The tendency observed so far is for long low-discrepancy multiplicative sequences to send primes congruent to 2 or 3 mod 5 (and the prime 5) to -1, and others to +1. See for example this one on the wiki. So the contrary choices may be worth trying to eliminate first, though it may not be straightforward to do so (witness some of the numbers on this page showing how far one sometimes needs to search to eliminate large drifts).

  7. New thread for Polymath5 « Euclidean Ramsey Theory Says:

    […] New thread for Polymath5 By kristalcantwell There is a new thread for polymath5. One goal that is close to being reached that of deriving a counterexample that is completely multiplicative from any counterexample. Let me update this there is another thread here. […]

  8. Kristal Cantwell Says:

    If a completely multiplicative sequence with discrepancy 2 or less has length more than 246 then I think there is a human proof that f(2) and f(5) are -1. Given this I can show that if f(3) is 1 f(7) is -1.

    Assume that f(3) is 1 and f(7) is 1. We have by the above that f(2) and f(5) are -1. Then this will give the partial sum at 10 the value 2 so f(11) must be -1.

    Since f(12) is 1 the sum at 12 must be 2 and f(13) must be -1.

    Then f(25), f(26),f(27),f(28) and f(30) must be 1 this forces f(31) to be -1.

    Then f(62), f(63),f(64),f(65) and f(66) must be 1 and we have a contradiction.

    So if a completely multiplicative sequence with discrepancy 2 or less has length more than 246 and f(3)=1 then f(7)=-1.

    • Uwe Stroinski Says:

      Being new to the project I first say hello to everybody and then deal with the next case.

      In the context of Kristal’s post I assume f(2)=-1, f(5)=-1. For the sake of contradiction I assume furthermore f(3) = 1. Then Kristal showed that f(7) has to be -1.

      With these assumptions, the partial sum f[18,21]:=f(18)+f(19)+f(20)+f(21) equals -3+f(19). Since f[1,odd] is in {-1,0,1} it follows that f[18,21]>=-2 and thus f(19)=1.

      Similarly we get:

      f[168,171]=3+f(17) f(17)=-1,

      f[34,37]=3+f(37) f(37)=-1 and

      f[74,77]=3-f(11) f(11)=1.

      Since f(9)=f(10)=f(11)=f(12)=1 it follows that f(13)=-1.

      f[50,55]=-5+f(53)>=-2 which cannot be satisfied for f(53) in {-1,1}.

      Hence f(7) cannot be -1 and thus f(3)=-1.

      It remains to show (by hand) that there is no completely multiplicative sequence with discrepancy 2 and f(2)=f(3)=f(5)=-1.

    • Jason Dyer Says:

      I updated the wiki with the arguments so far (Terry’s, Kristal’s, and Uwe’s), but it needs a lot of polish.

    • Uwe Stroinski Says:

      This is a proof that there is no completely multiplicative sequence with discrepancy 2, f(2)=f(3)=f(5)=-1 and f(7)=1.

      I assume f(2)=-1, f(3)=-1 and f(5)=-1. Then

      f[242,245]=-3+f(61) implies f(61)=1.

      For the sake of contradiction I assume furthermore f(7) = 1. Then f[1,10]=2 and thus f(11)=-1.

      We have:

      f[18,21]=-3+f(19) implies f(19)=1,

      f[168,171]=3+f(17) implies f(17)=-1,

      f[22,25]=3+f(23) implies f(23)=-1,

      f[60,63]=3-f(31) implies f(31)=1,

      f[60,65]=3-f(13) implies f(13)=1,

      f[114,117]=3+f(29) implies f(29)=-1,

      f[184,187]=3-f(37) implies f(37)=1,

      f[558,561]=-3+f(43) implies f(43)=1,

      f[40,43]=3+f(41) implies f(41)=-1,

      f[50,55]=3+f(53) implies f(53)=-1,

      f[58,61]=3+f(59) implies f(59)=-1,

      f[92,95]=-3-f(47) implies f(47)=-1 and

      f[132,135]=3-f(67) implies f(67)=1.

      0: 1 -1 -1 1 -1 1 1 -1 1 1|2

      10: -1 -1 1 -1 1 1 -1 -1 1 -1|0

      20: -1 1 -1 1 1 -1 -1 1 -1 -1|-2

      30: 1 -1 1 1 -1 1 1 -1 -1 1|0

      40: -1 1 1 -1 -1 1 -1 -1 1 -1|-2

      50: 1 1 -1 1 1 -1 -1 1 -1 1|0

      60: 1 -1 1 1 -1 -1 1 -1 1 1|2

      70: f(71) -1 f(73) -1 -1 1 -1 1 f(79) -1|-1+f(71)+f(73)+f(79)

      Now f[1,70]=2 and thus f(71)=-1. We have

      f[72,75]=-3+f(73) implies f(73)=1,

      f[88,91]=3+f(89) implies f(89)=-1 and

      f[868,871]=3-f(79) implies f(79)=1.

      70: -1 -1 1 -1 -1 1 -1 1 1 -1|0

      80: 1 1 f(83) -1 1 -1 1 1 -1 1|3+f(83)

      90: 1 -1 -1 1 -1 1 f(97) -1 -1 1|2+f(83)+f(97)

      Now f[1,91]=4+f(83) is a contradiction and thus f(7)=-1.

      It remains to show (by hand) that there is no completely multiplicative sequence with discrepancy 2 and f(2)=f(3)=f(5)=f(7)=-1. If my computer is right, this is much easier.

    • Uwe Stroinski Says:

      This is a proof that there is no completely multiplicative sequence with discrepancy 2, f(2)=f(3)=f(5)=-1 and f(7)=-1 (which is the last case).

      I assume f(2)=-1, f(3)=-1 and f(5)=-1. Then

      f[242,245]=-3+f(61) implies f(61)=1.

      For the sake of contradiction I assume furthermore f(7) = -1.

      We have:

      f[14,17]=3+f(17) implies f(17)=-1 and

      f[34,37]=3+f(37) implies f(37)=-1.

      Sub-case 1: f(11)=-1.

      0: 1 -1 -1 1 -1 1 -1 -1 1 1|0

      10: -1 -1 f(13) 1 1 1 -1 -1 f(19) -1|-2+f(13)+f(19)

      Then f[1,12]=-2 and thus f(13)=1. Now

      f[54,57]=3-f(19) implies f(19)=1.

      10: -1 -1 1 1 1 1 -1 -1 1 -1|0

      20: 1 1 f(23) 1 1 -1 -1 -1 f(29) -1|f(23)+f(29)

      f[1,22]=2 implies f(23)=-1 and then f[1,25]=3 is a contradiction. Thus f(11)=1.

      Sub-case 2: f(11)=1.

      0: 1 -1 -1 1 -1 1 -1 -1 1 1|0

      10: 1 -1 f(13) 1 1 1 -1 -1 f(19) -1|f(13)+f(19)

      We have f(9)=f(10)=f(11)=f(14)=f(15)=f(16)=1 and f(12)=-1. This implies f(13)=-1.

      f[34,39]=3-f(19) implies f(19)=1,

      f[30,33]=-3+f(31) implies f(31)=1,

      f[28,33]=-3+f(29) implies f(29)=1 and

      f[242,247]=-3+f(41) implies f(41)=1.

      10: 1 -1 -1 1 1 1 -1 -1 1 -1|0

      20: 1 -1 f(23) 1 1 1 -1 -1 1 -1|1+f(23)

      30: 1 -1 -1 1 1 1 -1 -1 1 1|3+f(23)

      40: 1 -1 f(43) 1 -1 -f(23) f(47) -1 1 -1|2+f(43)+f(47)

      Since f[1,41]=4 +f(23) we get a contradiction and thus f(11) is not 1.

      Therefore we get a contradiction in the main case (the case f(7)=-1).

      Since f(7)=1 is treated in an earlier post, that completes the proof (by hand) that there is no completely multiplicative sequence with discrepancy 2.

    • Kristal Cantwell Says:

      This step f[558,561]=-3+f(43) implies f(43)=1 goes beyond the range that proves no sequence of discrepancy 2 has length more than 246 So we have no sequence of discrepancy 2 has length more than 560. We know the length is at most 246 from computer proofs. I think this can be fixed we can have a human proof of this as well.

    • gowers Says:

      I think Uwe is just trying to show that there is no infinite sequence of discrepancy 2. However, I am interested in the distinction between the maximum length you can get and the maximum length you can get before there is a fairly easy proof (with no backtracking) that you are doomed. To that end, I’d be interested in the most economical proof that there is no infinite multiplicative sequence of discrepancy 2, even if it doesn’t prove that there is no such sequence of length 247. I even think this is somehow a more fundamental question, which is not to say that they aren’t both interesting.

    • Kristal Cantwell Says:

      I have managed to fix the proof I mentioned earlier. So that it does not go beyond the bounds that prove 246.

      Let me recap the proof up to this point:

      “I assume f(2)=-1, f(3)=-1 and f(5)=-1. Then

      f[242,245]=-3+f(61) implies f(61)=1.

      For the sake of contradiction I assume furthermore f(7) = 1. Then f[1,10]=2 and

      thus f(11)=-1.

      We have:

      f[18,21]=-3+f(19) implies f(19)=1,

      f[168,171]=3+f(17) implies f(17)=-1,

      f[22,25]=3+f(23) implies f(23)=-1,

      f[60,63]=3-f(31) implies f(31)=1,

      f[60,65]=3-f(13) implies f(13)=1,

      f[114,117]=3+f(29) implies f(29)=-1,

      f[184,187]=3-f(37) implies f(37)=1,”

      from this we can get f(85,91) = 5 + f(89) -f(43) which implies that

      however further on there was this: f[868,871]=3-f(79) implies f(79)=1.

      So there remains things to do to reach 246 as the maximal length.

    • Uwe Stroinski Says:

      When I started I tried to maintain the 246 bound. As the proof got longer and longer I got more humble and set the bound to 1000.

      Of course, with enough patience this can be fixed by adding more cases or, as Kristal did when he fixed my last post, by considering longer partial sums. I tried to get away with partial sums of length 3 to keep checking managable since the proof was essentially generated by computer.

  9. Steven Heilman Says:

    Suppose (as suggested before) that we simplify to real valued multiplicative functions f. (The following comment comes mainly from a remark in [Balog, Granville, Soundararajan]). Then [Wirsing 1967] seems to say: first, assume a decay condition on f. That is, suppose \sum_{p\leq x}f(p)\frac{\log p}{p}\sim\tau\log x for some \tau>0. Then the quantity \lim_{x\to\infty} \frac{1}{x} \sum_{n\leq x}f(n) exists, and is given by his Equation (1.6) (by letting \lambda=1 identically, and then setting \lambda^{*}=f).

    Without looking at the (rather technical) proofs, I would be skeptical whether or not the decay condition could be removed (and still retain the same argument).

  10. gowers Says:

    Here’s an attempt to prove that what we are trying to prove is very hard. What I would really like is for somebody to shoot it down.

    Over at mathoverflow, I’ve been trying to see whether there is some intuitive reason for the fact that the partial sums of the Liouville function \lambda grow at least as fast as \sqrt{n} (give or take n^{o(1)}). I get the impression from the responses that there is not an elementary proof of this fact: the proofs all use properties of the Riemann zeta function and a bit of complex analysis to show that there must be a zero with real part at least 1/2.

    It seems pretty clear that this approach is not going to generalize to all multiplicative functions that are not character-like. For instance, it seems that the functional equation is important in the proof alluded to above, and that property of the Dirichlet series associated with a multiplicative function is a rather special one.

    But in that case, it looks as though showing that non-character-like multiplicative functions have partial sums that grow much faster than logarithmically is going to involve solving a known hard problem (that of finding an elementary proof of this fact for the Liouville function).

    Does anyone see a way out of this difficulty? The results of Granville and Soundararajan are an obvious place to look, given that they are generalizing various results to all multiplicative functions. But nobody has yet pointed out some killer result or technique that does what we want.

    If it does turn out to be hopeless to prove fast growth for non-character-like functions, then either the whole problem is hopeless or we must find some unified proof that gives at least logarithmic growth for all functions. I’d be very interested to know other people’s opinions on this.

    • Alec Edgington Says:

      Can one prove, without recourse to the functional equation, that the partial sums of the Liouville function are unbounded?

    • gowers Says:

      I too would be very interested in an answer to that. The feedback from mathoverflow is a little discouraging, but maybe this weaker question has not been thought about so much.

    • Terence Tao Says:

      Here’s an elementary way to show unbounded discrepancy of Liouville.

      Lemma: let g be a {-1,1}-valued completely multiplicative function of bounded discrepancy. Then the (conditionally convergent) sum L(1,g) := \sum_n g(n)/n is non-zero.

      Proof: Observe that the Dirichlet convolution g*1 is non-negative, and at least 1 on square numbers, so

      \sum_{n \leq x} g*1(n) / \sqrt{n} \geq \frac{1}{2} \log x + O(1).

      On the other hand, expanding the LHS using the Dirichlet hyperbola method and the bounded discrepancy hypothesis (which in particular implies that \sum_{n \leq x} g(n)/n = L(1,g) + O(1/x) by summation by parts), one can eventually write the left-hand side as L(1,g) \sqrt{x} + O(1), which leads to a contradiction for x large enough if L(1,g) vanishes. QED

      (This lemma is modeled on the standard elementary proof that L(1,\chi) \neq 0 for quadratic characters \chi.)

      In the case of Liouville, lambda*1 is exactly equal to 1 on the squares and nowhere else. So the above argument gives the asymptotic

      \frac{1}{2} \log x + O(1) = L(1,\lambda) x^{1/2}+O(1)

      which is impossible regardless of whether L(1,\lambda) is non-zero or not.

    • Terence Tao Says:

      To put it another way: the textbook elementary proof of Dirichlet’s theorem tells us that if g is {-1,+1}-valued, completely multiplicative, and has bounded discrepancy, then g(p)=+1 for infinitely many primes p (indeed \sum_p g(p)/p is conditionally convergent, and by working harder (and using complex analysis of course) one can eventually obtain the prime number theorem for g, \sum_{p \leq x} g(p) = o( x/\log x) and maybe even the classical error term). Of course, the Liouville function does not have this property, hence has unbounded discrepancy.

    • gowers Says:

      Many thanks for that, and I’m relieved that such a proof exists: as requested, you have destroyed my informal argument that what we are trying to prove might be out of reach. Here, if anyone is interested, is a link to an online article about Dirichlet’s hyperbola method, which not only tells you what the method is, but also gives an example and describes the circumstances where it is particularly useful. (In fact, it wouldn’t take much work to turn it into at Tricki article …)

  11. gowers Says:

    I’ve just had to take my car to be looked at and then walk back home, which was a perfect opportunity to think about multiplicative functions, but it has left me puzzled. I decided to think about how the partial sums of a random multiplicative function behave. (By the way, I am still using “multiplicative” as a lazy way of saying “completely multiplicative”, since I have no use for the weaker notion at the moment.) To that end, let f be a completely multiplicative function taking values in \{-1,1\}, where the values at primes are chosen randomly (=independently and uniformly).

    What is the expected partial sum up to N? Well, the expected value at any n is 1 if n is a perfect square and 0 otherwise, so we get roughly \sqrt{N} as the expected partial sum. So for rather trivial reasons we have that the expectation of the square of the partial sum is at least N (or rather the largest square less than N), and therefore, by linearity of expectation, that the expected mean-square partial sum up to N is at least N/2 or something like that.

    So far this looks quite reasonable: we seem to have that the mean-square partial sum of a random multiplicative function is rather like the mean-square partial sum of the Liouville function \lambda, confirming our heuristic that \lambda behaves randomly.

    But that is not the sense in which \lambda is random. Indeed, we choose the values of \lambda(p) in as unrandom a way as we can, and the randomness is in its additive properties. I’ll come back to this point.

    Another point is that all I have shown so far is a lower bound for the typical size of a partial sum of a random multiplicative function. What if we try to be a bit more careful and actually get an estimate? What, then, is the expected value of (f(1)+\dots+f(n))^2? To answer this we need to think about the expectation of f(r)f(s) for each r and s. And the answer is that it is 1 if rs is a perfect square and 0 otherwise, since f(r)f(s)=f(rs).

    Now the relation “rs is a perfect square” is an equivalence relation. To form an equivalence class, you take a square-free number t and take all numbers of the form m^2t. The size of this equivalence class will be about \sqrt{n/t}. Also, the square-free numbers are dense, so the number of equivalence classes is cn.

    We therefore know that f(r)f(s) has expected value 1 if r and s belong to the same equivalence class and 0 otherwise. It follows that the expectation of (f(1)+\dots+f(n))^2 is the sum of the squares of the sizes of the equivalence classes, which is (to within a constant) \sum_{t=1}^n n/t=n\log n.

    My puzzlement disappeared as I wrote that — I had made the mistake of thinking that a typical equivalence class had size c\sqrt{n}, which gave a bound of n^{3/4}.

    Let me now try to work out the fourth moment. Given an integer x I’ll define A(x) to be the set of primes that occur an odd number of times in the prime factorization of x. Note that A(xy)=A(x) \oplus A(y) (where I’m writing \oplus for symmetric difference, or sum mod 2, or exclusive or, or whatever you want to call it).

    We’re trying to work out the expectation of (f(1)+\dots+f(n))^4, so we need to think about the expectation of f(a)f(b)f(c)f(d). This will be 1 if A(a)\oplus A(b)\oplus A(c)\oplus A(d)=\emptyset, or equivalently if A(a)\oplus A(b)=A(c)\oplus A(d), and 0 otherwise. So we end up with the expression

    \displaystyle \sum_{A(a)+A(b)=A(c)+A(d)}|A(a)||A(b)||A(c)||A(d)|.

    (Here I am summing over the equivalence classes rather than over the elements.)

    This seems to be a little tricky to estimate, since if you choose A(a), A(b) and A(c), then the size of A(d) depends a lot on how much they overlap. I think I’d better save this for another comment.

    The punchline of this comment was supposed to be the deliberately silly remark that we now had a proof of EDP for almost all multiplicative functions, and all we had to do was deal with the few that remained. So far, I’ve at least shown that it is true for a significant percentage of them (not that I would wish to claim that this was a new result).

    One thing I find slightly mysterious is the way a low-discrepancy multiplicative function manages to deal with the fact that all perfect squares must have value 1. For a random multiplicative function that trivially gave us a lower bound of \sqrt{N} for a typical partial sum, so if we are to avoid that then we need negative correlations between f(r) and f(s) when rs is not a perfect square.

    • Klas Markström Says:

      I guess one way to try to balance the contribution from the fact that perfect squares always give 1 could be to introduce a small bias in the values of the primes, making them slightly more likely to give -1. But then this bias would have to decrease for larger p in order to not give a large negative total sum.

    • obryant Says:

      For the usual simple random walk, we get c\sqrt N for the expected partial sum, but almost all paths get substantively larger than that infinitely often. In particular, the n-th partial sum gets as large \sqrt{2N\log\log N} (asymptotically) for infinitely many N.

      Is there a law of the iterated logarithm for random multiplicative functions?

  12. Sune Kristian Jakobsen Says:

    If we have a multiplicative sequence with bounded discrepancy, we can change the values at multiples of some prime p to 0, and the sequence will still have bounded discrepancy. We could also switch the sign at all multiples of p, but then the sequence wouldn’t be completely multiplicative (it would be – at p^2). But what if we only change the value at terms whose prime factorization contain p to an odd power? This sequence would be multiplicative, but is it possible to show that it can’t have bounded discrepancy?

  13. gowers Says:

    Here’s a thought that probably goes nowhere. Let’s think of our multiplicative function f as defined not on \mathbb{N} but as a \mathbb{F}_2-linear function defined on sets of primes. That is, given a set A of primes, we think of \prod_{p\in A}f(p) as f(A) rather than as f(\prod_{p\in A}p). And another difference is that we don’t even think about numbers that are not square free. Instead, we associate with each set A a number S(A,n), which is the number of positive integers less than n that can be written in the form m^2\prod_{p\in A}p. That is, S(A,n) is the size of the equivalence class associated with A.

    With this notation, we can rewrite (f(1)+\dots+f(n))^2 as (\sum_A f(A)S(A,n))^2.

    How might that help us produce a lower bound? I’m not sure it will, but the basic idea is this. For any particular n, we cannot say what will happen (since we know that some partial sums are likely to be zero, and definitely can be zero). But we will be averaging over n\leq N, and this changes things a bit, because the sizes of the sets S(A,n) will be changing. What’s more, we are now forced to choose the function f once and for all.

    What I find just a little bit encouraging about this is that the squares of the sizes of the equivalence classes are distributed in a 1/x-ish way: it seems to provide a tiny little opening through which logarithmic growth might squeeze. What I find discouraging is there are a lot of singleton As with S(A,n)=1 (namely all large primes). That makes me think of another question, which I’ll put in my next comment.

    • Gil Kalai Says:

      A related question: I find the “square-free” analog of EDP rather appealing. Suppose you consider only square free numbers and associate 0 to all other numbers. How small can the discrepency be? Are there still logarithmic examples? Or pethaps the lower bound becomes power-like. (I must admit I do not know what happens for the basic mod 3 based example and related Matryoshka Sequences.)

    • obryant Says:

      The “squarefree \lambda_3” appears to hit discrepancy k around k^{2.7} pretty steadily for 30<k<64.

      The "squarefree \mu_3" appears to hit discrepancy k around k^{2.8} pretty steadily for 10<k<59.

    • gowers Says:

      Another small remark to add to what I wrote above, but it still probably doesn’t do much. With the rewriting above, we can expand out the bracket and interchange summation to obtain the expression

      \displaystyle \sum_{A,B}f(A)f(B)\mathbb{E}_{n\leq N}S(A,n)S(B,n)

      for the mean-square partial sum up to N. So it’s just conceivable that one could get somewhere by understanding the (positive definite) matrix T_{A,B}=\mathbb{E}_{n\leq N}S(A,n)S(B,n). We’d be wanting \langle f,Tf\rangle to be unbounded. The information we would have (but might not be obliged to use) is that f depends \mathbb{F}_2-linearly on A (in the sense that it takes values in \{-1,1\} and f(A \oplus B)=f(A)f(B)).

    • gowers Says:

      I should add something to the comment just above, because I haven’t fully explained what the (unlikely to work) idea is. Suppose we make a guess at the value of \mathbb{E}_{n\leq N} S(A,n)S(B,n) as follows. Let x be the product of the primes in A and let y be the product of the primes in B. Then S(A,n)S(B,n) is approximately n/\sqrt{xy}, so its expected value is approximately N/2\sqrt{xy}. Therefore, if we write G(x)=\sqrt{N/2x}, it is approximately G(x)G(y).

      If this were exact, it would be bad news because it would tell us that the matrix T_{A,B}=G(A)G(B) had rank 1, and \langle f,Tf\rangle would be nothing other than \langle f,G\rangle^2. So all we’d have to do is find a \pm 1 combination of the G(A) that was bounded, which doesn’t look hard. However, because our estimates were only estimates, what we actually have is an approximation to this very low-rank matrix. Could it be that the real matrix has a much higher rank?

      As usual, there is the problem of understanding why the proof would work for functions with values in \{-1,1\} and not for functions in \{-1,0,1\}. Since I don’t have a good answer to that, I don’t have much faith in this approach, but I thought I’d mention it anyway.

    • Gil Kalai Says:

      a question and a comment: 1) why the 1/x ish way S(A,n) distributes gives hope to log n behavior? 2) maybe you should allow functions with values in -1 0 and 1 and just hope for a clain that unless the function is nonzero on a bounded number of variables (primes) the discrepency is unbounded. is it so?

    • gowers Says:

      My original answer to your question was rather pathetic: the function 1/x sums to log. Let me try to give a better one, but the calculations may well not work out.

      If the product of primes in A is x, then S(A,n) is approximately \sqrt{n/x} and the error will have order of magnitude 1. So the error when we approximate S(A,n)S(B,n) has order \sqrt{n/x}+\sqrt{n/y}, whereas the value itself is close to n/\sqrt{xy}. If we think of the errors as fairly random, then when we add them we should get something like the root mean square of the individual errors. That seems to introduce a term like \sqrt{n\log n} into the picture, which is vaguely promising except that I don’t actually know where I’m going with this line of thought.

      Basically, I don’t even have a heuristic argument at this stage. But you could summarize what I’m trying to do as this: exploit the fact that equivalent integers (that is, ones whose product is a perfect square) take the same value, by exploiting the fact that the sizes of the equivalence classes look ever so slightly random (in that we know roughly how big they are, but if n is random then the errors are distributed in a fairly random way).

      I’m not sure I completely understand your second suggestion. What worries me is the function 1 -1 0 1 -1 0 1 -1 0 … . I don’t want to find myself trying to develop an argument that would prove that that function had unbounded discrepancy.

    • Gil Says:

      I would like to regard the function 1 -1 0 1 -1 0 1 – 1 0 as “depending” on a single prime (p=3), hoping that when a function to {0,1, -1} or even to the unit disk does not depend on a single or a bounded number of primes then it has unbounded discrepency. But I do not know how precisely to define the influence of a prime on a sequence, and if it this hope is realistic.

  14. gowers Says:

    I have an idea for a way that one might try to build a multiplicative function with slow-growing partial sums. If it can be made to work, then it would suggest that the kind of classification we are looking for would be difficult to achieve.

    I’ll start with a simple observation. Suppose you were given a random walk and told that you could change it in every kth place, if you wanted to, and that your aim was to keep it as close to the origin as you could. Here’s how you might go about it. Every k^2 steps you would see what had happened to the partial sums. Typically, they would have drifted by about k, and there would be k places you could change, so you could bring the walk back to the origin.

    OK that’s not quite right, since on average only half the possible changes could help, and there’s a small chance that the walk would have drifted further, and so on and so forth, but some kind of argument like this ought to work, with a slightly worse bound, with high probability at each stage — and one could compensate for the unlucky stages by taking longer chunks of walk from time to time. But the point is that it ought to be possible to get the walk to stay much closer to the origin — I think logarithmic growth should be achievable (with a constant that depends on k).

    Now let us do something a bit similar. This time we choose a random multiplicative function. It ought to behave a bit like a random walk. We then allow ourselves to change it at some, but not all, primes. We could even randomize the changes we made, by taking an interval with several primes in it and randomly choosing the ones we change.

    It may be hard to prove anything rigorously — I don’t know — but heuristically it looks to me as though by this sort of procedure we could create a multiplicative function with a lot of randomness in it (so it would be most unlikely to exhibit character-like behaviour, I would have thought) but with growth less than n^\epsilon.

    • Alec Edgington Says:

      Interesting. It would be good to understand this stochastic process better, and how similar it is (or isn’t) to a simple random walk.

    • gowers Says:

      One way to do that would be to try to run it as an algorithm and see if one can produce lots of unstructured multiplicative functions all with pretty small partial sums. I’m imagining much longer sequences than the ones we’ve looked at so far — perhaps of length more like a million rather than 1000 — so we wouldn’t be displaying them on the wiki. A random such function would have partial sums of order of magnitude 1000, so if one could keep them down to 25 or something like that, then it would be a fairly convincing back-up to the heuristic argument.

    • gowers Says:

      I’ve tried running at as a by-hand algorithm, and the results are interesting, and slightly surprising. My algorithm was this: if the sum up to p-1 is positive then f(p)=-1, if it is negative then f(p)=1, and if it is zero then f(p)=-1. I made the last choice to give the sequence a little help in the fight against perfect squares.

      So far I’ve got up to 155. I got all the way to 45 before I first hit \pm 3 (and in fact I hit -3). I hit 3 at 57. I hit -3 again at 99, 105 and 119. I was feeling pretty good by this point, but then suddenly the sequence went a bit wild and the records started tumbling: 4, 5, 6 at 122, 123, 124, then 7 at 133 and 8 at 136. But fairly soon there were enough -1s to get things back under control again.

      Thomas Sauvaget seems to have programmed something a bit like this (see the end of this page on the wiki) and gets a disappointingly large drift that seems to show that my heuristic arguments are wrong. But I don’t see what would be wrong with them, so I wonder whether I understand properly what he has done. Anyhow, I’d be very interested if someone was prepared to program this very simple greedy algorithm and tell me roughly how fast the partial sums grow. The problem at the moment is that if F(136)=8, then F(n) could just as well be \sqrt{n/2} as \log_2n, so I need more data.

    • Alec Edgington Says:

      Tim, here you go: partial sums plotted up to 10\,000.

    • Alec Edgington Says:

      Perhaps I got the wrong end of the stick, but I thought your heuristic argument required a certain amount of ‘backtracking’: you’d look at the sum up to some value and then change a relatively small number of earlier primes to improve it. This does strike me as interesting, but the difficult question is how to decide which primes to change…

    • Thomas Sauvaget Says:

      Tim, what I had done there was to construct a sequence by filling in multiplicatively and choosing at each undetermined prime p either a +1 or -1 depending on which one minimized a kind of “global sum with maximal forward information”, namely the sum of all the HAP sums up to q (the next prime after p). I had not computed the drift, but the plot shows that indeed there were rather long sequences of minuses in a row.

      I have some time now to quickly code your algorithm, but I apologize I don’t understand what you mean by “the sum up to p-1”: is that just the sum along the sequence, or something else?

    • Alec Edgington Says:

      On second thoughts, perhaps it isn’t so difficult to choose the primes: you only expect to have to change \sqrt{N}, but you’ve got something like N / \log N in the second half of your sequence, so you can easily choose enough without messing about with values at multiples. Hmm…

    • Johan de Jong Says:

      Running your suggestion up to n = 10000 gives:
      Smallest value is -97 happens at 7049
      Biggest value is 140 happens at 8668

      Running your suggestion up to n = 100000 gives:
      Smallest value is -1067 happens at 86255
      Biggest value is 1121 happens at 99977

      Running your suggestion up to n = 1000000 gives:
      Smallest value is -10507 happens at 999991
      Biggest value is 12261 happens at 863579

      I could have made a mistake but the values you mention agree with what I get.

    • Klas Markström Says:

      Maybe one of you could try making a plot of \frac{\log(1+|S_n])}{\log(n)}, where S_n is the sum at n. This would give a rough idea about the rate of growth.

    • Alec Edgington Says:

      It does look curiously regular, like S_n \approx \Re (z n^{1+it}) for some z \in \mathbb{C} and t \in \mathbb{R}

    • Johan de Jong Says:

      You’re going to laugh at me, but I modified my code to put in a +1 at p only if the sum up to p-1 is <-10, and it behaves _much_ better. I just ran this up to 1000000 and the smallest value I get is -62 and the biggest value I get is 96. Coding error? I'll check, but you can try it too…

    • gowers Says:

      I must say I find these results very interesting. First, I do not begin to have an explanation for the rapid growth of the partial sums — it’s as though there is some accidental resonance going on. I’d actually be quite interested to know whether the resulting multiplicative function is approximating some known function with a nice formula.

      As for Johan’s modification, something makes me believe it. There appears to be a message (for which, again, I have not even a heuristic explanation) that if you are trying to control a multiplicative function you shouldn’t panic, particularly when it is negative.

      There are some other variants one could perhaps try. One would be to choose the value of the prime at p to be 1 with probability 2/3 if the partial sum up to p-1 is negative and with probability 1/3 if the partial sum up to p-1 is positive. (Or one could have different probabilities for the two cases.) Another would be to choose random values at say three quarters of the primes and greedy for the remaining quarter. And I’m sure there are several other variants. But I’m excited about Johan’s (which isn’t randomized but is probably not too character-like either), since it suggests that multiplicative functions with slow-growing partial sums don’t have to be all that special.

      Indeed, I’d be interested if Johan could tell us whether there is any sign that his function correlates with arithmetic progressions (other than homogeneous ones of course, where we know that they don’t correlate). If partial sums such as f(5)+f(11)+f(17)+f(23)+\dots are also reasonably small (I don’t mean logarithmic, but I do mean that their growth should be definitely slower than linear), then it would be convincing evidence against all sorts of conjectures one might otherwise be tempted to make. A quick way of judging this might be just to take the Fourier transform and see whether it gets big anywhere.

    • gowers Says:

      In answer to Alec’s question, it’s true that my original suggestion was to be slightly less greedy and modify primes in groups (and that may still be quite a good idea — one possibility is to have groups of size about C(\log n)^2 and change as many as it takes to get the partial sum by the end of the group to be zero (unless you can’t, in which case you get as close to zero as you can).

      However, it then occurred to me that it ought to be the case that adding a few greedy steps to a random walk would dramatically reduce its drift, so a slightly different heuristic argument prompted me to ask about much greedier algorithms, with these unexpected, and to me still mysterious, results.

    • Alec Edgington Says:

      This plot shows the results of three runs up to 10000 with f(p) chosen to be minus the sign of the partial sum so far with probability two-thirds; and \pm 1 with equal probabilities if the partial sum is zero:

      The growth looks a lot more square-root-like.

    • gowers Says:

      Another thought. If the original greedy algorithm gives something that looks suspiciously regular, might it be possible to multiply the function by one of the form n^{it} in order to introduce cancellation into the long stretches of pluses and long stretches of minuses? If we got it right, we might be able to get some very good multiplicative sequences taking values in \mathbb{C}.

    • gowers Says:

      In an effort to understand slightly better these very widely oscillating partial sums, I tried defining a multiplicative function by taking f(p) to be (-1)^k whenever 2^{k-1}<p\leq 2^k. The thinking behind this is that if p is a little below 2^k and q is a little below 2^l, then pq will be a little below 2^{k+l}, and so on, so composite numbers will have a tendency (but only a tendency) to behave similarly to the primes of about the same size.

      I’ve calculated it up to 64 and the results have borne out my expectations. The record partial sums (big and small) are 1 at 1, 0 at 2, 2 at 4, -2 at 8, 4 at 14 (and at 16), -4 at 24 (and at 32), 8 at 44, 10 at 48, 12 at 62 (and at 64), where I have not mentioned records that are instantly broken.

      However, I still can’t come up with a decent explanation for the phenomenon. The problem is that there are edge effects: it is not the case that the function f(x)=\lfloor\log_2x\rfloor satisfies f(xy)=f(x)+f(y). And it would seem that the more factors a number has, the more the edge effects should matter, eventually swamping everything and making the function look random. Perhaps that does indeed happen once we reach the point where almost all numbers have many prime factors that occur an odd number of times.

      Actually, that’s given me an idea for how to explain the phenomenon after all. I’ll do a couple of calculations and then write a comment to say how I get on.

    • gowers Says:

      OK, let’s suppose that we have a completely multiplicative function and we know that up to a certain point N. it correlates very well with the function (1+\cos(A\log n))/2. In other words, it behaves as if one were choosing to set f(n)=1 with that probability.

      Now choose a typical number m that’s a product of two numbers x and y that are less than N. In the absence of any other information, it seems reasonable to assume that the logarithms of x and y are uniformly distributed between 0 and \log m (since the probability that m is divisible by x is x^{-1}).

      On this model, f(m) will be 1 with probability

      \displaystyle (1+\cos(A\log x))(1+\cos(A\log y))/4+(1-\cos(A\log x))(1-\cos(A\log y))/4

      since this is the probability that f(x)=f(y)=1 or f(x)=f(y)=-1.
      But that cancels down to (1+\cos(A\log x)\cos(A\log y))/2. We also know that

      \displaystyle \cos(A\log x)\cos(A\log y)=(\cos A(\log x+\log y)+\cos A(\log x-\log y))/2.

      Since \log x+\log y=\log m and \log x-\log y averages zero (in our model where we’ve chosen two random positive numbers that add up to \log m), the probability that f(m)=1 (randomizing over m as well) is (1+\cos(A\log m)/2)/2.

      That’s not exactly what we wanted, because of the factor 2 inside the bracket. However, it does at least show that there’s a tendency for the biased behaviour that we have up to N to continue, in a slightly weaker form. But I still seem to have the problem that my calculations suggest that the randomness ought to damp down the oscillations, whereas Alec’s plot of what happens with the greedy algorithm shows no sign of damping at all.

    • gowers Says:

      In a nutshell, my difficulty is this. How do we get a \pm 1-valued multiplicative function to mimic the behaviour of a complex-valued multiplicative function such as f(n)=n^{it}=\exp(it\log n)? Alec’s plot seems to suggest that this should be possible somehow, but the obvious method of taking real parts and converting them into probabilities (which is what I was doing above) doesn’t seem to work, which is not altogether surprising given that \cos is not multiplicative. We do have a nice expression for \cos(XY), but it doesn’t give us quite what we want: it gives us a hint of multiplicativity, but it also seems to want to disperse that multiplicativity.

      I suppose it is just about conceivable that Alec’s plot would look different if there were more of it, but at the moment it really does look as though the bias is not dispersing and the peaks of the partial sums are growing linearly.

    • Alec Edgington Says:

      A small observation concerning the behaviour of a random multiplicative function. We know that a standard random walk up to N has a variance of about N. With a multiplicatively-defined random walk, the variance seems to be significantly bigger: it’s about 3500 for sums up to 1000. It looks roughly linear, with a few kinks.

    • Alec Edgington Says:

      I tried mapping p to -1 with probablility \frac{1}{2} (1 - \cos(10 \log p)). The resulting sums, for four runs up to 10000, are shown here:

      They look quite similar to the results of the greedy algorithm.

    • gowers Says:

      If my calculation half way through this comment is correct, then the variance for a random multiplicative function is more like N\log N, which could be exactly what you are observing.

      Those plots are interesting. I feel I must be making a mistake in my heuristic arguments, because they don’t seem to predict this linear growth of the record partial sums. I’m looking for a reason to get rid of that factor of 2 …

    • Alec Edgington Says:

      Ah yes, hadn’t spotted your calculation. That seems quite plausible (with a factor of a half or so). There was indeed some upward curving near the origin which I’d dismissed.

    • gowers Says:

      Here’s another experiment that would interest me. I wonder whether the problem with the greedy algorithm could be thought of like this: when the walk gets close to zero one is pulling it towards zero as fast as one can, but what one should actually be doing is easing off by that stage so that it won’t go zooming off in the other direction.

      With that vague thought in mind (and partly informed by Johan’s experience), one could try a more delicate damping that worked like this. You set yourself a target, such as 50, and aim to keep the partial sums between -50 and 50. But you also don’t want to lose the benefits of randomness, so you use the non-randomness sparingly — only if there seems to be some genuine danger. More precisely, if the partial sum up to p-1 is t and |t|\leq 50, then you let f(p)=-1 with probability (t+50)/100 (so it’s definitely -1 if t=50 and it’s definitely 1 if t=-50 and the probability varies linearly in between). If the partial sum is outside these bounds then you revert to the pure greedy strategy and try to get it back inside.

      I’d be interested to know whether this easing off as you get near zero produces a completely different result. How far does it have to go before you hit \pm 100, for example?

    • Alec Edgington Says:

      Here are four runs of that process taken up to 30000:

      It does seem to be much better controlled. I wonder if one could even come up with an ‘optimal’ function \alpha for generating the sequence by

      \mathrm{Prob} (f(p)=-1) = \alpha (\sum_{m<p} f(n))

    • gowers Says:

      That is very interesting, and I wonder how long this better control would continue, and whether it would work with 50 replaced by 20, etc. etc. It would indeed be interesting to optimize that function, but at the moment I feel as though I’m making stabs in the dark.

      Talking of which, here’s another one. The thought behind it is that the original greedy algorithm didn’t work because the additional kick it gave was accidentally in phase with the partial sums themselves (not that I quite understand why that should be, when it was designed to be precisely the opposite). One way of changing the phase would be to make it depend not on the function itself (that is, the partial sum) but on the derivative (that is, the multiplicative function). A reasonably sensible way of doing that might be this. If you’ve chosen the values of f up to p, then randomly pick an integer r between p-C\log p and p-1 and set f(p) to be -f(r). The idea here is that if the function is growing quite a bit as you approach p then the value at p will have a high probability of cancelling out a little bit of that growth. I’d be curious to know if this had the effect of hugely smoothing out the “random walk”. Part of me says that it probably will and another part of me says that it probably won’t.

      If we define g(p) to be the average value of f near p, then one could generalize your question and ask for the optimal function \beta(F(p),g(p)), where F(p)=\sum_{m<p}f(m).

    • Alec Edgington Says:

      For what it’s worth, setting

      \mathrm{Prob}(f(p) = -1) = \frac{1}{2} (1 + \tanh \frac{1}{2} \sum_{m<p} f(m))

      gives good results most of the time. Here are ten runs up to 30000:

      Two of them get out of hand, but eight stay under control and close to zero. I wonder if they'll all 'go linear' eventually.

      I tried the suggestion of setting f(p) = -f(r) for a random recent r; maybe I just didn't get the parameter right but the behaviour looked fairly random. It ought to be possible to do something along these lines, though.

    • Mark Bennet Says:

      The oscillations of the ‘out of control’ examples look quite regular – are there natural periods for these oscillations – ie predictable values?

    • gowers Says:

      Alec, thanks for running that f(p)=-f(r) idea. It gives me one more idea (after which I may stop — but I can’t quite guarantee it). I think I can understand why f(p)=-f(r) for a random recent r might give fairly random behaviour. The rough reason is this. If the function starts growing in a linear kind of way, then the choices at primes will cease to be random and will try to kill the gradient. So this method should stop linear growth. However, if the behaviour is fairly random, then the choice of f(p) is also pretty random. (Yes, there is some dependence, but we have a lot of dependences anyway.) So this method won’t do anything to defeat randomness, so to speak.

      Now the other method (making f(p) -1 when partial sums are positive and +1 when they’re negative) defeats randomness, but seems to give rise to linear behaviour. So what if we combine the two? That is, what if we just alternate the two methods of choosing f(p)? The idea is that one method would keep things at worst random and the other would then kill off the randomness. In short, one method would produce the randomness that the other one needs for the heuristic argument that lay behind it to work.

      My heuristic arguments here are obviously pretty vague, and I can well imagine that this won’t work at all, but it definitely seems to be worth a try, and I’d have thought you could patch together some bits of existing code pretty quickly.

      On another topic, I am very intrigued by your tanh sequences, and especially by the idea that they may be extremely well controlled but in an unstable way. There are all sorts of phenomena that I’d like to understand better. It feels as though there ought to be differential equations that would predict the behaviour well, but my attempts to do that have so far given obviously wrong predictions.

    • Alec Edgington Says:

      Sure. Here are ten runs of that method up to 30000, where at every other assignment r is chosen randomly from the last 10 \log p numbers:

      As expected the graphs display a mixture of characteristics. Apart from the green one they do seem to be just about under control.

    • gowers Says:

      I have a partial explanation for this linear behaviour that keeps popping up, but need to think about it more. The idea is that the heuristic argument that predicts that choosing f(p) to have the opposite sign to the partial sum up to p-1 should work well depends on the assumption that the partial sums look like a random walk. But if they ever don’t look like a random walk and start moving at a linear rate, then choosing the f(p)s will not have enough effect to defeat that linear behaviour. So instead what happens is that you just hopelessly attempt to slow it down, putting in a long sequence of values of the same sign that set you up for trouble a bit later on.

      Actually, have you tried the very simple adjustment, designed to avoid this rather extreme behaviour, of setting … wait, I’ve just looked back and seen that you have. I’m talking about having probabilities of 2/3 and 1/3 instead of 1 and 0. You gave a link to some pictures that you claimed were much more square-root-like. I’d be interested to know whether that really is the case — that is, if you go a lot further than 10,000, do the deviations get quite a bit bigger?

      Also, one could imagine varying your tanh experiment by sticking a factor of 1/2 in front of the tanh (so that the probabilities vary between 1/4 and 3/4 rather than between 0 and 1). I think that might have the effect of stopping the linear behaviour from ever arising, while maintaining the good control. And if it doesn’t, then perhaps some other constant would.

    • gowers Says:

      Here’s yet another idea for producing a good sequence (my promise to try not to do this was fairly meaningless). If one wants to introduce a phase shift in order to try to stop any resonance, then a simple way of doing it is to define f(p) to be 1 if the partial sum up to \alpha p is negative and -1 otherwise. And of course that too can be made more probabilistic. It would be interesting to know which (if any) of the following three likely possibilities holds.

      (i) Whatever \alpha you choose, you get big growth, but \alpha affects its frequency and amplitude to some extent.

      (ii) Most \alpha have very little effect, but if you get the right \alpha you almost completely kill the growth (with subcases that you make it square-root-like or you make it power-of-log-like).

      (iii) There are a few values for \alpha, notably 1, that are particularly bad, but in general pretty well any \alpha is a big help.

    • Alec Edgington Says:

      Right, an initial play with this suggests that something like (ii) may be the case:

      These are plots up to 5000 for 20 values of \alpha between 0.5 and 1. I’ll see if I can home in on the ‘good’ values…

    • gowers Says:

      I wish I understood what was going on well enough to be able to guess what the best \alpha should be. But the next best thing would be to see what it is experimentally and then try to retrodict (if that’s the right word) that value.

    • Alec Edgington Says:

      Hmm, actually it appears that the best growth arises as \alpha \rightarrow 1, though there is some fairly chaotic variation at other values.

      Maximum absolute value attained by partial sums up to 10000 terms for various \alpha:

      0.1     731
      0.2    1111
      0.3     844
      0.4     639
      0.5    1009
      0.6    1271
      0.7    1324
      0.8     371
      0.9     480
      0.99    181
      0.999   138
      0.9999  140
  15. Ben Green Says:


    Upper and lower bounds of N^{1/2 + o(1)} for random multiplicative functions have been obtained: this is a paper of Halasz. It’s not easy to obtain:

    Halász, G. On random multiplicative functions. Hubert Delange colloquium (Orsay, 1982), 74–96, Publ. Math. Orsay, 83-4, Univ. Paris XI, Orsay, 1983.


    • gowers Says:

      I presume what you mean there is that although it’s easy to get the average behaviour it’s not so easy to show that almost all functions behave roughly like average ones. Is that the point?

    • Ben Green Says:


      That’s right.


  16. Klas Markström Says:

    As I mentioned in one of the earlier post I have kept my program for constructing sequences, not necesssarily multiplicativem running after a restart. The program is now working on sequences of length 1120 and has found a number of them, 63 so far. I have created a page from which you can download the sequences. It is here

    As before these sequences are not necessarily maximal so running one of the backtracking programs to try to extend them would be good.

    Many of these sequences differ already in the first few values, but there are not that many different stubs of length 25 which the program has managed to extend to length 1120 so far.

    • Alec Edgington Says:

      Klas, I’m afraid I can’t get at the sequences from that page: I get error 403 ‘access forbidden’…

    • Klas Markström Says:

      I had forgotten to update the file permissions. It should work now.

      Is see that I also happened to put the raw output files there instead of a cleaned up version. I will fix that too. The current files are fine if you delete the first line and the first two words of the second line.

    • Klas Markström Says:

      Now the files should be readable and only contain the sequences.

    • Alec Edgington Says:

      Thanks for these, Klas.

      The first two that I’ve tried to optimize have stuck at 1124 (and backtracked to the 400s already). I’m starting to suspect that this is the limit! I’ll try some others.

    • gowers Says:

      I was really hoping we’d break the 1124 barrier …

    • Klas Markström Says:

      There are still a number of undecided cases which the computer is working on so I might add more sequences in a few days time.

      Alec, make sure to try at least one from each file number, since they differ quite early on, but the best would of course be to at least see what the maximum extension of each sequence is.

      Comparing them with the analysis done for the first 1124-sequences would also be interesting. I have no idea how different these sequences are from the first few.

    • Alec Edgington Says:

      OK, I’m going through them all now, stopping each run when it’s backtracked below 500 without passing 1124…

    • Alec Edgington Says:

      None got further than 1124. I’ll leave the last one running overnight just in case! In any case there’s a lot to explore in these sequences.

    • Alec Edgington Says:

      There is quite a bit of common structure to these sequences. One can partition [1120] into sets A \subseteq [1120] with ‘primary’ functions \xi_A : A \rightarrow \{ 1, -1 \} such that all the sequences in the list satisfy x \vert_A = \pm \xi_A. One such set is \{ 2, 3, 14, 21, 94, 98, 141, 147 \}. Most impressive is the set

      A_0 = \{ 4, 5, 6, 8, 9, 10, 12, 15, \ldots \}

      which has 729 elements! (Thus all Klas’s sequences satisfy x_4 = -x_5 = -x_6 = -x_8 = x_9 = \ldots = x_{1120}.)

    • Alec Edgington Says:

      The next largest set in the partition is

      A_1 = \{ 148, 152, 164, 166, 172, 173, \ldots \}

      which has 66 elements.

    • Alec Edgington Says:

      I’ve posted the partition, along with prime factorizations of the members of each set in it, on the wiki.

    • Thomas Sauvaget Says:

      Klas, is your program exhaustive, i.e. when it finishes will it mean that it has reached the maximal possible length of C=2 sequences for sure? Also, do you have an estimate of when it might finish under the assumption that 1124 is the maximal length (i.e. how long did it take from 1119 to 1120?)

    • Klas Markström Says:

      Thomas, the program is exhaustive, so given enough time it will determine the maximum possible length of a sequence with discrepancy 2.

      The program does not move forward in simple steps, as from 1119 to 1120, so it is difficult to estimate the remaining running time.

    • Klas Markström Says:

      During the night one more solution of length 1120 was found. I have added it to the list, the name of the file is sol.6119-1-1-1-0-1-0-1-0

  17. A Discrepency Problem for Planar Configurations « Combinatorics and more Says:

    […] in spirit and in methodology to discrepency problems in combinatorics and number theory such as the Erdos Discrepency problem, now under polymath5 attack.  The Kupitz-Perles conjecture seems of a different nature, but I find […]

  18. Sune Kristian Jakobsen Says:

    “Showing EDP for automatic sequences may be a nice toy problem.” – O’Bryant
    I must admit that I don’t have much intuition about automatic sequences, but here is my idea:
    I think that if a sequence x_i is k-automatic and d is an integer, gcd(d,k)=1, all (non-homogeneous) arithmetic progression with difference d looks like the d-HAP in the following sense: If a and n are natural numbers there exists a number N such that x_{a},x_{a+d},x_{a+2d},\dots x_{a+nd}=x_{Nd},x_{(N+1)d},x_{(N+2)d},\dots x_{(N+n)d}. This would imply that if the d-HAP has bounded discrepancy, then any d-AP has bounded discrepancy.
    We know that the AP-discrepancy is unbounded for any sequence, so if we could prove the above for any d (and not only d relative prime to k) we would be done. But perhaps we could generalize the result about AP-discrepancy to:
    For any sequence and any natural numbers C and k we can find natural numbers d and a such that gcd(d,k)=1 and the sequence x_{a},x_{a+d},\dots has discrepancy >C.
    However, I have no idea about how the AP-discrepancy result was proved, so I don’t know how difficult this generalization would be.

    • Sune Kristian Jakobsen Says:

      “For any sequence and any natural numbers C and k we can find natural numbers d and a such that gcd(d,k)=1 and the sequence x_{a},x_{a+d},\dots has discrepancy >C.”
      Do’h, here is a counterexample: +,-,+,-,… and k=2. Actually I was thinking about that example while I wrote the comment, but I didn’t take I too serious, because the k-HAP had unbounded discrepancy. What I should have wrote was: Is it possible to prove that for any C we can find a HAP or a d-AP with gcd(d,k)=1 that has discrepancy >C.

  19. Guy Srinivasan Says:

    Suppose x is a sequence of length N which is completely multiplicative, has discrepancy at most D, has x_N=1, and has \sum_{i=1}^{N-1}x_i=0.

    Define y as y_i=y_{N k + a}=y_k if a=0 and x_a otherwise.

    Then y is completely multiplicative and up to length N^k-1 has discrepancy at most kD.

    Verifying using the sample multiplicative sequence of length 246 with discrepancy 2 (http://michaelnielsen.org/polymath1/index.php?title=Multiplicative_sequences), I checked to find the largest index where the sequence has value +1 and the partial sum just before that is 0, which is at x_{241}. Extending that gives a completely multiplicative sequence of length 1935 with discrepancy 3, and it goes to 241^3 < 112446751 < 241^4 with discrepancy 7.

    It's still multiplicative if y_{N k + 0}=x_k, which helped improve the original Matryoshka sequence. When I extend that way I get to length 1943 while still at discrepancy 3.

    What I'm really hopeful about is iterating this procedure – finding a good constructive way to choose how much to increase the discrepancy and where to pick the length of the next sequence, then constructing "better and better" versions. I thought I had sqrt(log) for a moment, but it didn't work…

  20. Guy Srinivasan Says:

    (x needs to be completely multiplicative mod N to this to work.)
    (which means I don’t know how to iterate it. unless y is magically completely multiplicative mod M for a nice M)

    • Guy Srinivasan Says:

      Yes, the length 1935 “completely multiplicative” sequence is not, because the length 240 sequence is not completely multiplicative mod 241. Is there an enumeration of the 500 length 246 discrepancy 2 sequences somewhere?

    • gowers Says:

      I think Kevin O’Bryant has been down more or less this road, but perhaps you are doing something he wasn’t. In any case, if you have a sequence that is completely multiplicative mod M, then a fairly easy Fourier argument gives you a partial sum of size proportional to the square root of M, so I think there isn’t an easy way of improving Matryoshka sequences. But it’s possible that I am not fully understanding what you are hoping to do here.

  21. gowers Says:

    The thread on simple algorithms for producing (or accidentally failing to produce) reasonably low-discrepancy multiplicative sequences has got so long now that I’m starting a new comment about what the explanations might be for what we are observing.

    It occurs to me now that the apparent linear behaviour may be an illusion, and that the truth may be growth that is very slightly sublinear. My reasoning is as follows. If you define a multiplicative function by defining f(p) to be -1 raised to the power \lfloor\log_2p\rfloor, then you get some very nice behaviour that points in the direction of linear discrepancy. For instance, if p and q are primes and it so happens that pq is only just smaller than 2^k, then it is very likely that \lfloor\log_2p\rfloor+\lfloor\log_2p\rfloor=\lfloor\log_2(pq)\rfloor. So there is a tendency for the pqs to behave very like the primes, at least in the run-up to a power of 2.

    This kind of argument works for products of more than two primes as well, but the more primes you take, the worse it gets. Now a typical integer near n has about \log\log n prime factors (despite anything I might have written to the contrary), and if n is less than a million, then \log\log n looks pretty much like a constant. So I think that the apparent linearity that we see in the examples that blow up may well be a function more like n/\log n that we are just not fully seeing yet. Looking back at some of the plots, there are some signs, though far from conclusive ones, that the gradient is levelling off just a little.

  22. EDP and some first steps in number theory « Random Thoughts Says:

    […] My interest was sparked by Tim Gower’s Polymath project and two comments of Terence Tao (1, 2) on the special case of completely multiplicative functions. Let me state this version of […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: