## EDP3 — a very brief report on where we are

Right at the moment I’d say that Polymath5 has quietened down a little bit but is nevertheless continuing to make progress. There is quite a lot I could write about, but I thought it made more sense to put it on the wiki, so here I shall just briefly draw attention to some recent additions to the wiki (not just by me).

On the experimental side, I am losing track of how new everything is, but a list of the main data can be found at the start of the experimental page on the wiki.

On the theoretical side, Terence Tao has put proofs of some nice facts. One is a very clean proof of Mathias’s result that if the HAP sums of a sequence never go below -1 then they must be unbounded above. The cleanness of the argument comes partly from working with the positive rationals rather than the positive integers and partly from making use of results from topological dynamics. In particular, the latter allows one to talk about a “random rational number $x$” and give a sensible meaning to quantities such as the probability that $f(x)=f(2x)$.

Terry has also proved, by a fairly intricate combinatorial argument, that there must exist a HAP such that the difference between the maximum and minimum partial sums along that HAP is at least 3. (In the terminology we have been using, the drift is at least 3.)

I have created some pages about general proof strategies. These are linked to from a section of the main Polymath5 page. One is a proof strategy that I have already discussed in some detail in recent posts: showing that a counterexample must have multiplicative structure, tidying up that multiplicative structure as much as possible, and eventually showing that there cannot be any examples with multiplicative structure. A newer idea is to define a different parameter, a bit like drift but measured on a number of different distance scales and averaged. More details about this idea and the motivation for it can be found here.

Incidentally, it is easy to find out what the most recent changes to the wiki have been: on the left of each page is an internal link called Recent Changes.

### 104 Responses to “EDP3 — a very brief report on where we are”

1. gowers Says:

I am very interested by Sune’s comment right at the end of the EDP2 thread, so in an effort to understand it better (and while I’m at it move the discussion over to this thread) I will try to put it into my own words.

His comment is about the following question, which emerged naturally from the discussion of how one might prove EDP for multiplicative functions. Let $f$ be a function defined on the positive integers that takes the value 0 at multiples of 3 and $\pm 1$ at numbers that are congruent to 1 or 2 mod 3. As we all know, such functions can easily have bounded discrepancy. Indeed, the function that goes 1, -1, 0, 1, -1, 0, 1, -1, 0, … has discrepancy 1. Sune’s question was simple: is this the only example?

In order to approach this question, he suggests defining $p$ to be the probability that $f(x)=1$ if $x$ is a random integer congruent to 1 mod 3. At first this does not seem to make sense, but we know from Terry that standard results in topological dynamics can be used to make sense of it, at least if we are prepared to work over the rationals. So let us talk freely about random numbers, blithely assuming that they will behave themselves nicely and that what we write can be interpreted coherently.

Sune then considers the probability that $f(xy)=1$ given that $xy\equiv 1$ mod 3. On the one hand, since $xy\equiv 1$, we should expect this probability to be $p$. On the other hand, we can work it out as follows. Given that $xy\equiv 1$, we know that $x$ and $y$ are either both congruent to 1 or both congruent to 2. If the former, then the probability that $f(x)=f(y)$ is $p^2+(1-p)^2$, and if the latter then the same is true. So the probability that $f(x)f(y)=1$ is $p^2+(1-p)^2$. But $f(x)f(y)=f(xy)$, since $f$ is multiplicative, so we appear to obtain the equation $p=p^2+(1-p)^2$, which is a quadratic equation with the two solutions $p=1$ and $p=1/2$. If $p=1$, then $f$ correlates perfectly with the function 1, -1, 0, 1, -1, 0, 1, -1, 0, …, and if $p=1/2$, then there is no correlation at all.

This result is quite plausible. Indeed, in order to define a multiplicative function of the given kind, we have to choose its values at all primes apart from 3. If we choose the value at $p$ without regard to whether it is congruent to 1 or 2 mod 3, then a typical large number will have no particular reason to have an even number of primes that take the value -1 in its prime factorization, or an odd number for that matter. So we do not really expect any correlation.

Having said that, if the set of primes for which we choose the “wrong” value (that is, -1 for a prime that is 1 mod 3 or 1 for a prime that is 2 mod 3) is very sparse, then only a small proportion of numbers will have a “wrong” prime factor. And in fact, this makes one think a little. Suppose we choose the wrong value at just one prime, such as 13. So everything is as before, except that at numbers with an odd number of 13s in their prime factorization we need to change the sign. Since fewer than 10 percent of numbers have an odd number of 13s in their prime factorization, this suggests that the correlation cannot have become all that much worse.

I don’t think that this new function has bounded discrepancy, but neither do I see how to use the bounded-discrepancy condition. I wonder whether there is a slight problem in your argument, which is that it is not clear that the distribution of a random product $xy$ such that $xy\equiv 1$ mod 3 is the same as the distribution of a random $z$ such that $z\equiv 1$ mod 3. Even over the rationals I’m not sure I see that.

• gowers Says:

Just to add to that last point, if you choose a random $z$ near $n$, then on average it has about $\log\log n$ prime factors, and the standard deviation is small.

If you choose a random $x$ and $y$ with product near $n$, then most of the time (I think) they will be between $n^{1/10}$ and $n^{9/10}$, so they will again have approximately $\log\log n$ prime factors, which means that $xy$ will have approximately $2\log\log n$ prime factors and will therefore not be at all typical.

• Terence Tao Says:

Hmm, the issue of whether xy has the same effective distribution as z seems to be a subtle one. I think I can make the ergodic theory formulation give you this, but only if you ask that y be random at a “smaller scale” than x. The intuition now is that there are two ensembles of “random rationals”: a “random rational of moderately large height” y, and a “random rational of extremely large height” x. (One could make this rigorous using nonstandard analysis, incidentally, but never mind that here.) Then I can ensure that xy has the same distribution as x, and that the joint distribution of (x,y) is invariant with respect to fixed rational shifts. But I don’t currently have the ability to make x and y have the same distribution as each other. If one had some sort of “convergence” as one increased the scale then this would not be a problem; typically this happens when there is some sort of “energy” or “index” that keeps increasing as one refines the scale (cf. the standard proof of the Szemeredi regularity lemma). Unfortunately there is not enough translation invariance floating around to make this work (we’re dealing with “pinned” parallelograms 1, x, y, xy, which are not as equidistributed as we need.)

Amusingly, if one works with _three_ random variables x,y,z, with x of humungous height, y of merely enormous height, and z being just of large height, then x, xy, xz, xyz all have the same distribution, which is morally like y, z, yz having the same distribution, especially in the multiplicative case. Maybe this is enough.

• Terence Tao Says:

Note, by the way, that the notion of randomness coming from ergodic theory will look a little different from the “Archimedean” notion of randomness one is used to when considering, say, numbers drawn uniformly from an interval. (It’s a little bit like the difference between uniform measure and equal-slices measure, actually.) The way one should think of creating a “random rational” is as follows:

1. Pick a moderately large number w, and consider all the primes p up to w.
2. Then pick a huge exponent scale M, much larger than w.
3. For each prime p up to w, raise it to a random exponent between -M and M.
4. Multiply them all together. Voila, this is your random rational x.

Note how qx is going to have much the same distribution as x provided that q only involves primes up to w, with exponents much smaller than M. But this distribution is going to be quite different from, say, picking a random integer near N.

• Terence Tao Says:

Actually, I think I can make an energy increment argument to find random shift-invariant independent ensembles x,y,z such that x, xy, xz, xyz have the same distribution, and that y and z have the same distribution, while x has a possibly different distribution. Assuming multiplicativity, one should be able to divide out by x and conclude that y,z,yz have the same distribution, so things should actually be OK.

• gowers Says:

Hmm, I think I may want to take that back. But first we have to work out what it means to extend a function that is zero at multiples of 3 to the rationals. At first glance it looks as though the whole function would have to be zero.

What I think we can do is define a function on the set of all rationals that have no multiples of 3 in either their numerator or their denominator. This set is closed under multiplication, so given any $x$ we can look at the values of $f$ at $x,2x,4x,5x,7x,8x,\dots$. Also, it now looks as though the product $xy$ of a random pair $(x,y)$ should be have the same distribution as a random element $z$. (Having said that, I wouldn’t mind if Terry confirmed it …)

What about my multiples-of-13 example? Here it looks as though we might be OK as well, since for a random rational (in the particular set I am talking about) the bijection $x\mapsto 13x$ tells us that the probability that you have an odd number of 13s in your prime factorization should be 1/2.

• gowers Says:

I should say that Terry’s comments appeared while I was writing my previous one, so I was writing mine without knowing what he said, or even that he was saying anything. I may well want to modify what I wrote once again.

• Sune Kristian Jakobsen Says:

“I don’t think that this new function has bounded discrepancy, but neither do I see how to use the bounded-discrepancy condition.”
I think you’re right, that we don’t use the bounded discrepancy. I thought I used the fact that the density of pluses in 1,4,7…, must be the same as the density of minuses in 2,5,8,… because I used that in a earlier version of my comment.

So if the proof can be formalized, does that mean that any two multiplicative sequence are uncorrelated in this sense?

• Terence Tao Says:

Sune, that sounds quite plausible. Any multiplicative +-1 sequence with at least one -1 should have mean zero (not over the integers, necessarily, but with respect to this funny notion of a “random rational”) because we would have a relation of the form f(qx) = -f(x) for some rational q other than 1.

And the product of two distinct multiplicative sequences would be a multiplicative sequence with at least one -1, so they should be uncorrelated. (This is analogous to how Fourier characters are all uncorrelated with each other.)

2. Terence Tao Says:

I wanted to toss one other half-formed idea I had out there. The bounded discrepancy hypothesis seems to create a tendency for f((n+1)x) to “want” to be of the opposite sign to f(nx), so we expect some negative correlation between these two quantities. But this creates some tension: for instance, f(2x), f(3x), f(4x) all want to be opposite signs to each other, but this can’t happen. (This is already a proof that the drift must exceed 1.) It may be that one has to analyse the correlation matrix

c_{a/b} = E f(ax) f(bx) = 2 Prob( f( ax ) = f(bx) ) – 1

for various a, b. There are various identities and inequalities relating these coefficients. Firstly, the matrix is positive semi-definite in a, b, is circulant with respect to multiplying a, b by a common factor, and equal to c_1=1 on the diagonal. And the drift hypothesis seems to impose further conditions.

For instance, in discrepancy 2, we have that f(x),f(2x),f(3x) can’t all be equal, which forces c_2 + c_3, c_2+c_{3/2}, and c_{3/2} +c_3 to be non-positive.

One could potentially see discrepancy 2 being ruled out by some sort of semidefinite program with this approach…

• gowers Says:

I’m not sure I’m right about this, but let me say it anyway. My impression is that the existence of the sequences of length 1124 would force the semidefinite program (by the way, I have only the haziest of ideas of what a semidefinite program is) to be rather large and complicated, since just plugging in the correlations that we observe in those sequences should give approximate solutions to all the small inequalities such as the ones you’ve written down. But perhaps one could set a computer to the task somehow.

• Terence Tao Says:

Good point. Though one model problem would be to rule out sequences which have discrepancy 2 and have drift 3; this seems to be the first problem out of reach of existing results. I assume the 1124 sequence has drift 4?

A semidefinite program is the matrix generalisation to a linear program. In linear programming one deals with inequalities of the form a_1 \cdot x_1 + \ldots + a_n \cdot x_n \geq b, where the a_i, x_i, b are scalars. In semidefinite programming it’s similar, except that the x_i and b are matrices, and \geq is in the sense of matrices (i.e. the difference needs to be positive semi-definite). It’s not quite a linear program anymore, but it is still convex, and thus reasonably tractable by computer methods.

3. Progress on polymath projects « Euclidean Ramsey Theory Says:

[…] on polymath projects By kristalcantwell There is a new thread for Polymath5. Also the paper “Density Hales-Jewett and Moser numbers” will be […]

4. Miodrag Says:

The Fenwick tree datastructure enables updates and retrievals of partial sums in O(log N) time. I don’t know if the idea could be useful on the theoretical side, but I think it certainly could be on the experimental side. I added the wiki page.

5. Klas Markström Says:

I have been away the last couple of the days so I haven’t been very active here, but the computers have kept on working. I have two different machines working on C=2 sequences now.

From the way the program behaves I suspect that there are no C=2 sequences with length over 2000, but this is in no way rigorous statement and there might be a surprise later on.

There are some subcases which are surprisingly difficult for the program so there might be some sequences of length similar to what we have already seen but with a somehow different structure.

Answering an earlier question about the algorithm I am using, the brief answer is that the program is based on the UmSat constraint-solver. A prototype version of that solver is available from my webpage. It is based on a modern SAT-solver using restart and learning, but with more constraint types than just boolean clauses.

6. Terence Tao Says:

I did some Fourier-analytic computations (building on my previous comment), and managed to reduce the EDP to a similar problem which is purely about completely multiplicative functions, though unfortunately I have to consider complex-valued such functions. Namely, if one can show that for any complex completely multiplicative function $g: {\Bbb N} \to S^1$, one has the lower bound

$\frac{1}{N} \sum_{n=1}^N |\sum_{j=1}^n g(j)|^2 \geq \omega(N)$ (*)

where $\omega(N)$ goes to infinity as N goes to infinity, then this implies that all {+1,-1} sequences have unbounded discrepancy (indeed it implies more generally that any complex sequence with magnitude bounded above and below has unbounded).

The proof is not difficult; one works with the type of box that a random rational lives in, applies Plancherel, and follows one’s nose. The details are at

http://michaelnielsen.org/polymath1/index.php?title=Fourier_reduction

(*) is moderately (but not ridiculously) stronger than the claim that the partial sums of any completely multiplicative complex function are unbounded, which we believe to be true anyway. So if we believe (*), we’ve reduced the problem just to the study of completely multiplicative functions, at the cost of having to deal with an l^2 average in N (and also to get lower bounds that are uniform in N).

• obryant Says:

If we set $x_i$ so that $g(i)=\exp(2\pi i x_i)$, then trigonometry reduces

$\frac 1N \sum_{n=1}^N \left| \sum_{j=1}^n g(j) \right| ^2$

to

$\frac 1N \sum_{i=1}^N \sum_{j=1}^N (n+1-\max\{i,j\})\cos(2\pi (x_i-x_j))$

which further reduces to

$\frac 1N \left( \binom{N+1}{2} + 2\sum_{1\leq i < j \leq n} (n+1-j) \cos(2\pi (x_i - x_j)) \right)$.

• obryant Says:

A little further manipulation (and trusting Mathematica) then gives (with $r(x) = \arctan(\sqrt{x})/\pi$):

$\omega(1) = 1$

$\omega(2) = 1/2$ (at $x_2=1/2$)

$\omega(3) = 1/2$ (at $x_2=r(15), x_3 = r(5/3)$)

$\omega(4) = 11/24$ (at $x_2=r(91/5), x_3=r(65/7)$)

(this value required some guesswork from me, so is less trustworthy than the other values)

$\omega(5) = 21/50$ (near $(x_2,x_3,x_5)=(r(299/21),-r(161/39),r(299/21))$)

• gowers Says:

Kevin, can you explain what you are doing to derive these values? You seem to be going from an abstract sequence to a concrete one. Or is there a missing comment in my moderation queue? I’ll go and have a look.

• gowers Says:

Sorry, I take that back. I had forgotten the definition of $\omega(n)$ and thought that you were defining a good multiplicative sequence.

• obryant Says:

Oops, I was working with g(4)=1, when it should merely be $g(4)=g(2)^2$. So $\omega(4)$ and $\omega(5)$ might be even lower.

• obryant Says:

$\omega(4) =1/2$ (at $x_2=x_3=1/2$)

$\omega(5)\approx 0.48013382$ (at $x_2= -0.46648767, x_3 = 0.43477207, x_5= -0.25466086$)

$\omega(6) \approx 0.48347858$ (at $x_2 = 0.46664253, x_3 = -0.44401727, x_5= 0.378109$)

• Terence Tao Says:

These are interesting numerics! The initial descent is discouraging, but it does seem to turn the corner from 5 to 6, so optimistically one can hope that it’s all uphill from here. Of course we know from the completely multiplicative sequence of length 246 that omega(n) is not going to exceed 2 for a long, long, time…

7. obryant Says:

Iif instead of taking $x_i$ to be in $[-1/2,1/2]$, we set $x_i = \arctan(y_i)/\pi$ so that, assuming only that $y_i,y_j$ are real,

$\cos(2 \pi (x_i - x_j))$

is equal to

$\frac{(1-y_i+y_j+y_iy_j)(1+y_i-y_j+y_iy_j)}{(1+y_i^2)(1+y_j^2)}$

and the problem is algebraified.

8. gowers Says:

I am very excited about Terry’s lemma, partly because it brings into sharp focus some of the tantalizing hints we were getting from the 1124 examples that a complex structure was hiding behind our $\pm 1$ examples, but more importantly because it may well be a sufficiently good realization of one of the steps in our hoped-for proof strategy: to reduce the problem to the multiplicative case.

At first I was worried that the statement one would be required to establish might be too strong and be false for character-like functions. My faulty reasoning was that the partial sum looks as though it goes above $C$ with probability that is exponentially small in $C$, so the partial sums could have a finite average. But I was confusing the drifts of the partial sums with their actual values. Let me now say why I think that character-like functions will cause no problems, by considering the familiar example where $f(3)=-1$ and for all other primes $f(p)=1$ if $p\equiv 1$ mod 3 and $f(p)=-1$ if $p\equiv 2$ mod 3.

To analyse this, let us pick a huge $N$ and then take a random number $n$ between $0$ and $3^{N-1}$. To calculate the partial sum up to $n$ we first write $n$ in base 3. Then the partial sum up to $n$ is equal to the number of 1s that multiply even powers of 3 minus the number that multiply odd powers of 3. For instance, if $n=1022121$, then we have 1s multiplying $3^0$, $3^2$ and $3^7$, so we have two even powers and one odd and the partial sum up to $n$ is therefore 1.

Now the ternary expansion of a random $n$ between $0$ and $3^{N-1}$ is a (uniformly) random sequence of 0s, 1s and 2s of length $N$. Therefore, what we obtain is the difference $X-Y$, where $X$ and $Y$ are distributed binomially with mean $N/6$. This gives us a random variable of mean zero that will be approximately normal and will have standard deviation $c\sqrt{N}$. Therefore, the partial sums will tend to have size roughly $c\sqrt{N}$ and will hardly ever go much above that. In particular, their variance (which is what Terry’s formula actually gives) is linear in $N$.

In other words, for this example we seem to have that $\omega(n)$ grows logarithmically, just as we would like. (It’s logarithmically because $N$ is logarithmic as a function of $3^N$.)

• Alec Edgington Says:

I also find this result extremely interesting. It doesn’t seem like too much of a strengthening of EDP to be plausible. It is a strengthening in two respects: in its generalization to complex numbers, and in the taking of mean values. But I think that the former is not as big a step as it seems, since extremal sequences will likely be constrained to have values in some finite set, making the problem essentially finite-combinatorial again, albeit with a rather more complex set of allowed transitions than the simple $\pm 1$ transitions on $[-C, C]$ that are allowed in the EDP. (At least, this is what my earlier experiments suggested.) And as for the second strengthening, I find it hard to imagine a multiplicative sequence whose (squares of) partial sums are unbounded but bounded in the mean, since one would think that the places where they were allowed temporarily to get big would tend to propagate ahead at multiples of themselves, and so have positive density.

9. gowers Says:

I have already written, in particular on the wiki, about how the discrepancy itself is not necessarily the best parameter to work with if one is trying to find a lower bound for the discrepancy. In the context of multiplicative functions, I would rephrase this by saying that the maximum partial sum is not necessarily the best parameter to work with if one is trying to find a lower bound for the maximum partial sum.

An obvious question now is whether the mean-square partial sum is going to be better. A priori it seems to have a lot going for it, since mean squares tend to be a lot easier to deal with than maxima, and this particular mean square does appear to be unbounded (at least if one believes that character-like functions are in some sense extremal). To make further progress, I think we should attempt to understand what happens to the mean-square partial sum if we take a multiplicative function, expand it by some factor and rotate accordingly, and fill in the gaps. (The early gaps will tend to be forced.)

Hmm … this is feeling rather promising, but I have to go for an hour or so.

10. gowers Says:

Back again, and here is a preliminary thought about how Terry’s parameter might form part of a proof. It’s a bit half-baked at the moment, and I can’t yet see whether it just needs to stay in the oven a little longer or whether the recipe is wrong, if you’ll excuse the overstretched metaphor.

We are interested in a multiplicative function taking values in the unit circle in $\mathbb{C}$, and would like to show that the average value of $|f(1)+\dots+f(n)|^2$ is unbounded. More precisely, we would like to show that the average of those partial sums as $n$ goes from 1 to $N$ is bounded below by $\omega(N)$, where $\omega$ is a function that tends to infinity.

Let’s imagine that we are trying to prove by induction that $\omega(N)$ is logarithmic in $N$. Then a natural strategy would be to try to show that $\omega(2N)$ is bigger than $\omega(N)$ by some positive constant. Now I’ll be slightly sloppy. The inductive hypothesis should say something like that the value of a “typical” squared partial sum $|x_1+x_2+\dots+x_n|^2$ is $c\log n$. What does that tell us about the value of a “typical” squared partial sum $|x_1+x_2+\dots+x_{2n}|^2$?

A natural thing to do here is to split the partial sum up into its odd part and its even part. If we do so then we get

$\displaystyle |x_1+x_3+\dots+x_{2n-1}|^2+|x_2+x_4+\dots+x_{2n}|^2$
$\displaystyle +2\Re((x_1+\dots+x_{2n-1})(\overline{x_2}+\dots+\overline{x_{2n}})).$

The first of these terms is forced to have an average that is at least some positive constant, since each time you add a new $x_i$ you change the partial sum in modulus by at least 1. (So, for instance, you could take the partial sums in pairs and argue that the average of two successive ones is always at least $1/4$.) The second one is equal to $|x_1+x_2+\dots+x_n|^2$ by multiplicativity, and therefore by induction is at least $c\log n$. So we will be done if we can somehow argue that the third term, the one involving products of even and odd terms in the sequence, averages zero in some sufficiently strong sense.

Is there any chance of that? One might argue that the even and odd partial sums ought to be negatively correlated, but I’m not sure whether that is really the case.

It’s quite possible that I am trying to prove something much too strong here, namely that if you take any multiplicative sequence, then the mean square of its partial sums is always a step bigger than the mean square of its partial sums along even numbers (or equivalently along the first half of the numbers). I can well believe that this is true for extremal cases, but it may be too much to ask for it to be true in all cases. If so, then perhaps one really is forced to develop a more detailed parameter such as the oscillation measure I tried to develop here. However, the fact that the above decomposition gives a logarithmic term plus a constant term does seem promising enough to be worth thinking about further.

• gowers Says:

In case the above argument seems so vague as to be utterly useless, let me give one of the motivating ideas behind it. It’s that it is rather similar to how one proves that character-like functions have logarithmic growth. For instance, let’s take the base-3 example with $f(3)=-1$ again. This time, instead of expanding by 2 we’ll expand by 3. So we take a typical squared partial sum $(x_1+\dots+x_{n})^2$ and we split it into its 0-mod-3 part, its 1-or-2-mod-3 part and the cross term. The 0-mod-3 part gives us a contribution of $c\log(n/3)$, the 1-or-2-mod-3 part gives us a contribution that alternates between 0 and 1, so averages 1/2, and the cross term is of the form

$\displaystyle (x_3+\dots+x_{3m})(x_1+x_2+x_4+x_5+\dots+x_k),$

where $k$ is either $3m-1$, $3m+1$ or $3m+2$. If it is $3m-1$ or $3m+2$ then we get a contribution of zero. Otherwise, we get a contribution of $(x_3+\dots+x_{3m})$.

Hmm, we’re in a slightly strange situation here. I think if we take $N$ to be a power of 9 then we can argue that the average partial sum $(x_3+\dots+x_{3m})$ is zero and we’re done. But if, for example, I’d taken the Walters example with $f(3)=1$, then all the partial sums would have been positive and they would have had a substantial average like $c\log N$. Ah, but that’s OK as it’s only negative averages that concern us.

That leaves me not that much the wiser about whether an argument along these lines is feasible. But note that a character-like function would appear to have to have partial sums with non-negative averages. (I think the average, over a suitable range, is positive if $f(p)=1$ and zero if $f(p)=-1$, as we have just observed in the base-3 case.)

• gowers Says:

Of course, we now face a familiar problem, which is that if we are talking about a general multiplicative function, then it’s not clear what the right expansion factor should be. So either we need an argument that just expands by some fixed number like 2 or 3 (so if e.g. the function was character-like to base 5 then it would still work, even though not as well as it would have if one had expanded by 5) or we need to do something like first identifying the correct “base” and then expanding by that factor. In the complex case, it’s not altogether clear that there will be some integer base, so the second strategy could be a bit of a challenge.

• Jason Dyer Says:

So we will be done if we can somehow argue that the third term, the one involving products of even and odd terms in the sequence, averages zero in some sufficiently strong sense.

This is very handwavy based on my brute force experiments, but note this is obviously true for C=1, and to some extent the sequence “wants to” be constrained by C=1. That is, I can hand-run a C=2 algorithm by assuming a C=1 algorithm and tweaking only when forced.

• Jason Dyer Says:

To put my thoughts more mathematically (but equally hand-wavy), given a sequence bounded by a particular C, consider the sums for particular n for all d that are factors. Call the number of sums equal to abs(C) to be the danger level of that n. (This is closely related to drift, but unless I am misunderstanding drift only gives the maximum possible without counting the number of ways of doing it.)

A long sequence attempts to minimize the danger level for all n, because any danger level above 0 (equivalently, if I understand the definition, n with a drift of C) causes at least one forced placement in the sequence. Minimizing the danger level of C = 2 requires the sequence be confined to such an extent that it “acts like” C = 1 in a way that would likely provide your average you desire.

11. gowers Says:

Let me consider a question I raised earlier. Given a multiplicative function $f$ (with complex values) let us define $\omega(f,N)$ to be $N^{-1}\sum_{n=1}^N|f(1)+\dots+f(n)|^2$. Thus, Terry’s function $\omega(N)$ is the minimum over all multiplicative $f$ of $\omega(f,N)$. We would like to prove that $\omega(2N)$ is always at least $\omega(N)+c$ for some positive constant $c$. What I would like to know is whether there is any chance of proving the stronger result that $\omega(f,2N)$ is always at least $\omega(f,N)+c$. This feels like too much to hope for, so I am going to spend this comment trying to find a counterexample.

It seems to me that the natural way to do this is to exploit the fact that if we are choosing the values of a multiplicative function up to $2N$, then we have complete freedom of choice for all primes between $N+1$ and $2N$, so it feels as though we should be able to keep the partial sums in the second half of the function “artificially small”, so to speak. Indeed, it seems quite possible in principle that we might manage to cook up an example where $\omega(f,2N)$ is actually smaller than $\omega(f,N)$.

Indeed, here is how it might be done. Let’s take the original Walters example $f(n)=\lambda_3(n)$, where $\lambda_3(3)=\lambda_3(3m+1)=1$ and $\lambda_3(3m+2)=-1$. Let us take that function out to $N=2.3^k$ and then try to modify it.

The partial sum up to $n$ equals the number of digits equal to 1 in the ternary expansion of $n$. Therefore, up to $3^k$ the average partial sum will be $k/3$ or so, and the same will be true between $3^k$ and $2.3^k$ (but in fact it will be greater by 1 than the first average. Moreover, the variance will itself be around $2k/9$ in both cases (using the formula $np(1-p)$ for the variance of a binomial random variable with parameters $n$ and $p$), so the mean square will be about $k^2/9$ (since $2k/9$ is much smaller than this for large $k$).

But if we modify this function by changing the first $k/6$ 1s at primes beyond $3^k$ to -1s, then we will reduce the mean partial sum after $3^k$ from around $k/3$ to around zero, keeping the small variance. So it looks very much as though the mean square partial sum goes down from $k^2/9$ to more like $k^2/18$.

Naturally enough, we will pay a price for this later (if we try to extend our new function), but that isn’t much comfort, because we were trying to find an inductive proof and the inductive step has failed.

This suggests to me (but I am ready to revise my opinion in the light of further remarks people might have) that working directly with the mean square partial sum will not be straightforward. Basically, the problem is another familiar one, which is that the best examples for any given length seem to be ones where you take a very structured example (such as a character-like function) and mess about with it at large primes or small multiples of large primes. The trouble with these examples is that they cannot be continued, but they do tend to mess up proofs.

It may be that we can get round this problem by using highly infinitary methods, since then there would not be a notion of “messing around with the last few primes”. That is, perhaps we can pass to some nice limit and then use a clean inductive argument.

I was about to write that my oscillatory parameter did not seem to have the same difficulty, but I think that perhaps it does. Indeed, there are quite a lot of primes to use between $3^k$ and $2.3^k$, so it ought to be possible to make the partial sums in the second half far more constant. For instance, one could divide it into a fairly large number of ternary chunks and make the average zero in each chunk by messing about with primes at the beginning and end of the chunk.

This feels to me like a technical problem, however, since one can only mess around with primes in the second half of a function, so to speak. I would be surprised if there were not some dodge for getting round it. (However, it might be that we can mess about enough to get the mean-square partial sums to grow sublogarithmically. I’ll think about that in my next comment.)

12. gowers Says:

I’ve thought once again about whether messing around with primes can be used to get sublogarithmic growth, and, not for the first time, convinced myself that it can’t (or at least, can’t in any simple-minded way). I’ll try to give an idea of why I think that, but I don’t have the time to give proper details.

The basic idea is to take the ternary function $\lambda_3$ (the character-like one with $f(3)=1$ — I am just using this as an example but I think the argument applies more generally) and to reduce its average partial sums by changing the values at appropriately selected primes.

By “appropriately selected” I mean that if the main interval has length $3^k$, then we choose some suitable $r and split it up into intervals of length $3^r$. Now the “base” value in each of these intervals will depend on all the ternary digits of the number apart from the last $r$ of them, so the thought is that we could subtract this base value (by changing a few primes just before the interval starts) and end up with a function that varies between $0$ and $r$ instead of between $s$ and $s+r$, where $s$ is the base value.

All we have to do to achieve this is find $s$ primes that are just below the relevant multiple of $3^r$. If we assume Cramer’s conjecture, we can do this even when $r$ is $o(k)$, but even with that huge assumption we run into trouble, as I hope to show.

The problem is that if we make a few changes at some prime, then we will also be making changes at all multiples of that prime, so there will be effects that need to be undone. And if $r$ is less than $k/2$, so that $3^r$ is less than the square root of $3^k$, then by the time you get half way along, you will be needing to find more primes just to undo the effects of earlier primes than there are integers. It may be that the effects of the earlier primes will cancel out to some extent, but one will still be left with a lot of correcting to do, and any hope of a simple-minded proof will have evaporated.

So it seems to me that there is no hope of getting anything like this to work unless $r$ is at least $k/2$, which reduces the growth by only a factor of 2.

I’m beginning to think that whatever the right parameter is, it should measure not just the partial sums up to $N$ (averaged in whatever way you like) but also in some sense the difficulty of continuing the function further.

13. gowers Says:

One more quick thought. Here’s another way of seeing that the base-3, $f(3)=-1$ character-like function has partial sums with mean squares that go up logarithmically. I’ll consider the mean square of the partial sums up to $9^k$.

We can write the function as a sum of the following functions:

1 -1 -1 1 -1 1 1 -1 0 1 -1 -1 1 -1 1 1 -1 0 …

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 -1 0 0 0 …

and so on. The partial sums of these sequences are

1 0 -1 0 -1 0 1 0 0 1 0 -1 0 -1 0 1 0 0 …

000000000111111111000000000-1-1-1-1-1-1-1-1-1 000 …

which form an orthogonal system of $k$ sequences, all of the same size in $L_2$, resembling a system of Haar functions. (The squares of their norms, if you look at averages, are 4/9.)

The partial sum of the entire sequence is the sum of these partial sums, so by orthogonality, the $L_2$ norm of the sequence of partial sums (which is what we are measuring) is $4k/9$. Note that this is, as it should be, the sum of the variances of the (independent) random variables $X$ and $Y$ I defined in an earlier comment, which were both $2k/9$.

The obvious question that arises out of this is whether we can get somewhere in general by trying to find a decomposition of the sequence of partial sums. We do not need this decomposition to be orthogonal — just that there shouldn’t be too many negative correlations.

It seems to me that Sune’s ideas, which I wrote about above, could be relevant here, though I don’t see a direct connection. It’s just that they too concern multiplicative functions that have non-negative correlations. What is clear (though this was pretty obvious anyway) is that we are in very good shape if we can find some $k$ such that for every $m$ the sum $g(km+1)+\dots+g(k(m+1)-1)=0$, where $g(x)$ is the partial sum of the non-multiples of $k$ up to $x$, and we might still be in good shape if we could find a usable approximation to it.

Note also that a similar analysis applies to the “positive” Walters function, except that now all the functions are positive, and therefore positively correlated (and indeed, sufficiently correlated that the mean-square partial sum is quadratic in $k$ rather than linear).

14. Terence Tao Says:

I will try to write a more detailed version of this on the wiki later, but there is a slight variant of the Fourier reduction which is a little bit more efficient and which may be slightly better for some arguments.

To make the arguments below we have to work in a big “box” of rationals and then take the limit (or perhaps one could also use the adeles), or use the general theory of Fourier analysis on LCA groups; let me ignore this issue, so that I can blithely take the Fourier transform on the rationals ${\Bbb Q}$ (with the discrete topology). The Pontryagin dual $\hat {\Bbb Q}$ of ${\Bbb Q}$ is the space of completely multiplicative functions $g: {\Bbb Q} \to S^1$.

If $f: {\Bbb Q} \to \{-1,+1\}$, then by Plancherel, the Fourier transform $\hat f: \hat {\Bbb Q} \to {\Bbb C}$ has l^2 norm 1. Thus, $|\hat f|^2$ defines a probability measure on $\hat {\Bbb Q}$, i.e. it defines a concept of a random completely multiplicative function $g: {\Bbb Q} \to S^1$.

The fact that $|f(x)+\ldots+f(nx)| \leq C$ then translates (after Plancherel to the fact that

${\Bbb E} |g(1)+\ldots+g(n)|^2 \leq C$. (*)

Thus, g is a random completely multiplicative function whose partial sums remain bounded in (square)-expectation. Thus, in order to show EDP, we need to know not only that deterministic completely multiplicative functions have unbounded partial sums, but that random ones do as well, in the mean square sense.

Now if we average (*) from 1 to N we conclude

${\Bbb E} \frac{1}{N} \sum_{n=1}^N |g(1)+\ldots+g(n)|^2 \leq C$

and we now see that if we can bound the expression in the expectation for all g by some $\omega(N)$ going to infinity, then we would be done. But the point is that we don’t necessarily have to average (*) from 1 to N, and may wish to do something else instead.

My feeling is that we should now work on the model problem of whether a (deterministic) completely multiplicative function taking values in {-1,+1} has unbounded discrepancy. The hope is that any proof of this can (a) extend to complex-valued functions, and (b) extend to probabilistic multiplicative functions, and if we can pull off both (a) and (b) then we are done. Perhaps one model sub-problem here would be to prove _by hand_ (rather than by computer) that completely multiplicative functions taking values in {-1,+1} have discrepancy larger than 2.

An amusing side remark: if we assume that f ignores squares, in the sense that $f(q^2 x) = f(x)$ for all x, then the Fourier analysis lets one reduce g to take values in {-1,+1}. So if we can’t pull off (a) we still get a significant special case of EDP.

• Terence Tao Says:

I wrote this up briefly on the same page as before,

http://michaelnielsen.org/polymath1/index.php?title=Fourier_reduction

So EDP now reduces to establishing the bound

$\sup_n {\Bbb E} |g(1)+\ldots+g(n)|^2 = \infty$

for all probabilistic completely multiplicative functions $g: {\Bbb Q} \to S^1$. This generalises EDP for completely multiplicative functions in two ways: by allowing g to take values in S^1 rather than {-1,+1}, and by letting g be probabilistic rather than deterministic.

• Alec Edgington Says:

As far as I understand it, this notion of a random $g$ depends initially on a choice of two numbers ($N \ll M$), which then go to infinity. I’d like to get a better feeling for this distribution. For example, can we say anything in relation to EDP about the variances of $g(1) + \ldots + g(n)$ when $g(p)$ is uniformly distributed on $S^1$ independently for each prime $p$? (I expect this is too much to hope for.)

• gowers Says:

A very quick remark is that I think it is possible to prove by hand that a completely multiplicative function to {-1,1} has discrepancy greater than 2. But the proof I have in mind is essentially running a brute-force algorithm in a sufficiently clever way for it to be feasible by hand. This is the algorithm that I described in my EDP2 post. If I were allowed to write a proof by first describing that algorithm with complete precision and then saying things like, “If we now try f(11)=-1 then following through the implications we eventually find that the partial sum at 103 must be -3,” then I think there would be sufficiently little backtracking for the proof to fit on a page or two. If I had to describe all the implications (having to tell you the values that I had assigned to various numbers at various stages of the process, then unassigning some but not all of them as I backtracked, etc. etc.) then it would become unreadable.

In other words, I think that hand computation is feasible for this, and even a description of the hand computation, provided that the reader is prepared to believe what I say when I say that such and such a choice forces, after several steps, such and such a contradiction.

It’s even conceivable that each time a contradiction was obtained, one could go back and find a “minimal path” to that contradiction and produce a just about readable proof that went into full detail. But it would be quite tedious to do that.

I do think that one natural approach to the problem is to analyse this algorithm, because the amount that is forced really is pretty large. It could be that with a few number-theoretic assumptions one could prove that it always terminated with a contradiction. But my feeling at the moment is that that would be a difficult line of attack, so I’d like to be convinced that more theoretical approaches cannot work before putting a lot of effort into it.

Having said all that, if the general feeling is that it would be useful to have a proof of this kind on the wiki that completely multiplicative functions to {-1,1} must have discrepancy at least 3, then I’m prepared to do it. But perhaps you/Terry are/is looking for a proof that is more than just a description of what a semi-cleverly programmed computer would do.

• gowers Says:

One supplementary remark, aimed at programmers out there. In the light of various theoretical, and even algorithmic, considerations, I think that in some ways if you fix a discrepancy like 2, instead of asking for the longest multiplicative sequence of that discrepancy, it is more natural to ask for the longest multiplicative sequence for which there is no easy proof that it cannot be extended to an infinite sequence of discrepancy 2.

There are various notions of “easy proof” one can use here. The simplest is that you simply fill in all the values that are determined by multiplicativity at higher numbers and then see if you can find a string of five pluses or five minuses. A more sophisticated notion would allow deductions such as this. If you can find an even number n such that the values between n and n+7 are all determined and are + + – + + – + +, then the discrepancy must be at least 3. (Proof, if the discrepancy is at most 2, then the partial sum up to n must be 0, since it has to be even and cannot be 2 or -2 because of the pluses at n+1 and n, respectively, and that implies that the partial sum at n+7 is 3.) A yet more sophisticated notion would allow deductions such as this. If you can find an even number n such that the values between n and n+6 are determined and are + + – + + – +, then the value at n+7 must be – (by the above argument). But n+7=rs and we already know the value at r, so that gives us the value at s. And we then use information about the value at s to deduce more values, and so on.

I suspect that the longest completely multiplicative sequence that is not “doomed to have discrepancy greater than 2” has length a lot shorter than 246. But I’d be very interested to have that hunch confirmed.

• Alec Edgington Says:

In this context it may be natural simply to consider the drift. Call an interval $I \subseteq \mathbb{N}$ $N$-smooth if no number in $I$ has any prime factor greater than $N$. To prove EDP for multiplicative functions it is obviously enough to show that for any $C$ there is an $N$ such that any multiplicative $\pm 1$-valued function has drift greater than $C$ on some $N$-smooth interval.

Is this likely to be true? I’m not sure: I think so; but it would be interesting to do some computations to get a feel for it.

• Alec Edgington Says:

Ah, a moment’s thought shows that it’s no stronger than the multiplicative EDP, since $[1, N]$ is $N$-smooth. So the question is just whether one can say anything useful about the bound ($N$ as a function of $C$).

15. Terence Tao Says:

This is a rather frivolous comment, but optimising the discrepancy of a multiplicative function up to N has to me the feel of the type of slightly shady accounting one uses to balance the books in one fiscal year by introducing all sorts of borrowing gimmicks that then cause all sorts of difficulties in the next year. One can try to tamp those down by further uses of gimmicks, but at some point the whole house of cards seems to always fall apart. But to prove this sort of thing rigorously we need some sort of sophisticated accounting standards that take into account all the “borrowing” from the future one is doing. We’ve mentioned entropy in the past, and at a heuristic level at least one could imagine that something like entropy could be used to measure how many degrees of freedom we’ve already sacrificed to balance the books up to length N, and project when one should run out of room to maneuvre altogether.

• Terence Tao Says:

Anyway, if we employed a smoother weight, e.g. exponential weighting $\frac{1}{N} \sum_{n=1}^\infty e^{-n/N} |g(1)+\ldots +g(n)|^2$, then perhaps artefacts coming from the mathematical equivalent of the end of the fiscal year would be ameliorated.

The other thing that occured to me is that the expression here is basically quadratic in the g(p)’s individually (but not jointly), and so one can optimise in each independent phase g(p) easily. One can imagine using iterative methods to then numerically optimise these sorts of quantities, though unfortunately the situation is not fully convex and we may not have a guarantee that our numerical algorithm converges to the true optimum.

• Mark Bennet Says:

Isn’t there a natural coding of the information implicit in a fully multiplicative function through the Dirichelet series, which factorises by multiplicativity into a part which is known through choices at particular primes and a part which is unknown and yet to be chosen.

It is unclear to me that this codes the information in any useful way to tell whether it is messed up too much further out.

16. Steven Heilman Says:

Via the Fourier Reduction Lemma, we are curious if

$\frac{1}{N}\sum_{n=1}^{N}n^{2}| \frac{1}{n}\sum_{j=1}^{n}g(j) |^{2}\geq\omega(N).$

Applying Weyl’s criterion to the inner sum suggests an approach to this problem: understanding the possible equidistribution properties of $g(j)$. More specifically, a way to prove/disprove the above inequality is: consider whether or not a completely multiplicative $g$ has $\{Arg(g(j))\}_{j=1}^{n}$ “well equidistributed” (mod 1) as $n\to\infty$. For the inequality to hold, $\{Arg(g(j))\}_{j=1}^{n}$ cannot equidistribute “too quickly”.

17. Sune Kristian Jakobsen Says:

At the beginning of this project I noticed that if we have a quasi-multiplicative* {-1,1} sequence with bounded discrepancy, there must exist a multiplicative sequence over $S^1$ with bounded discrepancy. I think it would be of some interest to show the opposite (that given a multiplicative sequence over $S^1$ with bounded discrepancy, we can find a quasi-multiplicative {-1,1} sequence with bounded discrepancy) since that would prove that we don’t lose anything by going from {-1,1} to multiplicative $S^1$ sequences.

*Here a quasi-multiplicative sequence is a sequence of the from $x_n=s(f(n))$ where $f:\mathbb{N} \to S^1$ is a multiplicative function, and s is the function that sends 1 and numbers with positive imaginary part to 1 and all other elements in $S^1$ to -1. The multiplicative $S^1$-sequence I talk about in the above paragraph is of course f(n).

Would $\sup_n {\Bbb E} |g(1)+\ldots+g(n)|^2 = \infty$ imply EDP for general $S^1$-valued sequences? and is it equivalent to this problem?

• Terence Tao Says:

This is a good question! I think the statement

$\sup_n {\Bbb E}|g(1)+\ldots+g(n)|^2 = \infty$ (*)

is in fact equivalent to EDP in Hilbert spaces, i.e. that if $f: {\Bbb N} \to S$ takes values in the unit sphere of a Hilbert space then it has unbounded discrepancy (generalising discrepancy to Hilbert spaces in the obvious manner). This variant of EDP was implicitly raised by Tim back in his original blog post on this problem.

The proof that (*) implies Hilbert EDP follows the same lines as sketched in the Fourier reduction page on the wiki. (L^2-based Fourier analysis extends extremely painlessly to Hilbert-space valued functions.)

In the converse direction, given an example to (*), we let $\mu$ be the probability distribution of the random multiplicative function g, let H be the Hilbert space $L^2(d\mu)$, and let $f: {\Bbb N} \to H$ be the tautological function that assigns to each natural number n the function $g \mapsto g(n)$, which lies in the unit sphere of H.

I’ll add this observation to the wiki shortly.

18. obryant Says:

$\omega(N)$ is the infimum of an easily expressed rational expression with real variables.

To wit, we want to handle

$\omega(N,g) := \frac1N \sum_{n=1}^N \left|\sum_{k=1}^n g(k)\right|^2$

where $g$ is multiplicative, i.e., g(mn)=g(m)g(n), and g takes values in the complex unit circle.

Setting $g(k)=e^{2\pi i x_k}$, the multiplicativity of g becomes $x_{mn}=x_m+x_n$. We wish to handle:

$\omega(N,g) = \frac 1N \left( \sum_{1\leq i , j \leq n} (n+1-j) \cos(2\pi (x_i - x_j)) \right)$

Setting $y_k = \tan(\pi x_k)$ (and not being concerned about the possibility $y_k=\infty$), we want to handle

$\omega(N,g) =\frac1N \sum_{i,j}^N \frac{(n-j+1) \left(y_i y_j+y_i-y_j+1\right) \left(y_i y_j-y_i+y_j+1\right)}{\left(y_i^2+1\right) \left(y_j^2+1\right)}$

where the multiplicativity of g is expressed through

$y_{mn} = \frac{y_m+y_n}{1-y_my_n}$

by the identity expanding $tan(A+B)$. The denominator $\omega(N,g)$ is $N \prod_{k=1}^N (1+y_k^2)$ before any of the “multiplicative substitutions”, and

$N \prod_{p\leq N} (1+x_p^2)^{v_p(N!)}$

where $v_p(N!)$ is the largest exponent of p that divides $N!$, after all of the multiplicative substitutions (although some of those factors may cancel into the numerator). I haven’t found any nice expression for the numerators.

The point is that $\omega(N)$ is the infimum of a rational expression with real variables. The denominator of that expression is nice, and perhaps the numerator is also. I would’ve guessed that Mathematica would chew through minimizing such things quickly, but that isn’t the case. How hard is it to infimize a polynomial of several variables over the reals?

• obryant Says:

$n+1-j$ and $n-j+1$ should be $n+1-\max\{i,j\}$.

• Klas Markström Says:

Kevin,
are sure you have typed the expression correctly? I just ran a quick numerical minimization and for N=4 I get a value which is smaller than 0.5.

Doing this numerically will not determine the exact values but it will give correct upper bounds.

• obryant Says:

With the (n-j+1) factor replaced by n+1-max{i,j}, it works.

Numerically optimizing over all y_1, y_2, y_3, y_4, Mathematica finds

$\omega(4,g) \geq 5/12$.

Multiplicativity says $y_1=0, y_4 = 2y_2/(1-y_2)^2$, and using these to eliminate y_1, y_4, Mathematica symbolically (as opposed to numerically) returns:

$\omega(4,g) \geq 1/2$

• Klas Markström Says:

Then our values agree, I had a free $y_1$

With $y_1=0$ my Mathematica 7’s Minimization function finds the following numerical upper bounds for the first few values

1 1
2 0.5
3 0.666667
4 0.5
5 0.6
6 0.483479
7 0.503304
8 0.479167
9 0.447818
10 0.460806
11 0.472545
12 0.468074
13 0.585487
14 0.710418
15 0.581781
16 0.609378
17 0.526271
18 0.821528
19 0.555702
20 0.544255
21 0.636351
22 0.638662
23 0.620485
24 0.593714
25 0.580997
26 0.619162
27 0.580855
28 0.594115

• obryant Says:

What’s so disappointing about Klas’ numbers is that for N=3, they are already so bad. Setting $y_2 = -\sqrt{15},y_3=\sqrt{5/3}$ yields the upper bound $\omega(3)\leq 1/2$. I’m pretty certain that in fact that is equality: $\omega(3) = 1/2$.

As Terry noted, we can’t expect to really see the increase in $\omega$ unless we can get N up into the thousands. With numerical minimization giving answers that are off by a factor of 4/3, though, this isn’t feasible, I think.

What would be of great interest, though, is if there is a g (expressed through y_2, y_3, y_5, …) that is consistently extremal.

• Alec Edgington Says:

I think Tim showed here that the base-3 character-like function mapping 3 to $-1$, which uses $\pm 1$ only, gives $\omega(9^k) \leq {4}{9} k$, which seems to be about twice the values in this list. Perhaps a simple modification using the extra dimension could bring it down by a factor of two? (Even such a modified sequence presumably wouldn’t be extremal, because it’ll always be possible to apply tweaks towards the end of an initial segment, but on this admittedly rather meagre evidence it does seem as if it might be pretty close.)

• Alec Edgington Says:

Sorry, that should have been $\frac{4}{9} k$, of course.

• Alec Edgington Says:

Oops, I was miscalculating there! The real character-like function is actually much closer to these values already. (But we’d need more data to draw any real conclusions.)

19. gowers Says:

Suppose we decide that we want to deal with the EDP for multiplicative functions (taking values in {-1,1}). There seem to be two sorts of examples with low discrepancy. On the one hand there are functions that behave rather like random $\pm 1$ sequences, such as the Liouville function. On the other hand, there are functions that are character-like, where you choose $p$ and the value of $f(p)$, and for all other primes $q$ you set $f(q)=1$ if $q$ is a quadratic residue mod p and -1 otherwise. (Let me try to render the Legendre symbol: $f(q)=\left(\frac qp\right)$.)

These two types of example behave very differently, which makes me wonder whether in order to prove a general result about multiplicative sequences one might need to classify them somehow. In an ultra-ideal world, one might be able to say that either the function is “disordered”, in which case you get the typical square-root growth (or at least pretty fast growth), or it is “ordered”, in which case it has to be somewhat character-like, which allows one to analyse it more explicitly.

The question one then faces is how to do this classification. As a very small step in the right direction, I want to enlarge slightly the class of functions that deserves to be called character-like. The motivation for this came from thinking about what happens to a character-like function if you change it at just one prime. At first I thought that this would mess up the function completely, but I now see that that is not the case at all.

Let us write $\lambda_p$ for the completely multiplicative function that takes $p$ to 1 and $q$ to $\left(\frac qp\right)$ when $(p,q)=1$. Let’s also write $\mu_p$ for the same thing but with $p$ going to -1. Now let’s suppose that we take either $\lambda_p$ or $\mu_p$ and change it by altering the value at one particular prime $q$ and leaving the values at all other primes unchanged. What effect does this have on the function and its partial sums?

First, the effect on the function. What we are doing is multiplying it (pointwise) by the completely multiplicative function $\nu_q$ that takes the value -1 at q and 1 at all other primes. This function deserves to be called character-like as well: it’s just that at numbers coprime to $q$ we take the trivial character rather than the Legendre character. Since it is has a strong periodicity property (that is, it is periodic except at multiples of $q$, which themselves are periodic except … etc.), when we multiply $\nu_q$ by $\lambda_p$ or $\mu_p$ we should get a function that has some kind of periodicity property with period $pq$. And indeed we do. On all numbers coprime to $pq$ we get periodicity (with period $pq$), and more generally if we fix $a$ and $b$ and look at all numbers of the form $p^aq^bm$ such that $(m,pq)=1$, we get a sequence (with gaps) that looks just like the sequence along numbers coprime to $pq$. (Or rather, we get plus or minus that sequence.)

These sets partition $\mathbb{N}$, and at most $\log_pn \log_qn$ of them intersect the set $\{1,2,\dots,n\}$. This tells us that the partial sums grow at worst like $C_pC_q\log_pn\log_qn$, or in other words proportional to $(\log n)^2$, with a constant of proportionality that one expects to go up like $\sqrt{pq}$ or so.

More generally, if we do things like multiplying $\lambda_p$ by $\lambda_q$, we will get other functions that have pretty good log-squared type behaviour. So we do not have a straight dichotomy between log on the one hand and square root on the other.

I have not tried to find a deep connection, but I cannot help being reminded of Littlewood’s problem about minimizing the $\ell_1$ norm of the Fourier transform of the characteristic function of a set of integers of size $n$. There, it turns out (highly non-trivially) that the smallest possible norm is $C\log n$, achieved when the set is an arithmetic progression. As for “random” sets (under various sensible notions of randomness), they give square-root-like behaviour. But again there isn’t a log/root dichotomy because you can take two-dimensional progressions (which from a certain point of view — I won’t go into it right now but some people reading this will know what I mean — are rather similar to products of two one-dimensional progressions) and get bounds of $C(\log n)^2$.

What is known about Littlewood’s problem is actually slightly stronger. Suppose you have a real-valued (actually, maybe it can be complex-valued as well) function $f$ and a set $A$ of integers of size $n$, such that $|f(n)|\geq 1$ for every $n\in A$ and $f(n)=0$ for every $n\notin A$. Then $\|\hat{f}\|_1\geq C\log n$. So there’s a wild idea: can anyone take a completely multiplicative function and build out of it a function that’s large on some set and zero outside that set, and relate the partial sums of the first function to the $\ell_1$ norm of the Fourier transform of the second?

• gowers Says:

I have one small additional remark to follow on from my previous comment, which is that if you take a character-like function such as $\lambda_p$ or $\mu_p$ and multiply it by a product $\prod_{q\in Q}\nu_q$ for a sufficiently sparse set $Q$ of primes, then you can get slow growth rates that are nevertheless faster than any power of $\log n$. The idea is that you start with $\lambda_p$ itself, which gives you logarithmic growth. You let it chug along for a while, but you then multiply by $\nu_{q_1}$ for some large prime $q_1$. At this point the growth becomes proportional to $(\log n)^2$. You then let that chug on for a while (a very long while if you want) before multiplying by $\nu_{q_2}$ for some $q_2$ that’s much bigger than $q_1$, and so on.

20. Jason Dyer Says:

Rough thoughts, but following the polymath spirit:

Just to continue on the danger level definition above (given a sequence bounded by a particular C, consider the sums for particular n for all d that are factors. Call the number of sums equal to abs(C) to be the danger level of that n.) it’s useful to separate into what I’ll call positive danger level (sums equal to C) and negative danger level (sums equal to -C), noting that if for some n both the positive and negative danger levels are greater than zero, then the discrepency must be larger than C at n + k where k is one of the factors of n.

This suggests some sort of pigeonhole principle-like argument where one takes a point n with a high postive danger level and a point m with a high negative danger level such that (letting k_1, …, k_i be the factors of n and l_1, …, l_j be the factors of m) there exists some a and b where n + k_a = m + l_b such that the positive and negative danger levels are greater than 0.

For example, in the C = 1 case, note in the forced sequence + – – + – + + – + how the 5th term has a negative danger level of 2, forcing 6 and 10 to be +, and how the 9th term has a positive danger level of 2, forcing 10 or 12 or 18 to be -. In this case, 10 is the forced term, so that’s where the sequence “breaks” and has both positive and negative danger levels greater than zero, causing a future term (12) to make a discrepency of 2. Unfortunately, this isn’t quite pigeonhole (taken entirely generally, 12 and 18 could be the forced terms off of the 9th term) but it seems plausable to prove somewhere in the sequence the forcing has to happen.

This argument so far has ignored the multiplicative aspects entirely, which when added should cause even more forcings to happen (in hopefully a general enough way for the pigeonhole argument to work out).

• Jason Dyer Says:

Sorry, I meant that there exists some a and b where n + k_a = m + l_b that is forced to be both positive and negative if a discrepancy of C is to be maintained.

• Jason Dyer Says:

Quick example of what I’m meaning: given a C, and a positive danger level function $L^{+}$ and a negative danger level function $L^{-}$, if there exists a prime p with a $L^{+} \geq 2$ such that there is a prime 2p-1 with $L^{-} \geq 2$, then the discrepancy is greater than C.

This is because each prime p has p+1 and 2p as the only forced positions and each prime 2p-1 has 2p and 4p-2 as the only forced positions, so the position 2p would require both a + and a – to stay within the discrepancy.

21. Sune Kristian Jakobsen Says:

I have written a page on the wiki about (an algorithm very close to) Tim’s algorithm for finding long multiplicative sequences with bounded discrepancy, or proving that they don’t exist. I hope it is possible to make all the tricks Tim uses by following the step in the algorithm.

I just realized that I didn’t formulate is as an algorithm but as a set of legal step, but you could turn it into an algorithm: Just write a program that make as many step from “using multiplicity” as possible, then “using bounded discrepancy” and alternate between these two, until it is stuck. The guess, and continue by alternating between using multiplicity and discrepancy.

The reason a wrote the page was that I wanted to discuss what Tim called an easy proof, or more generally: how complicated a proof of this type is. So here are some possible definitions of “simple proof”s of the fact that a given sequence cannot be extended to an infinite sequence of discrepancy C (the sequence could be the empty sequence so we are measuring how complicated it is to prove that no sequence has discrepancy bounded by C):

The proof uses only multiplicativity and then discrepancy (so we don’t use multiplicity after discrepancy, and we don’t guess).
The proof alternate between using multiplicativity and discrepancy (we could measure the complexity by how many times it alternate)
The proof uses all three types of reasoning, but with a limitation on the complexity of the proof used when you guess. Eg:
After a guess you can only use multiplicativity once and then discrepancy before you reach a contradiction.
After a guess you can’t take another guess before you have reach a contradiction.
The proof only used at most G nested guesses.

• Alec Edgington Says:

It would be particularly interesting to know, of all these possibile notions of a ‘simple proof’, which is the weakest that still looks like it may be ‘good enough’. My first thought when Tim raised the question was to use only multiplicativity to get unbounded drifts on intervals, but I don’t think this is good enough: one needs high concentrations of odd-smooth numbers for the proof to work, and these are rare. (To get unbounded drifts, one doesn’t need contiguous blocks of pluses or minuses: one just needs a bit more than half of the same sign on a sufficiently long interval; I’m not quite sure this is impossible, but it still requires a high enough density of odd-smooth numbers in enough places, and feels unlikely.)

Using multiplicativity followed by bounded discrepancy, your first option, looks more promising. I notice that we can easily use it to show, without looking beyond $9$, that if the function has discrepancy $2$ it must take $2$ to $-1$. This is a good start.

• Sune Kristian Jakobsen Says:

“I notice that we can easily use it to show, without looking beyond 9, that if the function has discrepancy 2 it must take 2 to -1.”
I think + + – + – – – + + is an example of a length 9 multiplicative sequence with discrepancy 2, so your proof shouldn’t be possible.

• Alec Edgington Says:

Oops, you’re right, it isn’t!

• Sune Kristian Jakobsen Says:

By the way: Remember, that if you prove $x_2=-1$ by assuming that $x_2=1$ and then reach a contradiction, you are actually guessing (using the definition I wrote on the wiki). I don’t think it is possible to conclude that a multiplicative sequence can’t have discrepancy 2 without guessing. Still the very strict definitions of simple proof is relevant to the problem Tim wrote about.

Here is a proof (without guessing) that a multiplicative sequences can’t have discrepancy 1:

M: 1+, 4+, 9+, (that is: using multiplicativity we get $x_1=1, x_4=1, x_9=1$)
D: 2-, 3-, 10-
M: 5+, 6+

I think that the number of guesses (and nested guesses) needed increases rapidly (exponentially, perhaps) with C.

• gowers Says:

I like Sune’s economical formulation of the algorithm on the wiki. I suppose one additional small remark could be that one should fix an $N$ beyond which one will stop looking for implications.

Out of curiosity I want to see how long it takes to prove that f(2)=-1 if f has discrepancy 2. Sune’s values are completely forced, since the pluses at 1,2,4 imply minuses at 3 and 5 (by discrepancy) and pluses at 8 and 9 (by multiplicativity), and then at the next round a minus at 6 (by multiplicativity), which forces a minus at 7 (by discrepancy). In fact, that minus at 7 didn’t need the minus at 6: from the values up to 5 we know that the partial sum up to 6 is 0 or 2, so discrepancy will force 7 to get a minus regardless of the value at 6.

Continuing, we get minuses at 10, 12 and 14, and pluses at 15 and 16, by multiplicativity. Let’s write what we have so far. I’ll group it in tens for ease of reading. An “at” symbol denotes a value not yet chosen. Note that the partial sum up to 10 is 0.

+1 +1 -1 +1 -1 -1 -1 +1 +1 -1
@ -1 @ -1 +1 +1

This tells us that at least one of 11 and 13 go to +1. I’d like to avoid backtracking if I possibly can, so let me continue filling in the values I know.

+1 +1 -1 +1 -1 -1 -1 +1 +1 -1
@ -1 @ -1 +1 +1 @ +1 @ -1
+1 @ @ -1 +1 @ -1 -1 @ +1

There’s just too much that is not forced there, so it feels as though a guess is needed. (I’m not sure about that unless I check further.) So let’s try sticking in f(11)=+1. We get

+1 +1 -1 +1 -1 -1 -1 +1 +1 -1
+1 -1 @ -1 +1 +1 @ +1 @ -1
+1 +1 @ -1 +1 @ -1 -1 @ +1
@ +1 +1 @ +1 +1 @ @ @ -1

By parity, the partial sum up to 30 is either 0 or 2. By discrepancy, it follows that the values at 31 and 34 must be -1 and the partial sum up to 30 must be 0. By multiplicativity, the value at 17 must also be -1. Also, the partial sum up to 36 is 2, so the value at 37 is -1. Filling these in gives us

+1 +1 -1 +1 -1 -1 -1 +1 +1 -1
+1 -1 @ -1 +1 +1 -1 +1 @ -1
+1 +1 @ -1 +1 @ -1 -1 @ +1
-1 +1 +1 -1 +1 +1 -1 @ @ -1

Unfortunately, I have to do something else. So I leave this comment half finished as a demonstration that to prove that f(2)=-1 one has to look further ahead than 40, and probably has to do at least one backtrack (but I haven’t established that the backtrack is necessary — I may just not have looked forward enough).

• Terence Tao Says:

I started a page for the wiki for Tim’s argument at

http://michaelnielsen.org/polymath1/index.php?title=Human_proof_that_completely_multiplicative_sequences_have_discrepancy_at_least_2

starting from f(2)=+1 and filling in as many values as are forced. Maybe someone can see the simplest way ahead from here. I’ve tried to fix a standardised notation that is easy to work with.

• Terence Tao Says:

I managed to start with f(2)=+1 and end up filling in a sequence of length 100 with discrepancy 2, so one really has to look quite far ahead to get a contradiction. The solution is on the above mentioned page.

I found one useful tool was to divide the integers into “blocks”, with a “cut” between n and n+1 whenever f(1)+…+f(n) was provably equal to zero. Then the sum of the f’s inside each block had to be zero, and the discrepancy within each block had to be at most 2.

Note that whenever f(2n) was known to equal f(2n+1) in a discrepancy 2 sequence, then there must be a cut between 2n and 2n+1, because f(1)+…+f(2n) is even and is thus either 0, 2, or -2, and the latter two possibilities are then excluded. This observation leads to a lot of cuts which simplifies the analysis a lot. (The discussion on the wiki page is not optimised for this, because I didn’t realise this trick until much later.)

• Terence Tao Says:

An amusing observation: if we have f(n)=f(n+1) for a complete multiplicative sequence of discrepancy 2, then f(4n^2+4n)=f(4n^2+4n+1) and so f(1)+…+f(4n^2+4n)=0. Thus, for instance starting from the assumptions of f(2)=+1, we get a cut at 120, 168, 287, and 960. We can also iterate the map n -> 4n^2+4n as many times as we wish, so in principle we have some control on f(n) for arbitrarily large n.
No idea how to use this information though.

• Terence Tao Says:

Ah, here’s one way to use this information. Starting with what one can deduce from f(2)=+1, one has

f(15)=+1, f(16)=+1, f(18)=+1

which shows that f(17), f(19) can’t both be +1. Similarly we have

f(168)=+1, f(169)=+1, f(170)=-f(17), f(171)=f(19)

which shows that -f(17),f(19) can’t both be +1. This gives a somewhat non-trivial proof that f(19) = +1 assuming only f(2)=+1!

• Terence Tao Says:

Er, I meant f(19)=-1, of course.

• Terence Tao Says:

Another unusual deduction (these are computer assisted, by the way – I got a computer to reduce f(n) for all n < 1000 based on the data I know), again starting from the hypothesis f(2)=+1 and discrepancy 2.

Suppose f(59)=+1. Then f(118)=f(120)=f(121)=1 and f(119)=-f(17), so f(17)=+1. Also, f(57)=f(59)=f(60)=1 and f(58)=f(29), hence f(29)=-1.
Next, f(15)+…+f(22) = f(11)+2, hence f(11)=-1; and then f(1)+…+f(13) = f(13)-2, so f(13)=+1. But then f(1)+…+f(23)=f(23)+2, so f(23)=-1. But then f(1)+…+f(29)=-3, contradiction.

Thus we must also have f(59)=-1.

• Terence Tao Says:

Gah. It’s hard to make further progress now… the game board now looks like a random 3SAT. If anything is to come of this, I think one will have to restrict attention to various “islands” of very smooth numbers; the general sea of relations here looks structureless.

• Terence Tao Says:

Suppose a, b solve Pell’s equation 2a^2-1 = b^2. If f(2)=2, then we have f(2a^2) = f(2a^2-1) = +1, and f(2a^2-2) = f(a-1)f(a+1). The four numbers f(2a^2-2),…,f(2a^2+1) can’t all be +1 (for parity reasons), so we conclude that at least one of f( 2a^2+1 ) and f(a-1)f(a+1) must be -1.

Again, don’t know how to use this… but perhaps some more general version of Pell’s equation could be helpful here? Unfortunately, probabilistic heuristics suggest that we are unlikely to get more than two adjacent near-squares by this method.

• Jason Dyer Says:

I started but didn’t finish extending the multiplicative values, so I posted what I put in so far on the talk page.

• Jason Dyer Says:

Quick deduction (I am dropping using f, I hope you don’t mind):

109 is clearly + (otherwise there’s a sequence of 5 -s).

Consider 101 and 103.

Both can’t be + because the sum at 103 would reach 3.

Both can’t be – because then at 105 the sum would reach -3.

Therefore one has to be + and the other -, so the sum is 0 immediately before 105 and 0 immediately before 107.

Since the sum is 0 before 107, 107 must be + (otherwise the sum is -3 at 111).

• Jason Dyer Says:

I have extended deductions up to 159. I was able to use the pair-of-primes-can’t-be-both-the-same deduction again, and then an interesting deduction where since 71 and 73 have to be alternating so do 142 and 146 (locating another spot where the sum is 0).

• Jason Dyer Says:

Some quick checking indicates deductions to the end in the same manner should go smoothly, but I need to take care of all my other work obligations for today so someone else is welcome to finish things off.

• Terence Tao Says:

Ack, I made a sign error halfway through the calculations. This is not a good way to proceed – it’s too unstable. One could imagine having fun with a sort of “Minesweeper” type applet where one makes various assumptions and the computer assistant helpfully propagates the information along the natural numbers using multiplicativity, discrepancy, and cuts until a contradiction is reached and one “blows up”, in which case one records that part of the tree as being eliminated. I wonder if this would be hard to code…

• Jason Dyer Says:

You are meaning when you were finishing the proof? Is anything on the wiki wrong?

• Terence Tao Says:

Well, there are two errors I spotted so far (still on the wiki): firstly, after deducing f(41)=-1 I wrote f(82)=+1 which was a mistake, it should be -1. Also the proof I gave before that f(59)=+1 was incomplete, because it is not immediately contradictory that f(57)+…+f(60) is equal to 4.

But perhaps more seriously, every error discovered earlier in the tree tends to mess up a lot of delicate arguments later down the tree, and updating the game board by hand may also introduce its own errors. I’m thinking that to do this properly, we will need computer assistance (or at least computer verifiability) of some sort. Of course at some point this may devolve into just replicating the computer proof that no completely multiplicative discrepancy 2 sequences exist, so we somehow need to do this in a way that preserves our ability to see patterns and gain intuition as to how such a search might scale to higher discrepancies.

One does have a sense though that this sort of brute-force combinatorial checking pretty soon runs up against a “combinatorial explosion” in which there are not enough short relations between f(p)’s, and one eventually has to solve a general k-SAT problem, which does not look tractable. One may have to look for “islands” of structure in which one can do something, or perhaps work with some averaged statistic of the f(n)’s. (Passing to subsequences no longer seems to help in the completely multiplicative case, one just gets back the original sequence up to a sign change. But perhaps other averaging may help.)

• Jason Dyer Says:

The 82 error is easily fixable, just change the signs of 82 and 83 (and the work we have so far hasn’t used them past that).

I did use f(59)=+1, though. Is the proof fixable or do I need to go back to that point and assume only that 118 and 122 are alternating?

• Jason Dyer Says:

…and then I do a ‘gah’ of my own as I see that your logic at 79 depends on 82, so the fix isn’t as easy as I thought.

• Terence Tao Says:

Yeah, this is the kind of thing that made me realise that maybe doing everything by hand here is not the optimal strategy.

One can imagine though that if we could semi-automate this type of deduction tree process for, say, tackling the full discrepancy 2 problem, we could make it into some sort of online “game” where players could compete to prune as much of the search tree as they can by applying computer-verifable (and computer-recorded) deduction rules. That would be quite an amusing type of proof. (The next step would be to encode this game into a CAPTCHA, and let the spammers solve the problem for us…)

• Terence Tao Says:

One observation here: the notation I have been using has been following a “1SAT” approach, in which I identify all the information available about f(n) for a single value of n: +, -, or ?, and also to identify similar information for partial sums f(1)+…+f(n).

Jason’s arguments indicate that it may be more profitable to take a “2SAT” approach in which one looks at information about pairs f(n),f(m), e.g. that they can’t both be +1, or that they are equal, etc. This information can be displayed by a kind of graph with the integers n, m as vertices and each piece of 2SAT information encoded as an edge. These edges can propagate in various ways but it is combinatorially reasonable to explore them.

The thing to avoid is having to deal with “3SAT” information involving f(n), f(m), f(r) for three different values of n, m, r (e.g. that these three values can’t simultaneously be +1). Trying to chase down the logic here is famously known to be an NP-complete problem, the decision trees grow exponentially fast. One could almost formalise Tim’s notion of an “easy proof” as one which only uses 1SAT and 2SAT type arguments.

[One has to make an exception for identities such as multiplicativity f(nm)=f(n) f(m), which strictly speaking is a 3SAT statement, but one which doesn’t require splitting into cases and so is still combinatorially manageable.]

• Sune Kristian Jakobsen Says:

“Jason’s arguments indicate that it may be more profitable to take a “2SAT” approach in which one looks at information about pairs f(n),f(m), e.g. that they can’t both be +1, or that they are equal, etc.”
I’m not sure I agree (but I haven’t read all you arguments in Human proof … , so I might have missed something). To be more precise, my claim is that Jason’s arguments don’t tell anything, the algorithm described here couldn’t have told us. But of course Jason’s arguments might be easier for a human to find. (I don’t think they are easier for a computer).
Here is a part of the table, where we want to prove that 107 is +:

-|? + ? – – + ? – + 100-109
– – – ? + – + – – + 110-119

Here is how is would have done it: PPS (possible partial sum) at 100 is {0}, at 101: {-1,+1}, 102 {0,2}, 103 {-1,1}, 104 {-2,0}, 105 {-1}, 106 {0}. This tells us that the partial sum at 106 is 0.
PPS at 112: {-2,0,2}, 111: {-1,1}, 110 {0,2}, 109 {1}, 108 {0}, 107 {1}, this tells us that 107 is +.

• Sune Kristian Jakobsen Says:

Now I see why Jason’s arguments might be more powerful. I don’t think I can do this using the algorithm, without guessing.

• Mark Bennet Says:

The case f(2)=f(37)=+1, f(13)=-1 ends with a sequence of five +1s at values 119, 120, 121, 122, 123.

22. obryant Says:

Every multiplicative function has a drift of at least 5.

I’ve put on the wiki a table of prime assignments and (for each assignment) an interval that must have a substantial drift. As a consequence, we now know that every multiplicative function must have a drift of 5.

For the drift 6 computation, my computer crashed twice, my daughter logged me out to tend to her virtual horse, and we had a power outage. Perhaps an infinite sequence of drift 6 is traveling through time to prevent its discovery.

• Alec Edgington Says:

This is interesting. Was there a theoretical or heuristic justification for choosing the cut-off $N= p^2 +1$, or was this just a convenient way to keep the computations within bounds? Do you think that if you increased this cut-off you could get away with searching fewer primes?

• obryant Says:

I initially was going to $2^{18}$, but it was taking too long, and the results for smaller drifts indicated that 3p or 4p was all that was needed, so $p^2+1$ was a way of going “more than enough”. We know of one sequence, though, where it isn’t more than enough, and that’s the mod 9 multiplicative sequence. Not coincidentally, that sequence (and ones that agree with it at most primes) is where the drift 6 computation stalls out.

• obryant Says:

A more interesting collection of data would be this. For each interval, find all assignments for the values at primes that guarantees a large drift on that interval. In particular, which intervals allow you to get large drift with a large proportion of assignments?

For example, the interval [31,32] has drift 2 with probability 1/2 (both 2 and 31 need to be plusses, or both minuses), and that’s as large a probability as can be for an interval of length 2 and a drift of 2.

23. gowers Says:

I have a number of observations and also a lot to do, so let me give one or two and save the rest for later.

First, I wanted to think again about what might happen with an algorithm such as the one we are considering for multiplicative functions, but I wanted to think about the case of largish $C$, so that nothing that happens for small numbers is allowed to be used any more. How might a proof go?

It’s tempting to reason as follows. Suppose the values have been chosen at all primes up to $n$. Then if we pick a number $A(n)$, let us estimate how many values between $nA(n)$ and $n(A(n)+1)$ have been determined by multiplicativity. To do that, we could think about how many have not been determined. A very crude estimate would be to subtract $n/\log n$ for the primes, $n/2\log n$ for numbers of the form $2p$, and so on up to the point where we have chosen the values at the primes. That gives us that the number of points where we can still affect anything is at most $(n/\log n)(1+1/2+\dots+1/A(n))$, or about $n\log(A(n))/\log n$. Therefore, if $A(n)=n^{o(1)}$, we have lots and lots of intervals of unbounded length $1/o(1)$ where all the values are determined.

At first it is hard to believe that we could possibly control the drifts in every single one of these intervals, and one might guess that the behaviour inside them was fairly random, and therefore that one could get a proof that way. But then one thinks of the character-like examples — this is a point that Sune made some time ago — and they seem to ruin all attempts at proofs. The reason is that the drifts in intervals become highly correlated if a function is character-like.

All this convinces me more and more that we are going to be forced to prove a kind of inverse theorem for multiplicative functions, which says that if we cannot get a proof to work by arguing that the function is like a random function, then it has to be character-like (in some possibly generalized sense). When I get the time, I will give a few suggestions of the kind of way that such a statement might be made precise.

24. Sune Kristian Jakobsen Says:

I gave some possible definitions of simple proof, motivated by a comment by Tim where he asks for “the longest multiplicative sequence for which there is no easy proof that it cannot be extended to an infinite sequence of discrepancy 2”. So I want to talk a little about how to show that no easy proofs exist (that a given sequence can’t be extended and still have discrepancy C).

By the way: I think we should have a name for these proofs using the algorithm, so we don’t confuse them with general proofs.

Here is a proof that no infinite multiplicative sequence can have discrepancy 1:

M: 1+, 4+, 9+,
D: 2-, 3-, 10-
M: 5+, 6+

This is minimal in two ways. The first is in the largest number we consider: 10. It is not possible to show that no infinite sequences can have discrepancy 1 by only looking at numbers less than 10. To show this, I only need to find at multiplicative sequence with length 9 and discrepancy 1:
+ – – + – + + – +
The other way, in which this is minimal is in the number of alternation. It is not possible to reach a contradiction by only considering M,D,M:
After considering M first time, we can only know that perfect squares must be send to +.
After M,D, we only know that n is send to + if it is a perfect square and to – if it is on the form (2m)^2-1 or (2m+1)^2+1. So all the number we know must be send to + are 0 or 1 mod 4, and the numbers send to – are 2, 3, or 7 mod 8. Because all the numbers =0 (mod 4) we know are send to + are squares, and therefore have 2 to an even power in its factorization, we can’t reach a contradiction by using only multiplicativity.

This was a proof that we needed some number of alternations between M and D, but how do we proof that we need to guess? We can proof this for the case: Proof that an infinite multiplicative sequence can’t have discrepancy 2.
Using M we know that all perfect squares are send to +. We can’t use M to conclude anymore because no matter what value we assign to any term, we can extend the sequence to a multiplicative sequence: If we assign a +, we simply choose +,+,+,…, and if we try to send n (a non-square) to -, we choose a prime with an odd power in n’s factorization, and send it to -, the rest of the primes is send to +.
And we can’t use D, since no matter what what value we assign a term, we can extend the sequence to a sequence with all partial sums bounded by 2 in absolute value. (I won’t give a proof). This shows that we have to guess.

In general, if we have an infinite sequence over {-,+,?} and we want to show that we can’t reach a contradiction without guessing, we can do it this way: Substitute some of the “?” with “+” or “-” (the result doesn’t have to be reachable by M and D), and show that it doesn’t contain any contradictions, and that neither M not D can give you any more terms.

How do you show that we need nested guesses? or just more than one guess?

• Jason Dyer Says:

When there are multiple sequences generated, one of the popular ways of having more than one is to have two primes between sums of zero, like this:

(sum of zero) . . . prime . . . prime . . . (sum of zero)

such that the primes have to be alternating sign, but it is unknown which is which. The zero-block phenomenon suggests to me that a guess is neessary here if we’re only thinking in terms of guessing in single positions.

What made my deductions above more powerful was I took both primes as a group and tried all 4 possible guesses (+ +, – -, + -, – +) so that it narrowed down to two (+ – and – +) which was sufficient to find where the zero block ended.

25. Terence Tao Says:

I had a quick thought: if we assign values to a completely mult. function randomly, we can try to use the Lovasz local lemma to show that short drifts, at least, are all bounded with nonzero probability, because most of the short drifts are going to be independent of each other. Have not tried to make this quantitative yet…

Another unrelated thought, more of an amusing nature than anything else: a completely multiplicative function g: N \to \{-1,1\} with bounded discrepancy behaves sufficiently like a Dirichlet character that one can mimic a number of classical elements in classical analytic number theory. For instance, one can show that $L(1,g) := \sum_n \frac{g(n)}{n}$ exists and is non-zero (e.g. by computing the sum $\sum_{n,m: nm < N} \frac{g(n)}{\sqrt{nm}}$ using the Dirichlet hyperbola method, taking advantage of the fact that the Dirichlet convolution 1*g is non-negative, and at least 1 on square numbers). This also implies that $\sum_p \frac{g(p)}{p}$ converges. In fact I think one can even mimic the full proof of the prime number theorem in AP to show that $\sum_{p, thus g(p) is unbiased. (It is tempting to conjecture a GRH for bounded discrepancy mult. functions, so that the error would be $O_\epsilon(N^{1/2+\epsilon})$, but presumably the easiest way to prove such a GRH is to show that bounded discrepancy mult. functions do not exist. 🙂

• gowers Says:

If one is concerned just with the existence of the function, then one does pretty well just with partial summation. We have that

$\displaystyle \sum_n g(n)n^{-s}=\sum_n (n^s-(n+1)^s)\sum_{m=1}^ng(m)$

which is bounded above by $C\sum_n|n^{-s}-(n+1)^{-s}|$. If $s$ is real, then this gives us a uniform bound. In general, a reasonably good estimate for $n^{-s}-(n+1)^{-s}$ is $sn^{-(s+1)}$, which will have a sum that converges absolutely when $s$ has real part greater than 1.

It’s tempting to think about the Euler product formula, but that doesn’t obviously equal what it should when $s$ has real part less than 1. But what you write about being able to say things about $\sum_pg(p)/p$ makes me start to wonder whether it is after all possible to do something clever and say that the Euler product formula is valid if interpreted properly.

• Terence Tao Says:

Well, to sum anything that decays slower than $\sum_p g(p)/p$, one would need a zero-free region for $L(s,g) = \sum_n g(n)/n^s$ to the left of s=1. I think there is a good chance that the classical zero-free region for Dirichlet L-functions can be extended to this case (basically anything about characters which doesn’t go through the functional equation should be OK) but clearly there are well-known difficulties in going much beyond this.

• Anonymous Says:

A theorem of Wintner says that for a completely multiplicative function f with |f(n)|=1, the limit of $1/x \sum_{n < x} f(n)$ is zero if and only if $\sum_p |1-f(p)|/p$ diverges.

Alec noted earlier that $\sum x_n n^{-s}$ converges for $Re(s)>0$ if

$\limsup \frac{\log |x_1+ \dots + x_n}{ \log n} =0.$

26. gowers Says:

I think what I was saying about taking a character-like function and changing it at just one prime has some relevance to the question of what can be done with just multiplicativity and small discrepancy. Suppose we take the function that I am now calling $\mu_3$ (multiplicative, takes 3 to -1, and other primes to $\pm 1$ according to whether they are $\pm 1$ mod 3). And suppose we write this function out all the way up to $9^k$ and now consider what we can deduce. I’m hoping I can actually say, in this instance, exactly what follows from multiplicativity and discrepancy. Let’s suppose we are aiming to keep the discrepancy below k+1 (so our first danger point is coming up fairly soon, at $9^k+9^{k-1}+\dots+9+1$).

Multiplicativity will allow us to fill in lots more values, and every value filled in will be given by $\mu_3$, obviously enough. Now by the time we get anywhere near a partial sum of $\pm k$ again, there will have been lots of gaps, so unless I am much mistaken we cannot use the discrepancy condition to get us any more information. So it seems that we are forced to guess. What happens if we guess? Well, if we choose the correct value at some prime $q$ then I’m pretty sure that the same argument will apply: just fill in what you can using multiplicativity, after which you’re stuck once again.

But it’s worse than that. I think even if you fill in the wrong value at $q$, the drifts are all so small that at this stage there is no chance of the algorithm using the discrepancy condition to see that you’ve made the wrong choice. Indeed, it seems to me that one has to make a number of guesses before one will run into direct problems.

So it looks to me as though it might be possible to prove rigorously that no “intelligent brute force” algorithm will be able to avoid a lot of backtracking, which in turn suggests to me that to analyse such an algorithm theoretically (and thereby prove that multiplicative functions have unbounded discrepancy) may be a hopeless task.

27. Terence Tao Says:

Incidentally, the Fourier reduction is quantitative enough that any lower bound of the form

$\sup_n {\Bbb E} |g(1)+\ldots+g(n)|^2 > C^2$

will imply that functions $f: {\Bbb N} \to \{-1,+1\}$ have discrepancy greater than C. In particular if one could get $\omega(N)$ greater than 4 for some N, one could totally eliminate discrepancy 2 sequences. This may eventually be doable, but I don’t think our current combinatorial approach to studying discrepancy of multiplicative functions is there yet.

• Sune Kristian Jakobsen Says:

Would this give us any information on how long a discrepancy 2 sequence could be?