EDP6 — what are the chances of success?

I thought it might be good to have a post where I tried to take stock and assess the chances that Polymath5 will actually result in a solution to the Erdős discrepancy problem. But before I talk about that, I want to make the point, which I have made before, that solving the problem one sets out to solve is not a necessary condition for the attempt to count as worthwhile. There is easily enough material on the blog now for it to be possible to base a publishable paper on it, even if quite a lot of that paper ends up being a combination of a survey about the problem and a report on the interesting phenomena that have become apparent as a result of computations, some of which we understand reasonably well and some of which are still rather mysterious.

I am not saying this in order to hint that I think that it might be time to give up. At the moment it feels to me as though we still have momentum, so my enthusiasm is not flagging. What is mathematical momentum? I might define it as follows: an assault on an open problem has momentum if one’s understanding of the problem is continuing to develop. Right now I feel as though with each passing day I understand things about EDP that I did not understand before — and I presume that applies to everyone else too. And for as long as that is the case, one can never quite be sure that the problem won’t suddenly yield.

After Terry’s reduction of the problem to one about multiplicative functions, we have devoted almost all our attention to them. (Almost, but not quite: Klas and Alec are still trying to find a sequence of discrepancy 2 and length greater than 1124. It looks as though 1124 may be the upper bound though.) At this stage, I think it is fair to say that the main thing we are doing is playing around in multiplicative-function space, trying to get a feel for what it looks like. But this playing around has a serious purpose, as I shall now explain.

In my EDP4 post, I wrote about how there seem to be some very different kinds of multiplicative functions, which have unbounded partial sums for very different kinds of reasons. On the one hand you have functions like the Liouville function $\lambda$, which behaves fairly like a random function and therefore has partial sums that grow like $n^{1/2+o(1)}$ (if you believe the Riemann hypothesis, and at least that fast unconditionally). On the other, there are functions like $\lambda_3$ and $\mu_3$, which have a great deal of additive structure (as well as their multiplicative structure) and give rise to logarithmic growth. Because of this, it felt as though one might be forced to classify multiplicative sequences to some extent. Some of them would be “similar to character-like functions” and would have unbounded partial sums for similar reasons to the reasons that apply to $\lambda_3$ and $\mu_3$, while others would be “not similar to character-like functions”, which would cause them to behave sufficiently randomly to have unbounded partial sums for that reason.

It is still conceivable that some kind of strategy like that could work, but certain difficulties with it have come to light. At one point, I had the following serious anxiety about the approach. The only proof I knew of that the partial sums of the Liouville function $\lambda$ were unbounded went via the Riemann zeta function and some very special properties of $\lambda$. Therefore, in order to prove EDP it looked as though it would be necessary to find an entirely new, and highly generalizable, argument that applied to $\lambda$. And this looked as though it might be a known hard problem in number theory.

Fortunately, Terence Tao was able to supply a much simpler proof that these partial sums were unbounded. His proof relies on the fact that the Dirichlet convolution of $\lambda$ with the constant function 1 is the characteristic function of the squares, which is a very special property of $\lambda$, so Terry’s proof does not look like the massively generalizable argument that one would need, but at least it is reassuringly elementary. It might still be worth trying to find new ways of understanding this fact, though.

However, I have another worry about the strategy of splitting multiplicative functions into those that are character-like and those that are not. It is that I now believe that there are probably multiplicative functions that have no obvious additive structure but that still have slow-growing partial sums. (For now let me say that the partial sums are slow-growing if they are bounded above by $n^{o(1)}$, but I think one might be able to get a small power of $\log n$.) The evidence for this comes from thinking about algorithms for choosing the values at primes in a way that is designed to keep the partial sums small. The obvious algorithm is a greedy one: if the partial sum up to $p-1$ is positive then let $f(p)=-1$, and if it is negative then let $f(p)=1$, and do what you like if it is zero. However, computations suggest that this fails spectacularly. I don’t know if anyone has a clear understanding of why this should be — I certainly don’t.

Equally mysterious is a fascinating discovery of Johan de Jong: that if you change your criterion slightly and choose $f(p)=1$ if and only if the partial sum up to $p-1$ is less than -10, you do far better. I would love to have an explanation for this. I don’t care whether it is rigorous as long as it is convincing.

Those two links are part of a sub-conversation that starts here. I think that by this point is fairly safe to say that the experiments are telling us that multiplicative functions with slow-growing partial sums do not have to be all that structured. It is still possible that if you insist that the growth is at most logarithmic then your function is forced to be similar to a character-like function, but it also seems clear that we are not going to be able to prove big growth when the function is not similar to a character-like function.

To me this suggests that we may have to look for a unified proof after all. I do not feel 100% confident about this, but it is where my own energies are concentrated for the time being.

113 Responses to “EDP6 — what are the chances of success?”

1. gowers Says:

Here I am carrying on from where I left off at the end of the previous run of comments, where I was trying to work out why certain algorithms for producing multiplicative functions behave as they do.

An alternative explanation for the failure of a greedy algorithm to produce a sequence with slowly growing partial sums (and in fact, it seems to do a lot worse than a purely random choice) is that one’s first-order choices (that is, values assigned to primes $p$) lead to second-order choices (that is, values assigned to products $pq$ of two primes) and higher-order, and if a certain kind of oscillation starts (looking a bit like $x\sin(\log x)$) then these higher-order effects can reinforce each other in ways that you don’t want.

And yet, the heuristic argument behind the algorithm suggested that if you have a fairly random looking walk and are allowed to change it in a few places, then you should be able to keep it within much tighter bounds than a normal random walk. The difficulty seems to be to apply some kind of greedy algorithm without destroying the randomness.

Now let’s suppose you have a long interval in which you have chosen $f(p)$ to be 1 with probability $\alpha$. If you pick a random pair $(p,q)$ of primes from this interval, then you will get 1 with probability $\alpha^2+(1-\alpha)^2=1/2+2(\alpha-1/2)^2$, and -1 with probability $1/2-(\alpha-1/2)^2$. This shows that if $\alpha$ is bounded away from 0 and 1, then the higher-order effects disappear exponentially quickly. (Well, it doesn’t quite show that instantly, but they do.)

This suggests that it is possible to have one’s cake and eat it. The first-order effects should be enough to cancel out most of the drift, provided that drift is reasonably random, and the decay of the higher-order effects should be enough to ensure that the drift remains reasonably random. But the question that remains is what balance to strike.

For that reason, I would be interested in the following experiment: try out several different values of $\alpha$ for the probability that $f(p)$ has the opposite sign to $\sum_{m. So far Alec has tried $\alpha=1$ and $\alpha=2/3$. My guess is that the optimal $\alpha$ is not in fact a constant but a function of $p$ that tends to 1/2. The thinking there would be that if you are going for a discrepancy of something like $(\log n)^4$, and if the walk is locally pretty random, then you don’t need to make all that many corrections in order to keep it small. And that argument becomes stronger as $(\log n)^4$ increases, so you let $\alpha$ tend to 1/2.

It doesn’t look easy to optimize over a function, but perhaps one could experiment with a function like $\alpha(p)=1/2+1/\log_2p$ and see what happens. Looking at the runs with a constant probability of 2/3 one does get the impression that they are not looking all that random — there seem to be some long runs of rapid growth. So there is some reason to hope that letting the choices become gradually more random as you go along would help.

Another way of doing this, which I would expect to be more or less equivalent, would be to choose every $k$th prime between $2^k$ and $2^{k+1}$ using the greedy approach, and make all other choices random.

One final remark: I think there’s a chance that something like this could work, but that one would have to go up to a rather large $N$ before it became clear that it had worked. (One needs an $N$ such that $C(\log_2 N)^2$ is clearly smaller than $c\sqrt{N}$. If we take $C=2$ and $c=1/2$, this tells us that $N$ should be a lot bigger than $2^{16}=65536$. So one might want to go up to $10^{10}$ or so. Is that still feasible? If not, then one could try to guess from the shapes of the curves, e.g. looking at them on a logarithmic scale, where power-type behaviour would show up as linearity.

• MArk Bennet Says:

Looking at various plots, would it be of interest to set the next prime to
-sign(sum(f(p)/p)) [counting 1 as a ‘prime’ for this purpose] – the 1/p factor in theory weights the effect of the prime p.

2. Johan de Jong Says:

Random experimental comment: There is a completely multiplicative function whose discrepancy is at most 3 up to 3545. Namely, take \mu_3 and flip signs at the primes 709, 881, 1201, 1697, 1999, 2837, 3323, 3457.

• Thomas Sauvaget Says:

This greatly improves the previous record, you might want to add at least the description on the wiki here http://michaelnielsen.org/polymath1/index.php?title=Multiplicative_sequences#Case_C.3D3

I’d like to ask: did you try many other sign flips? And did you optimize some quantity while searching, or was this really a purely random find as your comment suggests?

• Anonymous Says:

I would be very interested to know the longest multiplicative sequence of discrepancy 3.

• Alec Edgington Says:

Not only does this substantially beat the previous record, but initializing a depth-first search with Johan’s sequence gets us up to 13186.

• Thomas Sauvaget Says:

Alec, that’s great! While Johan is away, a few more comments to put this into perspective: I’ve reproduced and played a bit with this example, and in fact $\mu_3$ without any sign flips has discrepancy 4 at 3545, first reaching C=4 at 820. Doing only the first of Johan’s sign flips (the one at 709) allows to reach C=4 for the first time at 1002. With only the first two flips at 709 and 881 it reaches C=4 for the first time at 1306. I haven’t looked further.

• Thomas Sauvaget Says:

Regarding Alec’s new record I’ve added some comments on the wiki about its sign flips. I remember that Tim had mentionned that non-multiplicative sequences of discrepancy 3 and length greater than 10114 existed so that only “substantially longer” such sequences should impress us, but now we have an example which is both longer than 10114 and multiplicative.

• MArk Bennet Says:

One very obvious feature of the length 13186 sequence is the way in which values are heavily correlated with residues mod 3

Equivalent to 1 mod 3 – value almost certainly +1
Equivalent to -1 mod 3 – value almost certainly -1
Equivalent to 0 mod 3 – value approx random?? – though there is some highly patterned behaviour in the multiples of 3 taken mod 60

This can be seen because Alec has conveniently set out the table in the wiki in lines of length 60.

• gowers Says:

One would of course expect this, given that the sequence is derived from $\mu_3$ and exploits the good properties of that sequence.

Alec, would it be possible to have another display in rows of 81, so that it was easier to see to what extent the sequence deviates from $\mu_3$?

• Thomas Sauvaget Says:

I’ve now created a separated page on the wiki with further analysis http://michaelnielsen.org/polymath1/index.php?title=Discrepancy_3_multiplicative_sequence_of_length_13186
In particular, while there are few flips + to -, there a lot of – to + flips, so much so that about 50% of the primes got their sign flipped! So the deviation from $\mu_3$ is quite large, and is quite evenly distributed too starting already for very low primes.

• Thomas Sauvaget Says:

Oops, I take back the bit about there being many – to + flips. In fact there are none (that looked wrong and it was, silly bug in my program), apologies! So in fact very few sign flips, and the sequence is very close to $\mu_3$.

• Alec Edgington Says:

I’ve changed the wiki page so that it now displays the sequence in 81 columns instead of 60.

• Alec Edgington Says:

By the way, I’ve put on the wiki my C program for performing depth-first search for multiplicative sequences, if anyone wants to play with it.

• Alec Edgington Says:

(Just looked at the program and realized it contains some unnecessary, and memory-consuming, code left over from a previous incarnation — now deleted.)

3. Klas Markström Says:

Here is something that I am curious about. The liouville function is the multiplicative function defined by setting $\lambda(p)=-1$ for all primes $p$ and we know that its partial sums grow as $n^{0.5}$.

What about the function defined to be $(-1)^i$ at the ith prime? This is also non-random but I would expect cancellations from the alternating signs to give fairly low discrepancy.

• Klas Markström Says:

After a bit of thought “fairly low” should probably not be anything stronger than possibly a lower power of $n$ than 0.5.

4. Ernie Croot Says:

My vague, intuitive thoughts about the discrepancy problem and the techniques people have tried so far are that one needs a method that gets away from working only with small prime factors of the numbers being considered. More precisely: From experience with trying to prove things about the distribution on average of smooth numbers over short intervals (such as appears in my paper “Smooth Numbers in Short Intervals” — which by the way was recently improved upon by Kaisa Matomaki), I know that it is not easy to get the type of equidistribution results one would like (e.g. most intervals $I \subseteq \{1,...,n\}$ of length $\log n$, say, have the about the right number of $n^c$-smooth integers) by thinking one can just divide and multiply by very small primes (which preserves smoothness) in the hopes of getting the right sort of mixing behavior. Another approach I know about that involves playing with small prime factors is Hildebrand’s “Theory of Stable Sets” — it gives “lower bound” estimates (i.e. it shows there are lots of pairs of smooth numbers a certain distance apart), but alas there is no “local-to-global phenomenon” one can exploit to pass from there to equidistribution. (Incidentally, you may want to look at Hildebrand’s stable sets — it might give you some new ideas to play with, though I doubt it will ultimately lead anywhere for the discrepancy problem.)

So, what I think is that one will not easily be able to say anything useful about the discrepancy of $\sum_{m \leq n} f(n)$, based mostly on considering the primes up to $\log n$, say. This was the intuition behind that note I posted about three days ago on looking at multiple sequences $late n_{i,j}$, and considering their “large” factors; though, maybe it is not exactly the right way to go.

I would love to be pleasantly surprised. If someone can prove new things about discrepancy using “small prime information”, I will definitely make me rethink what I know about smooth numbers, and I will *definitely* try to see if it can somehow be used to get new results on smooth numbers in short intervals.

And now I must leave polymath for a good long while, in order to get other projects completed…

• Ernie Croot Says:

I guess I mean $\max_{n^{1/2} < N < n} | \sum_{m \leq n} f(m)|$, in place of $\sum_{m \leq n} f(n)$.

• gowers Says:

I suppose a problem we could think about is this: if $f$ is a function taking values in $\{-1,1\}$, and $C$ is a real number, must there exist a HAP $P$ with maximal element $m$ such that the sum along $P$ is at least $C$, and the common difference is a product of primes that are at least $2(\log m)^2$. And there may be other problems of a similar nature, that attempt to force one to consider large primes.

At some point (it would take a while to dig it out) I had a thought that is in a similar spirit to your matrix thing, which was to try to find a collection $H$ of HAPs and some intervals $X_1,\dots,X_k$ with the following properties.

(i) Each progression in $H$ intersects each $X_i$ in a singleton.

(ii) Almost every element of $X_1\cup\dots\cup X_k$ is contained in roughly the same number of HAPs from $H$.

How might one achieve that? Well, one could pick integers $K$, $L$, $M$ and $N$ such that 1 is much smaller than $K$ is much smaller than $L$ is much smaller than $M$ is much smaller than $N$, and one could take the intervals $[NL,NL+K]$, $[(N+1)L,(N+1)L+K]$, … , $[(N+M)L,(N+M)L+K]$.

Now choose a typical integer $x$ belonging to the first interval $[NL,NL+K]$. It will tend to have some prime factors less than $N^{1/\log\log N}$, so it will have quite a lot of small, but not too small, factors. In particular, if we choose all our parameters carefully, I hope that we might manage to show that it will have several factors that are pretty close to $L$. (Quick number-theory question: is it true that almost all numbers of a certain size have several factors reasonably close to some large number $L$?) If that’s the case, then HAPs with such factors have a good chance of intersecting each interval in a singleton.

Why would such a bunch of HAPs be useful to us? Er … I haven’t really thought about this. I see that one could defeat the whole lot by simply choosing plus signs in the first interval, minus signs in the second, and so on. But that would be pretty disastrous for other reasons: we need to choose roughly equal numbers of pluses and minuses in each interval. I think what I’d be hoping is that the arrangement of points and lines would be sufficiently quasirandom for there to be no hope of bounded discrepancy.

Before going any further, it is surely worth looking at some of the more abstract results about discrepancy (e.g. ones about connections between discrepancy and VC dimension) in order to see if there are some general classes of configurations such that if we could find a configuration of the right kind then it would solve the problem.

• Sune Kristian Jakobsen Says:

“must there exist a HAP P with maximal element m such that the sum along P is at least C, and the common difference is a product of primes that are at least $2(\log m)^2$
I’m not sure I understand you correctly. If you don’t allow the difference to be even, +,-,+,- … has bounded discrepancy.

• gowers Says:

You understand me perfectly — it was the question that was stupid. I keep making that silly mistake. It would be good, though, to think of a question that somehow stopped one from thinking about smooth numbers. Perhaps one could insist that the common difference has at least one large prime factor.

5. Various Polymath Projects « Euclidean Ramsey Theory Says:

[…] Polymath Projects By kristalcantwell There is new thread for Polymath5. Also the paper “Density Hales-Jewett and Moser numbers” has been submitted to the […]

6. Johan de Jong Says:

Function field case revisited. I am going to use a little algebraic geometry to say what I want to say here, but I will summarize the meaning at the end of this comment.

Let E be an elliptic curve over a finite field k. In this setting the primes correspond to the closed points x of E. The integers correspond to the effective Cartier divisors D on E. Let \chi : \pi_1(E) –> {+1, -1} be a character which is nontrivial on the geometric fundamental group of E. In this case we set f(x) = \chi(Frob_x) and f(D) = \prod f(x)^{n_x} if the divisor D is D = \sum n_x(x). In this situation the l-adic etale sheaf associated to \chi has zero geometric cohomology groups and we obtain from the Frobenius trace formula that

\sum_{d|n} \sum_{x, deg(x) = d} d f(x)^{n/d} = 0

This in turn means that the Zeta function

L(E, f, s) = \sum_D f(D) |D|^{-s}

is equal to 1! You can also conclude this directly from the vanishing of cohomology if you know Grothendieck’s cohomological interpretation of L-functions. Hence we see that

\sum_{D, deg(D) <= x} f(D) = 1

for every integer x.

OK, so this means that in the function field case we can even find completely multiplicative sequences which are "arithmetic", i.e., they are akin to Dirichlet or Hecke characters, which have bounded discrepancy. The reason for this is that you can find such characters whose L-function has no zero's or poles whatsoever! Note that the function field of E is not a purely transcendental extension of the underlying finite field k and so this is not really completely analogous to the (most interesting) case of multiplicative functions defined on the integers.

In the next three comments I want to ask three questions I have which are vaguely related to this but are about the number field case.

7. Johan de Jong Says:

First question: If K is a number field and \chi is a Hecke character for K (e.g. with conductor 1) with values in {+1, -1} does L(\chi, s) necessarily have a zero s with Re(s) > 0?

8. Johan de Jong Says:

Second question: Assume that f is a completely multiplicative function with values in {+1, -1} defined on the integers with bounded discrepancy. Define L(f, s) = sum_{n >= 0} f(n)n^{-s} for Re(s) > 0 (somebody explained that this was defined I think). Then is it true (or provable) that L(f, s) is never zero for Re(s) > 0? (I really do not know anything about estimates, so…)

9. Johan de Jong Says:

Third question: If is a completely multiplicative function with values in {+1, -1} defined on the integers with bounded discrepancy then do there exist (or can you prove there exist) finitely many primes p_1, …, p_r so that if f_{p_1, …, p_r} is the function obtained from f by flipping signs at p_1, …, p_r then f_{p_1, …, p_r} does NOT have bounded discrepancy?

As usual, sorry if these questions were already posed previously.

• gowers Says:

Have you thought about whether it might be the case that all choices of finitely many primes give infinite discrepancy? In other words, are bounded-discrepancy multiplicative sequences “isolated” in a certain sense?

• Alec Edgington Says:

Sune asked almost the same question here (for the case of a single prime).

10. Alec Edgington Says:

Continuing the analysis of Klas’ set of (now 64) sequences of length 1120 and discrepancy 2, and our earlier search for patterns in long low-discrepancy sequences, which I still find fascinating: let $\theta$ be the sequence

$+1, +1, -1, +1, +1, -1, +1, +1, -1, +1, +1, -1, +1, -1, \ldots$.

(The first value shown is $\theta(0) = +1$.) I do not have a formula for this sequence — that last value rather spoils the regularity. But every one of Klas’ sequences satisfies, up to a global change of sign,

$f(n) = (-1)^{b+c} \theta(a+b-c+3)$

for every $n = 2^a 3^b 5^c$, $4 \leq n \leq 1120$ except for $n = 625 = 5^4$.

• gowers Says:

An obvious point, but let me be explicit about it anyway: although that sequence isn’t periodic, it could still be a Sturmian sequence based on some number that’s close to 1/3.

• Alec Edgington Says:

Yes indeed. But I am getting a little hint that that ‘anomalous’ value (which corresponds to the value of the sequence at 1024) may be part of the reason why one cannot extend the sequence very much beyond here. Specifically, I’m searching for sequences that are constrained to agree (up to a sign) with Klas’ sequences on each of the $250$ sets of the partition I defined here. (The numbers above all belong to the largest set of that partition.) In other words, I’m restricting the search to the 250-dimensional subspace of $\mathbb{F}_2^{1120}$ spanned by the characteristic functions of those sets multiplied by any one of Klas’ sequences. (Incidentally, this is relevant to the good-guy-bad-guy game: it seems that the good guy cannot present to the bad guy a set containing more than one member of any of those sets.)

Anyway, what I observe is that the search (unless it’s initialized with values that I know get it to 1120) gets stuck at 1026: just beyond that ‘anomalous’ value.

So a ‘human’ proof that no infinite sequence has discrepancy 2 might proceed by showing first that such a sequence would have to satisfy the above for the regular (3-periodic) $\theta$, and then obtain a contradiction. But this is still pure speculation.

• Alec Edgington Says:

Incidentally, 23 of Klas’ sequences satisfy the formula for all $n = 2^a 3^b 5^c$, $1 \leq n \leq 1120$.

11. Ian Martin Says:

Following on from Johan’s comment, I tried setting f(p)=1 if and only if the partial sum up to p-1 is less than -2, as opposed to -10. (Obvious aside: this is the smallest possible move in Johan’s direction from the original proposal of setting f(p)=1 iff the partial sum is less than 0, since partial sums to p-1 are even.)

It turns out that for N<10^6, this is enough to keep the partial sums below 70: here is a plot.

• jc Says:

I find the oscillations in this plot truly remarkable (and many of the others from the previous thread, though in those cases the amplitude of the oscillations grew and things “got out of control”). Perhaps I missed it, but is there an explanation for this somewhere?

• gowers Says:

jc, I don’t think we have a full explanation by any means, but one can make a sort of guess for why it might be the case. The rough idea would be that if you react greedily to how things are at the moment and put in quite a lot of plus signs in interval $I$ and quite a lot of minus signs in interval $J$ (say), then you’ll put quite a lot of minus signs in the interval $I+J$. That can have the effect of creating resonances: in parts of $I+J$ the partial sums might well be negative and you would be reinforcing them. The natural scale to look at it on is logarithmic, and indeed we do see that the oscillations are like $\sin(\log x)$ rather than like $\sin x$.

Ian, thanks a lot for that very interesting plot — I could stare at these things for hours. One obvious further experiment to do in this direction is to choose the value of $f$ randomly when the partial sum is zero (or to alternate between 1 and -1, or something along those lines). I have a feeling this has not yet been tried, though some very similar things have. It seems just possible that the slight bias towards -1 was what nudged the original greedy sequence into its out-of-control behaviour.

• gowers Says:

Also, how far would it be reasonable to go with calculations like this? I’d be curious to know how it behaved if one went up to 10^8, for example.

I’d also be interested to know whether it is stable. For instance, suppose you start by setting $f(p)=1$ for all primes up to 30 (or 50, or whatever). Does this algorithm (that is, the one that Ian tried out) then get kicked into bad linear behaviour, or is it powerful enough to overcome its initial handicap and get the walk under control?

• Ian Martin Says:

I fear my current code (Mathematica running on a laptop) won’t be able to get that much further in a reasonable amount of time.

This plot extends the previous one: it gives the partial sums up to just over 2.5 x 10^6. (That takes us past a potentially dangerous long run of composite numbers between the primes 2,010,733 and 2,010,881.) The new max and min are 68 and -76 respectively.

• Ian Martin Says:

On issue of stability, here is a plot of the same algorithm applied after setting f(p)=1 for the first 10 primes, and here is the corresponding plot after setting f(p)=1 for the first 30 primes.

• Alec Edgington Says:

Those last two plots are fascinating: particularly the second, which nicely shows the instability manifesting at regular periods. Among many questions it raises is: for a randomized algorithm, would an instability like this arise with probability one at some point, and so kill off any hope of sublogarithmic growth? Or perhaps the instability would correct itself eventually but not before another one came along…

• gowers Says:

Wow, thanks for those. I have another request, which is to see some of those plots on a logarithmic scale. More precisely, I’d like to see the y axis stay as it is and have the x axis logged, so that the oscillations would no longer slow down, and various phenomena near zero would no longer be squashed up.

I too am fascinated by the apparent stability in at least one of those plots.

12. obryant Says:

This may well be the dopiest post in a long time, but here’s a new proof that there isn’t a infinite discrepancy-1 sequence. Technically, it isn’t a new proof, but it’s at least a new way of thinking of the old proof.

Let D be a finite set of positive integers and L=LCM(D), and let V be the set of divisors of 2L that are also multiples of some element of D. To have discrepancy 1, a function f has to take (2k-1)d and 2kd to different values. We should consider the graph with vertex set V and an edge between (2k-1)d and 2kd for all $d \in D$ with $2kd \leq 2L$. If the graph has an odd cycle, then such an f is impossible!

With D={1,2,3}, we find L=12, and the graph has a 3 cycle: 9, 12, 10, 9.

With D={2,3,5}, we find L=60, and the graph has the 3 cycle: 15,18,20,15.

With D={1,4,5}, we find L=40, and the graph has the 3 cycle: 35, 36, 40, 35.

With D={3,5,7}, the graph is bipartite. In fact, with D={3,5,k}, the graph seems to be bipartite unlesss (unless and only unless) k=2.

Note that we are only using the d-HAPs with d in D.

13. Terence Tao Says:

I had the following outline of an idea for how to reduce to the character-like case.

A Dirichlet character, in addition to having bounded discrepancy, has two other features of note. Firstly, it is completely multiplicative, which implies in particular that there are only finitely many multiplicative dilates of itself. Secondly, it is periodic, which means that there are only finitely many additive translates of itself.

Now, we’ve had for a long time the idea that we could start with a bounded discrepancy function f and improve it by taking multiplicative dilates and taking limits thereof. One can view the Fourier reduction as a kind of realisation of this idea (we use a multiplicative Fourier transform to create multiplicative structure).

But I think one can also improve a bounded discrepancy function by taking _additive_ translates and limits, as follows. Suppose we have a function f (not necessarily multiplicative) of bounded discrepancy. Now take translates $f(n+n_j)$, where the $n_j$ are a sequence which is increasingly divisible in the sense that for every natural number d, one has n_j a multiple of d for sufficiently large j. (e.g. $n_j=j!$ is an example of an increasingly divisible sequence). Observe that $f(n+n_j)$ will have bounded discrepancy on more and more HAPs as $j \to \infty$, and if one takes a subsequence limit one thus recovers a bounded discrepancy function again.

I have therefore this very vague idea of alternately applying multiplicative reductions and additive reductions to keep reducing some sort of discrepancy-type quantity of a sequence, until one reaches an energy plateau, at which point one must have both multiplicative and additive structure – and must thus be character-like.

Step one in this program is the multiplicative Fourier reduction that we already have. Step two would be some sort of additive Fourier reduction, but I don’t yet know what this would be.

• gowers Says:

Here’s an attempt to argue that what you suggest is unlikely to work. I’m playing devil’s advocate here because obviously I hope that it could work. (I briefly toyed myself with the idea of additive reductions, but had not had the idea of using increasingly divisible shifts so always got hung up on the fact that there seemed to be no reason whatsoever for the discrepancies of shifted sequences to be bounded).

The problem I have at the moment is that there seems to be no reason for the limit sequence to have any additive structure. If $m$ is quite a bit smaller than $n$, then I don’t know much about the factorization of $n!+m$, apart from the fact that it shares a lot of factors with $m$, so I don’t really have a way of relating $f(m)$ to $f(n!+m)$. So it feels as though one could get a fairly arbitrary sequence of low-discrepancy functions in this way and not much information about its limit.

In a separate comment I’ll return to the side of the angels.

• gowers Says:

The other day I was idly thinking about EDP and observed that if you have a low-discrepancy multiplicative function on $\mathbb{N}$ then you can extend it to a low-discrepancy multiplicative function on the Stone-Cech compactification of $\mathbb{N}$. But I then couldn’t see any use for this fact (though it has the amusing consequences that if $\alpha$ is an idempotent ultrafilter and $f$ is an arbitrary completely multiplicative function to $\{-1,1\}$, then $f(x)=1$ for $\alpha$-almost every $x$, and that $f(\alpha\beta)=f(\beta\alpha)$ even if $\alpha$ and $\beta$ are arbitrary ultrafilters that do not commute).

Here’s a slightly different (but in fact not that different) idea, prompted by yours above. Let $f$ be a completely multiplicative low-discrepancy function to $\{-1,1\}$ and let $\alpha$ be an idempotent ultrafilter. Define a new function $g(x)=\alpha-\lim_yf(x+y)$ (the limit of $y$ along $\alpha$ of $f(x+y)$). What do we know about $g$?

Well, an easily verifiable property of idempotent ultrafilters is that for any $n$ we have $n|y$ for $\alpha$-almost every $y$. Therefore, as we take the above limit, we can pick some large $n$ and assume that $n!|y$. This implies that $g$ has low discrepancy.

Now the above argument uses nothing like the full strength of the idempotency of $\alpha$, so it leaves one with a tiny hope that $g$ might have more properties. Is it multiplicative? Almost certainly not necessarily, but let me see what happens. We have $g(uv)=\alpha-\lim_yf(uv+y)$, whereas $g(u)g(v)=\alpha-\lim_y\alpha-\lim_zf(u+y)f(v+z)$. Now $f(u+y)f(v+z)=f(uv+vy+uz+yz)$, so we’d want to argue that the limit of that over $y$ and $z$ was the same as the previous limit. I don’t see any instant reason for that.

• Sune Kristian Jakobsen Says:

“I briefly toyed myself with the idea of additive reductions, but had not had the idea of using increasingly divisible shifts so always got hung up on the fact that there seemed to be no reason whatsoever for the discrepancies of shifted sequences to be bounded”
I thought of that as a special case of this comment from 29/12, but I probably should have pointed out this special case. Back then I wondered if EDP is equivalent to the similar problem on $\hat{\mathbb{Z}}$, the profinite completion of $\mathbb{Z}$.

I don’t understand the concept of Stone-Cech compactification to well, so this might be a stupid question: Is there any relation between Stone-Cech compactification and the profinite completion?

• Sune Kristian Jakobsen Says:

Oops, I didn’t read the comment from 29/12 before I linked to it. Now I see that this is not a special case, but it is similar in spirit.

• gowers Says:

A difference between the Stone-Cech compactification and the profinite completion is that addition in the former is not commutative. However, given any set of congruences such that any finite subset of them is simultaneously satisfiable, one can find an ultrafilter that satisfies them all. (Basically, the congruences give you a filter, which you can extend to an ultrafilter.) So maybe there are things you can do with one that you can do with the other.

• gowers Says:

Here’s a wild idea: to use van der Waerden’s theorem. Of course, in a sense the whole point of this problem is that you can’t just apply van der Waerden. The problem is that the progression you get is not homogeneous and therefore tells you almost nothing.

But what if, by means of lots of shifts and limits, one could create a family of bounded-discrepancy sequences, together with one further limit sequence, such that arbitrary APs in the limit sequence would tell you about HAPs in at least one of the family of sequences? Perhaps there is an argument along these lines that fails only if the original sequence is sufficiently character-like. (The reason it would fail in the character-like case is that the family would be too small.)

I haven’t even begun to think about whether anything like this could be done, so probably there is an easy argument that it has no hope of working.

• gowers Says:

It might be worth adding to my earlier “pessimistic” comment that it doesn’t directly engage with Terry’s suggestion, since he is suggesting taking some kind of (additive) Fourier transform rather than passing to a limit of shifts. However, I still worry that there may not be enough (provable) structure there to exploit.

• Terence Tao Says:

It’s true that by taking additive shifts and taking limits, one loses the multiplicative structure. But one gains something interesting instead… a _non-zero_ value of the function at zero. In principle this should decrease the discrepancy by 1, allowing for some sort of induction on discrepancy argument. (Or if one uses the right sort of ultrafilters, one could hope to skip the induction step and jump directly to a “discrepancy = discrepancy+1” type of contradiction.)

(It is at this point that we finally begin to diverge away from the near-counterexamples provided by characters, which will vanish at all sufficiently divisible numbers and so do not provide this decrement by 1. Note also the compatibility with the log divergence exhibited by Sune-type examples.)

At present I can’t quite make this strategy work, due to the distinction between drift and discrepancy, which naively causes me to lose a factor of 2 in the discrepancy in addition to subtracting 1, which of course puts me further behind rather than ahead. However I have some hope of being able to use the Fourier reduction to get some symmetry of f around n_j, which may help resolve this. Note that if the discrepancy of f is D, n_j is increasingly divisible, and f is locally even around n_j (in the sense that for fixed h, f(n_j-h)=f(n_j+h)), then any subsequence limit of f(n+n_j) has discrepancy at most D-1 (compare the HAP partial sums f(d)+…+f(n_j-md) to f(d)+…+f(n_j+md)). This doesn’t work directly when one is locally odd instead, unfortunately, and the Z/2 Fourier reduction I had in mind will give one of the two (or more precisely, a vector-valued combination of both – one is replacing f(-n) and f(n) with the vectors ((f(-n)+f(n))/2, (f(-n)-f(n))/2) and ((f(-n)+f(n))/2, (f(n)-f(-n))/2) respectively, to make a vector-valued function of with the same discrepancy but with an even first component and an odd second component). I’ll keep fiddling with this idea.

• Sune Kristian Jakobsen Says:

“At present I can’t quite make this strategy work, due to the distinction between drift and discrepancy, which naively causes me to lose a factor of 2 in the discrepancy in addition to subtracting 1”
I think you can get rid of the factor 2 by considering upper and lower discrepancy instead of just the discrepancy. Then you can decrease the discrepancy unless the partial sum of f is the same up to all but finitely many of the shifts used in the convergent subsequence of sequences. This might be useful. Unfortunately these shifts must go rapidly to infinity, and I don’t think we have any control over which subsequence we are passing to.

• obryant Says:

Not quite connected, but not entirely unconnected to the function described in this post, which I’ll repeat here.

Every positive integer $k$ has a unique representation in the form $k= \sum_{n=1}^\infty a_n n!$. Coloring $k$ according to whether $\sum_{n=1}^\infty a_n$ is even or odd yields a function with:

For each $d\geq 1$ there is a constant $C_d$ such that for every $\ell\geq 1$

$\left|\sum_{i=1}^{\ell} f(id)\right| < C_d.$

The natural problem here is to think about $C_d$, but unfortunately it grows rather rapidly.

14. gowers Says:

I set out a possible approach to EDP in this comment above. The approach could well fail miserably — I haven’t thought about it hard enough to have much confidence that it is a sensible idea — but for now let me pretend that it doesn’t. It suggests the following related problem. Suppose we have some measurable functions $f_1,\dots,f_n$ defined on $[0,1]$ and taking values in $\{-1,1\}$. Can we find an arithmetic progression $x+d,x+2d,\dots,x+nd$ such that $|f_1(x+d)+\dots+f_n(x+nd)|\geq C$?

Obviously this would be under the assumption that $n$ is sufficiently large in terms of $C$. Equally obviously, the answer is no, since one can just define $f_j(x)$ to be $(-1)^j$ for every $x$. But what if we insist that each $f_j$ averages zero?

As I wrote that I realized that it too was an easily solved case: one can set $f_j(x)=(-1)^j$ if $x\leq 1/2$ and $(-1)^{j+1}$ otherwise. So I still haven’t got the assumption right. What I think I want is this: if you convolve any $f_j$ with an interval of width $\epsilon/n$ then you more or less kill it. In other words, let’s assume that each $f_j$ averages zero, or very nearly zero, on every short interval. What happens then? Will some kind of easy averaging argument (over all $x$ and all small $d$) give us a non-trivial lower bound?

By way of comparison, let’s ask a similar question in $\mathbb{Z}_N$. This time we’ll assume only that $f_1,\dots,f_N$ average (almost) zero and take values in $\{-1,1\}$. What is the expected $L_2$ norm of $T_df_1+T_{2d}f_2+T_{Nd}f_N$, where $T_df(x)=f(x+d)$? I’ve just checked offline and the mean square seems to be $N$. What I mean by that (I say this to clarify what my normalization is) is that a simple averaging argument in $L_2$ shows, if my calculations are correct, that there must be some $x,d$ such that $f_1(x+d)+f_2(x+2d)+\dots+f_N(x+Nd)$ has absolute value at least $\sqrt{N}$.

If that is correct (and I should write it out carefully before claiming to be sure about this) then the answer to the previous problem will be similar in the case that the functions are periodic with very small period, and I don’t believe that the periodicity would be playing a genuine role in the proof.

So here, very tentatively, is a proposal. Find a bunch of HAPs of the kind described in my previous comment and argue (by means of some kind of quasirandomness property) that they imitate this model problem well enough to give us unbounded discrepancy.

Just to be clear, this would be an argument that, if it worked, would work directly on the general problem and make no use of multiplicativity. I feel that it’s on a similar wavelength to Ernie’s suggestion, though his was designed to apply to completely multiplicative functions.

• gowers Says:

My allusion to quasirandomness above was a bit cryptic. Here, in slightly more detail, is what I mean. Given two copies of the interval $\{1,2,\dots,n\}$ one could define a bipartite graph by joining $x$ to $y$ if and only if $x$ and $y$ are close. A “quasirandom imitation” of that graph would be one that behaved like a random collection of edges from it. The hope would be that if we joined $x$ in one interval to $y$ in another interval if and only if both $x$ and $y$ belonged to one of our chosen HAPs, then we would get something quasirandom in this sense. One difficulty, it occurs to me now, is that this graph would necessarily be pretty sparse, since it’s a subset of the graph where you join $x$ to $y$ if and only if $x-y$ divides $x$.

But maybe that’s not too bad. Also, maybe it isn’t necessary to have disjoint intervals. One just takes one big interval $[N,N+M]$ with $N$ much larger than $M$. Then, given $x,y$ in this interval …

Hmm, the problem I have is that the probability that $x-y$ divides $x$ is roughly $(x-y)^{-1}$. (My reasoning here is that you head back from $x$ to zero in steps of $x-y$ and you need to hit zero exactly.) So the number of neighbours of $x$ in an interval of width $M$ is … ah, not quite as bad as I thought. It will be about $\log M$. (That fits with other things we know about how many factors a typical $x$ will have.)

So I have the vague hope that this graph (join $x$ to $y$ if $x-y$ divides $x$) is sufficiently quasirandom that one can do the kind of second-moment argument that appears to work for the easier problem discussed in the comment just above.

• gowers Says:

I think I’ve got a version of these ideas that’s sufficiently precise to be worth wikifying. I’ll do that soon.

• gowers Says:

I have now created a wiki page with a sketch of a new approach to the problem. The good news is that it is reasonably precise. The bad news is that it has a very small chance of working. But even if it fails, it may provoke other ideas of a similar general nature. Ultimately, it is intended as a response to Ernie Croot’s remark, which I agree with, that one should try to leave small primes behind and look a long way out at where behaviour is more typical. There are lots of non-rigorous steps, but it feels as though if they could be justified it would not need one to say anything much about the distribution of smooth numbers.

• Alec Edgington Says:

This sounds like a good step since for the general problem it does seem necessary to think in terms of large primes and forget as much as we can about smooth numbers. Tim, on that page you say that the standard deviation of the number of prime factors of $N$ is $\sqrt{\log n}$; is this a typo for $\sqrt{\log \log N}$, or are you defining $n = \log N$?

• gowers Says:

Alec, I’ll have a look in a moment. But there’s a small screw loose in my brain that keeps making me say log in this context when I mean log log, so it’s almost certainly a typo.

More importantly, I had what a typical backgammon-playing Cambridge mathematician might call a co-Archimedes moment an hour or so ago. While in the bath, I realized that the approach set out on the wiki has a serious flaw, which is that it fails the Sune test. More precisely, if it worked, then it would produce a segment of a HAP of length L on which the drift would be $\sqrt{L}$. But we know that that is false for character-like functions.

However, it may be possible to salvage something from the wreckage by looking at the “proof” and seeing where it breaks down, or rather, by looking at the model and seeing what is unrealistic about it. And that I think I can do. The whole idea of the model is that the function $G(x,y)$ defined to equal 1 if $x-y$ divides $x$ and 0 otherwise should be a quasirandom approximation of the function $G(x,y)=|x-y|^{-1}$. A consequence of this, if it was true, would be that if you took any large set $A$ then the sums $\sum_{x,y\in A}G(x,y)$ and $\sum_{x,y\in A}|x-y|^{-1}$ should be roughly the same. But that is completely untrue. For example, if $A$ is the set of all integers congruent to 1 mod 3, then the first sum is zero and the second is not.

Under those circumstances, what can we hope for? Well, it seems promising that the natural example that shows up the flaw in the argument should be a residue class modulo a small prime. It suggests that it might be possible to make the proof work if one added an extra assumption concerning the function $f$ (something like that it did not have good correlation with any residue class modulo a small prime). And if we had that, then we would have made progress on the non-character-like case of EDP.

I’m busy now, but when I get a moment I’ll add something to the wiki page elaborating on this comment.

• gowers Says:

I have now added to the wiki page about a different approach to EDP an explanation of why the approach doesn’t work, and some speculations about how it might nevertheless be useful for pinning down low-discrepancy functions a bit further. This new, sadder but wiser argument aims to show that all counterexamples to EDP are somewhat character-like in the sense of correlating strongly with some AP with small common difference. Perhaps a lemma like that, combined with some additive Fourier reduction, could get us somewhere.

• Klas Markström Says:

Thinking about the hypergraph which describes EDP give rise to some natural questions.

One class of hypergraphs which trivially have unbounded discrepcancy are the hereditary hypergraphs, i.e. the hypergraphs which have the property that any subset of an edge is an edge.

The hypergraph describing the EDP belongs to another family, the hypergraphs which are closed under taking initial segments. These are hypergraphs with the integers as vertex set and with the property that for any $k$ the first $k$ vertices of an edge also form an edge.

Within this class there are both hypergraphs with bounded and unbounded discrepancy. As an example of the first type consider the hypergraph with all finite intervals containing 1 as edges, for the other take the hypergraph with all subsequences of the integers as edges.

A less “dense” example with unbounded discrepancy can be constructed as follows. Partition the integers into intervals $I_k$, each with length $k$, then for each $k$ construct $2^k$ hyperedges which contain all integers not in $I_k$ and one of the subsets of $I_k$.

So coming back to the EDP-hypergraph we may ask, what are the natural, large subgraphs which have bounded discrepancy? In the other direction, are there some natural small sets of edges which we can add to the EDP-hypergraph which provably give a hypergraph with unbounded discrepancy?

• obryant Says:

If you add in edges for all progressions, you have unbounded discrepancy by Roth, and if you add in the quasi-progressions ($latex \lfloor k \alpha \rfloor, k=1,\dots,L) you get unbounded discrepancy by Beck. 15. Gil Kalai Says: Some thoughts regarding the post and some further comments on this thread. (1) The problem of classifying low-discrepency functions (say with logarithmic growth) with values in +-1 (or even in T) while very interesting looks harder than EDP. Maybe it is possible to come up with a conjecture on the nature of bounded functions f from Z to the complex numbers (with 0 values allowed) with bounded discrepency. (Lets drop the condition for the function to be multiplicative) We know that periodic functions (under some additional simple conditions which force some 0 entries) have bounded discrepency and also that the sum of a bounded number of functions with bounded discrepency have bounded discrepency. Is there some converse, namely that “essentially” these are all the bounded functions from Z to C of bounded discrepency? (2) Is there hope to characterize functions to C with bounded discrepency in the function field case? (3) Regarding the problem of classifying low-discrepency functions with values in +-1, is this a fair description of matters? (a) we have some examples (fairly similar to each other) of logarithmic discrepency. (b) All of those have large correlation with a character. (c) there is some hope that some algorithms will lead to low discrepency sequences of a different nature but there is no strong evidence one way or the other. • gowers Says: I think that’s a fair assessment of where we are at (if you are talking about completely multiplicative functions). • Alec Edgington Says: I find Gil’s first question interesting. Clearly the answer is ‘no’ in general: the space $D$ of bounded-discrepancy sequences with values in $\mathbb{C}$ contains $\ell^1$ as a subspace (indeed, with the obvious ‘discrepancy norm’ on $D$, namely $\Vert x \Vert = \sup_{d,m} \lvert \sum_{i the inclusion map $\ell^1 \rightarrow D$ has norm 1). It might be interesting to consider $D / \ell^1$, though. It does feel as if periodic functions should have some distinguished role to play in this space. • Gil Kalai Says: Alec, the naive question is if every function of bounded discrepency is up to density-zero entries the sum of bounded number of periodic functions. I am not sure if what you said exclude this possibility (but I suppose it does) since I do not understand what is the inclusion map from $\ell_1$ into $D$. (I fail to see the $\ell_1$ subspace in $D$.) • Alec Edgington Says: Gil, I just mean that any sequence in $\ell_1$ (for example, $x_n = 2^{-n}$) has bounded discrepancy, but most of these aren’t sums of periodic functions. Perhaps you mean something different by $D$? 16. Sune Kristian Jakobsen Says: Meta: My RRS reader haven’t been able to read feeds from the wiki’s recent changes since the day before yesterday. Do anyone else have this problem? 17. Alec Edgington Says: A question (maybe for mathoverflow if no-one here knows the answer). Is there a completely-multiplicative function $\pm 1$-valued (or even $\mathbb{T}$-valued) function $f$ such that the (analytic continuation of the) Dirichlet series $\sum_n f(n) n^{-s}$ has all its poles strictly to the left of the imaginary axis? I note, for example, that the function derived from $\mu_3$ has poles at $(2n+1) i \pi / \log 3$ ($n \in \mathbb{Z}$). 18. Thomas Sauvaget Says: Here’s an idea which I think hasn’t been tried out yet (please correct me if I’m wrong). There’s a well-known sequence of intervals of consecutive integers of the form $CS_n:=[n!+2; n!+n]$, of length $n-1$. Let’s focus on the multiplicative case and fix a target discrepancy $C$ (although one could also consider a direct approach to EDP). I’m wondering whether such intervals might force long strings of pluses (or minuses) whatever the multiplicative function one starts with. That is, whether one could obtain an upper bound on length as a function of $C$ using precisely this family of consecutive integers. So, I’m wondering whether the prime factors that are used to construct the integers in some $CS_n$ might for “most of them” be in common with those used in many other $CS_{m_1}$ , …., $CS_{m_k}$,…(perhaps choosing only some clever values for $n$). If so, then maybe the constraints on each factor to be either + or – might imply that some $CS_{m_k}$ would necessarily have a long string of pluses. I’ll investigate this numerically later today (i.e. lists of factors and of constraints). For example, we know that for C=2 the longuest multiplicative sequences have length 246, but could it be that say $CS_7:=[5042; 5047]$ or $CS_8:=[40322; 40328]$ provide an upper bound anyway? (This is a re-post, sorry if this appears twice). • Thomas Sauvaget Says: (I really meant: intervals of consecutive composite integers of course.) • gowers Says: I think this approach is yet another that fails the Sune test: we know that for $\mu_3$ there will not be any long strings of pluses or minuses. • Alec Edgington Says: I think you may have the problem that you have no control over the large prime factors in that interval. For example, in the interval $[40322, 40328]$, as well as the prime factors less than 8, the primes 11, 13, 17, 47, 593, 823, 1613, 20161 and 13441 all enter as factors to odd powers. This gives a lot of freedom in the choice of signs. But perhaps I have missed something. • gowers Says: I suppose that, strictly speaking, it is conceivable that one could prove the existence of a long string of pluses and minuses by assuming that the discrepancy of the sequence was bounded (and aiming for a contradiction). But such a proof would have to be an argument that worked for a bounded-discrepancy sequence and not for $\mu_3$. So, for instance, it would not be enough merely to assume complete multiplicativity. • MArk Bennet Says: The detailed workings so far in low discrepancy cases seem to show that it is having to control the discrepancy in a number of such intervals simultaneously which creates the constraints. There are likely to be more constraints on low-value primes than on higher values, but quite often the patterns which are established deal with the low value issues, and it is higher value primes where the contradictions actually arise. It would take a very long interval for the information within the interval to create mutual constraints between a variety of primes. If the interval is not long enough then it seems to me that the information has to come from ‘outside’. 19. Mark Bennet Says: One of the questions which spins round my mind is whether we are any closer to a good estimate of growth rates. From the multiplicative examples (minimal data) and from the work on discrepancy 2 sequences it seems that it is possible to find sequences of discrepancy C (or multiplicative sequences of discrepancy C) which are longer than current known means of estimating would suggest. In particular we have no means yet of showing whether or not (minimum discrepancy) C is sublogarithmic in (sequence length) N. It is also interesting that good multiplicative examples to date are based on mod 3 behaviour [I got into mod 5 thinking one of my earlier posts]. If there is a persistent dependence on small prime factors in good examples, this might indicate a logarithmic lower limit, which might be provable (eg for the multiplicative case). 20. Gil Kalai Says: The type of “fantasy” argument (not so far away from what was suggested by people already), I would hope for is something like this: You consider a function f and now you look at f(1),…,f(N) where N is a highly divisible number. (So it divides all primes and prime powers less than some M.) The intention in words is this: If a large chunk of f (in terms of l2-norm) can be written as a sum of priodic functions with periods which divides N then we want to decompose f according to these different periods. We want to bound the discrepency from below as a function of how these periods “spread out” leading to the conclusion that small discrepency implies a small period (and hence a lot of 0 values). If a large chunk of f does not behave like periodic with respect to periods that divides N then we want some other argument to show that the discrepency is large. (Maybe even power-large) An offhand (maybe clumsy and wrong) way to express mathematically this intention is as follows: You write the discrete Fourier transform of f and consider only Fourier coefficients $\hat f(t)$ where t divides N. You divide to 3 cases: a) when a substantial$\ell_2$-norm of f is on these coefficients. b) When these coefficients accounts for a negligebale part of the$\ell_2\$ norm of $f$.

In case a) I would like to have some inequality like $D > \sum \hat f^2(t) \log t$, (the sum is over divisors of N) which will say that unless the support is on a bounded number of periods, the discrepency goes to infinity.

(Maybe the weight for t should depends more delicately on its arithmetic of t, like if t=p^s where p is a prime the weight should be like sp rather than s log p).

In case b) I would like a seperate argument that the discrepency goes to infinity (maybe even like $\sqrt n$).

(Remark: of course sums of peridic functions is periodic as well, but it may be useful to consider decomposition to periodic functions with small periods and to base a lower bound on the discrepency on such a decomposition.)

• gowers Says:

I am starting to feel fairly confident that something along the lines of b) can be done. When I get a spare moment, I shall do a little calculation that will either nip this optimism in the bud or vindicate it. I’ll report back on the result.

• Gil Says:

One thing that may come handy and perhaps useful for for a), is a Fourier description of some l-2 measure for the dicrepency, (probably just for HAP of periods at most M) in terms of the Fourier coefficients ( probably just $\hat f(t)$ where t is at most M).

• gowers Says:

OK here’s a fairly simple Fourier calculation that suggests that it certainly ought to be possible to prove something for functions $f$ such that $\hat{f}(\alpha)$ is small when $\alpha$ is close to a rational with small denominator.

To keep things reasonably general, let $H$ be a set of HAPs. We would like to get a lower bound for $\sum_{P\in H}|\sum_{x\in P}f(x)|^2$. We can write this as $\sum_{P\in H}|\langle f,P\rangle|^2=\sum_{P\in H}|\langle\hat{f},\hat{P}\rangle|^2$.

Given that it is easy to calculate the Fourier transforms of HAPs, and given that they will tend to be small away from rationals with small denominator, this should tell us a lot about functions $f$ that satisfy an assumption such as the above. However, I have a feeling that the devil is in the detail here: at the moment it looks as though the entire sum is going to be small, whereas what I wanted was for there to be a “trivial” contribution (which can be seen on the non-Fourier side) that is not cancelled out. It’s not yet obvious where that trivial contribution is coming from on the Fourier side. (In particular, $\hat{f}(0)$ has no reason to be large.) I think the answer will come from a comparison of the above sum with one over a wider class of non-homogeneous APs. Then all one has to do is show that the difference is small.

21. Ian Martin Says:

Here is a plot of the greedy algorithm I tried before, but with the x-axis plotted on a log scale, as requested. It’s really quite striking.

For comparison, the improved Walters example $\mu_{3}$ looks like this (and the original $\lambda_{3}$ like this).

I also put the two up on one webpage here – maybe that makes it easier to compare them against each other.

• gowers Says:

It might be good to get a better estimate, but a crude estimate that I get from the graph is that the multiplicative period (by which I mean what you need to multiply by to get through one oscillation) seems to be about 3/2. I’m not sure whether that lessens the mystery.

I would very much like to understand this example. One thing it might be good to do is take some kind of average in order to smooth it out and try to see if there is some curve that fits it very well. For instance, one might consider replacing f(x) by the average of f(y) for all y such that 3x/4\leq y\leq 4x/3 (or one could take some smoother cutoff by taking a weighted average where the weights decay to zero at the ends of the interval.

What I would hope is that the “true period” of the curve would then jump out, that one could divide by a trigonometric function, and that one would then get some monotonic function that in turn could be guessed. The result might be a conjecture not about the maximum behaviour of the function but about the underlying trend, so to speak. With your knowledge of financial mathematics, you may know exactly what bag of tricks to use.

• Ian Martin Says:

The multiplicative period is also 3/2 on the original greedy algorithm (which sets f(p) equal to 1 iff the partial sum to p-1 is less than zero) – here’s the plot.

• Ian Martin Says:

The standard greedy algorithm* starts out by setting 2, 3, 5, and 7 to -1. I tried “seeding” the algorithm by switching some of these -1s to +1s, but then letting the algorithm do what it wanted subsequently.

Let me represent the seeds with binary numbers:
0 0 0 0 means that 2,3,5,7 are all -1
1 1 1 1 means they’re all +1
0 0 0 1 means that 2,3,5 are -1 and 7 is +1
etc.

My hope was that there might be some obvious pattern in the multiplicative periods of these different sequences that would shed some light on the 3/2 phenomenon.

The results don’t look particularly conclusive, unfortunately:
0000 (3/2)
0001 (3/2)
0010
0011
0100 (5/2)
0101 (5/2)
0110
0111 (5/4)
1000
1001
1010
1011
1100
1101
1110
1111

Towards the bottom of the list, the type of behaviour seen here starts to emerge. The numbers in brackets are my estimates of what the multiplicative period looks like, at least initially. But in most of these cases it’s hard to come up with a reasonable estimate because (it looks as though) different frequencies start to overlap.

* which sets f(p)=1 iff partial sum to p-1 is less than -2.

• Ian Martin Says:

Tim, I tried doing something along the lines you suggested. Given the long list of partial sums, I broke it up into windows of geometrically increasing size—[1, x], [x, x^2], [x^2, x^3], etc.—and took the average over each window, to give a smoothed version looking like this. (I chose a dilation factor x close to 1, in fact equal to 1.03, because the pictures above suggested that there might be some fairly high frequencies to be picked up.) I hit the result with a discrete Fourier transform, which showed a strong peak corresponding to the 3/2 multiplicative period.

Doing the same thing (smoothing followed by discrete FT) with the other sequences, the multiplicative periods look like
0000: 3/2
0001: 3/2
0010: 15/13
0011: 15/13
0100: 2.4 (and maybe 7/6, 3/2)
0101: 2.4 and 1.3
0110: nothing
0111: 1.3
1000: 13/11
1001: 9/5
1010: 5/4
1011: not sure (9/8?)
1100: 8/7
1101: 8/7

I write 2.4 and 1.3 because it’s hard to distinguish between 7/3, 5/2 and 12/5, or between 9/7 and 13/10.

Not sure whether this is going anywhere, having said all that.

Incidentally, if you look at the peaks of the original function (0000) and “divide out” by something that looks like $\sin (2\pi \log n/\log 1.5)$, the result looks to be growing at something like log to base 1.1. But here I really am just eyeballing it…

• Alec Edgington Says:

It may be worth studying how the oscillations in the partial sums of the Liouville function come about. According to wikipedia they arise from the first non-trivial zero of the Riemann zeta function, but I couldn’t find a reference for this.

22. gowers Says:

I’d like to ask a general question. It’s basically come up already, but I’d still like to try to make it as explicit as I can. As Ian’s plots make particularly clear, greedy algorithms tend to lead to good control, but with regular oscillations. And it seems that the oscillations are what stop the control from being even better than it is.

I’m fairly sure that what causes the oscillations is that the after-effects of trying to “correct” the walk near n are felt at 2n, 3n, and so on, and these can accidentally reinforce the walk somewhere else. So the question is whether one can refine the greedy algorithms in some simple way so that they do not cause these oscillations.

But first, let me clear up something that keeps bothering me, and perhaps others. One might think that if $p$ belongs to an interval in which $f$ is predominantly positive, and we correct the walk by setting $f(p)=-1$, then this will not just be the right thing to do at $p$ but it will have beneficial consequences later on, since (if $f(2)=-1$) there will be a negative bias around 2p and we will be making $f(2p)$ positive, and so on.

One thing that’s wrong with that argument is that we are often setting $f(p)=-1$ even when $f$ is predominantly negative near $p$. We do this when the partial sum up to $p-1$ is positive, but the walk is on the way down.

So it could be that the greedy algorithm is a little too greedy, and should start backing off a bit when victory is in its sights. I made one or two guesses about algorithms that would take into account not just the partial sum near $p$ but also the predominant value near $p$, but so far they have not come to much. (Alec tried one or two out for me, with underwhelming results.) But there still seems to be plenty of room for experiment here.

Perhaps one idea would be a superimposition of lots of greedy algorithms at different scales. By that I mean that, given $p$, one would work out for each $k$ the partial sum $f(p-k)+\dots+f(p-1)$, weight it by $k^{-1}$, add up all these quantities, and let $f(p)$ have the opposite sign to the total.

This is yet another stab in the dark, but there is something natural about it that makes me think it might flukily do something good and damp down all those oscillations. If it doesn’t, I still think that there should be some way of damping them.

The motivation for all this is still to find an algorithm that looks as though it gives miraculously good control. If it seemed to be logarithmic I’d be very pleased (as it would almost certainly disprove all sorts of natural conjectures about functions having to be character-like), and if it seemed to be sub-logarithmic then obviously I’d be ecstatic.

• Thomas Sauvaget Says:

I’ve just tried this superimposition idea, but it unfortunately doesn’t seem to have a nice behaviour. I’ve only run it until length 14000 but already got in logscale this plot.

• gowers Says:

In the light of that I want to make doubly clear that the main point of my previous question was not whether that particular idea would give a suitable damping mechanism (though I did have my hopes) but rather whether there is some way of getting rid of the oscillations. Alec’s remark about the Liouville function is interesting in this respect, since I’m getting the impression that we need to have a better understanding of why the oscillations occur if we want to make intelligent guesses about how to get rid of them.

A general idea that could perhaps underlie an algorithm is that if the partial sums are on their way down anyway then there is no need to help them along. That to me suggests an algorithm that looks ahead a bit. For instance, if you’ve chosen the values up to $p-1$, then perhaps you could calculate the partial sum up to 2p, ignoring the places where $f$ is not yet defined, and choose $f(p)$ accordingly.

Here’s an idea that probably won’t work, but I can at least explain the motivation behind it. It’s a different sort of greedy algorithm that keeps the partial sums as small as possible but insists that they are all positive. The way it does this is as follows. It starts by setting $f(p)=1$ for all primes $p$. It then looks at 2 and sees whether it can change $f(2)$ while keeping all partial sums positive. Finding that it can, it changes $f(2)$ to -1. It then can’t change $f(3)$ (or the sum up to 3 would be -1) so it doesn’t. It can change $f(5)$ to -1 so it does that.

It seems possible that this could result in the function $\lambda_3$, in which case it works but doesn’t give anything interestingly new. If that happens, then one could make a few “wrong” choices to start with.

The reason for doing it is that if the algorithm knows that it mustn’t go below zero, then it will be very careful to slow things down if the partial sums seem to be decreasing to zero too fast. In other words, it might lead to a natural damping tendency.

• Thomas Sauvaget Says:

Regarding Alec’s question, a few simple observations. Wikipedia says that the Dirichlet series for the Liouville function is a ratio of zeta values, namely $\sum_{n=1}^{+\infty}\frac{\lambda (n)}{n^s}=\frac{\zeta(2s)}{\zeta(s)}$, so that indeed when $s$ is not a Riemann zero the sum stays finite, meaning that the number of integers that have an odd number of prime factors is comparable to that of those which have an even number. And when $s=\frac{1}{2}+it$ is a nontrivial Riemann zero then the sum diverges, so that one type of integer decomposition dominates, and the Liouville function then correlates well with something like $\sqrt{n}\cos(t)$. Now of course the Liouville function doesn’t give us enough information for EDP, as a number with an even total number of prime factors could still be given value +1 or -1. But nevertheless such a case covers in particular numbers of the form $n=\Pi_ip_i^{2\alpha_i}$ which we do know must be assigned to a +1. So the oscillation coming from nontrivial Riemann zeros in the Liouville partial sums seen on wikipedia may still transfer to EDP. Does that make sense?

As for the numerical aspect, yes I understood many ideas should be tested, for instance I had looked at the forward idea but it didn’t work well. I’ll give a go at your latest “positive constraining” one and try others too, and report the results.

• Alec Edgington Says:

I found this interesting page about the summatory Liouville function:

http://demonstrations.wolfram.com/UsingZetaZerosToComputeASummatoryLiouvilleFunction/

The formula that determines its oscillations is

$\sum_{r

where $\rho_k$ are the zeta zeros on the critical line. This is apparently proved using Perron’s formula, which in turn relies on information about the Dirichlet series defined by the Liouville function. I don’t know if some approach like this could be useful to us.

• gowers Says:

I had a little try at the positive constraining algorithm, just to check that it didn’t give $\lambda_3$. I’m not 100% sure about this, but I think the first place where it differs from $\lambda_3$ is at 37, where it seems to be possible to set this function to equal -1 instead of 1. So it’s just conceivable that this could give us better behaviour than $\lambda_3$, which would be very interesting given that its partial sums remain positive. However, the evidence so far is pretty flimsy, to put it mildly.

• Thomas Sauvaget Says:

I have a question about this: the first values I find up to 12 are: sequence 1+,2-,3+,4+,5-,6-,7+,8-,9+,10+,11-,12+ ;
partial sums:1,0,1,2,1,0,1,0,1,2,1,2. So one could be tempted to pick 13-, but by multiplivativity we have 14-,15- which rules out 13- or else the sum at 15 becomes negative. Could anyone confirm this? If so, I have a code ready which includes the look-ahead up to the next prime, but the resulting discrepancy growth is large.

• gowers Says:

Your values agree with the ones I calculated. (So far I’ve got up to 40.) But I’m not quite sure we’re doing the same thing: the idea I wanted to pursue involves looking ahead infinitely far, so to speak. (In practice it would involve picking a large $N$ and at each stage choosing the smallest prime $p$ that’s bigger than the last prime you changed to -1 and that you can change to -1 without any of the partial sums up to $N$ going negative — after you have put in all the changes at multiples of $p$.)

23. Thomas Sauvaget Says:

I see, so here’s a plot of a short run up to 2500 here. Just to check, the values it finds up to 40 are: 1+ 2- 3+ 4+ 5- 6- 7+ 8- 9+ 10+ ; 11- 12+ 13+ 14- 15- 16+ 17+ 18- 19- 20- ; 21+ 22+ 23- 24- 25+ 26- 27+ 28+ 29+ 30+ ; 31+ 32- 33- 34- 35- 36+ 37- 38+ 39+ 40+ .

I’ve launched a run up to $10^5$ but it loks like it’ll take a while to complete.

• gowers Says:

According to my calculations the value at 17 should be -1. I think that is right, because the function would then agree with $\lambda_3$ as far as 17, so there shouldn’t be any proof that positivity implies that f(17)=1.

24. gowers Says:

I’ve suddenly realized that not only was I unclear about precisely what the positive-constraining algorithm was, but that the rough idea I had in mind doesn’t work. Here, however, is one that does.

The idea is that, as usual, you define $f$ on each prime in turn. Each time you do so, you work out all the consequences that follow just from multiplicity.

We can think of the process as follows. We start with the sequence 1 0 0 0 … At the next stage, we set f(2)=-1 and work out the consequences of that, which gives us the sequence 1 -1 0 1 0 0 0 -1 0 0 … . Note that the partial sums of this sequence are all positive. At the next stage we are obviously forced to choose $f(3)=1$, which gives us a sequence that starts 1 -1 1 1 0 -1 0 -1 1 0 0 1 0 … . At the stage after that, we have to choose f(5)=1, because otherwise the partial sum up to 8 would be negative. (Here I depart from my previous approach, where I could have set f(5)=-1 and compensated for it by setting f(7)=1. But I’m not quite sure whether I have an algorithm that does that kind of compensation systematically.) So now we have a sequence that begins 1 -1 1 1 1 -1 0 -1 1 -1 0 1 0 0 1. I don’t know without going off and doing some calculations, but I think it is probably safe at this point to let f(7) be -1. And so on.

At each stage, it is provably possible to continue the algorithm without backtracking. Indeed, if all the partial sums of the sequence so far are non-negative and you set f(p)=1, then the partial sums along the multiples of p are all non-negative, and all multiples of p previously took the value zero, so no partial sum has been decreased. So the idea is that you try out f(p)=-1. If it leads to a negative partial sum somewhere then you set f(p)=1 and continue.

The unintelligent nature of this algorithm leads me to think that its partial sums may grow pretty fast, but I’m still interested to know, just in case they don’t.

• Mark Bennet Says:

Since the value at p influences 1/p of the later values, I still think 1/p weights are worth exploring in various forms.

• Ian Martin Says:

I had a quick try by hand and think I have managed to get up to 100 using a rather vague approach that I think corresponds what what you were suggesting. The obvious problem is that there is no obvious control on the amount of backtracking that may be required; I proceeded on the assumption that there would be enough room to get out of trouble, and only ever needed one layer of backtracking (on maybe three or four occasions). Here’s what I got:

{1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1,
-1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1,
1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1}

When you say that the rough idea you had doesn’t work, Tim, do you mean that you’ve convinced yourself that the backtracking is an issue, or just that you haven’t convinced yourself that it’s not?

• Thomas Sauvaget Says:

Based on many tests with my programs, it seems that this algorithm like the previous ones depend quite sensitively on precisely which large value $N$ one chooses to sum up to. If I run this new one with $N=50$ it does set f(7) to -1, but with larger values like $N=500$ it now chooses f(7)=+1. Here’s a plot for $N=5482$ which shows oscillations.

• Ian Martin Says:

Sorry, I posted my comment below in the wrong place. And I should clarify that my hand calculations were for the informal algorithm that Tim suggested here.

• gowers Says:

A quick answer to Ian’s question. I was trying to define a greedy algorithm, where by that I mean an algorithm that doesn’t backtrack. But the informal idea I described does require one to backtrack. So I ended up not being quite sure what the algorithm even was.

I get the impression from Thomas’s plot that the genuinely greedy algorithm I then suggested gives, as expected, not a very good result. So perhaps it’s worth going back to the original idea, as Ian was doing above. Here is what I had envisaged. Instead of setting all as yet unspecified values to zero when evaluating future partial sums, one sets them to 1. The problem I have with this is that when I fix the value of $f(p)$, then I will definitely increase some of the later partial sums (because however I do it I’ll change some 1s to -1s and I won’t change any -1s to 1s) so I don’t have any reason to suppose I can keep them all positive. But perhaps some limited backtracking would be enough to deal with this problem when it arises, as Ian suggests. I’m interested to see that the sequence Ian has produced below is better than $\lambda_3$ (as far as it goes).

• Ian Martin Says:

The algorithm I was trying was:

1) Set p = -1 unless this pushes the partial sum below zero;
2) Fill in composite numbers myopically (ie, no look-ahead) until either
2a) you reach a prime (go to step 1); or
2b) the series goes negative, in which case backtrack and change the sign of the last prime p which was set equal to -1 to a +1 (then go to the beginning of step 2 again).

On the down side, there’s no obvious limit to the amount of backtracking that may be required. On the up side, I didn’t get close to getting in trouble when I was generating the first few elements of the sequence.

I’m going to be busy for a while now, so probably won’t get the chance to do anything more systematic on this for a while – in particular I would have liked to check that the hand calculations below can be relied on…

• gowers Says:

I see. I think that gives the same output as the less greedy algorithm I suggested. And it may well be that if only a limited amount of backtracking is needed then the looking forward that I suggested would not save time.

I think the output of this algorithm is the first sequence in the lexicographical order (counting -1 as earlier in the alphabet than 1) with all its partial sums non-negative. So there is a certain greedy flavour to it (in the sense that each time a new prime p comes along, it will set f(p)=-1 unless that is provably not a valid continuation of the sequence so far).

• Ian Martin Says:

Here’s a plot of what the algorithm generates up to 40,000 or so. (If you extend the plot to the right by a thousand or so, setting all primes to +1, then the sequence heads reassuringly upwards, so I don’t think there are lookahead issues.)

• gowers Says:

As usual I’m fascinated every time I see one of these plots. For the function to be an interesting one theoretically, we’d like the growth to be subexponential as a function of $\log n$ (so that it’s subpolynomial as a function of $n$). At the moment, it’s not terribly clear: it looks to me as though it could be exponential, but it could also be quadratic. It might be clearer if one plotted both axes on a logarithmic scale, since then the question would be whether the underlying curve is concave (good news) or not concave (bad news). I’m feeling a little pessimistic though: it looks as though a typical value of the partial sum up to $n$ is about $n^{1/4}$.

• Ian Martin Says:

Here you are. Looks like potentially good news …. I’ll set my computer calculating overnight and hope for a more conclusive plot tomorrow.

• Ian Martin Says:

By the way, the hand calculations I posted earlier were almost right, though there was one mistake (a prime unnecessarily assigned +1) towards the very end.

• gowers Says:

I’m not sure I agree that that log-against-log plot is good news. It looks to me as though it is growing roughly linearly — I don’t see any sign of the gradient levelling off. It still looks like $n^{1/4}$ growth, but perhaps the picture will change when there’s more data.

25. Ian Martin Says:

In light of Thomas’s comment this may be irrelevant, but I might as well put up my hand calculations up to 164:

{1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1,
-1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, 1,
1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
-1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, 1, 1, 1, -1, -1, -1,
-1, 1, -1, 1, 1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1, -1, -1, 1, -1, -1,
1, -1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, -1,
1, 1, -1, 1, 1, -1, -1, -1, -1, 1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1,
-1, -1, 1, -1, 1, 1, -1, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1,
-1, -1, -1, 1, 1}

Thomas, if you put this as input into your program does it go negative even if all subsequent primes are set to 1?