EDP8 — what next?

It’s taken noticeably longer than usual for the number of comments on the previous EDP post to reach 100, so this is perhaps a good moment to think strategically about what we should do. Individual researchers continually have a choice — whether to take a break from the problem and work on other, possibly more fruitful, projects or to tackle that blank page head on and push towards a new level of understanding — and I see no difference with a Polymath project.

I would be interested in the views of others, but my own feeling is that there is still plenty to think about here. There has been a certain amount of talk about Fourier analysis, and that still feels like an insufficiently explored avenue. A good preliminary question, it seems to me, is this. Suppose that {}f is a quasirandom \pm 1-valued function defined on \{1,2,\dots,N\} for some large N, in the sense that all its Fourier coefficients are small. Must there be some HAP along which the sum has absolute value at least C? If so, how quasirandom does {}f need to be? What I like about this question is that I think it should be substantially easier than EDP itself. It could be that a simple calculation would solve it: my attempts so far have failed, but not catastrophically enough to rule out the possibility that they could succeed next time. It also seems a pertinent question, because the functions we know of with very low discrepancy have some very high Fourier coefficients. (I don’t really mean Fourier coefficients, so much as real numbers \alpha such that \sum_{n=1}^n e(\alpha n) has very large absolute value.) Therefore, proving that low discrepancy implies a high Fourier coefficient would be a result in the direction of proving that these examples are essentially the only ones, which would solve the problem.

Even if we knew in advance that there was no hope of solving EDP, there are still some questions that deserve more attention. As far as I can tell, nobody has even attempted to explain why some of our greedy algorithms appear to give a growth rate of n^{1/4} for the partial sums. I would love to see just an informal argument for this.

I am also interested in (without yet having thought much about) a class of examples introduced by Kevin O’Bryant. These are sequences of the form f(n)=(-1)^{\phi(n)}, where \phi(n)=\sum_{i=1}^k\lfloor\alpha_i n\rfloor. Kevin also considers the case where k is infinite but the \alpha_i tend to zero so that the sum is always finite. Sune has pointed out that every sequence is of this form, but also makes the point that this may still be a good way to think about the problem. At the time of writing, I don’t have a lower bound of n^{1/2} even in the case when k=3. If that is not just me being stupid, then it raises the possibility that with an infinite k one might be able to get a growth rate of n^{o(1)} in an interestingly new way. (A first step would be to get n^{c(k)} with a constant c(k) that tends to zero as k tends to infinity.)

Another line of enquiry that Sune and Gil have thought about is to come up with toy problems that may be easier than EDP. One of Sune’s suggestions is interesting in that he has variants for which there appear to be counterexamples. This sheds light on EDP itself, because it shows that any proof of EDP would have to distinguish between the real problem and the variants.

I think it might be a good idea if we made a conscious decision that we would all focus on one aspect of the problem. For instance, we could choose a sub-problem with a nice statement and temporarily think of that as the “real” problem that we’re working on. But it might also be a bad idea: perhaps what we’ve done up to now, just letting people comment and others follow up on the comments if they find them sufficiently interesting, is the most efficient process. Incidentally, if you think you had a pretty good idea that got forgotten about, it’s not a bad idea to repeat it from time to time (as various people have done already, both in this project and in the DHJ project).

Anyhow, if anybody has thoughts about how we should proceed, then I’d be very pleased to hear them. In the absence of any suggestions to the contrary, I’ll think about the Fourier question and about trying to come up with interesting new multiplicative functions with low discrepancy.

While I’m at it, here’s a thought about such functions, which could turn out to be a special case of Kevin’s thought. (It might even be the basic idea that led him to his thought.) Our best examples so far (not including finite ones found by brute force) have been character-like functions, which are based on some prime number p. Could one have something that is a bit like a character-like function but based on an irrational number? The idea would be to take a Sturmian sequence of some kind but to throw in a few zeros. (One way of doing this would be to have two fairly large intervals mod 1 and one small one, and an irrational number \alpha, and make f(n) 1 if \alpha n lies in one of the large intervals, -1 if it lies in the other, and 0 if it lies in the small interval. Then one could do something about the zeros that was somehow based on the original sequence. It might end up being quasimultiplicative rather than multiplicative. I haven’t even remotely checked whether that was a non-crazy idea, but perhaps something vaguely similar could work, and perhaps it might do better than our existing character-like examples. The experimental evidence (by which I mean the 1124 sequences) certainly suggests that there ought to be some theoretical construction that does better than \log_9n. Perhaps a useful exercise would be to write a character-like function in O’Bryant form and then try to generalize from that.

Another little thought, this time in the direction of toy problems, is this. (It may be a special case of one of Sune’s suggestions — I haven’t checked.) Let us say that an infinite discrete set X of positive real numbers is a set of pseudointegers if it is closed under multiplication, and if the growth rate X\cap (0,n] is roughly n. I don’t know how rough it would have to be. One could then ask EDP for X: a HAP would be defined as a set of the form x_iX_r, where X_r is the set of the first r elements of X and x_i is some element of X. It’s just conceivable that one might be able to find an X for which EDP is false, in which case it would tell us that a proof of EDP in \mathbb{N} would have to distinguish between \mathbb{N} and X.

I believe that something like this has been done for the Riemann hypothesis but I can’t remember the details. Terry, if you’re reading this, you were the one who told me about the RH example — it would be great if you or anyone else could remind me what the precise statement is and give me a reference so that I can add them to this post. [It has now been provided: see the first comment below.]

102 Responses to “EDP8 — what next?”

  1. Harald Says:

    Hi –

    This is in answer to your request for a reference to work on the Riemann hypothesis for pseudointegers. You are referring to Beurling´s work.
    See also

    MR2208947 (2006j:11131)
    Diamond, Harold G.(1-IL); Montgomery, Hugh L.(1-MI); Vorhauer, Ulrike M. A.(1-KNTS)
    Beurling primes with large oscillation. (English summary)
    Math. Ann. 334 (2006), no. 1, 1–36.

  2. gowers Says:

    Let me try to think about Kevin-style examples. I find it rather difficult to exact calculations straight on to the computer here, so to start with I’ll just describe the rough idea I have in mind.

    We could look at \mu_3 in a different way as follows. We let \alpha=2/3 and we start by taking the sequence f(n)=(-1)^{\lfloor\alpha n\rfloor}. This goes 1, -1, 1, 1, -1, 1, 1, -1, 1, 1, -1, … We see that what is messing up the partial sums along HAPs is that this sequence is constantly 1 on multiples of 3, so we decide to correct that by using precisely the sequence we started with (or minus it) — and then we iterate.

    What I’d like to know is whether we can do something similar for an irrational \alpha, and whether there is any chance that by some magic we would end up with a multiplicativity property.

    To make it easier to think about, perhaps one should go for a complex sequence. Here’s how one might define it. Start with a function f(n)=e(\alpha n) for some suitable irrational \alpha. The partial sum along the HAP with common difference r will be bounded, but the bound will depend on r and will be worse, the closer r\alpha is to an integer. So we could pick the first r for which we find the bound to be unacceptably bad, choose an irrational \beta (perhaps even the same as \alpha), define a function g(n) to be e(\beta\lfloor n/r\rfloor), and replace f(n) by f(n)g(n). The vague thought behind this is that g is constant on intervals of length r, so with any luck g doesn’t mess up the partial sums along HAPs with common difference less than r (but this is far from a complete argument — it’s by no means clear that we do have this “luck”), but it corrects the problems when the common difference is r.

    Actually, perhaps this is more Morse-like than character-like. In fact, the Morse sequence can be expressed rather easily as a Kevin-type sequence: just take \alpha_i=2^{-i}.

    Just to finish these rambling thoughts, I wonder what happens if we express numbers in Fibonacci base and then take their parities. (I think this may be the same as taking \alpha_i=\phi^i, where \phi=(\sqrt{5}-1)/2.) I may as well give it a quick go. Here are the Fibonacci-base expressions of the first few integers. 1, 2, 3, 1+3, 5, 1+5, 2+5, 8, 1+8, 2+8, 3+8, 1+3+8, 13, 1+13, 2+13, 3+13, 1+3+13, 5+13, 1+5+13, 2+5+13, 21.
    The corresponding sequence is -1, -1, -1, 1, -1, 1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, -1. It seems to behave quite promisingly, but this is much too small a sample to be sure.

    Actually, forget it. It’s obviously just as bad as the Morse sequence and for exactly the same reason, since if n is a large Fibonacci number then the HAP with common difference n+1 gives you a long sequence of 1s.

    What this seems to suggest is that taking one of Kevin’s sequences with the \alpha_i shrinking too geometrically is likely not to be all that good. But maybe there is a chance of something if the ratios of successive terms are not so regular.

    • Sune Kristian Jakobsen Says:

      “We could look at \mu_3 in a different way as follows. We let \alpha=1/3 and we start by taking the sequence f(n)=(-1)^{\lfloor\alpha\rfloor}. This goes 1, -1, 1, 1, -1, 1, 1, -1, 1, 1, -1, …”
      There is no n in your formula for f(n). Do you mean f(n)=(-1)^{\lfloor n\alpha\rfloor} or f(n)=(-1)^{\lfloor (n+1) \alpha\rfloor-\lfloor (n) \alpha\rfloor}?

    • gowers Says:

      I meant the former and have now changed it. But the latter could work too perhaps.

    • Sune Kristian Jakobsen Says:

      But that don’t give 1,-1,1,1,-1,1,…

    • gowers Says:

      Damn, sorry — I meant \alpha=2/3 and have now changed that too.

    • Sune Kristian Jakobsen Says:

      “if n is a large Fibonacci number then the HAP with common difference n+1 gives you a long sequence of 1s”
      I don’t think this works. Eg. 2(n+1) and 3(n+1) will have an odd number of 1s in the Fibonacci base (dots means string of zeros):
      n+1: 00100… 001
      2(n+1): 01001… 010
      3(n+1): 10001… 100
      4(n+1): 10101… 101

      However, if n and m are large Fibonacci-numbers with n much larger than m, then the HAP with difference n+m starts with a lot of 1s:
      n+m: 00100…00100 … 00
      2(n+m): 01001…01001 … 00
      3(n+m): 10001…10001 … 00
      4(n+m): 10101…10101 … 00
      and so on.

  3. Sune Kristian Jakobsen Says:

    Your pseudointeger suggestion is similar to the one I suggested, but it is not a special case. E.g. we could choose X=\mathbb{N}\setminus \{2\}, but this is not possible in the model I suggested.

    It might be, that if you require the set X to have unique factorization, your suggestion is a special case of mine. I want to break the specialization down to two steps (you can take any of the steps without taking the other):

    First you want the pseudointegers to be reals numbers. This implies that if x<y, there is a n\in\mathbb{N} so that x^{2n+1}<y^{2n} but this this not the case in the model I suggested (e.g. this is not the case in my "counterexample" where the sequence of primes grows fast). Your model is what I called "an ordering that can be defined by logarithms". If you want to make your model more general in this respect, you could use infinitesimals.

    Secondly you wanted |X\cap (0,n]| to be roughly n. I think this could be formulated in the model I suggested in this way: By n\in\mathbb{N} I mean the nth smallest element in l_0. Now for fixed k we want |\{i\in l_0|i\leq kn\}| to be linear in n.

    I'm almost sure that I have prove that the "counterexample" I gave here (the second example) has discrepancy 1, but I haven’t wrote down a proof yet. I might do that later this weekend.

  4. Sune Kristian Jakobsen Says:

    Semi-meta (semi because this is not about polymath generally or the way the blog/wiki works, but about the “what next?” question)

    Tim suggested that we discussed what to do now, so I will try to start this discussion.

    I must admit that I’m not as optimistic about the project as I have been, but I was more pessimistic a couple of days ago. It seems that there have been more comments the last couple of day, and now we have something to think about again. (I can’t help noticing that posts asking “should we take a break from this problem” are unlikely to appear when the numbers of comments per day is low, as long as new posts are only made when the number of comments in the last post reach 100.)

    I still think that there is good reason to continue working on this problem. Right now I’m interested in the different toy problems of EDP. In particular, it would be interesting to find a set of “psudointegers” (not necessarily in the way Tim defined them) where we can prove the EDP. As long a everyone has something to study, I think there is a reason to continue.

    I’m not sure it is a good idea to decide that everyone should work on the same subproblem. It seems to me that these kind of decisions are impossible (like deciding that a polymath-project shouldn’t start yet. Sorry!) I think we should just let “evolution” make sure that the most interesting ideas and problems survive.

    Before the project began 9 people said they would be “an active and enthusiastic participant with relevant knowlege”. I said I would be “an interested non-participant” and as I understand Tim didn’t vote. Who are the 9 people, and how many of them are participating/participated?

    • Thomas Sauvaget Says:

      I also had ticked the “interested non-participant” box, as I anticipated that many experts would contribute — it seems that polymath5 is perhaps not known widely enough (or not appealing enough, as in “it’s a well-known impossibly difficult problem” ?).

      As for toy problems, I’m not sure I can contribute much theoretically, but for instance I’d be happy to explore the Liouville function more (there are relevant papers by Cassaigne et al in Acta Arithmetica) — although not until mid-march due to other commitments.

  5. gowers Says:

    Here’s another vague thought that will probably come out hopelessly confused when I try to write it down. Let’s transform the multiplicative problem about pseudointegers into an additive problem (just for convenience). So now we have a set X of non-negative real numbers that includes zero, is closed under addtion, and has the property that the size of X\cap [0,C] is about e^C for each C. An additive function on X is a function f:X\rightarrow\{-1,1\} such that f(x+y)=f(x)f(y) (so it’s an additive function from X to the Abelian group of order 2). We now want to find X and f in such a way that \sum_{x\leq C}f(x) is a bounded function of C.

    Without the growth condition on X, this is trivial: we simply take X to be the set of all non-negative integers and we define f(x) to be (-1)^x.

    As I write, I see that I want to add an extra condition to the definition of X: it should be generated by an additively independent set (of “pseudoprimes”). Otherwise we get strange multiplicities coming in.

    So now a possibly nice preliminary question is whether we can build up a sequence \alpha_1,\alpha_2,\dots of generators in such a way that the pseudo-Liouville function f(\sum_in_i\alpha_i)=(-1)^{\sum_in_i} has bounded partial sums. I have to go for a while but may return to this question unless someone has posted an answer to it by the time I get back.

    • gowers Says:

      It seems that here we have another question that can be investigated algorithmically: we simply choose our \alpha_i and hope to cancel out any largeness in the partial sums. (It might be necessary to make f(\alpha_i)=1 for some i, however, unless one looks ahead.) But the big difference between this and the usual problem is that now we have complete control over where the primes fall: the only constraint is on how dense they are. So it’s reasonable to hope that we might be able to get better bounds for this problem than one can get for EDP itself.

      I don’t really mean “hope” exactly, because if we can find better bounds, then it seems to rule out various proof techniques for EDP (as Sune has already pointed out in a very similar context). But in another sense I do mean “hope” because it would be an interesting result in itself.

      Just to add one remark about greedy algorithms here, one can clearly use a greedy algorithm to produce a sequence that keeps the partial sums small. The issue, however, is whether one can do so in a way that doesn’t cause the growth rate of the function |X\cap [0,C]| to be too large.

    • gowers Says:

      I haven’t actually done it, but I think it ought to be possible to come up with an example where the growth rate of |X\cap [0,C]| is e^C to within a multiplicative constant (which is less accurate than one would like, but a start).

      The rough idea would be as follows. For convenience I’ll go for a growth rate of 2^C instead, as we can always rescale later. As our first generator we choose \alpha_1=1 and set f(\alpha_1)=-1. By convention our sequence includes \alpha_0=0 as well, so the growth rate is exactly what we want it to be at 0 and 1. At 2 we want to have had four numbers in X but so far all we have is three: 0, 1 and 2. So let’s choose \alpha_2 to be some real number that is very slightly smaller than 2 and irrational (so that there are no additive relationships between \alpha_2 and \alpha_1 — I won’t keep mentioning this). Since f(0)+f(1)+f(2)=1-1+1=1, we may as well let f(\alpha_2)=-1.

      So the first few elements that we have so far put into X are 0, 1, \alpha_2, 2, 3, 2\alpha_2, 4, 5, 3\alpha_2, and so on (though eventually the pattern will be broken because eventually r(2-\alpha_2) will be greater than 1). The partial sums therefore go 1, 0, -1, 0, -1, 0, 1, 0, -1, 0, -1, 0, 1, 0, -1, 0, -1, 0, … until the pattern breaks. (This is periodic with period 6 because when we get to 4 we have the same pattern of integers and multiples of \alpha_2, including parity, and there are six numbers less than 4 in the sequence as it is so far.)

      However, this is not good enough because we need to have eight numbers by the time we reach 3, and so far we have only five. The partial sum up to 3 is -1, so we should choose three numbers \alpha_3,\alpha_4,\alpha_5 just below 3 and choose their values to be 1, -1 and 1, respectively.

      Now I claim that at least to start with we can keep the partial sums between -1 and 1 very easily. It will start to get more complicated when the next integer after n\alpha_2 is no longer 2n, but if we make new terms of the sequence close enough to the ones they are trying to “correct”, then difficulties can be made to arise arbitrarily infrequently, which leads me to believe that a greedy algorithm along these lines might well be able to keep the partial sums bounded, and give a growth rate that at every C is between 2^{C-1} and 2^C. And if that is indeed possible, it can probably be refined quite a lot: perhaps even to give a growth rate of 2^C(1+o(1)).

  6. Sune Kristian Jakobsen Says:

    I want to prove that the sequence I defined here over some “pseudointegers” has discrepancy 1. To recap, as pseudointegers I’m using l_0=l_0(\mathbb{N}_0), the set of sequences taking values in the non-negative integers that are 0 from some point, with the following ordering: Define f((a_1,a_2,a_3,\dots))=a_1+2a_2+4a_3+\dots. Now a>b if f(a)>f(b) and if f(a)=f(b) we use the lexicographical ordering. So the smallest elements is (dots means: the rest is 0):
    (0,…), (1,…), (0,1,…), (2,…), (1,1,…), (3,…), (0,0,1,…), (0,2,…), (2,1,…), (4,…)
    The sequence I define on these pseudointegers is simply the pseudo-Liouville function: x_a= (-1)^{a_1+a_2+\dots}.
    First of all, I want to show by induction that for all n>1, \sum_{\{i|f(i)=n\}}x_i=0. This is true for n=2 and 3, so lets assume that it is true for all k<n. We divide the set of \{i\in l_0| f(i)=n\} into two set: The set with i_1\leq 1 (the parity of i_1 is the same as that of n) and the set with $i_1\geq 2$. But we take an i in \{i\in l_0|f(i)=n,i_1\geq 2\}, and subtract 2 from i_1, and we get a element in \{i\in l_0| f(i)=n-2\}. This function is bijective and doesn't change the parity of the sum of the terms in i, so by induction hypotesis \sum_{\{i\in l_0|f(i)=n,i_1\geq 2\}}x_i=\sum_{\{i,f(i)=n-2\}}x_i=0. We have \sum_{\{i\in l_0|f(i)=n,i_1\leq 1\}}x_i=\sum_{\{i|f(i)=\lfloor n/2\rfloor \}}x_i=0 using a left shift. This gives \sum_{\{i,f(i)=n\}}x_i=0 and by induction we get \sum_{\{i,f(i)\leq n\}}x_i=0

    This is not enough: We need to show that the partial sums \sum_{i,f(i)=n,i<j} are also bounded by 1. This can be done in exactly the same way.

    • Sune Kristian Jakobsen Says:

      You should read my next comment and my reply to that, before you read this.

      I will try to answer my first questions for this set of pseudointeger. The question is: For fixed i\in l_0 what is the density of pseudointegers divisible by i?

      Lets try to find the numbers b_n=|\{i\in l_0|f(i)=n\}|. We have b_0=1,b_1=1,b_2=2,b_3=2 and we can get the recursion formulas b_{2n+1}=b_{2n} and b_{2n}=b_{2n-1}+b_n. This gives us for n>1 b_n=2a_{\lfloor n/2\rfloor}, where a_n is A033485 given by a(n) = a(n-1) + a([n/2]), a(1) = 1.

      Lets look at the subset of these pseudointegers that are not divisible by the first prime. We know that if i is not divisible by the first prime then f(i) is even. Furthermore we know that |\{i\in l_0|i_1=0,f(i)=2n\}|=b_n (we get that by using a left shift). So if \frac{b_n}{b_{2n}}\to 0 as n go to infinity (this seems to be a reasonable assumption, I haven’t proved it) the density of pseudointegers divisible by the first prime is 1. From this comment we know (or at least think we know) that if this density of pseudointegers divisible by x is in (0,1) for any x, then it will be in this set for all x. Since this is not the case for the first prime, this can’t be the case for any of the pseudointegers in this set. I think it is possible to show that the density is 1 for all x\in X.

  7. Sune Kristian Jakobsen Says:

    Two questions I think are interesting for any set X of pseudointegers (again let n denote the nth lowest pseudointeger, and let p_n denote the nth lowest pseudoprime):

    Fix k\in X. What is the density of the pseudointegers divisible by k? In particular, does it have a density and if so is it in (0,1) or in {0,1}?
    How fast does \pi^{-1}(n)=\{i\in X|i\leq p_n\} grow?

    It would also be interesting to find out how the two questions relate: If we know that the pseudointegers divisible by 2 have some density in (0,1), what can we say about the second question?

    If X\subset (0,\infty) is a set of pseudointegers as Tim defined them in the post, and it has a set of independent generators and \lim_{n\to\infty}\frac{X\cap (0,n]}{n}\to 1 we can solve the first question. The density of numbers divisible by x\in X is \lim_{n\to\infty}\frac{xX\cap (0,n]}{X\cap (0,n]}=\frac{1}{x}. Here x means the real number, and not the integer that represents its position in X. What can we say about the second question?

    • Sune Kristian Jakobsen Says:

      I forgot the “|”s around the sets in the limits.

    • Sune Kristian Jakobsen Says:

      Lets say that a set X (actually an ordered set) is a set of pseudointegers if it is the set l_0(\mathbb{N}_0) with an ordering such that a\geq b,c\geq d\Rightarrow ac\geq bd and for any a in X the set of elements less than a is finite. Here multiplication of sequences in X is just termwise addition.

      Suppose we have a pseudointeger x\in X such that the the density of pseudointegers divisible by x d_x=\lim_{n\to\infty} \frac{|\{kx\leq n|k\in X\}|}{n} is in (0,1). Lets define the logarithm in base x: For y\in X we define \log_x(y)=sup \{a/b|x^a\leq y^b\}. Now it should be possible to prove some basic formulas:

      \forall y\in X: \log_x(y)\in (0,\infty)
      a/b\geq log_x(y)\Rightarrow x^a\geq y^b
      a/b\leq log_x(y)\Rightarrow x^a\leq y^b

      I want to define the natural logarithm by: \log(y)=-\log(d_x)\log_x(y), where log(d) is the (good old) logarithm of the density of the pseudointegers that are divisible by x. All this gives us a map \varphi: X\to (0,\infty) defined by \varphi(y)=e^{\log(x)}. This map should fulfill:

      \varphi(y)\leq \varphi(z)\Leftrightarrow y\leq z

      But in general this map is not injective, so we don’t have \varphi(y)<\varphi(z)\Leftrightarrow \varphi(y)<\varphi(z). Now I think it is possible to show that the density of pseudointegers divisible by y is given by d_y= 1/\varphi(y). In particular we have d_y\in (0,1) for all y in X and the functions \varphi and log doesn't depend on the x we chose. Furthermore we get \lim_{n\to\infty}\frac{|\{\varphi(x)\leq n|x\in X\}|}{n}=1

      Disclaimer: I haven't proved everything I claimed in this comment, but I think it should be easy.

    • Sune Kristian Jakobsen Says:

      There is something wrong somewhere: I couldn’t make this work for the set of odd numbers. It should be possible to go though the “proof” with this example in mind and find the flaw, but I will go to bed now. I will look at it tomorrow, if no one else has found the flaw.

    • Sune Kristian Jakobsen Says:

      I think I have found the flaw. In general we don’t get \lim_{n\to\infty}\frac{|\{\varphi(x)\leq n|x\in X\}|}{n}=1 but \lim_{n\to\infty}\frac{|\{\varphi(x)\leq n|x\in X\}|}{n}=d for some real number d (can this be >1?). And now d_y=d/\varphi(y). Of course the formula \forall y\in X: \log_x(y)\in (0,\infty) doesn’t hold for y=1, but it should hold for all other y.

      If X is a set of pseudointegers as Tim defined them in the post, then \varphi must be the inclusion map: \log\circ\varphi must be a monotone solution to Cauchy’s functional equation and if X \cap(0,n] should be of almost the same size as \varphi(X)\cap (0,n], there is only one solution that works.

      I will try to state and proof all the remarks from these comments on the wiki later, and give a summary when I’m done.

  8. Kristal Cantwell Says:

    I have been looking at the human proof that a multiplicative sequence with discrepancy less than or equal to 2 has length at most 246. In particular the case where f(2)=-1 and f(5)=1. The output from Uwe Stroinski prover is on the wiki and I think I can use it to deal with this case. f(2) = -1; f(3) = -1; f(5) = 1; f(7) = -1; f(11) = 1; f(13)= -1; f(17)= -1 and f(37) = 1.

    From this we get that a f(220)=f(221)=f(222)=f(224)=f(225)=1. This means that there is a cut at 220 and the sum at 220 is zero and the sum at f(225) is at 3 or more which is greater than two thus the discrepancy is greater than 2 and we have dealt with this case. If this works there is still a lot of work to be done on the wiki in polishing the result but it looks like a human proof of this result is possible.

    • Uwe Stroinski Says:

      That should be correct. The machine didn’t find this because I have excluded “long” partial sums like f[220,225] because of performance issues.

    • Uwe Stroinski Says:

      A ‘polished’ version of case 1 in the Wiki.

      Assume f(2)=1. We derive f(3)=f(5)=f(7)=f(19)=-1 e.g. like in the Wiki.

      Assume f(11)=-1. We need 5 more values.

      * f[12]=-2 implies f(13) = 1,
      * f[32,35]=3+f(17) implies f(17) = -1,
      * f[30,33]=3+f(31) implies f(31) = -1,
      * f[132,135]=3+f(67) implies f(67) = -1;
      * f[22]=-2 implies f(23) = 1

      Since f(62)=f(63)=-1 we have a cut at 62. f(64)=1,f(65)=-1,f(66)=1. Thus f[66]=0. Furthermore f(67)=f(68)=f(69)=-1 implies f[69]=-3. Hence f(11) cannot be -1.

      Assume f(11) = 1. We need 6 more values.

      * f[74,77]=-3+f(37) implies f(37) = 1
      * f[34,37]=3+f(17) implies f(17) = -1
      * f[152,155]=-3-f(31) implies f(31) = 1
      * f[184,187]=-3+f(23) implies f(23) = 1
      * f[23]=2+f(13) implies f(13) = -1
      * f[28]=-2 implies f(29) = 1;

      Now f[37]=3 and thus f(11) cannot be 1. In conclusion: f(2) has to be -1.

  9. Polymath5 « Euclidean Ramsey Theory Says:

    […] By kristalcantwell There is a new thread for Polymath5. The pace seems to be slowing down a […]

  10. Alec Edgington Says:

    Although EDP is evidently very difficult, I feel that we haven’t yet discovered quite why it is so difficult — or at least, I haven’t. For this reason alone I feel like pushing on at least a bit longer, though I must admit that demands of work and family don’t leave me a great deal of time at the moment (I ticked the middle box in the poll, perhaps optimistically).

    It seems that one reason for the difficulty of the problem is that it seeks to relate quite directly the additive and multiplicative structure of the integers; but this hardly feels like sufficient excuse for failure! Why should sums of multiplicative functions be so intractable? Is the difficulty on the level of the Riemann Hypothesis? If so, I’d like to see why, even if only roughly.

    I’m not sure whether we have even come up with a heuristic reason, based on a plausible probabilistic model, why the conjecture should be true or false. I do feel this should be possible. The fact that any such argument would have to contend with the character-like examples seems to be part of the difficulty, however.

    Another question that I think is worth exploring further experimentally is whether there is any non-trivially random way of generating multiplicative functions that exhibit logarithmic growth.

    • Guy Srinivasan Says:

      For any random assignment of +-1 to a finite set of primes, can’t we get logarithmic growth by finding a large prime p such that all of the assigned -1s are quadratic nonresidues mod p and all of the assigned +1s are quadratic residues mod p and assigning the rest of the primes accordingly?

      Or is it hard to find such a p.

    • gowers Says:

      One can do that with the help of quadratic reciprocity and the Chinese remainder theorem, but I think Alec would not want to count that as non-trivially random.

    • Alec Edgington Says:

      It is interesting that that can be done, but Tim is right, I wouldn’t count it as non-trivially random because it amounts to choosing a character-like function modulo a single random prime.

      A less ambiguous question would be: are there uncountably many \pm 1-valued multiplicative functions whose partial sums grow logarithmically? This isn’t quite what I had in mind (for instance, the answer is trivially ‘yes’ if we generalize to \mathbb{T}-valued sequences, whereas the question about randomly-generated sequences should apply there too), but it may be interesting in itself.

      Here is another question in a similar vein. Imagine an infinite game where two players construct a multiplicative sequence by taking turns to assign values to primes. Good guy wants the final sequence to have logarithmic growth; bad guy wants to thwart him. Initially, all primes are assigned zero. Each player can, on his turn, assign \pm 1 to any prime that is still assigned zero. The final sequence is the pointwise limit of the sequences obtained after a finite number of turns. Is there a winning strategy for good guy? What if we allow good guy to assign K primes on his turn (but bad guy only one)?

    • gowers Says:

      That’s a nice question. Here’s an idea that it might perhaps be possible to develop into a proof that bad guy wins. The strategy would be simply to pick the smallest prime that has not had a value assigned and choose a value randomly. Bad guy will then get to choose random values on a set of primes of density 1/2 (not true density, but something along those lines), which ought to be enough to produce square-root growth.

      As it stands, this isn’t a proof, since good guy gets to choose values that depend on earlier values that bad guy has chosen. To understand this dependence, another question might be this: suppose bad guy chooses values at every other prime, and then good guy gets to choose the rest (knowing what bad guy has chosen). Can good guy get logarithmic growth? One could even ask this question in the case where bad guy chooses 1 every single time. What makes this hard for good guy is that the irregularities in the distribution of the primes will probably make it impossible to produce character-like functions. On the other hand, a free choice at every other prime feels as though it might be somehow comparable to a free choice at all primes.

  11. gowers Says:

    Here are a few thoughts about the Fourier question discussed in the post. We want to show that if f is a function from \{1,2,\dots,N\} to \mathbb{C} satisfying certain conditions, the most important two of which are that \sum_x|f(x)|^2=N and \max_x|\hat{f}(\alpha)|\leq cN (for some c that may turn out to have to depend on N), then there exists a HAP P such that |\sum_{x\in P}f(x)|\geq C. This is provided that N is sufficiently large.

    One way of proving such a result is to show that the sum in question is large on average. Since the extra generality could turn out to be helpful, let us consider a weighted average. For each d and m such that dm\leq N, let P_{m,d} be the HAP \{d,2d,\dots,md\}, let w_{m,d} be a non-negative weight, and suppose that the weights sum to 1. Then it will be enough to show that \sum_{m,d}|\sum_{x\in P_{m,d}}f(x)|^2\geq C^2.

    We can write \sum_{x\in P_{m,d}}f(x) as \langle P_{m,d},f\rangle (identifying P_{m,d} with its own characteristic function). If we do this, then we see that by Parseval’s identity it is equal to \langle \hat{P_{m,d}},\hat{f}\rangle. Therefore, the sum we are interested in is \sum_{m,d}w_{m,d}|\langle\hat{P_{m,d}},\hat{f}\rangle|^2, which is \sum_{m,d}w_{m,d}\int_{\alpha,\beta}\hat{f}(\alpha)\overline{\hat{f}(\beta)}\hat{P_{m,d}}(-\alpha)\hat{P_{m,d}}(\beta)d\alpha d\beta. Interchanging summation and integration, we see that this is \int_{\alpha,\beta}\hat{f}(\alpha)\overline{\hat{f}(\beta)}K(\alpha,\beta)d\alpha d\beta, where the kernel K(\alpha,\beta) is equal to \sum_{m,d}w_{m,d}\hat{P_{m,d}}(-\alpha)\hat{P_{m,d}}(\beta).

    So now we are trying to show that \langle \hat{f},K\hat{f}\rangle is unbounded, using certain properties of \hat{f} and K. What are these properties?

    Well, we are assuming that \sum_x|f(x)|^2=N and \max_x|\hat{f}(\alpha)|\leq cN. Since we are trying to find a lower bound for a quantity involving f, it might seem a bit peculiar to use an upper bound on the L_\infty norm of \hat{f}, but this is not completely ridiculous, since in conjunction with our knowledge of \|\hat{f}\|_2 it gives us a lower bound for \|\hat{f}\|_1. So there is something to use here. (Roughly speaking, it tells us that \hat{f} cannot be too concentrated.)

    The kernel K(\alpha,\beta) is obviously positive definite (by which I really mean non-negative definite) since it is a sum of kernels that are trivially positive definite. It’s also Hermitian. So it can be “diagonalized”. We could either discretize everything and do a genuine diagonalization, or apply whatever spectral theorem is appropriate here. Instead of thinking about what that would be, let me simply assume that there is some orthonormal basis (u_i) of eigenvectors such that Ku_i=\lambda_i, where \lambda_i\geq 0.

    So we would like to choose the weights w_{m,d} (or just forget about the weights and give everything equal weight) in such a way that every function \hat{f} with the properties above has an expansion \sum_i\mu_iu_i for which we can show that \sum_i\lambda_i|\mu_i|^2 is at least C^2. This splits naturally into two problems: understand which kernels K have this property, and try to produce such a kernel from the ingredients we have at our disposal.

    Is there some intuitive reason that one hopes would get this to work? Here is one small thought. Let us consider the difference between the sum \sum_{j=1}^mf(jd) and the sum \sum_{j=1}^mf(jd+r) for some r. The first of these sums is \int \hat{f}(\alpha)\hat{P_{m,d}}(\alpha)d\alpha, and the second is \int \hat{f}(\alpha)\hat{P_{m,d}}(\alpha)e(-r\alpha)d\alpha. So the difference is \int\hat{f}(\alpha)\hat{P_{m,d}}(\alpha)(1-e(-r\alpha))d\alpha. Actually, I’ll need to think about this. I was hoping that f wouldn’t be able to tell the difference between P_{m,d} and the shifted P_{m,d}, but I haven’t yet worked out what the right calculation is (if it is even correct at all).

    • gowers Says:

      I forgot to make the point that one reason I am interested in this Fourier question is that it avoids the annoying difficulty of having to use the fact that the function is \pm 1-valued. I am asking the question for a more general class of functions, and do not have to worry about 1, -1, 0, 1, -1, 0, … and functions like that, since they have large Fourier coefficients.

      Arising from the discussion above is a question I find quite interesting, though at the moment I don’t even begin to see how to answer it. Consider the linear map defined on functions f:\{1,2,\dots,n\}\rightarrow\mathbb{R} defined by the formula f\mapsto\sum_P\langle f,P\rangle P, where the sum is over all HAPs. I would like to know what the eigenvectors of this map are like. To make the question easier, I’m happy to allow any reasonable choice of weights w_P (I chose w_P=1 for all P in the expression I wrote). I would also settle for an approximate eigenvector, by which I mean a function f that is roughly proportional to its image. But I should also make clear that I want not just one eigenvector but the whole basis.

      If we had that basis, then we would have an amazing handle on mean-square sums along HAPs. However, the eigenvectors look as though they may be quite strange, because a lot depends on how smooth a number is.

    • Gil Kalai Says:

      That’s a very intersting question. It is still a bit vague for me. Just to make it entirely concrete here are some little questions:

      1) What precisely do you mean by \hat f(\alpha) and what precisely is \alpha.

      2) Do you need to consider all \alphas or just those corresponding to small periods (= perhaps(??) the alphas which are rational number with small denominators).

      3) Does a positive answer (to the question in the first paragraph “we want to “…) even just to +1 -1 valued functions imply EDP?

    • gowers Says:

      Here’s an attempt to answer your questions.

      1) \alpha is a real number between 0 and 1 and \hat{f}(\alpha) is \sum_n f(n)e(\alpha n), where e(x) is shorthand for e^{2\pi i x}.

      2) I don’t know the answer to this. I think it is convenient to consider all periods in calculations, so that one can use things like Parseval and the inversion formula, but it may be that the assumption about f should be the weaker one that \hat{f}(\alpha) is small whenever \alpha is close to a rational with small denominator (which would be nice because it would prove that if a sequence had bounded discrepancy then it would have to have bias on an AP with small common difference).

      3) I’m not sure I fully understand this question, but I’ll answer what I think you are asking. If one could prove a very good result in the direction of what I am asking, then probably the best one could hope for is to show that if the discrepancy is at most C then there is some Fourier coefficient of size at least \eta(C)n. But I don’t see how to get from that to EDP. It seems like a step in the right direction, but it is a lot weaker than being character-like. But perhaps one could combine this with known results to get something stronger. For instance, if we now assume that the function is multiplicative, and if we got linear growth of partial sums along some AP, then we would perhaps be able to apply results of Granville and Soundararajan and somehow mimic the proof that works for character-like functions.

    • gowers Says:

      A reasonable reaction to my comment at the beginning of this thread would be “What is the point of the Fourier analysis?” I wasn’t very clear about that when I wrote the comment, but I think I have clarified it a bit in my mind now.

      First of all, I’d say that the question about diagonalizing the linear map f\mapsto\sum_P\langle f,P\rangle P is the real question here. If we fully understood its eigenvectors and eigenvalues, then it should be possible to start thinking about why every function in the class we are interested in had to have a reasonable projection on to the subspace where the eigenvalues were not too tiny. Now it could be that looking at everything through the lens of the Fourier transform already takes us some way towards the diagonalization. That is, it could be (and there is some plausibility to what I am saying — it’s not just a wild hope) that the kernel K(\alpha,\beta) is already somewhat closer to being diagonal than the matrix we have in physical space. So it could be that finding the diagonalization would be easier if we started with K(\alpha,\beta).

      That said, in physical space we have the advantage of being able to write down an expression that’s easy to understand. Let me do that. The xy entry of the matrix is equal to the number of homogeneous arithmetic progressions that contain both x and y. The common difference d of such a progression has to be a factor of (x,y), and the number of times d counts is the number of multiples of d that belong to the interval [\max\{x,y\},n]. (Here is where allowing weights could be useful: it would allow one to smooth off the nasty boundary effects.)

      That gives us a slightly unpleasant quantity to work with, but we can at least see that it is large when (x,y) has plenty of small factors.

      This seems to be inviting us to define everything in terms of prime factorizations. Perhaps with the right system of weights we could get a matrix that could be explicitly diagonalized.

    • Gil Kalai Says:

      Something quite natural we may try is to try to represent the discrepency along each HAP by some new auxiliary function depending on the original function. For example, we can try the following.

      Define g_r(rd)=f(r)+f(2r)+f(3r)+\dots+f(rd) and g_r(x)=0 if x is not of the form dr. The functions g_r(x) reflext the discrepenct along HAP. The 2-norm of g_r is a measure of the discrepency. If f is completely multiplicative then these functions (up tp spacing and a sign) are in some sense the same.

      The question is: Is there a simple way to express the Fourier coefficients of g_1 and more generally of g_r in terms of the Fourier coefficients of f. (If we cannot find a useful expression perhaps we can at least give some bounds.)

    • Gil Kalai Says:

      And thanks for the answers, Tim.

    • Alec Edgington Says:

      I don’t know if this is anything more than a curiosity, but if one uses the weight function w_{m,d} = \mu(d) (so allowing negative weights, which may defeat the whole point of the exercise), then the (x,y) entry of your matrix is approximately

      (n - \max(x,y)) \frac{\phi((x,y))}{(x,y)}

      when n is large compared to \max(x,y) (one can write down a slightly more complicated exact formula), where \phi is the Euler totient function.

      Perhaps we really want a weight function that decays with m, so that the coefficients behave nicely as n \rightarrow \infty.

  12. gowers Says:

    Here I want to try to pursue the thoughts from this comment.

    First, let me make the observation (made in different guises several times already) that if the partial sums of f are bounded, then \sum_{n=1}^\infty e^{-\alpha n}f(n) is bounded by the same bound for every \alpha > 0. This follows easily from partial summation. So it would be enough to find some \alpha such that that infinite sum is at least C.

    Now in fact a more accurate bound that comes from partial summation is that the infinite sum is at most Ce^{-\alpha}, so it would be sufficient to prove that

    \displaystyle \int_0^\infty|\sum_{n=1}^\infty e^{-\alpha n}f(n)|^2d\alpha > C^2\int_0^\infty e^{-2\alpha}d\alpha,

    which equals C^2/2.

    Expanding out the modulus squared and interchanging summation and integration gives us \sum_{m,n=1}^\infty f(m)f(n)\int_0^\infty e^{-\alpha(m+n)}d\alpha, which equals \sum_{m,n}f(m)f(n)/(m+n).

    Now that was just looking at the partial sums of f, but we also want to look at the partial sums along HAPs. The corresponding expression we get if we sum along multiples of d and take this particular weighted average is (obviously) \sum_{m,n}f(md)f(nd)/(m+n). So we could throw in some weights w_d and obtain a sum, of which we want a lower bound, of \sum_d w_d\sum_{m,n} f(md)f(nd)/(m+n). But now we want to express that in terms of x and y. That is, we change variables so that x=md and y=nd. For each x and y that gives us a term \sum_{d|(x,y)}w_d/(x/d+y/d)=\sum_{d|(x,y)}dw_d/(x+y).

    I find that an encouragingly pleasant expression. One possible option would be to take w_d=1/d, in which case the xy entry of the matrix would be d((x,y))/(x+y), where d((x,y)) is the number of factors of (x,y), or in other words the number of common factors of x and y. Then the sum of the weights would be \log n, so we’d need to beat that with our lower bound.

    I’ve slightly glossed over a problem there, which is that this approach seems more naturally suited to functions defined on all positive integers, but if we do that then we end up with \sum_dw_d=\infty. I don’t immediately see what to do about this.

    • gowers Says:

      I’ve been trying to decide whether it’s better to use the e^{-\alpha n} that I used above or whether n^{-s} is better. It’s a difficult one because e^{-\alpha n} is more natural for additive problems and n^{-s} is more natural for multiplicative functions, but we have to deal with both. Anyhow, let me try to see how the discussion would have differed if I’d used n^{-s}. Then the partial summation bound would have been C.

      After expanding out the modulus and interchanging summation and integration I would have got \sum_{m,n}f(m)f(n)\int_0^\infty (mn)^{-s}ds. The integral equals \int_0^\infty e^{-s(\log m+\log n)}ds=1/(\log m+\log n), so I’d have got \sum_{m,n}f(m)f(n)/(\log m+\log n). Looking ahead, that means the matrix element at (x,y) would have been \sum_{d|(x,y)}w_d/(\log(x/d)+\log(y/d)), which doesn’t simplify as nicely, though one can of course replace the denominator by \log x+\log y-2\log d.

      I have to go, but I think I may have more to say on this later.

  13. Kristal Cantwell Says:

    “Even if we knew in advance that there was no hope of solving EDP, there are still some questions that deserve more attention. As far as I can tell, nobody has even
    attempted to explain why some of our greedy algorithms appear to give a growth
    rate of n^{1/4} for the partial sums. I would love to see just an informal
    argument for this.”

    If for the integers 1 to 2n a greedy algorithm is used to minimize the
    discrepancy then in the region n to 2n the choice of sign of any prime has no effects because the next multiple of the prime is outside the region. So the primes in this region would have a damping effect. I am not sure what this effect would be but I wonder if it would account for the above phenomena or for that matter what numerically the damping effect of these primes would be.

  14. gowers Says:

    Here’s a question that probably has an easy answer: what happens if we change EDP by requiring the common difference of the HAP to be a prime power?

    We have already observed that we can find bounded discrepancy if we insist on a prime common difference: the sequence 1,1,-1,-1,1,1,-1,-1,… is an example with discrepancy 2. However, this is bad on multiples of 4. If we try to correct it on all powers of 2 in the most obvious way then we will end up with something like the Morse sequence, which will almost certainly fail to be an example. (A sufficient condition for it to fail is that there are infinitely many prime powers of the form 2^n+1, but that is far from necessary.)

    It seems to me that there are unlikely to be enough constraints on a sequence to stop a greedy algorithm doing pretty well, but I’ve only just come up with the question so I’m not sure how to go about this. Note that going for a multiplicative function is not such a great idea because you need the partial sums to be bounded, just as before, and therefore you need a counterexample to the stronger EDP rather than this weak version.

    The reason I thought of this question was that I was wondering about a good system of weights w_d in the expression that occurs near the end of this comment. It occurred to me that taking w_d=\Lambda(d)/d (where \Lambda is the von Mangoldt function) would give rise to a nice expression, since \sum_{d|m}\Lambda(d)=\log m. However, if that turned out to be useful it would I think prove that one could get unbounded discrepancy with prime power differences (since \Lambda is zero except on prime powers) and there’s no point in trying to prove that if it is false.

    Alec, these thoughts were partly a response to your suggestion of taking w_d=\mu(d) — the fact that \mu is so often negative is indeed a problem for what I was trying to do, so this was an attempt to come up with the nicest non-negative arithmetical function I could think of. So I’m now quite interested in the matrix with xy entry equal to \log((x,y))/(x+y).

    • Sune Kristian Jakobsen Says:

      I don’t know if you can use this answer, but EDP fails over \mathbb{T} if we only look at HAPs with primepower difference. E.g. take x_n=e^{2\pi i n/6}.

    • Mark Bennet Says:

      The Morse sequence is well investigated, but it struck me that the related sequence which allocates +1 to numbers with an even number of zeros in their binary representation, and -1 when the number of zeros is odd might have some interestingly different properties. It allocates the prime 2 the value -1, and 2 behaves multiplicatively (f(2n)=-f(n)).

      This may have been explored already, but I was wondering how it performed on the prime power HAPs.

    • gowers Says:

      Sune, I don’t see how to convert your example into a \pm 1 example. Indeed, in the \pm 1 case it is clear that no periodic example can work, because if m is the period and 2^k is the largest power of 2 that divides m, then the sequence along the HAP with common difference 2^k is periodic with period r=m/2^k, which is odd. This means that the growth rate of the partial sums along this HAP is linear (with gradient at least 1/r).

    • gowers Says:

      Thinking about it further, I realize that I don’t have an answer to the even simpler question of whether EDP holds if you insist on a common difference that is either a prime or a power of 2. My instinct would definitely be to try to find a counterexample, but the proof that it cannot be periodic still works — indeed, it works just for powers of 2 — so it might be that while counterexamples exist, it is difficult to find an explicit one (as opposed to producing one using a greedy algorithm, say).

      If by some miracle EDP were to hold for this smaller set of common differences, then it might turn out to be easier to prove the stronger result, since one would have much more control on how different HAPs intersected. Who knows, perhaps the best growth rate for EDP is \log n and the best for this version is \log\log n. If so, then it would be pretty hard to discover anything experimentally. I might see if I can work out by hand how long a sequence you can get with discrepancy at most 1 in this version.

    • gowers Says:

      That last question turned out to have a boring answer: a discrepancy of at most 1 along HAPs with common difference a power of 2 forces the sequence to be the Morse sequence, which then goes wrong at 12 if you look at multiples of 3.

    • Gil Says:

      If I vaguely understand you correctly then the weights will imply some result of the form:

      For every sequence f(1),f(2),…f(N)
      \sum _d w_d C_d is large (namely tending to infinity when N tends to infinity) where C_d = 1/N \sum_{n=1}^N (f(1)+...+f(n))^2.

      And you asked what can we say about the support of the weights. This would imply how much can restrict the differences in the HAP so that bounded discrepency is already impossible. (E.g., are prime powers suffice.)

      Another related question is this: It looks that when we start with one sequence we can replace it by other sequence (based on blocks depending on the first sequence) so that the discrepency for HAP with “small periods” is forced to be small. So another question is what do such block constructions imply, or suggest, regarding the weights w_d.

    • Sune Kristian Jakobsen Says:

      An even weaker version of EDP: Use only HAPs where the difference is either 3 or a power of 2.

    • Sune Kristian Jakobsen Says:

      This turned out to be easy: Let the even terms be the Morse-sequence. Now we know we have bounded discrepancy for the powers of 2 >1. We can choose value at an odd multiple of 3, 3(2n-1) to be minus the value at 3(2n). Now we also have bounded discrepancy for the HAP with difference 3. Since we have bounded discrepancy along 2,4,6,… and have only fixed the value at 1/3rd of the odd numbers, we can also keep the partial sums bounded.

    • gowers Says:

      What happens if one takes the Morse sequence and repeats every term three times? Obviously the HAP with common difference 3 is OK. The HAP with common difference 2 goes 1 -1 -1 -1 1 1 -1 1 1 1 -1 -1 … In other words, it replaces each 1 in the Morse sequence by 1 -1 -1 and each -1 by -1 1 1. I haven’t checked, but my guess is that other powers of 2 behave similarly.

      In fact, thinking about it for a few seconds longer, I’m pretty sure that it’s true in general that for each power of 2 you get a block x of length 3 such that the sequence along that HAP is x -x -x x -x x x -x -x x x -x etc.

      So here’s a question in the opposite direction. Let’s suppose that d_1,d_2,\dots is a sufficiently fast-growing sequence. Does it follow that we can find a sequence of \pm 1 with bounded discrepancy with respect to HAPs with common difference d_i for some i?

      A first case would be where d_n\geq\alpha^n for some \alpha>1. A spirit-of-Erdős conjecture (with the difference that I haven’t thought about it enough to rule out that it is a stupid problem) could be that there is always a counterexample if \sum_n d_n^{-1}<\infty. That is carefully designed not to conflict with the 2^n-or-prime question.

    • gowers Says:

      Just seen your example after typing in mine. I prefer yours …

    • gowers Says:

      Maybe your general technique does the 2^n-or-prime question as well. One could take the Morse sequence at multiples of 8, or something like that, and then … hmm, maybe I don’t see how to make this work. The rough idea was supposed to be that it would then be fairly easy to deal with HAPs with odd common difference. But I don’t actually see why that would be easy, because the example that works for odd HAPs, namely 1, -1, 1, -1, … would keep getting disrupted by the Morse bits.

    • Sune Kristian Jakobsen Says:

      “A first case would be where d_n\geq\alpha^n for some \alpha>1. A spirit-of-Erdős conjecture (with the difference that I haven’t thought about it enough to rule out that it is a stupid problem) could be that there is always a counterexample if \sum_n d_n^{-1}<\infty. That is carefully designed not to conflict with the 2^n-or-prime question."

      I think this is false if EDP is true. Notice that if we look at if we have a sequence with bounded discrepancy along the HAPs with difference resp. d, 2d, 3d, … nd, we can pass down to the HAP with difference d and get a sequence with bounded discrepancy along the HAPs with difference 1,2,3,…n. So assume that your conjecture is true. Choose a very fast growing sequences, a_1,a_2,a_3,\dots. If the sequence grows fast enough, we should be able to find a sequence with bounded discrepancy along HAPs with difference a_1,a_2,2a_2,a_3,2a_3,3a_3,\dots. Passing down to the HAP with difference a_n gives you a sequence with bounded discrepancy along HAPs with difference 1,2,\dots n. Compactness gives an counterexample to EDP.

  15. gowers Says:

    To try to get some intuition about what the spectral decomposition of a matrix such as A_{xy}=\log((x,y))/(x+y) might conceivably be like, I’m going to imagine a more general situation. Just before I say what that is, let me point out what this matrix has going for it: it is positive definite, and pretty big on the diagonal (the latter because the highest common factor of x and x tends to be a lot bigger than the highest common factor of x and y).

    As I write, I suddenly realize that a cleverer choice of w_d than \Lambda(d) might have been \phi(d)/d. That way we would get A_{xy}=(x,y)/(x+y), which is rather nicer.

    Anyhow, my model situation is this. We have a finite set X and some subsets X_1,\dots,X_k and we define A_{xy} to be the number of i such that both x and y belong to X_i. I’d like to know what the eigenvectors of this linear map are like when there aren’t too many sets. (The rank of the matrix will be at most k so there will be a big eigenspace with eigenvalue 0. I’m interested in the rest.)

    We know that this matrix will have a non-negative eigenvector. It may not be easy to say exactly what it is, but perhaps we can say something about its properties. For instance, we know that it will be a linear combination of the characteristic functions of the X_i. If it is \sum_i\lambda_iX_i, then its image is \sum_i(\sum_j\lambda_j|X_i\cap X_j|)X_i, so we know that \sum_j\lambda_j|X_i\cap X_j| is proportional to \lambda_i. In other words, we are interested in the matrix B_{ij}=|X_i\cap X_j| and its eigenvectors.

    To try to understand this better, let’s consider a very simple case, where k=2, X_1=\{1,2,\dots,n\} and X_2=\{1,2,\dots,m\} for some m=\alpha n. Then we are looking at the matrix with entries 1, \alpha, \alpha, \alpha. The characteristic polynomial is x^2-(1+2\alpha)x+\alpha^2=0, which has roots (1+2\alpha\pm\sqrt{1+4\alpha})/2. If we choose a convenient value of \alpha such as 5/16, then we have \sqrt{1+4\alpha}=\sqrt{9/4}=3/2 and 1+2\alpha=13/8, so the two roots are 25/16 and 1/16. The eigenvector with eigenvalue 25/16 has entries 1 and u such that 1+5u/16=25/16, so 5u/16=9/16 and u=9/5. I feel far from confident that this is correct, and I’ve lost track of what I was hoping to get out of it.

    The basic thing I’d like to do is try to understand how the ways that the X_i intersect affects the shape of the eigenvectors. Does anyone know anything about this?

    • Thomas Sauvaget Says:

      Probably this subproblem would make a good question for mathoverflow and its large algebro-geometric readership, it might even be a classical subject for them.

    • Ian Martin Says:

      In case it helps, here is a picture of the elements of the positive eigenvector of the matrix (A_{xy})_{1\leq x,y \leq 1000} whose elements are (x,y)/(x+y), plotted on a log-log scale.

    • gowers Says:

      Another gorgeous diagram! I think it would be very useful also to have a few values tabulated — up to 200, say — so one could get an idea of how the value of the eigenfunction depends on the prime factorization of the number itself.

    • Ian Martin Says:

      In haste… as can be seen in the previous picture, the elements of the eigenvector decay at rate \sqrt{n}. This picture shows what happens if you scale up the nth element by \sqrt{n}. Points on the lower boundary occur at primes. Points on the upper boundary occur at highly composite numbers, though the precise pattern is not clear.

    • Ian Martin Says:

      Incidentally, when I say that the elements of the eigenvector decay at rate \sqrt{n}, I’m speaking very loosely: it looks as though the running minimum decays at roughly \sqrt{n}, but it’s hard to say exactly what’s going on with the running max.

    • Sune Kristian Jakobsen Says:

      Perhaps it would be better to plot only the points and don’t connect them with lines?

    • Ian Martin Says:

      Here you are, Sune.

    • gowers Says:

      It’s very clear that there is some partitioning of the integers going on here. Are you able to present it to us in some easily digestible form so we can try to work out the pattern? Then it might be possible to make a guess about what the eigenvector actually is (or perhaps just to guess an approximation to it).

      Another thing that would be very interesting indeed would be to work out what the eigenvector with the next largest eigenvalue is. I don’t know if you’re using a package that can do that quickly and easily.

    • Ian Martin Says:

      I’m using Mathematica. I’m wiki-literate, otherwise I’d create a page and put the first few thousands entries there. Let me try to put the first 100 entries here–I hope it won’t look too bad:

      {1., 2.11102, 2.81802, 3.8597, 3.61534, 5.75823, 4.00197, 6.35191, \
      6.03905, 7.26692, 4.32967, 10.1464, 4.39645, 7.9459, 9.39487, \
      9.59086, 4.46601, 11.859, 4.49177, 12.5566, 10.1864, 8.44294, \
      4.50222, 16.0052, 8.32411, 8.50826, 10.6031, 13.529, 4.50517, \
      18.0482, 4.51182, 13.4171, 10.6934, 8.53287, 12.3541, 19.9551, \
      4.50549, 8.5356, 10.7248, 19.353, 4.49567, 19.2631, 4.48884, 14.0761, \
      18.173, 8.4719, 4.48422, 23.0056, 9.0444, 15.624, 10.6619, 14.0592, \
      4.46045, 19.8037, 12.7455, 20.489, 10.6336, 8.36961, 4.43931, \
      29.6064, 4.43621, 8.3527, 19.1655, 17.4809, 12.6807, 19.7686, \
      4.40838, 13.8927, 10.4793, 22.7692, 4.41709, 29.8178, 4.40162, \
      8.25771, 19.2753, 13.8107, 13.3357, 19.6322, 4.38885, 27.0662, \
      15.6337, 8.18873, 4.38498, 30.9959, 12.4535, 8.14757, 10.2706, \
      20.783, 4.35536, 32.9945, 13.2172, 13.5572, 10.231, 8.10037, 12.3841, \
      30.3699, 4.32922, 16.3485, 19.3677, 24.9019}

    • Ian Martin Says:

      Ouch – that wasn’t great. And, obviously, I meant I’m *not* wiki-literate above. I’ve put the Mathematica notebook here if anyone else wants to have a look.

      Regarding the various strata that are visible in the picture:

      The bottom one is the primes
      The next up is 2 x primes
      The next one up, which is sparser, is the primes squared
      The next up is 3 x primes
      The next up is 5 x primes

      Then there is some bunching up

      Then the primes cubed (very sparse)
      Then 2 x primes squared


      I should reiterate for the benefit of people reading these comments rapidly that what I’m plotting here is not the elements of the eigenvector, but these elements scaled up at rate \sqrt{n}. (Here’s what the unscaled picture looks like.)

    • gowers Says:

      I may be able to work it out from what you’ve given, but what I was really hoping for was this. Looking at your diagram with the points, they seem to belong to very nice curves (or at least many of them do). I’d be particularly curious to know which integers belong to which curves. Precisely what the value of the function is interests me less (for now, though eventually I’d be very interested to know what the curves are too). When I get time, I’ll have a look at the 100 values you’ve provided and see if I can extract a small set of “continuous” functions from it.

    • gowers Says:

      Ah, while I was writing that comment you gave me just what I wanted to know.

    • Ian Martin Says:

      Here’s a plot of the second eigenvector of (A_{xy})_{1 \leq x,y \leq 3000}, with no \sqrt{n} scaling and on a log-vs-linear scale for obvious reasons.

      No time to investigate what’s going on here further, unfortunately, but I’ve updated the Mathematica notebook in case anyone wants to play around with it.

    • Ian Martin Says:

      It occurs to me that perhaps the second eigenvector plot is harder to interpret than the first eigenvector plot: I think some of its qualitative features may be more sensitively dependent on the cutoff (ie, on the dimension of the matrix: 3000 in these examples).

  16. gowers Says:

    Let me try to think about the first eigenvector by trying to think what happens if we start with the constant function f(n)=1 and repeatedly apply A to it. So at the first stage we have Af(n)=\sum_m (m,n)/(m+n). Already I’m not quite sure what’s happening. In fact, I’m worse than not quite sure, since if we sum to infinity then this diverges (since (m,n) is always at least 1).

    Obviously, one could try to deal with this by summing to some large N instead. But that feels as though it would introduce ugly cutoff effects. (On the other hand, I think that’s what Ian has been doing and his pictures look as though they may have quite pleasant descriptions.) So instead I want to think about whether one can make sense of the ratio of f(a) and f(b). I’ll start with the case where a is a prime. Then \sum_m (a,m)/(a+m) goes to infinity like \log m+(a-1)\log m/a. The way I worked that out was to say that if we pick a random integer m then we add roughly 1/m (when m is large and therefore roughly equal to a+m, multiplicatively speaking) and with probability 1/a we add a further (a-1)/m.

    So the rate of growth associated with a is (2-1/a)\log m. So I’m going to deal with this by just dividing through by \log m at the end.

    That was just for a prime. What if a is twice a prime? Let’s write a=2p. This time our growth rate will be \log m(1+1/2+(p-1)/p+p/2p), which, after we divide by \log m, gives us 3-1/p=3-2/a.

    At this point I want to approximate, by taking a to be fairly large. So if a is a prime I map it to 1, and if a is twice a prime I map it to 5/2.

    And what is happening in general? For this I’ll do the calculation differently. For every factor d of a I’ll get a contribution of d with probability a^{-1}\phi(n/d). (Check: when a=2p we get (2p)^{-1}(1.(p-1)+2.(p-1)+p.1+2p.1) which is approximately 3. (Honesty compels me to say that I had in fact made a mistake earlier and got 5/2, so that check was worthwhile.) And what is \sum_{d|n}d\phi(n/d)?

    I don’t know whether that has a nice interpretation, so instead I’ll try applying the matrix again. So now we obtain a function g(x)=\sum_y\frac{(x,y)}{x+y}y^{-1}\sum_{d|y}d\phi(y/d). Following what I did last time I’ll just forget about dividing by x+y and I’ll take an average over all y. But I don’t see anything nice coming out of this.

    What about a qualitative description? It definitely seems that the smoother a number is the better it is going to do. So one might guess that there is some measure of smoothness \sigma and that \sum_y\frac{(x,y)}{x+y}\sigma(y) is proportional to \sigma(x). This would be saying something like that you can get a reasonable idea of how smooth a number x is by seeing how smooth other numbers are that have big common factors with x.

  17. Ian Martin Says:

    Let me correct the small typo in my post above: if n is a prime, then Af(n) \approx (2n-1) \pi/n^{3/2}.

    If n is twice a prime, then following the same approach gives Af(n) \approx (5n/2-3)\pi/n^{3/2}. And if n is a prime squared, then Af(n) \approx (3n-2\sqrt{n})\pi/n^{3/2}.

  18. Ian Martin Says:

    D’oh! I think I made the same mistake you made earlier, which caused me to get a 5/2 when I should have had a 3 in the case n=2p. The corrected version for this case is Af(n) \approx (3n-3)\pi/n^{3/2}.

    In general, if n=pq for distinct primes p, q, we have Af(n) \approx (2p-1)(2q-1)\pi/(pq)^{3/2}, assuming I haven’t made any more silly mistakes.

    So if n is the product of twin primes p, p+2, Af(n) can be as large as about (4n-4\sqrt{n}) \pi / n^{3/2}; in the typical case in which p \ll q, Af(n) is quite a bit smaller.

    • Ian Martin Says:

      Follow-up question: is there an obvious function \xi(m) such that (i) \sum_{m} \frac{ (n,m) }{ n+m } \cdot \frac{ \xi(m) }{ \sqrt{m} } simplifies somehow and (ii) \xi(m) is constant over primes p, constant over twice-primes 2p, … , over squared primes p^{2}, … in such a way that it replicates the layers visible in the plot, which also seem to be coming out of these calculations?

    • gowers Says:

      That does look like a good question. I’m going to add the condition that it is smallest for primes, next smallest for twice-primes, and so on, and have a look at the Wikipedia page on arithmetic functions to see if anything jumps out at me.

    • gowers Says:

      Nothing did, but an idea occurred to me, which was that perhaps \xi is of the form \xi(n)=\sum_{d|n}\rho(d) for some function \rho. If so, then we’d be able to find \rho by Möbius inverting \xi: that is, \rho(n)=\sum_{d|n}\mu(d)\xi(n/d). It’s a stab in the dark, but do we get anything interesting if we Möbius invert the function \xi that we’re given experimentally?

    • Ian Martin Says:

      Here’s a plot of the Mobius inversion of the experimental \xi using the 2000×2000 matrix A_{xy}.

      There are some visible patterns: for example, the clearly defined upper stripe is the primes, and the most obvious stripe below is the twice-primes.

  19. H Matthew Says:

    I haven’t read this for a while, and I’m sure this has already been mentioned, but the sequence is made out of 14 letter alphabet with distinct combination rules. Each letter represents a 4 digit sequence that contains no more than three 1’s or three -1’s.

    I’ll think about this some more, as I am sure the algorithm’s that people are running are probably already doing something similar, and I saw someone had already done some analysis using 4 digit binaries. What I am thinking though is that one could determine whether or not the there is a rate of growth in the number of permissible letter combinations. If there isn’t and it only goes to extinction (or one unique solution) that would be an interesting result itself I think.

  20. H Matthew Says:

    One could also view the problem as reducing to a sequence of movements on a unit hypercube. It would be interesting to see whether the movements take on any patterns, or rather if the movements are restricted to edge, face or volume movements.

  21. H Matthew Says:

    I suppose one could also ask in terms of the rotations and reflections.

  22. gowers Says:

    I’m still trying to make a decent guess in response to Ian’s question. I’ve tried several things and not got very far yet. But maybe some useful clues could come from the ratios that are there in the data. For instance, it looks as though \xi(2p)/\xi(p) tends to a limit as p tends to infinity (along the primes). So this gives us some number \rho(2) to think about.

    I’m still puzzled about why squares of primes should behave as they do. It seems that being a square of a prime is worse than being three times a prime, and much worse than being a pq for primes p and q of comparable size.

    One thing I’m about to look at, without much hope of a miracle: is the eigenvector a multiplicative (as opposed to completely multiplicative) function?

    • gowers Says:

      Answer: it seems not.

      A few experimental questions that interest me are as follows.

      (i) What does the restriction of the graph of f to numbers of the form 4p look like?

      (ii) More generally, if you pick a large prime p and look at the ratios f(mp)/f(p) for small values of m, then what does that look like as a function of m?

      (iii) Does that function have more or less the same shape as \xi itself?

      (iv) What does \xi look like at integers of the form pq with p and q primes of roughly the same size?

    • Ian Martin Says:

      I’m entering a busy phase but will try to have a look at these at some point. (Given the addictiveness of this, I’ll probably get round to it fairly soon…)

      Regarding the experimental evidence on prime squares vs thrice primes: isn’t this what we see happening based on the approximations here and here? For example, Af(n) \approx 10\pi/(3 \sqrt{n}) if n is a thrice prime and Af(n) \approx 3\pi/\sqrt{n} if n is a prime squared.

      Admittedly, these approximations are (i) upper bounds and (ii) upper bounds on the behaviour of the first iterate of f(n), rather than the eigenvector/fixed-point, but it seems an encouraging sign.

      Incidentally, I think it’s also possible to provide lower bounds for this first iterate. For example if n=p a prime, then [(n-1)\pi+4n \arctan \sqrt{n}]/(2n^{3/2}) \leq Af(n) \leq (2n-1)\pi/n^{3/2}, which looks like 3\pi/(2 \sqrt{n}) \leq Af(n) \leq 2\pi/\sqrt{n} if n is large. I haven’t yet done this in the cases n=pq or n=p^{2}, but will try to do this at some stage soon unless you think this is a dead end?

      Finally, another experimental observation: it also looks as though the primes cubed are worse than being twice a prime squared.

  23. Jason Dyer Says:

    I’ve been thinking about a graph-theoretical generalization of the problem, which unfortunately doesn’t include the multiplicative feature.

    Define an infinite set of nodes a_1, a_2, ... such that given a node a_i there is a directed edge labelled i from a_{i(n)} to a_{i(n+1)} for all n\in \mathbb{N}.

    Give each node a value 1 or -1. Consider a traversal starting at a node moving along edges with the same label a finite number of times; then the discrepancy of a given traversal is the absolute value of the sum of all nodes visited. Asking if it is possible for the discrepancy to be bounded is equivalent to our problem.

    What’s interesting is the graph doesn’t have to follow the number system. Have a set of nodes (finite or infinite) a_1, a_2, ... such that there are labelled directed edges, but allow the edges to be arbitrary. Then there are interesting questions like; given a discrepancy d, what is a minimal graph (in terms of nodes and/or edges) where the discrepancy is greater than d? What conditions cause the discrepancy to “break”? My hope is we can isolate certain graph theoretic properties which cause the logic to break and translate those into number theoretic properties we can search for (perhaps very rare ones, but ones we can prove exist nonetheless).

    • Jason Dyer Says:

      Sorry, I should add it’s only equivalent to our problem if we only allow traversals to start at the root node of a particular set of labelled edges.

    • gowers Says:

      Your generalization is in a similar spirit to a generalization that Terry suggested a few weeks ago, but it seems to be more general still. It would be very nice if one could come up with a graph-theoretic conjecture that was significantly more combinatorial than EDP but still hard.

    • Jason Dyer Says:

      While it’s fun to think about the completely general version of the structure (for example, cycles of odd degree cause unbounded discrepancy), for the sake of the original problem it might be useful to set some conditions:

      Given a set of nodes a_1, ... a_n:

      1. One labelled set of edges connects a_i to a_i+1 for all i from 1 to n-1.

      2. For any edge from a_i to a_j, i is less than j.

      3. Given a set consisting of all labelled edges of a given label, any traversal starting from the root node of the set can traverse the entire set of edges.

      4. No set of edges is a subset of another set (this prevents having arbitrary start points).

      Given these conditions, the minimal set of nodes where the discrepancy must be greater than 1 is 4.

      Proof: Consider every possible set of edges using 3 nodes.

      set A: 1->2->3
      set B: 1->3

      Then a_1 = 1, a_2 = -1, a_3 = -1 has a discrepancy of 1.

      However, consider these sets with 4 nodes:

      set A: 1->2->3->4
      set B: 1->3
      set C: 1->4

      Then no combinations of +1 and -1 (set A constrains the possibilities to + – + -, – + + -, + – – +, and – + – +) avoids having a discrepancy of 2 in some set of edges.

  24. gowers Says:

    I want to try once more to explain why I’m interested in the eigenvalues and eigenvectors of the matrix A_{xy}=(x,y)/(x+y), partly for my own benefit.

    The starting point is a wish to prove EDP by means of a suitable averaging argument. That is, instead of proving that there exists a HAP P such that |\sum_{x\in P}f(x)|\geq C, one attempts to prove that the sum is large on average.

    There is a great deal of flexibility in how one interprets this, since we can apply a monotonic function to the sum before taking the average, and we can also introduce weights. To be more explicit, suppose that \phi is some strictly increasing function. Then x\geq C if and only if \phi(x)\geq\phi(C). Therefore, if H is a set of HAPs, and to each P\in H we assign a weight w_P in such a way that \sum_{P\in H}w_P=1, then a sufficient condition for there to exist a HAP with |\sum_{x\in P}f(x)|\geq C is that \sum_{P\in H}w_P\phi(|\sum_{x\in P}f(x)|)\geq \phi(C).

    The most obvious choice of function \phi is \phi(x)=x^2, since this is gives us a positive-definite quadratic form in f and potentially makes available to us the tools of linear algebra.

    We can think of this quadratic form as follows. Let \Lambda be the linear map from \mathbb{R}^n to \mathbb{R}^H that’s defined by \Lambda f(P)=\sum_{x\in P}f(x), and let \Delta be the diagonal map from \mathbb{R}^H to itself that has matrix \Delta_{PP}=w_P. Then the quadratic form takes the column vector f to f^T\Lambda^T\Delta\Lambda f. It ought to be useful if we could diagonalize this quadratic form, which is equivalent to finding n functions g_1,\dots,g_n such that \sum_Pw_P\langle f,P\rangle^2=\sum_{i=1}^n\langle f,g_i\rangle^2. The most obvious way to try to do that is to take the symmetric n\times n matrix \Lambda^T\Delta\Lambda and find an orthonormal basis of eigenvectors. But it occurs to me while writing this that there might be other useful ways of diagonalizing the quadratic form. Or maybe one would be happy to settle for more than n functions g_i, as long as they were somehow “nice”. It is of course always open to us to choose the weights to try to make them nice.

  25. Sune Kristian Jakobsen Says:

    This might be a stupid question, but I’ll ask it anyway: Does there exist a C\in \mathbb{N} such that for any n\in\mathbb{N} there is a finite {1,-1}-sequence x_1,x_2,\dots,x_n such that for any d\in\mathbb{N}:

    |\sum_{i=1}^{\lfloor n/d\rfloor}x_{id}|\leq C

    That is, we don’t care for partial sums, but only for HAPs that goes all the way to n.

  26. gowers Says:

    Something else that would help a lot in the process of seeing whether any diagonalizing-of-matrices approach could work is to think about what any proof that \langle f,Tf\rangle is large could possibly look like. For example, it is obvious that if the smallest eigenvalue of T is c>0, then \langle f,Tf\rangle\geq c\|f\|_2^2, since the best you can do is choose an eigenvector with eigenvalue c. However, we also know that that is not how a proof of EDP could look, since there are functions f such that \|f\|_2 is large and the discrepancy is small, such as our old friend 1, -1, 0, 1, -1, 0, 1, -1, 0, …

    What I have been suggesting is an intermediate result, in which we assume not just a lower bound on \|f\|_2 but also an upper bound on \|\hat{f}\|_\infty. How might that help?

    Well, if we knew that the only eigenvectors with very small eigenvalues were very trigonometric in shape, then we might be getting somewhere, since then the only way for \|f\|_2 to be large without f correlating with a trigonometric function would be for f to have a reasonable projection on to the space generated by eigenvectors with larger eigenvalues.

    It’s quite plausible that low-eigenvalue eigenvectors are trigonometric in nature, since periodic examples seem to give us a rich supply of low-discrepancy sequences (and hence sequences that are mapped to something small by our matrix). So this might be something that’s not too hard to investigate.

    Why isn’t this just restating the problem? I think it’s because I’d hope that we could find such a big supply of functions that are mapped to small things that we would “run out of the small part” of the spectral decomposition. It might not be necessary to find the eigenvectors. Sorry to be a bit vague here. I think the next thing to do is see what the matrix does to trigonometric functions.

    • gowers Says:

      Hmm … immediately after submitting that I realized it was very close to what I was trying to do earlier with the continuous kernel. But that doesn’t stop me wanting to think about it.

  27. Sune Kristian Jakobsen Says:

    I promised that I would write something about pseudointegers on the wiki and give a summary. I have written the page now, but I haven’t proved as much as I had hoped. Here is what I have found out:

    For a pseudointeger x\neq 1= (0,0,\dots) we look at the density of pseudointegers that are divisible by x. In one of the models where EDP fails this is 0 for all x\neq 1 and in another this is 1 for all x. I have proved, that if it is 0 for one x, it is 0 for all x\neq 1. If it is 1 for a x\neq 1, it must be 1 for all x.

    It turns out, that for a set X of “pseudointegers” as I defined it here (@Tim: sorry for stealing the word) you can find a (non necessarily injective, but non-trivial) function \varphi: X\to \mathbb{R} with \varphi(xy)=\varphi(x)\varphi(y) and x\leq y \Rightarrow \varphi(x)\leq\varphi(y). It is defined such that if the density of integers divisible by x is d_x\in (0,1), then d_x=1/\varphi(x). If furthermore the limit

    d=\lim_{n\to\infty}\frac{|\varphi(X)\cap (0,n]|}{n}

    exists and d\in (0,1) we know that the density d_x must exists.

    I think it is a good idea to think of pseudointegers as subsets of R, but as Tim noticed here it gives some difficulties (if \alpha<2 there exists n such that n\alpha<2n-1), so it might still make sense to use the more general definition.

  28. Gil Kalai Says:

    I thought that maybe it will be a good idea to try even superficially some orthogonal ideas (at least to my own, it is so not easy keeping track).

    First, a remark: It looks that the only definite result in the positive direction is about functions correlated with a character and I would be happy if somebody can explain informally, “why” this result is true, and does it “explain” the observed log-n discrepency.

    So I thought about probabilistic heuristical ideas towards sequences with low discrepency.

    How to construct prove using a random method that sequences with low discrepency exist? or even to estimate roughly the number of such sequences? Heuristically, if we want to keep the discrepency on a AP below K we need to impose conditions on intervals of length about K^2 or a little less. (If we assume that x_1+...+x_m=0 then we can be pretty sure that the partial sums are below \sqrt m or so.)

    Altogether on all HAP we have nlogn/(K^2) conditions; and each one represents an event whose probability is roughly 1/K, so a rough estimate of the overall probability for a sequence to satisfy all these conditions is (1/K)^nlogn/K^2 which should be larger than 2^-n.

    So if K is little more than sqrt log n it looks good.

    There are now good news and bad news:

    The bad news is that we do not have independence. There are methods to prove the existence of rare events even if there is some slight dependencies (e.g., Lovasz Local Lemma) but you still need much more independence.

    The good news is that it seems that the correlation is positive. So if you already assume that sums of x_is is zero (or close to zero) for certain entervals in some HAP then it is more likely that such sums are zero when you take sums along intervals in other HAP.

    But the good news are not that good in the sense that I see no (even heuristically) reason to believe that this positive correlation will replace the 1/K with something larger than 1/2 and this will not make a big difference. (But this deserves more thinking.)

    What about general AP (just as a sanity test). Here, the n log n is replaced by n^2 so K indeed is expected to be larger. It would be interesting to compare the discrepency results for AP (Roth?) with what the probabilistic heuristic gives to get some intuition.

    If such a heuristic can show that the probability for a sequence with low discrepency (say polylog n) is sufficiently large, I see no reason why such low discrepency will force some periodic behavior.

    • Sune Kristian Jakobsen Says:

      How was the cN^{1/4}-bound for general APs proved?

    • gowers Says:

      I find this line of thought interesting, and I would certainly be very interested indeed in a proof that there were many examples with logarithmic discrepancy.

      One thing I didn’t understand was your assertion that the correlations were positive. For example, it looks as though by far the easiest way to make the discrepancy small along all HAPs with odd common difference is to take a periodic sequence such as 1, -1, 1, -1, … or 1, 1, -1, -1, 1, 1, -1, -1, … which suggests a negative correlation with low discrepancy along HAPs with common difference a power of 2.

      It could be that this heuristic argument breaks down if one looks only at logarithmic discrepancy, but I still don’t see any evidence for positive correlation. Indeed, the rough picture I had in mind was that there was negative correlation, in that in general it seems that periodicity tends to help a lot with some common differences and be a big problem for others. Or perhaps I mean something a bit more complicated, like positive pairwise correlation but with negative correlations coming in when one tries to ensure many events all at once. (If EDP is true then something like this seems to be the case, since we probably do have reasonable pairwise correlations but we know that we can’t get all the events to hold simultaneously. But again I’m talking about bounded discrepancy rather than logarithmic discrepancy here.)

    • gowers Says:

      Sune’s question is one I want to know the answer to as well, but I’ve been too lazy to get hold of the paper.

    • Gil Kalai Says:

      Well, if one drunk person goes at each time-period left and right with probability 1/2 and his lazy twin brother makes precisely the same moves on odd time-periods and stays still on even times periods, then the probabilities for them to end very near to the origin are positively correlated. So I think positive correlation for the kind of events I considered is roughly the “first order” behavior. (I think you are right that there may be subtle negative correlations floating around.)

      If my heuristic idea is roughly ok (and it is quite possible I made a serious mistake there), then I would expect that log n discrepency sequences (perhaos even logn^{1/2+epsilon)) will be a rare but positive-probability events, (the number of such sequences will still be exponential), so I would not believe some subtle negative correlation will interfere (of course, proving this is a different matter).

      There is some little hope that positive correlations will somehow allow constant-discrepency sequences.

      This also suggests that finding in an arbitrary +-1 sequence an interval in an HOP with large discrepency using a pobabilistic argument is something worth trying. But the big obstacle here (as before) would be how to distinguish a +-1 sequence from a 0 +1 -1 sequence.

    • gowers Says:

      A general question that occurred to me was this. Suppose you have a finite set X and a collection \mathcal{A} of subsets of X of even cardinality. Under what conditions can you find a function f:X\rightarrow\{-1,1\} such that \sum_{x\in A}f(x)=0 for every A\in\mathcal{A}?

      My guess is that this problem would be NP-complete in general, so the best one could hope for is a sufficient condition for such a function to exist. The motivation for the question is that one way of restricting the growth along a HAP is to insist, fairly frequently, that the sum is zero. This gives us a collection \mathcal{A} as above, and the question is how “dense” we can make it.

      A small remark is that if we take the linear relaxation (that is, we allow f to take arbitrary real values) then we always have the trivial solution of setting f to be identically zero, and we have non-trivial solutions only if the characteristic functions of the sets A\in\mathcal{A} are linearly independent. If we decide that the sum should be zero at every kth number along each HAP, then the HAP with common difference d gives rise to around n/dk linear conditions, so the total number of conditions we need is (n/k)(1+1/2+\dots+1/(n/k)), which (for smallish k) is around n\log n/k. This suggests that there is something natural about the \log n bound. (Of course, it is possible that there will be dependencies amongst the various conditions, but at first glance they look fairly independent to me.)

    • Gil Kalai Says:

      That’s quite interesting! I like this linear algebra idea.

      A few remarks:

      1) Actually we do not have to determine \cal A in advance: we only need to assume we can partition the HAP to intervals of length proportional to k (say between k/100 and 100k) so that the equations will be satisfied. (We have a considerable freedom to chose the equations that will still to the discrepency being likely small.)

      2) The linear algebra consideration can go both ways. If you can show using linear algebra that you cannot have the zeros of the sums on all HAP being too “close by” (say uniformly below some constant) this will come close of proving EDP.

      (And as a meta comment: trying other techniques like linear algebra methods or polynomial methods or probabilistic methods seem quite reasonable for EDP. unlike DHJ i dont think we have a propri much knowledge which tricks will be relevant.)

      3) Strangely we need to understand how such conditions on HAP with large periods depend on conditions for HAP with small periods; Both in terms of “statistical dependence” (my heuristic comment) and in terms of “linear algebra dependence” (your comment).

      4) Also the problem of trying to get by stochastic methods low discrepency sequence has a feeling like random walks with various stochastic tools to make the walks frequently return to zero. (Simultaniously for all HAP). It seems we discussed such “biased” random walks before but I dont remember when.

      5) Anyway these heuristics taking also with the linear algebra concerns still suggest that discrepency sqrt log n might be possible.

    • gowers Says:

      One further remark. It is initially tempting (or at least it was for me) to conjecture that if you choose many intervals along many HAPs, then you are forced to choose a linearly independent set of intervals. However, I have just realized that the usual example, 1, -1, 0, 1, -1, 0, … shows that that cannot be true, since along every HAP it has a zero sum either always or every third term. The conclusion is that if we take all sets of the form \{md,(m+1)d,(m+2)d\} then they (or rather their characteristic functions) are not linearly independent (because they all live in the subspace orthogonal to 1, -1, 0, 1, -1, 0, …).

      But I think that raises an interesting question. If we choose a large set of small segments of HAPs, then must the annihilator of that set consist of very structured functions (e.g. functions that are periodic with period d for some small d and zero at all multiples of d)? Since this is a linear algebra question, it might be more approachable that EDP itself, but it also seems close enough that an answer to it could be helpful. Even if that is not the case, I like the question.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: