It is not at all clear to me what should now happen with the project to solve the Erdős discrepancy problem. A little while ago, it seems that either the project ran out of steam, or else several people, at roughly the same time, decided that other things they were doing should take higher priority for a while. Perhaps those are two ways of saying the same thing.
But they are not quite the same: as an individual researcher I often give up on problems with the strong intention of returning to them, and I often find that if I do return to them then the break has had an invigorating effect. For instance, it can cause me to forget various ideas that weren’t really working, while remembering the important progress. I imagine these are common experiences. But do they apply to Polymath? Is it possible that the EDP project could regain some of its old vigour and that we could push it forward and make further progress? Is it conceivable that it could go into a different mode, where people contributed to it only occasionally? (A problem with that is that one of the appeals of a polymath project is checking in to see whether there are new ideas, or whether people have reacted to your comments. It is not clear to me that this appeal works if it is substantially slowed down.)
Anyhow, as Terry Tao might put it, this situation can be regarded as an opportunity to add a new datapoint to the general polymath experiment. Recently the following conjunction of circumstances occurred: I found myself on a plane, my laptop battery lasts a fraction of the time it should, and all the films on offer were either unappealing or too appealing (meaning that I’d been looking forward to seeing them but didn’t want to waste them by watching them in aeroplane conditions). I therefore found myself with several hours to think about mathematics. It was just like the old days when I didn’t have a laptop and there would only be a couple of films, both terrible. So I thought about EDP. Now I would like to avail myself of the opportunity to obtain a new datapoint by writing my thoughts, such as they were, down in a post and seeing whether anyone feels like joining or rejoining the discussion. I have plenty of questions, some of which may be fairly easy: I hope that will be a good way of tempting people.
A brief reminder of some of the ideas we have been thinking about.
Since one of the aims of this post is possibly to attract newcomers to the project, let me briefly recap one or two things, including a statement of the problem. We define a homogeneous arithmetic progression, or HAP for short, to be a set of the form for some pair of positive integers . Given any sequence , we define its HAP-discrepancy, or discrepancy for short (since in this context we will almost always be talking about HAP-discrepancy and not any other kind of discrepancy) to be the maximum of over all HAPs . Erdős’s discrepancy problem is the annoyingly simple sounding question of whether it is possible that one can find arbitrarily long sequences with bounded discrepancy. That is, does there exist a constant such that for every there is a sequence of length and discrepancy at most ?
When we started work on the problem, my perception was that a major difficulty was that the problem relied heavily on the sequence taking values in . There is a very simple example that appears to demonstrate this: the sequence . It is easy to check that the discrepancy of this sequence (even the infinite sequence) is 1, and yet it takes values of modulus 1 two thirds of the time. This is a pity, because the usual analytic methods tend to give results that can be generalized from purely combinatorial statements (typically involving sequences that take just two values) to more general analytic ones (typically involving sequences that satisfy some analytic condition such as a norm bound).
A more general question.
One of the main pieces of progress to come out of the original discussion was Moses Charikar’s observation that it is possible to attack the problem analytically after all. The trick is to use a weighted norm that tends to give more weight to numbers when they have more factors. For instance, a multiple of the example mentioned above shows that there is no hope of proving that the discrepancy is unbounded for every sequence such that . However, there appears to be no reason in principle that one could not prove that the discrepancy was always at least for a sequence such that is unbounded. If one could do that, then it would imply EDP, since for sequences we have for every .
There has been quite a bit of discussion about how one might be able to find such a set of weights and prove the result, but so far we have not succeeded. Rather than repeating what the possible methods of proof might be, I would like to suggest focusing on the problem of choosing the weights, because it seems to me that there is plenty of opportunity here to get further insights into the problem.
To repeat: in order to prove EDP we need to find a system of weights such that is unbounded above and such that every sequence has discrepancy at least . The problem I would like to focus on for a while is this: find a good candidate system of weights. That is, find weights that do not obviously fail to work. What I hope is that if we look at several systems of weights and manage to show that they do not work (by constructing increasingly clever sequences ) then we will eventually settle on a system of weights that does work. If we can do that, then we will have split the problem into two and solved one of the two parts. And we have ideas about how to do the other part. (Roughly speaking, we want to find a way of decomposing a diagonal matrix: the two parts of the problem are to choose the diagonal matrix and to find the decomposition.)
Which systems of weights are definitely bad?
For the remainder of this post I want to make a start (or rather, a restart, since much of what I say can be found in the earlier discussions) by thinking about what is wrong with certain systems of weights. It will be convenient to rephrase the problem slightly: let us ask for a system of weights such that the discrepancy of an arbitrary real sequence is always at least (where tends to infinity as tends to infinity). That way we care about the weights only up to a scalar multiple, so we do not have to normalize them carefully in advance.
The sequence 1, -1, 0, 1, -1, 0, … shows that we cannot take for every . (I have already said this above, in a very slightly different way.) Indeed, its discrepancy is 1, but would be around 2/3 if we took for every . And the ratio of these two is not unbounded.
The same sequence disposes of many other systems of weights. For example, suppose we were to choose randomly to be 1 with probability and otherwise. Then with very high probability would be close to and would be close to . Therefore, once again we would have a quantity that is close to 2/3 for the ratio, which does not tend to zero.
Indeed, any system of weights that is not concentrated in the multiples of 3 will have the same defect, for precisely the same reason. To put that more precisely, suppose that Then with as above, we have while the discrepancy of the sequence is 1. Thus, for the weights to yield a proof of EDP we need
This simple observation can be generalized in many directions. For instance, it is easy to prove in a similar way that we need It is also easy to prove in a similar way that we need A general message from this argument is this: the weights must be strongly concentrated on numbers with many factors. And when one thinks about this, it is not too surprising: the numbers with plenty of factors are precisely the numbers that belong to many HAPs and therefore the numbers for which it should be in some sense difficult to incorporate a large real number into a sequence with low discrepancy (because there are likely to be many constraints on ).
But let us stay for the moment with elementary observations rather than thinking about what sequences there are that are concentrated on numbers with many factors. We know that for every the proportion of the total weight of the on values of such that but is . One way of achieving that is to and to choose a set that contains, for each , exactly one element that is divisible by but not by . One could then let if and only if
If is the set then it is easy to find an appropriate sequence : just let if for even , if for odd , and 0 otherwise. The intersection of any HAP with is either empty or takes the form so this sequence has a discrepancy of 1, while
However, if consists of an arbitrary number that is equal to 1 mod 3, an arbitrary number that is equal to 3 mod 9, and so on, then many more HAPs can intersect and I have not come up with a method of constructing bounded-discrepancy sequences with a substantial restriction to Let me ask this as a formal question. It doesn’t seem to be trivial, but I cannot believe that it is especially hard either.
Question. Let be a subset of such that for every there is no element of that is congruent to mod and at most one element that is congruent to mod Must there be a sequence of bounded discrepancy that takes values everywhere on ?
It is not in fact necessary for the sequence to take values everywhere on in order to show that the characteristic function of is unsuitable. However, since I expect such a sequence to exist, I have stated the question in this stronger form. Probably one can also ask for the sequence to take values or 0 outside
I should briefly explain why I have not asked the more general question of whether it is possible to prove that no small set can work (as opposed to a set that is forced to be small by the divisibility conditions above). That is because cardinality on its own is not enough: for example, one could take to be a HAP of length , and then to find a sequence of bounded discrepancy that takes values on would be equivalent to finding a counterexample to EDP. The idea here is to show that for certain sets one can exploit the freedom of a sequence outside to make the discrepancy small.
Nevertheless, there might be a condition of the following kind: perhaps if has the property that it cannot intersect a HAP of length in more than elements, then it cannot serve as a system of weights for EDP. And I don’t have any particular reason to think that is the right function here — perhaps one could prove a similar result for significantly faster-growing functions.
A candidate for a good system of weights.
I’d now like to resurrect an idea that I mentioned at some point earlier in the discussion. It’s a suggestion for a sequence of weights that seems to me to have some chance of working. And if it doesn’t work for some obvious reason, then it feels like the kind of idea that could be modified in various ways to get round preliminary objections.
Let me start with two ideas that don’t work. The first is to take for every . We have seen that this doesn’t work — indeed, the proportion of weight it attaches to multiples of 3 is not
The second is a feeble attempt to get round this obvious problem with the first. We want to give extra weight to numbers with more factors, so how about defining to be , the number of factors of ?
To see that this does not work is slightly harder, but only slightly. Again we shall show that this system of weights does not concentrate on multiples of 3. (Of course, we care about multiples of other numbers too, but if it doesn’t work for 3 then we’re already in serious trouble.) To see this, let us first consider how to estimate This we can rewrite as since each contributes 1 to the sum for each multiple of that is at most And this we can approximate by which is roughly (I won’t bother to analyse how good an approximation that is.)
Now let’s consider what happens if we estimate in a similar way. This time, we can approximate the sum as since if is a multiple of 3 we get no contribution, and if is not a multiple of 3, we get a contribution of 1 for each multiple of that is not a multiple of 3, and there are about of those. So the sum works out to be around Now this is 4/9 of the entire sum, so the usual sequence shows that setting does not work.
Nevertheless, we do seem to have achieved something: with this choice of we have given less than average weight to non-multiples of 3. The problem is that we have not gone far enough. Now one way of thinking of is that it is . A way that one might think of attaching yet more weight to numbers with plenty of factors would be to define to be This can be rewritten as That is, it counts pairs such that and A fairly simple calculation similar to the one done above shows that this function again attaches too much weight to non-multiples of 3, but it is an improvement on the previous function. The constant function 1 attaches weight 2/3 to the non-multiples of 3, the function attaches weight 4/9, and the function attaches weight 8/27 (or 4/9 of what it should if it were unbiased).
This suggests, correctly, that we can obtain the desired weight if we iterate this process an unbounded number of times. This leads to two suggestions for a system of weights. The first is to choose some slow-growing function and take the function (which is defined inductively by the formula Another is to iterate until a fixed point is reached, though for this the definition must be modified somewhat. We obviously cannot find a positive function such that but we can find a function such that Indeed, if we set then the first few values are Let me now just write the sequence: it goes
This sequence can be found on Sloane’s database. (It’s sufficiently natural that I’d be perturbed if it couldn’t.) I haven’t yet understood everything it says there about it, but there appear to be some facts there that might conceivably be of use.
So here is an obvious question: is there some reason that it cannot work in a proof of EDP? Let me say once again what this means. For this function to work in a proof of EDP we would need to be able to prove an inequality that said that for every sequence the HAP-discrepancy is always at least So to prove that it does not work it is sufficient to find a sequence (I’ve been assuming it will be real, but even a vector-valued sequence would do) of bounded discrepancy such that for some constant that is independent of
One possible reason for its not working might be that it grows too fast, or perhaps has a subsequence that grows too fast. Let me try to explain what might conceivably go wrong.
To do this, we need a more explicit definition of . It can be defined as follows: is the number of sequences such that and for each we have . Here we include the empty sequence as a sequence. (An equivalent definition would be to ask for sequences that start with 1.) It is easy to check that this function satisfies the relation and the initial condition . And to see how it works in a concrete case, let's take . The possible sequences are (), (2), (3), (4), (6), (2,4), (2,6) and (3,6).
In general, working out is slightly complicated. If for some prime , then we can take any subset of in increasing order, so If is a product of distinct primes, then for each there is a one-to-one correspondence between the sequences of length that we can take and the set of surjections from to . Indeed, given a surjection , define to be the product of all such that and take the sequence . Conversely, from the sequence we can recover the and hence the function , which has to be a surjection since the sequence is strictly increasing.
Unfortunately, there is no nice formula for the number of surjections from a set of size to a set of size , but we can think of as being a function that behaves fairly like .
So how big can be if ? The biggest it can be for a prime power is , since the best possible case is . As for products of distinct primes, the best we can do is if we take the first primes. The product of these we can sloppily estimate to be about , which is around , which in turn is around . By contrast, is around , so in this case appears to be much smaller than . However, it is also a lot bigger than 1, so it would appear that the weight attached to a big power of 2 does not by any means dominate the sum.
Given that evidence, I find it not quite clear whether we should be happy with the function as already defined, or whether we should consider weighting it in some way so as to make it roughly the same size throughout the range. At this stage it is perhaps enough to say that we have the option of attempting the latter. (I am mindful of the experimental evidence discovered by Moses and Huy, which suggests that the optimal function has a tail that shows some kind of exponential decay. I don't have any kind of theoretical argument for that.)
How might one prove the result we want?
In my previous post on EDP, I suggested various techniques for finding delicate examples. One of them was recursion, which I rather dismissed. However, I am interested in it again now, not least because the function above has a natural recursive definition.
How might that lead to a proof? Well, here’s a suggestion. Suppose we find a non-trivial way of writing the identity matrix as a linear combination , where the and are HAPs and is the rank-1 matrix with xy-entry equal to 1 if and and 0 otherwise. We know that this will not be a useful decomposition of the identity. However, the formula can be rewritten where if and if . It follows that the diagonal matrix that has down the diagonal can be expressed as a sum , where is the diagonal matrix that takes the value at all multiples of apart from itself, and 0 everywhere else. (To see this, note that )
If we have a way of decomposing the identity (for all ) then we can subtract the matrix that’s 1 at (1,1) and 0 everywhere else, to obtain the matrix And then we can use the same method to obtain decompositions of all and finally we can take an appropriate linear combination to obtain the matrix So far, this achieves nothing much, but note that our decompositions involve negative as well as positive coefficients. If we can do it in a way that involves many HAPs with many common differences, then perhaps there is some hope that there will be significant cancellation so that the decomposition of is more efficient than the decompositions of the individual matrices
I’ll start by briefly pointing out that if the above approach is to work, then it will have to use HAPs that are not “full”. (By a full HAP I mean one of the form ) That is because the sequence that is 1 up to and thereafter has bounded discrepancy on all full HAPs. More generally, for a decomposition of a diagonal matrix to work, it must lead to a discrepancy proof for the class of HAPs used in the decomposition, so one should not try to build the matrix out of some restricted class of HAP products unless there is a chance that the theorem is still true for that restricted class.
A second remark is essentially the same as this recent comment of Gil’s. It’s that we already know that the extremal examples — by which I mean long sequences with low discrepancy — have to have some kind of multiplicative structure. This information ought to help us make our guesses. At the time of writing I don’t have a clear idea of how it would narrow down the search, so in order to pursue this thought, the first thing to do would be to consider exactly that question: if you have information about what the worst sequences are like, how does that affect the possible matrix decompositions?
Could the weights be multiplicative?
Since it’s relevant to that last question, let me mention a system of weights I thought of that doesn’t seem to work, but that might perhaps be salvageable. The idea is to define weights such that for every and
An instant problem with that idea is that it seems to be hard to concentrate the weights on multiples of 3. Indeed, either tends to infinity, or the sum of the first weights tends to zero as a proportion of the sum of the first weights. That is because by multiplicativity.
The second condition essentially means that the sequence should grow faster than any power of . But if we set to be the same for every prime , as seems quite natural, then we find that the sequence grows at most polynomially (since the biggest we can get is at powers of 2, and would be which is a power of ). So we seem to be forced to take to be a “constant” that tends to infinity. But then, if some back-of-envelope calculations are correct (but I’m far from sure about this), the weight of the sequence is too concentrated on large powers of 2. In fact, I think it may even be concentrated at just the single largest power of 2 less than , which is obviously disastrous.
So it looks as though multiplicative sequences are out, but this claim needs careful checking as it could be rubbish. (One way it could be rubbish is if it is possible to salvage something from the idea by having different values at different primes, but at the moment I’m not very convinced by that idea.)
A rather vague general question.
The following question is important, even if hard to make precise: how much does the choice of the weights matter?
Let me at least try to pin the question down a little bit. We know that there are certain simple constraints that the have to satisfy, and also that these constraints narrow down the choice quite a bit. However, they do not seem to determine the weights anything like uniquely. It is quite possible that with a bit more thought we could come up with some more constraints. In fact, that is clearly something we should try to do, so let me interrupt the discussion to ask that more formally.
Question. Are there some other kinds of constraints that the must satisfy?
In relation to this question, it is worth looking at this comment of Sune’s, and the surrounding discussion.
Even if the answer is yes, it is clear that the constraints will not completely determine the sequence . (Why? Well, given one matrix decomposition that works, one can build other ones out of it in artificial ways. I leave that as an easy exercise.) But do they come close? For example, is there some low-dimensional space such that every system of weights that works is a small perturbation of a sequence in that space?
I don’t really care about that as a mathematical question so much as a strategic question. From the point of view of solving EDP, can we afford to be reasonably relaxed about the choice of the and focus our attention on the decomposition, or will we find that the desired decomposition does not exist unless we choose the extremely carefully?
A summary of what I have said above.
I hope very much that it will be possible to revive the EDP discussion, even if the pace is slower than before. It seems to me that an excellent topic to concentrate on for a while would be the question of how to find a system of weights such that the square of the HAP-discrepancy of every sequence is always at least Amongst the subquestions of this question are whether the sequence I suggested above has a chance of working, “how large” the set of sequences that could work is, and whether there are constraints of a completely different kind from the ones considered so far.
I have considered real sequences, but if a sequence of weights is to work, we also need the inequality to generalize to vector-valued sequences. That is, we need the square of the discrepancy of to be at least A further question of interest is whether generalizing to vectors introduces interesting constraints that are not felt if one just looks at scalar sequences.