EDP15 — finding a diagonal matrix

It is not at all clear to me what should now happen with the project to solve the Erdős discrepancy problem. A little while ago, it seems that either the project ran out of steam, or else several people, at roughly the same time, decided that other things they were doing should take higher priority for a while. Perhaps those are two ways of saying the same thing.

But they are not quite the same: as an individual researcher I often give up on problems with the strong intention of returning to them, and I often find that if I do return to them then the break has had an invigorating effect. For instance, it can cause me to forget various ideas that weren’t really working, while remembering the important progress. I imagine these are common experiences. But do they apply to Polymath? Is it possible that the EDP project could regain some of its old vigour and that we could push it forward and make further progress? Is it conceivable that it could go into a different mode, where people contributed to it only occasionally? (A problem with that is that one of the appeals of a polymath project is checking in to see whether there are new ideas, or whether people have reacted to your comments. It is not clear to me that this appeal works if it is substantially slowed down.)

Anyhow, as Terry Tao might put it, this situation can be regarded as an opportunity to add a new datapoint to the general polymath experiment. Recently the following conjunction of circumstances occurred: I found myself on a plane, my laptop battery lasts a fraction of the time it should, and all the films on offer were either unappealing or too appealing (meaning that I’d been looking forward to seeing them but didn’t want to waste them by watching them in aeroplane conditions). I therefore found myself with several hours to think about mathematics. It was just like the old days when I didn’t have a laptop and there would only be a couple of films, both terrible. So I thought about EDP. Now I would like to avail myself of the opportunity to obtain a new datapoint by writing my thoughts, such as they were, down in a post and seeing whether anyone feels like joining or rejoining the discussion. I have plenty of questions, some of which may be fairly easy: I hope that will be a good way of tempting people.

A brief reminder of some of the ideas we have been thinking about.

Since one of the aims of this post is possibly to attract newcomers to the project, let me briefly recap one or two things, including a statement of the problem. We define a homogeneous arithmetic progression, or HAP for short, to be a set of the form P_{d,m}=\{d,2d,\dots,md\} for some pair of positive integers (d,m). Given any sequence (x_1,\dots,x_n), we define its HAP-discrepancy, or discrepancy for short (since in this context we will almost always be talking about HAP-discrepancy and not any other kind of discrepancy) to be the maximum of |\sum_{j\in P_{d,m}}x_j| over all HAPs P_{d,m}. Erdős’s discrepancy problem is the annoyingly simple sounding question of whether it is possible that one can find arbitrarily long \pm 1 sequences with bounded discrepancy. That is, does there exist a constant C such that for every n there is a \pm 1 sequence of length n and discrepancy at most C?

When we started work on the problem, my perception was that a major difficulty was that the problem relied heavily on the sequence taking values in \{-1,1\}. There is a very simple example that appears to demonstrate this: the sequence 1, -1, 0, 1, -1, 0, 1, -1, 0, .... It is easy to check that the discrepancy of this sequence (even the infinite sequence) is 1, and yet it takes values of modulus 1 two thirds of the time. This is a pity, because the usual analytic methods tend to give results that can be generalized from purely combinatorial statements (typically involving sequences that take just two values) to more general analytic ones (typically involving sequences that satisfy some analytic condition such as a norm bound).

A more general question.

One of the main pieces of progress to come out of the original discussion was Moses Charikar’s observation that it is possible to attack the problem analytically after all. The trick is to use a weighted norm that tends to give more weight to numbers when they have more factors. For instance, a multiple of the example mentioned above shows that there is no hope of proving that the discrepancy is unbounded for every sequence (x_1,\dots,x_n) such that n^{-1}\sum_ix_i^2\geq 1. However, there appears to be no reason in principle that one could not prove that the discrepancy was always at least (\sum_i b_ix_i^2)^{1/2} for a sequence (b_1,\dots,b_n) such that b_1+\dots+b_n is unbounded. If one could do that, then it would imply EDP, since for \pm 1 sequences we have x_i^2=1 for every i.

There has been quite a bit of discussion about how one might be able to find such a set of weights and prove the result, but so far we have not succeeded. Rather than repeating what the possible methods of proof might be, I would like to suggest focusing on the problem of choosing the weights, because it seems to me that there is plenty of opportunity here to get further insights into the problem.

To repeat: in order to prove EDP we need to find a system of weights b_1,\dots,b_n such that \sum_ib_i is unbounded above and such that every sequence x_1,\dots,x_n has discrepancy at least (\sum_ib_ix_i^2)^{1/2}. The problem I would like to focus on for a while is this: find a good candidate system of weights. That is, find weights that do not obviously fail to work. What I hope is that if we look at several systems of weights and manage to show that they do not work (by constructing increasingly clever sequences x_1,\dots,x_n) then we will eventually settle on a system of weights that does work. If we can do that, then we will have split the problem into two and solved one of the two parts. And we have ideas about how to do the other part. (Roughly speaking, we want to find a way of decomposing a diagonal matrix: the two parts of the problem are to choose the diagonal matrix and to find the decomposition.)

Which systems of weights are definitely bad?

For the remainder of this post I want to make a start (or rather, a restart, since much of what I say can be found in the earlier discussions) by thinking about what is wrong with certain systems of weights. It will be convenient to rephrase the problem slightly: let us ask for a system of weights b_1,\dots,b_n such that the discrepancy of an arbitrary real sequence x_1,\dots,x_n is always at least \Omega(1)(\sum_ib_ix_i^2)^{1/2}/(\sum_i b_i)^{1/2} (where \Omega(1) tends to infinity as n tends to infinity). That way we care about the weights only up to a scalar multiple, so we do not have to normalize them carefully in advance.

The sequence 1, -1, 0, 1, -1, 0, … shows that we cannot take b_i=1 for every i. (I have already said this above, in a very slightly different way.) Indeed, its discrepancy is 1, but (\sum_ib_ix_i^2)/(\sum_ib_i) would be around 2/3 if we took b_i=1 for every i. And the ratio of these two is not unbounded.

The same sequence disposes of many other systems of weights. For example, suppose we were to choose b_i randomly to be 1 with probability p and 0 otherwise. Then with very high probability \sum_ib_ix_i^2 would be close to 2pn/3 and \sum_ib_i would be close to pn. Therefore, once again we would have a quantity that is close to 2/3 for the ratio, which does not tend to zero.

Indeed, any system of weights that is not concentrated in the multiples of 3 will have the same defect, for precisely the same reason. To put that more precisely, suppose that \sum_{3\not |m}b_m=c\sum_mb_m. Then with x_1,\dots,x_n as above, we have (\sum_ib_ix_i^2)/(\sum_ib_i)=c^{-1}, while the discrepancy of the sequence (x_i) is 1. Thus, for the weights b_i to yield a proof of EDP we need \sum_{3\not |m}b_m=o(\sum_mb_m).

This simple observation can be generalized in many directions. For instance, it is easy to prove in a similar way that we need \sum_{3\not |m}b_m=o(\sum_mb_m). It is also easy to prove in a similar way that we need \sum_{3|m,9\not |m}b_m=o(\sum_mb_m). A general message from this argument is this: the weights b_m must be strongly concentrated on numbers with many factors. And when one thinks about this, it is not too surprising: the numbers with plenty of factors are precisely the numbers that belong to many HAPs and therefore the numbers m for which it should be in some sense difficult to incorporate a large real number x_m into a sequence x_1,\dots,x_n with low discrepancy (because there are likely to be many constraints on x_m).

But let us stay for the moment with elementary observations rather than thinking about what sequences there are that are concentrated on numbers with many factors. We know that for every k the proportion of the total weight of the b_m on values of m such that 3^k|m but 3^{k+1}\not |m is o(1). One way of achieving that is to n=3^r and to choose a set B that contains, for each 0\leq k<r, exactly one element that is divisible by 3^k but not by 3^{k+1}. One could then let b_m=1 if and only if m\in B.

If B is the set \{1,3,9,\dots,3^{k-1}\}, then it is easy to find an appropriate sequence (x_i): just let x_i=1 if i=3^j for even j, -1 if i=3^j for odd j, and 0 otherwise. The intersection of any HAP with B is either empty or takes the form \{3^r,3^{r+1},\dots,3^{k-1}\}, so this sequence has a discrepancy of 1, while \sum_ib_ix_i^2=\sum_ib_i.

However, if B consists of an arbitrary number that is equal to 1 mod 3, an arbitrary number that is equal to 3 mod 9, and so on, then many more HAPs can intersect B and I have not come up with a method of constructing bounded-discrepancy sequences with a substantial restriction to B. Let me ask this as a formal question. It doesn’t seem to be trivial, but I cannot believe that it is especially hard either.

Question. Let B be a subset of \{1,2,\dots,n\} such that for every k there is no element of B that is congruent to 2.3^{k-1} mod 3^k and at most one element that is congruent to 3^{k-1} mod 3^k. Must there be a sequence of bounded discrepancy that takes values \pm 1 everywhere on B?

It is not in fact necessary for the sequence to take values \pm 1 everywhere on B in order to show that the characteristic function of B is unsuitable. However, since I expect such a sequence to exist, I have stated the question in this stronger form. Probably one can also ask for the sequence to take values \pm 1 or 0 outside B.

I should briefly explain why I have not asked the more general question of whether it is possible to prove that no small set B can work (as opposed to a set that is forced to be small by the divisibility conditions above). That is because cardinality on its own is not enough: for example, one could take B to be a HAP of length k, and then to find a sequence of bounded discrepancy that takes \pm 1 values on B would be equivalent to finding a counterexample to EDP. The idea here is to show that for certain sets B one can exploit the freedom of a sequence outside B to make the discrepancy small.

Nevertheless, there might be a condition of the following kind: perhaps if B has the property that it cannot intersect a HAP of length m in more than \log m elements, then it cannot serve as a system of weights for EDP. And I don’t have any particular reason to think that \log is the right function here — perhaps one could prove a similar result for significantly faster-growing functions.

A candidate for a good system of weights.

I’d now like to resurrect an idea that I mentioned at some point earlier in the discussion. It’s a suggestion for a sequence of weights that seems to me to have some chance of working. And if it doesn’t work for some obvious reason, then it feels like the kind of idea that could be modified in various ways to get round preliminary objections.

Let me start with two ideas that don’t work. The first is to take b_m=1 for every m. We have seen that this doesn’t work — indeed, the proportion of weight it attaches to multiples of 3 is not 1-o(1).

The second is a feeble attempt to get round this obvious problem with the first. We want to give extra weight to numbers with more factors, so how about defining b_m to be d(m), the number of factors of m?

To see that this does not work is slightly harder, but only slightly. Again we shall show that this system of weights does not concentrate on multiples of 3. (Of course, we care about multiples of other numbers too, but if it doesn’t work for 3 then we’re already in serious trouble.) To see this, let us first consider how to estimate \sum_{m\leq n}d(m). This we can rewrite as \sum_{d\leq n}\lfloor n/d\rfloor, since each d contributes 1 to the sum for each multiple m of d that is at most n. And this we can approximate by \sum_{d\leq n}n/d, which is roughly n\log n. (I won’t bother to analyse how good an approximation that is.)

Now let’s consider what happens if we estimate \sum_{m\leq n, 3\not |m}d(m) in a similar way. This time, we can approximate the sum as (2/3)\sum_{3\not |d}n/d, since if d is a multiple of 3 we get no contribution, and if d is not a multiple of 3, we get a contribution of 1 for each multiple of d that is not a multiple of 3, and there are about 2n/3d of those. So the sum works out to be around (4/9)n\log n. Now this is 4/9 of the entire sum, so the usual sequence 1,-1,0,1,-1,0,\dots shows that setting b_m=d(m) does not work.

Nevertheless, we do seem to have achieved something: with this choice of b_1,\dots,b_n we have given less than average weight to non-multiples of 3. The problem is that we have not gone far enough. Now one way of thinking of d(m) is that it is \sum_{d|m}1. A way that one might think of attaching yet more weight to numbers with plenty of factors would be to define d_2(m) to be \sum_{r|m}d(r). This can be rewritten as \sum_{d|r|m}1. That is, it counts pairs (d,r) such that d|r and r|m. A fairly simple calculation similar to the one done above shows that this function again attaches too much weight to non-multiples of 3, but it is an improvement on the previous function. The constant function 1 attaches weight 2/3 to the non-multiples of 3, the function d(m) attaches weight 4/9, and the function d_2(m) attaches weight 8/27 (or 4/9 of what it should if it were unbiased).

This suggests, correctly, that we can obtain the desired o(1) weight if we iterate this process an unbounded number of times. This leads to two suggestions for a system of weights. The first is to choose some slow-growing function k=k(n) and take the function d_k(m) (which is defined inductively by the formula d_k(m)=\sum_{r|m}d_{k-1}(r). Another is to iterate until a fixed point is reached, though for this the definition must be modified somewhat. We obviously cannot find a positive function f such that f(m)=\sum_{r|m}f(r), but we can find a function f such that f(m)=\sum_{r|m, r<m}f(r). Indeed, if we set f(1)=1, then the first few values are f(2)=1, f(3)=1, f(4)=2, f(5)=1, f(6)=3, f(7)=1. Let me now just write the sequence: it goes 1,1,1,2,1,3,1,4,2,3,1,8,1,3,3,8,1,8,1,8,\dots.

This sequence can be found on Sloane’s database. (It’s sufficiently natural that I’d be perturbed if it couldn’t.) I haven’t yet understood everything it says there about it, but there appear to be some facts there that might conceivably be of use.

So here is an obvious question: is there some reason that it cannot work in a proof of EDP? Let me say once again what this means. For this function to work in a proof of EDP we would need to be able to prove an inequality that said that for every sequence x_1,\dots,x_n the HAP-discrepancy is always at least \Omega(1)(\sum_{m\leq n}f(m)x_m^2)^{1/2}/(\sum_{m\leq n}f(m))^{1/2}. So to prove that it does not work it is sufficient to find a sequence (I’ve been assuming it will be real, but even a vector-valued sequence would do) of bounded discrepancy such that \sum_{m\leq n}f(m)x_m^2\geq\sum_{m\leq n}f(m) for some constant c>0 that is independent of n.

One possible reason for its not working might be that it grows too fast, or perhaps has a subsequence that grows too fast. Let me try to explain what might conceivably go wrong.

To do this, we need a more explicit definition of f. It can be defined as follows: f(m) is the number of sequences d_1,\dots,d_k such that 1 < d_1 < d_2 < \dots < d_k < m and for each i < m we have d_i|d_{i+1}. Here we include the empty sequence as a sequence. (An equivalent definition would be to ask for sequences that start with 1.) It is easy to check that this function satisfies the relation f(m)=\sum_{d|m, d<m}f(d) and the initial condition f(1)=1. And to see how it works in a concrete case, let's take f(12). The possible sequences are (), (2), (3), (4), (6), (2,4), (2,6) and (3,6).

In general, working out f(m) is slightly complicated. If m=p^k for some prime p, then we can take any subset of \{p,p^2,p^3,\dots,p^{k-1}\} in increasing order, so f(m)=2^{k-1}. If m is a product p_1\dots p_k of distinct primes, then for each l \leq k there is a one-to-one correspondence between the sequences of length l-1 that we can take and the set of surjections from \{1,\dots,k\} to \{1,\dots,l\}. Indeed, given a surjection \phi, define m_i to be the product of all p_j such that j\in\phi^{-1}(\{i\}) and take the sequence m_1,m_1m_2,\dots,m_1\dots m_{l-1}. Conversely, from the sequence we can recover the m_i and hence the function \phi, which has to be a surjection since the sequence is strictly increasing.

Unfortunately, there is no nice formula for the number of surjections from a set of size k to a set of size l, but we can think of f(m) as being a function that behaves fairly like k!.

So how big can f(m) be if m\leq n? The biggest it can be for a prime power is n/2, since the best possible case is n=2^k. As for products of distinct primes, the best we can do is if we take the first k primes. The product of these we can sloppily estimate to be about ((k\log k)!)^{1/\log k}, which is around ((k\log k/e)^{k\log k})^{1/\log k}, which in turn is around (k\log k/e)^k. By contrast, k! is around (k/e)^k, so in this case f(n) appears to be much smaller than n. However, it is also a lot bigger than 1, so it would appear that the weight attached to a big power of 2 does not by any means dominate the sum.

Given that evidence, I find it not quite clear whether we should be happy with the function f as already defined, or whether we should consider weighting it in some way so as to make it roughly the same size throughout the range. At this stage it is perhaps enough to say that we have the option of attempting the latter. (I am mindful of the experimental evidence discovered by Moses and Huy, which suggests that the optimal function has a tail that shows some kind of exponential decay. I don't have any kind of theoretical argument for that.)

How might one prove the result we want?

In my previous post on EDP, I suggested various techniques for finding delicate examples. One of them was recursion, which I rather dismissed. However, I am interested in it again now, not least because the function f above has a natural recursive definition.

How might that lead to a proof? Well, here’s a suggestion. Suppose we find a non-trivial way of writing the identity matrix as a linear combination \sum_i\lambda_i P_i\otimes Q_i, where the P_i and Q_i are HAPs and P_i\otimes Q_i is the rank-1 matrix with xy-entry equal to 1 if x\in P_i and \in Q_i and 0 otherwise. We know that this will not be a useful decomposition of the identity. However, the formula f(m)=\sum_{d|m, d<1} can be rewritten f(m)=\sum_{d|m}f(d)g(m/d), where g(r)=1 if r>1 and 0 if r=1. It follows that the diagonal matrix D_f that has f down the diagonal can be expressed as a sum \sum_r f(r)G_r, where G_r is the diagonal matrix that takes the value 1 at all multiples of r apart from r itself, and 0 everywhere else. (To see this, note that \sum_rf(r)G_r(m,m)=\sum_{r|m,r<m}f(r)=f(m).)

If we have a way of decomposing the identity (for all n) then we can subtract the matrix that’s 1 at (1,1) and 0 everywhere else, to obtain the matrix G_1. And then we can use the same method to obtain decompositions of all G_r, and finally we can take an appropriate linear combination to obtain the matrix D_f. So far, this achieves nothing much, but note that our decompositions involve negative as well as positive coefficients. If we can do it in a way that involves many HAPs with many common differences, then perhaps there is some hope that there will be significant cancellation so that the decomposition of D_f is more efficient than the decompositions of the individual matrices G_r.

Further remarks.

I’ll start by briefly pointing out that if the above approach is to work, then it will have to use HAPs that are not “full”. (By a full HAP I mean one of the form \{d,2d,\dots,\lfloor n/d\rfloor d\}.) That is because the sequence that is 1 up to n/2 and -1 thereafter has bounded discrepancy on all full HAPs. More generally, for a decomposition of a diagonal matrix to work, it must lead to a discrepancy proof for the class of HAPs used in the decomposition, so one should not try to build the matrix out of some restricted class of HAP products unless there is a chance that the theorem is still true for that restricted class.

A second remark is essentially the same as this recent comment of Gil’s. It’s that we already know that the extremal examples — by which I mean long \pm 1 sequences with low discrepancy — have to have some kind of multiplicative structure. This information ought to help us make our guesses. At the time of writing I don’t have a clear idea of how it would narrow down the search, so in order to pursue this thought, the first thing to do would be to consider exactly that question: if you have information about what the worst sequences are like, how does that affect the possible matrix decompositions?

Could the weights be multiplicative?

Since it’s relevant to that last question, let me mention a system of weights I thought of that doesn’t seem to work, but that might perhaps be salvageable. The idea is to define weights b_i such that b_{rs}=b_rb_s for every r and s.

An instant problem with that idea is that it seems to be hard to concentrate the weights on multiples of 3. Indeed, either b_3 tends to infinity, or the sum of the first n/3 weights tends to zero as a proportion of the sum of the first n weights. That is because \sum_{3|m}b_m=b_3\sum_{m\leq n/3}b_m, by multiplicativity.

The second condition essentially means that the sequence should grow faster than any power of n. But if we set b_p to be the same for every prime p, as seems quite natural, then we find that the sequence grows at most polynomially (since the biggest we can get is at powers of 2, and b_{2^k} would be C^k, which is a power of 2^k). So we seem to be forced to take C to be a “constant” that tends to infinity. But then, if some back-of-envelope calculations are correct (but I’m far from sure about this), the weight of the sequence is too concentrated on large powers of 2. In fact, I think it may even be concentrated at just the single largest power of 2 less than n, which is obviously disastrous.

So it looks as though multiplicative sequences are out, but this claim needs careful checking as it could be rubbish. (One way it could be rubbish is if it is possible to salvage something from the idea by having different values at different primes, but at the moment I’m not very convinced by that idea.)

A rather vague general question.

The following question is important, even if hard to make precise: how much does the choice of the weights b_i matter?

Let me at least try to pin the question down a little bit. We know that there are certain simple constraints that the b_i have to satisfy, and also that these constraints narrow down the choice quite a bit. However, they do not seem to determine the weights anything like uniquely. It is quite possible that with a bit more thought we could come up with some more constraints. In fact, that is clearly something we should try to do, so let me interrupt the discussion to ask that more formally.

Question. Are there some other kinds of constraints that the b_i must satisfy?

In relation to this question, it is worth looking at this comment of Sune’s, and the surrounding discussion.

Even if the answer is yes, it is clear that the constraints will not completely determine the sequence b_1,\dots,b_n. (Why? Well, given one matrix decomposition that works, one can build other ones out of it in artificial ways. I leave that as an easy exercise.) But do they come close? For example, is there some low-dimensional space such that every system of weights that works is a small perturbation of a sequence in that space?

I don’t really care about that as a mathematical question so much as a strategic question. From the point of view of solving EDP, can we afford to be reasonably relaxed about the choice of the b_i and focus our attention on the decomposition, or will we find that the desired decomposition does not exist unless we choose the b_i extremely carefully?

A summary of what I have said above.

I hope very much that it will be possible to revive the EDP discussion, even if the pace is slower than before. It seems to me that an excellent topic to concentrate on for a while would be the question of how to find a system of weights b_1,\dots,b_n such that the square of the HAP-discrepancy of every sequence x_1,\dots,x_n is always at least \Omega(1)(\sum b_ix_i^2)/(\sum b_i). Amongst the subquestions of this question are whether the sequence I suggested above has a chance of working, “how large” the set of sequences that could work is, and whether there are constraints of a completely different kind from the ones considered so far.

I have considered real sequences, but if a sequence of weights is to work, we also need the inequality to generalize to vector-valued sequences. That is, we need the square of the discrepancy of x_1,\dots,x_n to be at least \Omega(1)(\sum b_i\|x_i\|^2)/(\sum b_i). A further question of interest is whether generalizing to vectors introduces interesting constraints that are not felt if one just looks at scalar sequences.

105 Responses to “EDP15 — finding a diagonal matrix”

  1. Gil Kalai Says:

    Let me repeat my last comment from EDP14 which perhaps is better placed here.

    In the recent two related approaches of Mozes’ SPD and of Gowers’s repreentation of diagonal matrices we somehow hope that the SPD and LP problem “will do all the work”. (I think this is also the case for the ideas in EDP15.)

    However, in earlier discussions we gained some additional information on how an example of bounded discrepency should look like. In particular, Tao’s argument suggests that a counterexample will have diminishing correlation with every character on every HAP. (Terry proved it for multiplicative sequences but we can expect that this extends to general sequences.)

    Maybe it can be fruitful to add the condition of “small correlation with a caracter on every HAP” to Mozes’ SPD or Gowers’s LP, and hope that the SPD will more rapidly give good bounds and oerhaps allow to detect some pattern with such extra assumption. At the time, dealing with (say) multiplicative functions with diminishing correlation with every character lookes appealing.

    Talking about sequences with diminishing correlation with every nontrivialcharacter, suppose that you have a multiplicative sequence with REALLY small correlation with every character. (E.g. whan you take the correlation for sequences of size n it is as small as n^{-beta} for beta>0. what can you say then about the discrepency?

    • gowers Says:

      Gil, I am intrigued by what you write (as I said in the post above) but can you say in more detail what it means to add the small-correlation condition to the SDP or LP? I don’t fully understand this.

    • Gil Kalai Says:

      Tim, my comment is related to some early result of Terry but maybe I missed or forgot something. If I remember correctly while the low-diecrepency examples are rather periodic i.e. they have high correlation with a non trivial character there was an argument showing that for the multiplicative case a bounded discrepency sequence cannot have substantial correlation with a non trivial character.

      If I “translate” this statement to a general sequence it seems appealing (and perhaps provable) that when a sequence has bounded-away-from-zero correlation with a nontrivial character on some HAP then it has unbounded discrepency.

      So we can try to add the condition: the sequence has a small correlation with every non trivial character on every HAP to the SDP. Actually, you suggested something similar in EDP8:to start with the case that the sequence is quasirandom. So the question is now that we are equiped with SDP and SP ideas (EDP10 and EDP12) can we add some sort of quasirandomness condition to these programs.

    • gowers Says:

      Here is a thought about how that might work in practice — but let me know if it is not the kind of thing you had in mind.

      Suppose we wanted to use the decomposition-of-diagonal approach. The idea, roughly, is to write a diagonal matrix with unbounded trace as a linear combination of HAP products with bounded sum of absolute values of coefficients. Then a simple argument shows that every \pm 1 sequence must have unbounded correlation with some HAP. If we want to show this under the additional hypothesis that the sequence does not correlate with a non-trivial character, then that is equivalent to showing that every sequence correlates either with a HAP or with a non-trivial character. So one could try decomposing a diagonal matrix into products not just of HAPs but also of non-trivial characters. This ought to be strictly easier, but still very interesting.

  2. gowers Says:

    Since writing the post above, I have had a small further thought about the function f (the one defined recursively by the formula f(1)=1 and f(m)=\sum_{d|m,d< m}f(d)). It strikes me that that formula depends only on the multiplicities of the primes in the prime factorization of m, whereas from the constraints one might expect a function that also depended on the primes themselves.

    On second thoughts, that’s not what I want to say. In fact, I like the function as it is, but let me say why I thought for a moment that I didn’t like it, and why I have changed my mind (back, I now realize, to what it was before).

    Roughly what do we expect the ratio of \sum_{3\not |m}f(m) to \sum_mf(m) to be? This is asking how much bigger we expect a random f(m) to be if we condition on m being divisible by 3. I think that this increases the expected number of prime factors by some positive constant that would probably be between 0 and 1 (I’d be interested in a rigorous proof of this), and if so then since the expected number of prime factors is around \log\log n, and since f behaves a bit like the factorial of the number of prime factors (it’s different if they are not distinct, but usually they will be mostly distinct) that ought to increase the expected value of f by a factor of around \log\log n. So we do seem to get the small weight on non-multiples of 3. (Note, however, that if these very crude heuristics are correct, then it looks as though this set of weights cannot prove better than a bound of \log\log n.)

    Now if p is bigger, then we do not need the weight to be quite so concentrated on multiples of p (since the sequences we know of that are supported on non-multiples of p and have bounded discrepancy have discrepancy something like c\sqrt{p}, so the larger p is, the less impact such sequences have on the set of possible sequences of weights). But this is good, because it isn’t quite so concentrated on multiples of p.

    Let me try to justify that claim. Some further heuristic arguments in the post suggest that on average f grows more slowly than any power of n. (The rough idea of the proof would be that it is smaller than any power of n for products of distinct primes, even in the worst case, and other numbers such as powers of 2 are just too rare to make a significant difference. All these “facts” are things that could be worth proving rigorously — I wouldn’t expect it to be all that hard.) This means that the ratio of \sum_{m\leq cn}f(m) to \sum_{m\leq n}f(m) will be roughly c unless c is o(1). So if p is bounded above by a constant C, then \sum_{m\leq n/p}f(m) is about (\log\log n)^{-1}\sum_{p|m,m\leq n}f(p), which means that the latter is about …

    Actually, I’m about to prove a nasty contradiction. I’ve had this trouble a few times, so let me say what it is, in case somebody can tell me where the heuristic argument is going wrong. My problem is this.

    1. It looks as though on average f grows more slowly than any power of n.

    2. It ought to follow that for any prime p, \sum_{m\leq n/p}f(m) is approximately p^{-1}\sum_{m\leq n}f(m).

    3. But multiplying a number less than n/p by p ought on average (unless p is fairly large) to increase f of that number by about \log\log n, or at least by some factor that tends to infinity.

    4. But since everything is positive, that is a contradiction, since it would show that \sum_{p|m,p\leq n}f(p) is greater than \sum_{p\leq n}f(p).

    So I need to look more carefully at how f behaves. But can anyone see which of the above highly non-rigorous arguments is wrong?

    • Sune Jakobsen Says:

      I don’t understand 4. What is m?

    • gowers Says:

      Sorry, 4 was a complete mess. It should have read as follows.

      4. But since everything is positive, that is a contradiction, since it would show that \sum_{p|m,m\leq n}f(m) is greater than \sum_{m\leq n}f(m).

    • Uwe Stroinski Says:

      Plotting f seems to indicate that the ‘bad’ candidates are multiples of 24 (e.g. f(192)=256 or f(240)=368) and that n^-1 sum_{i=1}^n f(i) grows faster than n^1/2. So maybe we should have a closer look at step 1 of the argument.

    • Alec Edgington Says:

      Indeed, it appears that \log ( \frac1n \sum_{m \leq n} f(m) ) / \log n is a (very slowly) increasing function of n. I’ve plotted it up to n = 30\,000 here.

    • Alec Edgington Says:

      (For avoidance of doubt, I don’t mean ‘increasing’ in a strict sense; I mean that the tendency is to increase.)

    • gowers Says:

      OK I’ll try to think of a theoretical argument for that, unless someone else does first. Basically it’s good news though. If I read it correctly, it’s saying that the function grows very slightly quicker than any polynomial, which is necessary if the above contradiction is to be avoided without sacrificing 3 (which I’d rather not sacrifice if possible). But it would also mean that there was no possibility of the whole sum being dominated by a tiny handful of values.

    • Alec Edgington Says:

      It’s not quite saying that f(n) grows faster than any polynomial: it is still possible that \frac1n \sum_{m \leq n} f(m) \sim n^\alpha for some \alpha

    • Moses Charikar Says:

      Alec, what happens to your plot if you have the x-axis in log scale ? Or is there any other scaling for the plot that suggests the rate of growth of the function ?

    • gowers Says:

      Ah, I didn’t look at the plot and interpreted “very slowly increasing” to mean “tending to infinity very slowly”.

    • Alec Edgington Says:

      Moses: here is a plot of the same function with n on a logarithmic scale.

      If f(n) does grow on average like n^\alpha, we can make a guess at what \alpha should be. (Perhaps this even constitutes a hand-waving argument why it should be so.) We have \sum f(n) n^{-s} = (2 - \zeta(s))^{-1}, which has a simple pole at s = \zeta^{-1}(2) \approx 1.7286. But \sum n^\alpha n^{-s} = \zeta(s - \alpha), which has a simple pole at s = 1 + \alpha. So we might expect \alpha = \zeta^{-1}(2) - 1 \approx 0.7286. This at least appears consistent with the graph.

    • Moses Charikar Says:

      Alec, what does your data say about \sum_{m \leq n, p \not |m}  f(m) versus \sum_{m \leq n} f(m) ?

    • gowers Says:

      I’ve just had another attempt at thinking about the growth rate of f and come to a similar conclusion for a different reason. Whether that constitutes convincing evidence I don’t know …

      Here is the argument. We are interested to estimate F(n)=\sum_{m\leq n}f(m), which equals the sum over k of the number F_k(n) of k-tuples (d_1,\dots,d_k) such that 1<d_1<\dots<d_k\leq n and d_1|d_2|\dots|d_k. Let's estimate F_k(n) by thinking about how many such k-tuples have at least one d_{i+1}/d_i equal to t. An upper bound for this is (k-1)F_{k-1}(n/t) (to be a bit cavalier about integer parts), since the maximum ratio can occur in k-1 places, and once you know where it occurs you get the rest by finding a sequence of length k-1 of numbers that divide each other, with the maximum number in the sequence at most n/t. This is an upper bound because it overcounts sequences where t occurs more than once.

      Now F_1(n)=n. So F_2(n) is at most \sum_{2\leq t\leq n}n/t which is roughly n\log n. I would expect this process to stabilize at some point. If we make the assumption that it stabilizes at some power n^\alpha, then we ought to have (if we ignore the factor k on the grounds that it is much smaller than n^\alpha) something like that n^\alpha=\sum_{2\leq t\leq n}(n/t)^\alpha, which is roughly n^\alpha(\zeta(\alpha)-1). So we get \zeta(\alpha)=2.

      This is consistent with what Alec says above, since he is looking at the average of f(m) up to n while I am looking at the sum.

    • Alec Edgington Says:

      Moses: is this the kind of thing you mean? These are plots of \sum_{p \nmid m \leq n} f(m) / \sum_{m \leq n} f(m) for n up to 30\,000, for p = 2, 3, 5, 7.

    • Alec Edgington Says:

      Here are the same data with n plotted on a logarithmic axis. From these, I’d say it looks fairly likely that the ratios are tending to zero — in other words, that f is concentrated on multiples of p (for each prime p)…

    • Alec Edgington Says:

      Oops: link.

    • Alec Edgington Says:

      Even more convincing is a log-log plot.

    • Moses Charikar Says:

      Very convincing indeed !

    • Alec Edgington Says:

      I don’t have time to check this now, but it seems to me possible that an adaptation of Tim’s argument above might show that \sum_{p \nmid m \leq n} f(m) \approx n^{\alpha_p}, where \alpha_p < \zeta^{-1}(2): instead of F_k(n) we would be counting the k-tuples (d_1, \ldots, d_k) such that d_1 \mid \ldots \mid d_k \leq n and p \nmid d_i.

    • gowers Says:

      Alec, many thanks for those plots. Let me now ask once again why they don’t imply a contradiction. (I must, once again, be missing something obvious.)

      Here’s why I think they do lead to a contradiction, even though I know they can’t.

      1. If F(n)=\sum_{m\leq n}f(m) grows like n^{\alpha}, then F(n/3) is approximately 3^{-\alpha}F(n).

      2. If m is a typical number, then F(m) should be o(F(3m)). (I think this is the incorrect statement, but I’ll try to justify it below.)

      3. It follows that \sum_{m\leq n, 3|m}f(m) is much bigger than \sum_{m\leq n}f(m), an obvious contradiction.

      Ah, at the moment that I start to try to justify 2, I think I see where I am going to fail. The crude argument was this: a typical number has about \log\log n prime factors, most of which are distinct; if m has k distinct prime factors then f(m) is something that behaves a bit like k!; therefore, multiplying by a prime should increase the value of f by something like k, which is usually around \log\log n, which tends to infinity.

      However, f behaves rather differently at prime powers: we know that f(p^r)=2^{r-1}. So in this extreme case, if m is a power of p then f(pm)=pf(m), giving us a ratio that does not tend to infinity.

      Now we also know, more or less, that f is concentrated on smooth numbers. So we will expect, for instance, that almost all the measure of f will be concentrated on numbers m such that 3^k|m, for some k that tends to infinity with n (possibly very slowly). I haven’t checked, but it now seems a highly plausible guess — since I don’t otherwise see how to avoid a contradiction — that if a high power of 3 divides m, then f(m) is not o(f(3m)).

      If we want to get a feel for how f behaves, then proving this last fact rigorously would seem to be quite a good idea. If nobody here can do it in the near future, then I’ll post a question on Mathoverflow.

    • gowers Says:

      I now have a nonrigorous argument that does indeed suggest that f(3m) is usually not all that much bigger than f(m) when f(m) itself is large. It feels like a better argument than some of my earlier nonrigorous arguments, in that there might be some hope of making it precise.

      The argument goes like this. A typical m for which f(m) is large will have many small factors. In particular, factors like 2 and 3 will occur with high multiplicity. Let us now define a bipartite graph as follows. The vertices on one side are strictly increasing k-tuples (where k does not have to be the same for each vertex) of numbers (d_1,\dots,d_k) such that each d_i divides the next, d_1 > 1, and d_k = m. The vertices on the other side are the same but with m replaced by 3m. We join (d_1,\dots,d_k) to (e_1,\dots,e_k) if there exists i such that e_j=d_j for every j < i and e_j=3d_j for every j\geq i. (It would perhaps have been nicer to define these sequences by the ratios of successive d_i.)

      The degree of each vertex (d_1,\dots,d_k) is obviously k. In the reverse direction, the degree of (e_1,\dots,e_k) is equal to the number of i such that 3|e_i/e_{i-1}. At this point rigour flies out of the window: it seems reasonable to guess that for the ultra-smooth ms that we are looking at, and for a typical value of k (if we choose a random e_1,\dots,e_k satisfying the conditions), a constant proportion of the ratios e_i/e_{i-1} are divisible by 3. If that is so, then a simple double-couting argument tells us that the number of vertices on the 3m side is at most a constant times the number of vertices on the m side. Since we sort of know that the conclusion must be true, this feels like a fairly plausible explanation for it. But I'd still like to understand better why so many ratios are divisible by 3.

    • Alec Edgington Says:

      I think that the only modification to Tim’s argument above needed to calculate the exponent \alpha_p in \sum_{p \nmid m \leq n} f(m) \approx n^{\alpha_p} is the replacement of \sum_{2 \leq t \leq n} (n/t)^\alpha by \sum_{2 \leq t \leq n, p \nmid t} (n/t)^\alpha. If the argument is correct then \alpha_p must satisfy (1 - p^{-\alpha_p}) \zeta(\alpha_p) = 2. This gives \alpha_2 \approx 1.38, \alpha_3 \approx 1.55, \alpha_5 \approx 1.65, and \alpha_7 \approx 1.69. I’m not hugely confident about this, but it appears consistent with the graphs.

    • Moses Charikar Says:

      This sounds reasonable. Do you see a way to estimate \sum_{p^2 \nmid m \leq n} f(m) and more generally, \sum_{p^k \nmid m \leq n} f(m) ? These estimates would lend confidence to Tim’s assertion that almost all the measure of f should be concentrated on numbers m such that 3^k|m.

    • Alec Edgington Says:

      The only thought I have at the moment is that it may be useful to get some kind of handle on the generalization of f — call it f_k for the moment — defined by f_k(n) = \sum_{a_1 \cdots a_r = n, a_i > 1} {2r+1 \choose k}. Note that f_0 = f. Then I think we have f(p^k n) = f_k(n), since given an ordered factorization of n each new factor p can be combined with one of the existing r factors or inserted in one of the r+1 gaps.

    • Alec Edgington Says:

      Sorry, I’ve just realized that’s wrong. It isn’t the choose function that we want, it’s a more complicated partition function…

  3. Polymath5 « Euclidean Ramsey Theory Says:

    […] Polymath5 By kristalcantwell There is a new thread for Polymath5 here. […]

  4. Sune Jakobsen Says:

    I hope that you’ll succeed at restarting this project, but unfortunately I don’t have too much time the next month (due to exams, IMO, and IMC), but I probably can’t resist thinking about the problem from time to time.

    In the post you ask for “system of weights b_1,\dots,b_n such that the discrepancy of an arbitrary real sequence x_1,\dots,x_n is always at least \Omega(1)(\sum_ib_ix_i^2)^{1/2}/(\sum_i b_i)^{1/2} (where \Omega(1) tends to infinity as n tends to infinity).” It is not clear to me if you allow the b_i’s to depend on n (in this case there is no reason to divide by (\sum_i b_i)^{1/2}. This is (1) below) or not (in this case we are really considering a infinite sequence b_1,b_2,\dots . This is (3) below).

    1. The most general version of this approach is to find a sequence (indexed by n) of (longer and longer finite) sequences b_1,b_2,\dots b_n such that any real sequence x_1,x_2,\dots x_n has discrepancy at least (\sum_ib_ix_i^2)^{1/2} and such that the sums of the b’s tends to infinity (or at least is unbounded) as n\to \infty. If this is not possible, Moses’ approach cannot work.

    2. Finding an infinite sequence of finite sequences is complicated, so we want to simplify things. One thing we could do is to require the b_i to not depend on n. In this case we are just using an infinite sequence b_1,b_2 and for each n we only look at the first n terms. However, our experimental results suggest that for fixed n in (1), b_i is large for some large values of i. That is, the size of b_i depends on the size of i/n (and of course the number of divisors of i). That suggests that this cannot work.

    3. One possible solution is the one Gowers use in this post (dividing by (\sum_i b_i)^{1/2}). Is this a new idea?

    4. Another possibility is to let b be a sequence over the rational numbers in (0,1] or (0,1) instead of over the integers. Here the idea is that you can find a sequence b_1,b_2,\dots b_n to use in (1) by looking at b_{1/n},b_{2/n},\dots b_{n/n} in this approach.

    Any other suggestions?

    Clearly 1 is stronger that 2, 3, and 4 (meaning that if you can find a sequence to use in 2, 3 or 4, you can find one to use in 1), but what is the relative strength of 2, 3, and 4? I think I would prefer to work on 3, but I can’t really justify it.

    About the vector-question: Isn’t it trivial that a weight system works for vectors iff it works for real numbers?

  5. Alec Edgington Says:

    One interesting property of Tim’s function f(n) that is mentioned on Sloane’s database is that its Dirichlet inverse is the sequence 1, -1, -1, -1, \ldots. (So \sum f(n) n^{-s} = (2 - \zeta(s))^{-1}.) Functions with Dirichlet inverse bounded by 1 are a kind of generalization of completely-multiplicative \pm 1-valued functions: see this wiki page. This one has the extra property of being non-negative. I wonder how special that is.

    • gowers Says:

      Here’s another argument where I get stuck and can’t see what I’m doing wrong. Perhaps you can help out.

      The inductive definition of f is that f(n)=\sum_{d|n}f(d)g(n/d), where g(m)=0 if m=1 and 1 otherwise. Now it seems to me that the normal rate of growth of f is slower than polynomial, with the occasional abnormally high value, but that f(n)\leq n for every n. If that is correct, then the Dirichlet series F(s)=\sum_nf(n)n^{-s} converges at the very least when \Re(s)>2, and almost certainly when \Re(s)>1. (All these assertions are unchecked facts, and given that I’m about to run into difficulties, the chances that some of them are wrong are quite high.)

      Now the inductive definition seems to tell us that F(s)=F(s)(\zeta(s)-1), since the Dirichlet series associated with g is \zeta(s)-1 and f is the Dirichlet convolution of itself with g. But that is ridiculous — it would tell us that F(s) was zero whenever \zeta(s)\ne 2, and hence, since it is analytic, zero everywhere where the series converges.

      I’m obviously making some silly mistake here, but I just can’t see what it is.

    • Alec Edgington Says:

      I think the flaw in your argument is that the inductive definition you wrote down is false for n=1. Taking this into account, the correct relation is F(s) (\zeta(s) - 1) = F(s) - 1, which gives F(s) = 1/(2 - \zeta(s)).

    • gowers Says:


  6. porton Says:

    If your EDP project is stalled, can we start a new polymath project in near future?

    I think this my conjecture is a good candidate for the next polymath project:

  7. David Says:

    I tried to post this in the appropriate thread above, but I couldn’t make it work. So, let me just say that it is not hard to show that \sum_{n\leq X}f(n) \sim c\cdot X^{\alpha} for some constant $c$, with \alpha=\zeta^{-1}(2). This follows from the fact that 1/(2-\zeta(s)) has no poles in \mathrm{Re}(s) \geq \alpha aside from s=\alpha, hence the Weiner-Ikehara Tauberian theorem applies. To see my statement about poles, note that for any fixed real \sigma > 1, the function \zeta(\sigma)- | \zeta(\sigma+it)| is nonnegative, and vanishes only at $t=0$.

    • Alec Edgington Says:

      Ah, thanks. Now I am much more confident that \sum_{p \nmid m \leq n} f(m) \sim c n^{\alpha_p} where (1 - p^{-\alpha_p}) \zeta(\alpha_p) = 2. Let f_p(n) be f(n) if p \nmid n and zero if p \mid n. Then for all n > 1, f_p(n) = \sum_{d \mid n, d < n, p \nmid \frac{n}{d}} f_p(d) = \sum_{d \mid n} f_p(d) g_p(\frac{n}{d}) where g_p(n) is 1 if n > 1 and p \nmid n, and zero otherwise. So, writing F_p(s) = \sum_n f_p(n) n^{-s}, we have F_p(s) - 1 = (\sum_n g_p(n) n^{-s}) F_p(s) = ((1 - p^{-s}) \zeta(s) - 1) F_p(s), so that F_p(s) = (2 - (1 - p^{-s}) \zeta(s))^{-1}. Now if my understanding is right, we just need to check that F_p(s) has no poles to the right of our \alpha_p.

    • David Says:

      Alec, this will indeed always work – F_p(s) is pole-free to the right of \alpha_p and only has a pole at $s=\alpha_p$ on the line Re(s)=\alpha_p.

  8. gowers Says:

    If it is correct that the function f is concentrated at numbers m that are divisible by high powers of small primes, then estimating f at products of distinct primes may not be all that important to us. However, I went to a bit of effort to do so yesterday, so let me report what I came up with. (Incidentally, I’m sure that everything I say in this comment is either known or incorrect.)

    Suppose, then, that m=p_1p_2\dots p_k. Then the sequences we are counting are in one-to-one correspondence with ordered partitions of \{1,2,\dots,k\} into non-empty sets. For instance, given the partition A_1\cup\dots\cup A_r, one sets u_i=\prod_{j\in A_i}p_j and then one sets d_i=u_1u_2\dots u_i for i=1,2,\dots,r.

    Now the ordered partitions of [k]=\{1,2,\dots,k\} into r non-empty sets are in one-to-one correspondence with the surjections from [k] to [r]. The normal way of counting these is to use the inclusion-exclusion formula, applied to the sets X_1,\dots,X_r, where X_i is the set of functions from [k] to [r] that miss i. This tells us that the number of surjections is

    r^k-\binom k1(r-1)^k+\binom k2(r-2)^k-\dots.

    Now this formula as it stands is not much use to us because what we really want to know is the approximate number of surjections, and here we have an exact formula with a lot of cancellation. So let us do some approximating. If we take out a factor r^k then a typical term will be \binom ks(1-s/r)^k. If we then approximate (1-s/r)^k by e^{-sk/r} we end up approximating the whole sum by

    r^k(1-\binom k1 e^{-k/r}+\binom k2 e^{-2k/r}+\dots,

    which, by the binomial theorem, equals r^k(1-e^{-k/r})^k.

    As a quick sanity check, let us see what happens in the extreme cases r=k and r=1. In the first case we get k^k(1-1/e)^k. This is not very good, because we should be getting k!, which is more like k^ke^{-k}. And it’s even worse when r=k+1 when we should get zero but in fact get a non-zero answer.

    Leaving those problems aside for now, let’s see what happens when r is small. Then we get almost exactly r^k, since e^{-k/r} is much smaller than 1/k (and therefore (1-e^{-k/r})^k is almost exactly 1). This is what we would hope, since almost every function from [k] to [r] is a surjection when r is small.

    Continuing to pretend that the first estimate was a good one, let us think about how to maximize r^k(1-e^{-k/r})^k for given k. Clearly if r is small enough for the second term to be approximately 1, then we want to increase it. To see roughly where the maximum occurs, let us take logs. That gives us k\log r - k\log(1-e^{-k/r}). At the maximum, e^{-k/r} will be reasonably small (if k is large) so we can probably approximate \log(1-e^{-k/r}) by -e^{-k/r}, which gives us k(\log r-e^{-k/r}). To maximize that let’s differentiate the part in the brackets, which gives us r^{-1}-kr^{-2}e^{-k/r}. For that to be zero, we need r/k=e^{-k/r}. Unfortunately, x=e^{-1/x} has no positive solutions. But perhaps that is not really unfortunate and is just telling us that the number of surjections is biggest when k=r.

    Another sanity check: how many surjections are there from [k] to [k-1]? This is not too hard to calculate because we know that precisely one value must be taken twice, and all the others must be taken once. The number of ways of choosing the value to be taken twice is k-1. The number of ways of choosing the two preimages of that value is \binom k2. The number of ways of choosing the rest of the function is (k-2)!. So it looks as though the number of surjections is (k-1)!\binom k2, which is definitely bigger than k!=(k-1)!k.

    I’ve got myself into a muddle once again. A possible explanation is that the number of surjections is maximized when r=(1-o(1))k. And in fact, I have a suggestion of why that might be. Perhaps the number is maximized when r is as big as possible such that almost all functions from [k] to [r] are surjections. Indeed, I find this pretty plausible. The probability that a function is not a surjection is at most r(1-1/r)^k, or about re^{-k/r}. That is small as long as r is much smaller than e^{k/r}, or \log r is much smaller than k/r. This tells us that r can go up to about k/\log k (which is not k(1-o(1)) but I now don’t believe that any more). The number of surjections we get should then be roughly r^k. Except that that is (k/\log k)^k, which is a lot smaller than (k/e)^k, which is roughly k!.

    My sanity checks seem to have resulted in insanity, so I’d better stop this rambling comment and see if I can find the answer somewhere, or if anyone knows it. (As I said at the beginning, it must surely be known.)

    • gowers Says:

      I’ve tried posting a question on Mathoverflow to see whether anyone has any idea. So far, some relevant information has been given but no definitive answer yet.

    • Moses Charikar Says:

      Here is how I had tried to calculate this earlier.

      We want the value of f(m) for m = p_1 p_2 \ldots p_k. This is a function g(k) where k is the number of prime divisors of m. Then it is not hard to see that g(n) satisfies the recurrence relation g(n) = \sum_{i=0}^{n-1} {n \choose i} g(i). The first few values of g() are g(0)=1, g(1) = 1, g(2) = 3 and so on.
      Let’s rewrite the recurrence relation in the following way:

      \displaystyle \frac{g(n)}{n!} = \sum_{i=0}^{n-1} \frac{g(i)}{i!(n-i)!}.

      Now define a(n) = g(n)/n!, then a(n) satisfies the recurrence:

      \displaystyle a(n) = \sum_{i=0}^{n-1} \frac{a(i)}{(n-i)!}.

      Consider the generating function $a(x) = \sum_{n \geq 0} a(n) x^n$. Then, from the recurrence relation for a(n), we have the following condition: a(x)-1 = a(x) (e^x -1). In order to verify this, we need to check the coefficient of x^n on both sides of the claimed equality. Assuming this is correct, we get: a(x)(2-e^x) = 1 which implies $a(x) = 1/(2-e^x)$. So a(n) should be the coefficient of x^n in the Taylor series of 1/(2-e^x) around 0. Let’s try to get an expression for this coefficient.

      \displaystyle a(x) = \frac{1}{2-e^x} = \frac{1}{2(1-e^x/2)} = \frac{1}{2}\sum_{t=0}^{\infty} \left(\frac{e^x}{2}\right)^t = \frac{1}{2} \sum_{t=0}^{\infty} \frac{e^{tx}}{2^t} = \frac{1}{2} \sum_{t=0}^{\infty} \sum_{n=0}^\infty \frac{(tx)^n}{2^t n!}.

      So a(n), the coefficient of x^n in this expression ought to be
      \displaystyle \frac{1}{2 n!}\sum_{t=0}^\infty \frac{t^n}{2^t}. Recall that the actual function we wanted was g(n) = n! a(n). Hence we get:

      \displaystyle g(n) = \frac{1}{2} \sum_{t=0}^\infty \frac{t^n}{2^t}.

      I have no idea why this expression is an integer for positive integers n, except that it must be if the calculations are correct. If this is correct, then we can get an approximate expression for g(n) by looking at the largest term in the summation. t^n/2^t is maximized for $n \log t – t \log 2 = 0$ which means t \approx n \log n, so g(n) \approx (n \log n)^n.

      As a sanity check, going back to the generating function a(x), I plugged in the expression $1/(2-e^x)$ into sage and computed the first few terms of the Taylor series expansion. This yielded

      \displaystyle \frac{1}{2-e^x} = 1 + x + \frac{3}{2}x^2 + \frac{13}{6}x^3 + \frac{75}{24}x^4 + \frac{541}{120}x^5 + \ldots

      The numerators of the fractions in the expression above are the values of g(n) = n! a(n), and they seem to match the values obtained from the recurrence relation for g(n). Also plugging in the sequence 1,1,3,13,75,541 into Sloan’s database, you get the sequence A000670 .

    • Moses Charikar Says:

      I made a mistake in estimating g(n) from the expression

      \displaystyle g(n) = \frac{1}{2} \sum_{t=0}^\infty \frac{t^n}{2^t}.

      The term t^n/2^t in maximized for n = t \log 2, i.e. t = n/\log 2, and this gives the approximation \displaystyle g(n) \approx \left(\frac{n}{e \log 2}\right)^n.

    • gowers Says:

      That’s interesting. Since \log 2 < 1, it is (as it has to be) bigger than n!\approx (n/e)^n, but it is a very similar kind of function. I like the argument too.

    • Alec Edgington Says:

      There is a function \phi(k,m) such that \phi(k,m) = f(p^k m) for all primes p not dividing m. We have \phi(0,m) = f(m), and we now know a bit about the growth of this, but it would be useful to understand the growth of \phi(k,m) more generally. This would allow us, for example, to test whether some easily-defined sequences with low discrepancy can actually ‘break’ f as a candidate for a proof of EDP, or not.

      Note that \phi(k,m) depends only on the multiplicities in the prime facrtorization of m. Here are some simple cases: \phi(k,1) = 2^{k-1} for k \geq 1 (and \phi(0,1) = 1); \phi(k,p) = (k+2) 2^{k-1}; and in general when m is a product of distinct primes, \phi(k,m) is described by Sloane’s A098348. (Click on the keyword ‘tabl’ towards the bottom of that page to see the sequence formatted as a table: row r shows \phi(k,m) as a function of k when m is a product of r distinct primes).

    • gowers Says:

      Richard Stanley has now given a useful answer to my Mathoverflow question. It is very similar to Moses’s answer above.

  9. gowers Says:

    One question I would very much like to know the answer to is this. Suppose we choose an “f-random” integer between 1 and n. What is its prime factorization like? By an f-random integer I mean that we choose the number m with probability f(m)/\sum_{t\leq n}f(t). What I suspect is that such a number will, with fairly high probability, be divisible by large powers of all small primes.

    We now understand f at prime powers and we understand it pretty well at products of distinct primes. Perhaps the next case is products of two prime powers. I’ve made a start thinking about this, and in the Polymath spirit I’m going to report on my half-formed thoughts rather than trying to push them to a conclusion first.

    Let’s take the number 2^a3^b. If we consider the ratios between successive terms of the sequences we are counting, these all take the form 2^{a_i}3^{b_i} with a_i and b_i non-negative integers and at least one of a_i and b_i not zero. So we need to count the number of pairs of sequences (a_1,\dots,a_k) and (b_1,\dots,b_k) such that a_1+\dots+a_k=a, b_1+\dots+b_k=b, and at least one of a_i and b_i is non-zero.

    Given such a sequence, we can form a 23-sequence as follows. For each i in turn we write a_i 2s followed by b_i 3s. It is clear that every 23-sequence with a 2s and b 3s can be created in this way (e.g. we could make every single ratio be either 2 or 3), so the number of sequences is at least \binom{a+b}a. However, this creation is very far from unique. For example, the sequence 2233232 can be split up as 2, 233, 23, 2 (leading to the four ratios 2, 18, 6, 2) or as 22, 33, 2, 3, 2 (leading to the five ratios 4, 9, 2, 3, 2). So what additional information do we need to get from the 23-sequence to the ratio sequence? Obviously, we need to know where the prime factorization of one ratio ends and the prime factorization of the next ratio begins. To do this, we need to add an extra symbol such as |.

    Now it’s not quite as simple as just adding |s wherever we feel like it, because between two |s you are never allowed to go from a 3 to a 2. Actually, the words “between two |s” were redundant there. I think the rule is that we can take any sequence of 2s, 3s and |s, provided 3 is never followed by 2. To test that, let me pick a fairly random such sequence: 23|33|2|223|233|3|2. That corresponds to the sequence of ratios 6, 9, 2, 12, 18, 3, 2. And in the opposite direction I take the ratios and form the sequence in the obvious way.

    I’ve missed out one further condition, which is that you are never allowed two |s in a row. So the question becomes the following. How many 23|-sequences are there with a 2s and b 3s such that 3 is never followed by 2 and | is never followed by |, and such that the sequence is not allowed to begin or end with |? This sounds to me like a pretty standard problem in enumerative combinatorics, probably treatable with generating functions. I’ll have a think about it myself.

    • gowers Says:

      A small further observation, which lends some support to the idea that f should be concentrated on numbers with many small prime factors, but not so many that they are all identical.

      First, observe that to get a valid sequence of 2s, 3s and |s, one can take a sequence of 2s and 3s with a 2s and b 3s and insert |s into it. The only rules about the insertion process are that you should insert at most one | into each gap (and nothing before or after the sequence) and that every time the sequence goes from a 3 to a 2 you must insert a |. That means that the number of possible insertions is 2 raised to the power the number of places in the sequence where you do not go from a 3 to a 2.

      Let’s consider the number m=2^a3^a. Then there must be at least a-1 places in the sequence where you do not go from 3 to 2 (with equality for the sequence 32323232…) and usually rather more than this. So f(2^a3^a) is at least 2^{2a} (for the number of 23-sequences) times 2^{a-1} (for the number of possible insertions per sequence), or around 2^{3a}. That is f(6^a) is at least 8^a or so. This is significantly bigger than m, from which we learn that to maximize f(m)/m (or, perhaps more interestingly, \log(f(m))/\log m) we definitely do not want to take m to be a power of 2.

      Needless to say, the estimates just made can be improved, but even now they are good enough to make this point.

    • gowers Says:

      A further quick thought. If we take m to be 2^a3^b5^c, what are the valid sequences that result? This time we make sequences out of 2s, 3s, 5s and |s. The rules are that 2 never follows 3 or 5 and 3 never follows 5, and otherwise they are the same as before. Obviously this generalizes.

  10. gowers Says:

    Going back to sequences of 2s, 3s and |s, here are some more calculations that give quite a good idea of how many there are.

    What I’m going to do is count something slightly different. I’ll count all length-n sequences of 1s, 2s and 3s (with 1 playing the role of |) such that 32 and 11 never appear. Let’s call this number f(n), and for i=1,2,3 let f_i(n) be the number of sequences of length n that end with i. Then we have the following recurrence relations: f_1(n)=f_2(n-1)+f_3(n-1), f_2(n)=f_1(n-1)+f_2(n-1), and f_3(n)=f_1(n)+f_2(n)+f_3(n). Writing x_n for the vector (f_1(n),f_2(n),f_3(n))^T, we find that x_1=(1,1,1)^T and x_n=Ax_{n-1}, where A is the matrix \left(\begin{matrix} 0&1&1\\ 1&1&0\\ 1&1&1\\  \end{matrix}\right).

    The largest eigenvalue of this matrix is, if my calculations are correct, a cubic irrational that lies beween 2 and 3. So the growth rate of f(n)=f_1(n)+f_2(n)+f_3(n) is \alpha^n for some \alpha between 2 and 3.

    Now the n here is not the same as the combined number of 2s and 3s in the sequence. However, we can get a feel for this by calculating the eigenvector that corresponds to the largest eigenvalue. It will tell us the proportions of 1s, 2s and 3s in a typical sequence (since we can think of these proportions as being the distribution after n steps of a random walk, which will converge very rapidly to the stationary distribution). So if we want a+b to be about n then we can multiply our previous n by (1-\theta)^{-1}, where \theta is the proportion of 1s in a typical sequence. (Of course, this isn’t exact, but it’s close.)

    By working through the technicalities here, I think one can get pretty good bounds for f(2^a3^b) when a and b are in their typical proportions. And I think we can also say that they are in their typical proportions f-almost all the time.

    • gowers Says:

      Actually, I want to modify one thing I said there. What we care about is not the value of a+b but the value of a\log 2 + b\log 3, which needs to be less than \log n. However, the remark about the typical ratio of a and b still stands, I think, so to work out the typical values of a and b subject to the condition that 2^a3^b\leq n we can use the ratio of a to b and the inequality a\log 2 + b\log 3\leq \log n.

    • gowers Says:

      On second thoughts, I’m worried about what I just said. The eigenvector tells us about the ratios of the number of sequences ending in a 2 and the number ending in a 3. It does not tell us about the ratio of the number of 2s to the number of 3s in a typical sequence. Indeed, that should tend to 1, since the reverse of a sequence that works gives us a similar sequence but with 23 forbidden instead, so symmetry seems to preclude any bias.

      But that observation doesn’t sit very well with the inequality a\log 2+b\log 3\leq \log n. We ought to be punished to some extent for choosing 3s, because that will cause the length of the sequence to go down. I’ll discuss this a bit more in my next comment.

  11. gowers Says:

    In this comment I want to suggest a (probably standard) method for estimating the number of sequences of the kind I’ve been considering above, but somehow forcing the ratio of the number of 2s to the number of 3s to be approximately some prescribed value. As I commented here, if you take the uniform distribution on sequences of length n that satisfy the conditions, then you should get about the same number of 2s as 3s. So how can one change this?

    One way is to count sequences with weights. Suppose we weight a sequence by u^av^b, where u is the number of 2s and v is the number of 3s. Now let f(n) be the total weight of sequences of length n, and split f up into f_1+f_2+f_3 as before. Our new matrix is \left(\begin{matrix} 0&1&1\\ u&u&0 \\ v&v&v\\ \end{matrix}\right).

    I can’t see in my head what to do next. Either I’ll have to go away and think about this, or someone reading this will know, or be able to work out, how the argument should go.

  12. Alec Edgington Says:

    From the inductive definition of f, it follows that the values of f_{ab} = f(p^a q^b) (for distinct primes p, q) are given by the formulae f_{a0} = f_{0a} = 2^{a-1} (a \geq 1), f_{00} = 1, and for a,b \geq 1, f_{ab} = \sum_{i \leq a, j \leq b, (i,j) \neq (a,b)} f_{ij}. This is Sloane’s A059576, called there the ‘summatory Pascal triangle’. I don’t quite understand the principal definition stated on that page, but there are a few formulae that may be helpful in calculating the asymptotics. The generating function is given as (1-z) (1-w) / (1-2w-2z+2zw).

    • Alec Edgington Says:

      Here is the calculation for the generating function of f_{a_1 \ldots a_r} = f(p_1^{a_1} \cdots p_r^{a_r}) (p_i distinct primes). Writing \mathbf{a} = (a_1, \ldots, a_r) and \mathbf{a} \leq \mathbf{b} if and only if a_i \leq b_i for all i, for \mathbf{a} \neq \mathbf{0} we have f_\mathbf{a} = \sum_{\mathbf{b} < \mathbf{a}} f_\mathbf{b}, so that 2f_\mathbf{a} = \sum_{\mathbf{b} \leq \mathbf{a}} f_\mathbf{b}, and 2f_\mathbf{0} = 2. In terms of the generating function F(z_1, \ldots, z_r) = \sum_\mathbf{a} f_\mathbf{a} z_1^{a_1} \cdots z_r^{a_r}, we can note that the right-hand side is a convolution with \prod (1 - z_i)^{-1}, and so write 2F(z_1, \ldots, z_r) = 1 + (1-z_1)^{-1} \cdots (1-z_r)^{-1} F(z_1, \ldots, z_r), which implies that F(z_1, \ldots, z_r) = (2 - (1-z_1)^{-1} \cdots (1-z_r)^{-1})^{-1}.

    • gowers Says:

      That’s nice. I feel as though if I were more used to generating functions (or perhaps just if I sat down and thought for a while) then I’d be able to get from here to a good answer to the question I mentioned earlier: what does the prime factorization of a typical f-random number look like? More precise questions are as follows.

      1. If you choose a random m from \{1,2,\dots,n\} with probability f(m)/\sum_{r\leq n}f(r), then what is the expectation of the sequence (a_1,a_2,\dots) such that m=2^{a_1}3^{a_2}\dots?

      2. How concentrated are the a_i about their means?

      3. Roughly how quickly do they decay with i?

      A decent answer to 1 should yield a decent answer to 3. My hunch, and hope, is that the answers to 2 and 3 are that they are pretty concentrated and that they decay pretty fast. The reason I’d like that is that it would make much more plausible my unproved assertion that if you take a typical m and a typical sequence d_1,\dots,d_k of increasing divisors of m, then a positive fraction of the d_i will be divisible by 3.

      It occurs to me that this question could be investigated experimentally. I don’t know how large an n one would need to take in order to get results that were genuinely informative, and the numbers would be large, so high-precision arithmetic (or some judicious approximations) might be needed. But just to see a large bunch of numbers generated according to the above probability distribution could give us a very good feel for the function f I think.

    • Moses Charikar Says:

      Minor observation: using Alec’s generating function (assuming the generalization to infinitely many variables z_1,z_2,\ldots), and substituting $z_i = (p_i)^{-s}$ gives an alternate proof of the fact that \sum f(n) n^{-s} = (2 - \zeta(s))^{-1}. (Alex had mentioned this in a comment above).

    • Alec Edgington Says:

      That is an interesting observation, and it generalizes a bit: if x_n is a completely-multiplicative sequence, and \phi(s) = \sum x_n n^{-s}, then \sum_n f(n) x_n n^{-s} = 1/(2 - \phi(s)). Of course if x_n is completely multiplicative then so is x_n^2, so if we fomally set s=0 then it starts to look suggestive in relation to the original problem.

      Regarding Tim’s question about the f-random frequencies of exponents of different primes: the distribution is easy enough to compute exactly, and I have done so up to n = 30\,000. These graphs show \sum_{m \leq n, a_p(m) = r} f(m) plotted against r, for p = 2, 3, 5, where a_p(m) is the exponent of p in the factorization of m.

    • gowers Says:

      Those are very nice. I realize that I was getting muddled when I said that the numbers would be very large — I was confusing the dependence on m with the dependence on the number of prime factors of m.

      What I find interesting so far is that the graphs for 2 and 3 seem to be exhibiting the kind of behaviour one might hope for — perhaps even approaching a normal distribution, though this is scant evidence for such a strong conjecture.

      I’d still be interested to know the behaviour for much larger n. One way that one might try to analyse it is by devising a suitable random walk through the set of all sequences (d_1,d_2,\dots,d_k), or equivalently through the set of all sequences (r_1,r_2,\dots,r_k) such that each r_i is a positive integer and r_1r_2\dots r_k\leq n. If we could devise it so that the stationary distribution was uniform and the walk was rapidly mixing, then we would be able to sample randomly with very large values of n and thereby get a very good feel for the distribution.

      The sort of walk one might try is this. Suppose you are at the sequence (r_1,r_2,\dots,r_k). Choose a random prime p less than or equal to n, a random number \epsilon from the set \{0,1\} and a random positive integer t less than or equal to \log_2n.

      If \epsilon=0, then

      ……if t\leq k and p|r_t, then divide r_t by p,

      ……else do nothing.

      If \epsilon=1, then

      ……if t\leq k then

      …………if r_1r_2\dots r_kp\leq n, then multiply r_t by p
      …………else do nothing

      ……else if t=k+1 then

      …………if r_1r_2\dots r_kp\leq n, then move to the sequence (r_1,\dots,r_k,p)

      …………else do nothing

      ……else do nothing.

      One not very nice aspect of this algorithm is that the probability of dividing by a prime is rather small at any given stage, so the time taken to reach the stationary distribution is more n-like than \log n-like. I don’t know if there is an ingenious way round this. But perhaps even an n-like algorithm could be run up to values of n of a few million or so.

      Unless I’ve made a silly mistake, the stationary distribution of that algorithm is uniform over all valid sequences. My reasoning is that we can think of it as a multigraph with loops that is regular of degree 2 p(n)\lfloor\log_2n\rfloor.

      I forgot to say that I have no proof about its speed of convergence. However, if all we are looking for is an experiment to try, then we don’t really care whether we have a rigorous proof that it converges rapidly, as long as it seems plausible that it should. (But the latter is important — if there were some bottleneck, then it would lead to genuinely misleading results.)

    • gowers Says:

      A small thought — is it possible that the distributions in Alec’s graphs (see previous message but one) are Poisson?

  13. Alec Edgington Says:

    I have a conjecture regarding the behaviour of f(n^k) as k \to \infty, which I’ll try to explain.

    Let \omega(n) be the number of distinct prime factors of n, and let \Omega(n) be the number of prime factors of n counted with multiplicity. (For example, \omega(18) =2 and \Omega(18) = 3.)

    The conjecture is that for any n, \lim_{k \to \infty} \frac1k \log f(n^k) = -  \Omega(n) \log (1 - 2^{-1/\omega(n)}).

    In terms of the notation used above for the generating function, this is saying that \log f_{ka_1, \ldots, ka_r} \sim - k \log (1 - 2^{-1/r}) (a_1 + \ldots + a_r), when all of the a_i are positive. This is true for r = 1, and seems consistent with the calculations I’ve done for r = 2. (For example, the conjecture implies that \log f_{k,k} / (2k) \to -\log(1 - 2^{-1/2}) \approx 1.228; we have \log f_{99,99} / 198 \approx 1.210.)

    The form is suggested by analogy of the formula 2 f_\mathbf{a} = \sum_{\mathbf{b} \leq \mathbf{a}} f_\mathbf{b} with the integral formula 2 f(x_1, \ldots, x_r) = \int_0^{x_1} \cdots \int_0^{x_r} f, which has solutions of the form f(x_1, \ldots, x_r) = C e^{\gamma_1 x_1 + \ldots + \gamma_r x_r}; by symmetry all the \gamma_i must be equal.

    The actual factor comes from the fact that the generating function F(z, \ldots, z) = (2 - (1-z)^{-r})^{-1} has a pole at z = 1 - 2^{-1/r}, so that its coefficients must grow as (1-2^{-1/r})^{-k}.

    These are rather vague justifications, but perhaps they can be fleshed out, or perhaps an altogether different proof is possible.

  14. Moses Charikar Says:

    I wanted to go back to some of the calculations we were doing earlier because I think they might yield some answers about the prime factorization of f-random numbers. In particular, I think we can conclude that the power of a small prime that divides an f-random number is \Omega(\log n).

    I am going to repeat a bunch of things from previous comments of Tim and Alec. Recall that we had estimated F(n) = \sum_{m \leq n} \sim n^\alpha where \alpha is such that \zeta(\alpha) = 2. Consider F(p,t,n) = \sum_{a_p(m)=t, m \leq n} f(m) where a_p(m) is the largest a such that p^a|m. Note that F(p,0,n) = \sum_{p \nmid m \leq n} f(m) \sim n^{\alpha_p} where \alpha_p is such that (1 - p^{-\alpha_p}) \zeta(\alpha_p) = 2, \alpha_p \leq \alpha. If we determine that F(p,t,n) = o(n^\alpha/\log n) for all t  1 and \prod d_i = m). f(m) is equal to the number of divisor sequences of m, and say f_k(m) is the number of such sequences of length k; also F_k(p,0,n) = \sum_{p \nmid m \leq n} f_k(m).

    A divisor sequence of m (with a(p,m)=t) can be mapped uniquely to a divisor sequence of m/p^t by simply dividing each term of the sequence by the highest power of p that divides it. This is a many-one mapping. Let’s estimate the number of pre-images that a divisor sequence of m/p^t has in this mapping. This number is a function of the length k of the sequence and t. Roughly speaking, we need to distribute t copies of p in the given divisor sequence – each of these copies could either fall between elements of the given divisor sequence or they could multiply existing elements of the sequence. I claim the number of pre-images is roughly {k+t \choose t}2^t – the number of ways of interleaving the t copies of p in the existing sequence of length k is {k+t \choose t} and then each copy has the option of merging (multiplying) with the element immediately to the right of it. That gives the factor 2^t. If one of the copies of p is the last element of the sequence, then it does not have this option of multiplying to the right, hence this is a slight overestimate.

    So f(m) \leq \sum_k {k+t \choose t} 2^t f_k(m/p^t). Hence F(p,t,n) = \sum_k {k+t \choose t} 2^t F_k(p,0,n/p^t) (ignoring floors). Note that k, the length of the divisor sequence, is at most \log_2 n, hence F(p,t,n) \leq {\log_2 n +t \choose t}2^t F(p,0,n/p^t) \sim {\log_2 n +t \choose t}2^t (\frac{n}{p^t})^{\alpha_p}. What we really wanted to do was find a value t_0, such that F(p,t,n) = o(n^\alpha/\log n) for all t < t_0. From our estimate, it looks like t_0 = c(\log n)/p should suffice (for suitable constant c).

    If all this is correct we conclude that an f-random number is divisible by p^{c(\log n)/p}.

    • Moses Charikar Says:

      Some parts of my comment above came out garbled. In reading it, replace the last few lines of the second para (beginning with “If we determine that”) with the following text:

      If we determine that F(p,t,n) = o(n^\alpha/\log n) for all t  1 and \prod d_i = m). f(m) is equal to the number of divisor sequences of m, and say f_k(m) is the number of such sequences of length k; also F_k(p,0,n) = \sum_{p \nmid m \leq n} f_k(m).

      A divisor sequence of …

    • Moses Charikar Says:

      uh oh … the intended fix is garbled in exactly the same way. I give up.

    • Alec Edgington Says:

      Did the garbled bit contain less-that or greater-than symbols? WordPress has trouble parsing those inside Latex expressions, so you have to render them as ‘& l t ;’ and ‘& g t ;’ (closing up the spaces) — which, oddly enough, WordPress can parse inside Latex.

    • gowers Says:

      Unfortunately, WordPress garbles not just the output but also the input, sometimes chopping out quite a bit of text, so I’m unable to edit it. But I think Alec’s diagnosis and cure are both right.

      One thing I’ve noticed is that sometimes one can get away with the usual symbols. I have a feeling that the problems really start to arise if you have a less-than symbol followed a bit later by a greater-than symbol. I think WordPress gives much higher precedence to that pairing than it does to dollars, and therefore gets itself into difficulties.

    • gowers Says:

      Something that slightly troubles me about this estimate is that it looks as though a typical f-random number is greater than n. The log of p^{1/p} is \log p/p. The sum of that over p\leq n is roughly the same as the sum of 1/m over m\leq n, since the \log p exactly cancels out the sparseness of the primes. So it seems that the sum tends to infinity. If we stop the sum at some k at a point where it is distinctly bigger than c^{-1}, then the product of p^{c\log n/p} over all p\leq k has logarithm greater than (c\log n)c^{-1}=\log n, so it is greater than n. For n large enough (so that the approximations are valid for all primes up to the constant k) this appears to be a contradiction.

      If I’m not making some silly mistake, I wonder if there is some tweak to your argument that would avoid the problem. (I can’t really tell until we get the missing text …)

    • gowers Says:

      Assuming that the answer to that last question is yes (and the end of Moses’s argument does look at a first glance as though it allows a certain flexibility), then that and Alec’s graphs make me wonder whether the largest power of p that divides an f-random m is aymptotically Poisson with mean c(p)\log n, for some suitable constant c(p) that depends on p only. If so, then it’s tempting to make it genuinely Poisson and see what happens. Normally this would make certain calculations much nicer. I don’t have any indication of how that might apply in our case, so this is a thought to go on the back burner for now.

    • Moses Charikar Says:

      In answer to Tim’s question about the contradiction from the estimate: yes, it does look like I miscalculated in the very end (and there could be other errors too). Let me try to do it a little more carefully. The final lower bound for the power of p was based on this calculation:

      \displaystyle {\log_2 n +t \choose t}2^t (\frac{n}{p^t})^{\alpha_p} < n^\alpha

      So we need:
      \displaystyle {\log_2 n +t \choose t}(\frac{2}{p^\alpha_p})^t < n^{\alpha - \alpha_p}

      I used the approximation {\log_2 n +t \choose t} \approx (\frac{e\log n}{t})^t. Also, although I did not calculate this before, it looks like the approximation \alpha - \alpha_p = \Theta(1/p^{\alpha_p}) may be reasonable. Substituting, the condition we need is:

      \displaystyle \left(\frac{2e\log n}{p^{\alpha_p} t}\right)^t < n^{\Theta(1/p^{\alpha_p})}

      The condition should be satisfied for all t \leq c(\log n)/p^{\alpha_p}, so the lower bound on the power of p we get from this argument is
      c(\log n)/p^{\alpha_p}.

    • Alec Edgington Says:

      I’ve taken the computation a bit further (to 150\,000, with p = 2, 3, 5, 7), and also fitted Poisson distributions with the same mean to the data. The plots are here.

      The fit is remarkably good for p = 3, 5, 7; for some reason it is a bit less good for p = 2.

      The mean values \mu_p of the exponents were about 5.840, 2.316, 0.836 and 0.442 for 2, 3, 5 and 7 respectively. To test Moses’s theory, we can calculate p^{\alpha_p} \mu_p / \log n, where \alpha_p satisfies (1 - p^{-\alpha_p}) \zeta(\alpha_p) = 2. The values of this ratio for 2, 3, 5 and 7 are about 1.27, 1.06, 1.01 and 0.99 respectively. Despite the seeming anomaly with p=2, I think these are very encouraging.

    • gowers Says:

      Wow, thanks for doing that. Odd about 2, and almost odder about the rest fitting so well. I suppose if one were greedy then one could check that the distributions were independent …

      There do seem to be some ways in which 2 is anomalous for this problem, such as the fact that we can’t use characters mod 2 to create bounded-discrepancy examples. I wonder if in some way that underlies what’s going on. (Having written that, I realize that I was accidentally assuming that there is a deep connection between f and HAP-discrepancy, which we haven’t yet demonstrated.)

    • Mark Bennet Says:

      The prime 2 is a problem in a very simple way. There is a trivial sequence (-1 on odd, +1 on even) which works for every HAP with odd difference. No-one has yet given an example (sorry if I’ve missed it), even for a set of at least two primes including 2, of a sequence which is bounded on every HAP where the difference is only divisible by those primes.

      Maybe there are some clues here as to why 2 blocks some of the otherwise obvious solutions.

    • gowers Says:

      You’re saying that we don’t know, for example, whether every \pm 1 sequence has unbounded discrepancy on a HAP with common difference of the form 2^a19^b? If we don’t have a counterexample to that, then we should probably look quite hard for one.

    • gowers Says:

      Actually, I now think we did have an example for common differences of the form 2^a3^b, but I can’t find where it was in the discussion. The idea was to construct a sequence with the property that f(2n)=f(3n)=-f(n) for every n and to keep the partial sums bounded. To do that, one divides it up into blocks of 6. When you get to a new block of 6 of the form 6n-5,\dots,6n you’ve decided on the value of f at 6n-4, 6n-3, 6n-2 and 6n, so you’ve only got two more choices. I think you can prove if you do this inductive construction properly that the choices you’ve made so far are not all the same, so you can keep the sum over the block down to zero. Or perhaps you do something to make sure you’ll always be OK. In fact, I think that’s it — you also insist that f(3n-2), f(3n-1) and f(3n) should never all be the same. That ensures that the block will be OK, and you can make sure that this condition continues to be satisfied.

  15. Moses Charikar Says:

    Yes indeed the WordPress problem seems to be with less-than and greater-than symbols. Here is the correction again. Replace the last few lines of the second para (beginning with “If we determine that”) with the following text:

    If we determine that F(p,t,n) = o(n^\alpha/\log n) for all t < t_0 then we could conclude that an f-random number is divisible by p^{t_0}. (Since the total contribution to F(n) from numbers m \leq n such that $p^{t_0} \nmid m$ is negligible). So our goal will be to estimate F(p,t,n).

    Consider an m such that a(p,m)=t. Then p^t|m and p \nmid m/p^t. We can build divisor sequences for m from divisor sequences for $m/p^t$. (For our purposes, a divisor sequence of m is a sequence d_1,d_2,\ldots d_k such that $d_i > 1$ and \prod d_i = m). f(m) is equal to the number of divisor sequences of m, and say f_k(m) is the number of such sequences of length k; also F_k(p,0,n) = \sum_{p \nmid m \leq n} f_k(m).

    A divisor sequence of …

  16. gowers Says:

    I want to add in a thought that’s somewhat orthogonal to the present discussion (no pun intended — why there might even be considered to be a pun will become clear).

    Once we’ve chosen the weight sequence (b_i), our goal is to bound the discrepancy of every sequence (x_i) below in terms of (\sum b_ix_i^2)^{1/2}. So we are living in a Hilbert space with inner product \langle x,y\rangle =\sum_ib_ix_iy_i. One sign that we were on the right track with the function f might be if we managed to find a nice set of functions that were f-orthogonal. And by “nice” I think I almost certainly mean “built out of HAPs in a simple way”.

    A candidate for such a set of functions is functions that alternate 1 and -1 along HAPs with prime common difference. (These can easily be built out of two HAPs, one with common difference p and the other with common difference 2p.) And we can also throw in functions with common difference a power of 2. (Note that we still don’t have an example that shows that we cannot get large discrepancy along a HAP with a difference that is either prime or a power of 2. This is vaguely encouraging.)

    Now obviously there aren’t enough of these functions to build every single sequence. But perhaps we can use these functions to build the “significant part” of every sequence (that is, the part with high f-weight) and extend them to an orthonormal basis (give or take the technicality that the functions may not be exactly orthogonal, but we could choose n to be divisible by the first few primes and a reasonably high power of 2) in some way that doesn’t matter too much since its only function is to deal with sequences of such low norm that they obviously have discrepancy bigger than that norm.

    This is all a bit vague for now, but I think there are some promising apsects to it.

    • gowers Says:

      I’ve just realized that my candidate for a nice system of orthogonal functions fails quite badly. For example, consider the sequences

      0 -1 0 1 0 -1 0 1 0 -1 0 1 …


      0 0 -1 0 0 1 0 0 -1 0 0 1 0 0 -1 0 0 1 …

      The pointwise product of these two is

      0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 0 0 0 -1 0 0 0 0 0 1 …

      The f-integral of this sequence is large, because the f-weight of numbers that are 6 mod 12 is much smaller than the f-weight of numbers that are 0 mod 12.

      So how do we create orthogonal sequences? Indeed, how do we create a natural example of a sequence with f-integral that is zero, or at least small? [Added after writing the rest of the post: I recommend skimming rather quickly through this, as I’m not sure it really gets anywhere but it shows a kind of idea that I think could be useful to us if we could get it to work better.]

      One thought that might be worth pursuing is to copy the inductive process that led to the function f, but starting with a different “seed sequence”. Let me try to explain what I mean by this.

      To create f, we start with the constant sequence 111111111…, and more generally, once we have defined f_k, we define f_{k+1}(m) to be \sum_{d|m,d<m}f_k(d). What would happen if we started instead with the sequence -1 1 -1 1 -1 1 \dots?

      Actually, I don’t think that’s going to work, because the inductive procedure is going to tend to lead everything towards a single fixed point. (That doesn’t seem to be quite true, but I think it’s almost true, meaning that the relationship between f(1) and the rest of f can change, but that’s about it.)

      Maybe the thing to do is change the inductive definition to something like g_{k+1}(m)=e^{i\theta}\sum_{d|m,d<m}g_k(d). I think that might have a chance of giving something interesting. I haven’t checked, but the kind of thing I’d be hoping for would be that the non-inductive definition would be a sum over all sequences, and sequences of length k would now have a weight of e^{i\theta k}.

      Except that that’s not very good, because for an f-typical number m one might expect a random sequence of increasing divisors of m to have a length that was fairly uniformly distributed modulo any small integer.

      One final suggestion. Perhaps the way to build functions that are approximately orthogonal in the f-weighted \ell_2 norm is to take functions of the form m^{it} for some t. To be precise, we take the function b(m)=\sqrt{f(m)} (which has norm (\sum_mf(m)^2)^{1/2}) and the candidate for a collection of nicely orthogonal functions is b_t(m)=b(m)m^{it}. The f-inner product of b_t and b_u is \sum_mf(m)m^{i(t-u)}.

      All this is extremely speculative, and therefore 99% likely to be wrong, though I hope that if wrong it might lead to something less wrong. But let me try to justify why one would want to take these functions.

      First of all, if you have an orthonormal basis (u_i) in the usual sense, then \sum_i u_i\otimes u_i equals the identity. The reason we can’t just use this straight off is that we can’t build an orthonormal basis efficiently (that is, with small coefficients) out of HAPs. Now the function f is designed in a HAP-ish sort of way, so there is some hope that it, or its square root, can be built efficiently. And functions like m^{it}=e^{it\log m} can be approximated quite well by functions that are constant on long intervals, since \log m grows slowly. So the functions b_t above look potentially buildable out of HAPs.

      Now the functions m^{it} and m^{iu} are far from orthogonal in the usual sense, again because \log m grows so slowly (so the terms from n^{1-\epsilon} to n will be roughly constant). But it’s not completely obvious to me that they are not approximately f-orthogonal (though in the opposite direction it’s even less obvious, I have to admit). We know that almost all powers of 2 that are less than n are less than n/2, and similarly for more general kinds of numbers such as numbers of the form 2^a3^b. Could it be that the f-measure is actually (and rather counterintuitively) concentrated on fairly small numbers? If so, then perhaps the f-measure “orthogonalizes” the functions m^{it} and m^{iu} (provided of course that t and u are sufficiently far apart).

      I am prepared for this suggestion to be quickly demolished — I don’t have good evidence for the assertion that f-measure is concentrated on small numbers, and a priori the assertion itself is not very plausible. On balance, I don’t think I believe it.

    • gowers Says:

      On second thoughts, I think I should have taken the functions m^{it} and not bothered to multiply them by \sqrt{f(m)}. Some of what I said above wasn’t quite right, even before it started to look unhelpful anyway.

      But let me say slightly more about the thoughts that lay behind the mistake. Suppose we had an f-orthonormal basis u_1,\dots,u_n. The f-orthonormality would tell us that \sum_m f(m)u_i(m)\overline{u_j(m)}=\delta_{ij}, which would tell us that the functions v_i(m)\sqrt{f(m)}u_i(m) formed an orthonormal basis in the usual sense. From this, if my calculations are correct, it follows that if w_i(m)=f(m)u_i(m), then \sum_iw_i\otimes w_i gives us the diagonal matrix with f down the diagonal.

  17. gowers Says:

    I think I now have a much clearer and more concise version of what I was struggling to say in my previous block of comments.

    Suppose we have an orthonormal basis u_1,\dots,u_n of \mathbb{C}^n, and let D be the diagonal matrix with entries given by f. Then I claim that if we set v_i(m)=f(m)^{1/2}u_i(m) for every i,m, then we have \sum_i v_i\otimes \overline{v_i}=D.

    To check this, let us evaluate the left-hand side at (m_1,m_2). We get \sum_if(m_1)^{1/2}f(m_2)^{1/2}v_i(m_1)\overline{v_i(m_2)}, which (since the columns of a row-orthogonal matrix are orthogonal) equals f(m_1) if m_1=m_2 and 0 otherwise, which is what we wanted.

    How might this observation be useful? Well, for normal AP-discrepancy one can start with a decomposition I=\sum u_i\otimes \overline{u_i}, where the u_i are trigonometric functions. This fails badly for HAP-discrepancy because a trigonometric function can be efficiently split up into a linear combination of APs, but it can’t be efficiently split up into a linear combination of HAPs.

    But if we weight a trigonometric function by multiplying its entries pointwise by \sqrt{f}, then things could perhaps change quite a bit, since the bits that are hard to make out of HAPs will be given much smaller weight than the bits that were easy. So perhaps the decomposition-of-identity proof of Roth can be modified to become a decomposition-of-D proof of EDP.

    • gowers Says:

      Let me say very slightly more about that last suggestion. The idea would be this. We would start with the decomposition I=n^{-1}\sum_ru_r\otimes\overline{u_r}, where u_r(m)=e^{2\pi i rm} for every r and m. As in the proof of Roth’s discrepancy theorem, written out here with an n^{1/8} bound by Kevin O’Bryant, we then decompose each u_r efficiently into characteristic functions of APs. However, in this case instead of trying to make them as long as possible, we give them lengths that tend slowly to infinity.

      As in the Roth proof, there should be plenty of cancellation in the coefficients, which is crucial to the success of that proof and obviously would be here too. But here we face an additional difficulty, which is that most of the APs we have decomposed into are not HAPs.

      What I am hoping, though it may be a rather desperate hope, is that if instead we do something very similar but for the functions v_r(m)=f(m)^{1/2}u_r(m), then the total weight of the non-homogeneous APs that we use will be small enough that we can replace that part of the decomposition by something much more trivial (such as decomposition into HAPs of length 1).

      Everything depends in a crucial way on the numbers, and at the time of writing I don’t have an intuitive idea of how that is likely to go. (Naturally, the most likely outcome is that a first attempt at getting the numbers to work will fail dismally, but if we were lucky then it might suggest a refined approach with a better chance of working.)

    • Moses Charikar Says:

      I haven’t quite figured out where this is headed, but I just wanted to recall
      a comment that Tim made when he first suggested this particular function as a possible candidate for the diagonal. Consider the matrix whose dth column is the HAP of common difference d and remove the diagonal part except for the (1,1) entry. Tim pointed out that f is an eigenvector of this matrix.

      Given this connection and the fact that this matrix is defined in terms of HAPs,
      is there any useful information in this matrix for constructing the diagonal representation we are hoping for ? e.g. does the singular value decomposition of this matrix suggest anything ?

    • gowers Says:

      I’ve thought about this a bit further, and this is where I’ve got to.

      In the representation-of-diagonals approach to Roth’s AP-discrepancy theorem, one does a trivial estimate, which results in no improvement to the trivial bound, and then one exploits cancellation to gain on the trivial estimate. Looking at it more closely, I see that the fact that the trivial estimate gives rise to the trivial bound is actually a coincidence: if you take m (the lengths of the APs into which you decompose) to be smaller than \sqrt{n} then you no longer get the trivial estimate but something worse than the trivial estimate.

      Just to clarify, in this proof we are trying to represent the identity as a linear combination of AP products with absolute values of coefficients adding up to less than n. The trivial estimate allows us to do so with absolute values adding up to around n^2/m^2, which just happens to equal n when m=\sqrt{n}. The extra cancellation then gains a factor of (m/n)^{1/2}, which for Roth’s theorem leads to a bound of n^{3/4} for the sum of the coefficients. The discrepancy is obtained by dividing n by this sum and taking the square root, giving the n^{1/8} bound. Unfortunately, when m\leq n^{1/3} it gives us nothing.

      However, we know that the correct bound for Roth is n^{1/4}, and it did look as though it should be possible to work out the cancellations more efficiently and obtain an improvement to the trivial bound by a factor of m/n rather than just (m/n)^{1/2}. If this is indeed possible, then it would give a bound of n/m for the sum of the coefficients, and a discrepancy bound of m^{1/2}, which makes more sense, since it interpolates correctly between the results we know for m=1 and strongly suspect for m=n^{1/2}. So I now feel that it is becoming a priority to stop being lazy and to try to push the representation-of-diagonals approach through for Roth until it matches the correct bound. (An additional motive for this is that it would be an interesting result in itself — perhaps even interesting enough to publish.)

      If all that can be got to work, here’s how I imagine we go from there. (The word “imagine” is carefully chosen, as the argument I have in mind may well be purely imaginary.) We decompose the identity efficiently into a sum of AP products of length m for some m that tends to infinity very slowly. Note the crucial fact that the common differences of these APs will also be bounded above by a number that tends to infinity very slowly. This gives us a very small but non-trivial AP-discrepancy lower bound. We then modify the decomposition to get a decomposition of the diagonal matrix with f down the diagonal (in the manner described in the comment to which this is a reply). Now some of the APs used in this decomposition will be (segments of) HAPs, and others won’t. But if m is small enough, so that the common differences are very small — only just bigger than constant — then \sqrt{f} should give very much smaller weight, on average, to the non-homogeneous parts. Indeed, the fraction ought to be smaller than 1/m. So I’m hoping we can replace the non-homogeneous APs we use by trivial decompositions into singletons and not suffer for it.

      At the moment, I don’t have a convincing argument that tells me that this approach couldn’t work. So I’m feeling as though we need to start to get a bit technical. Of course, I don’t mean that absolutely everything needs to be proved — at this stage it would be fantastic if we could prove a result conditional on the Roth thing working out and various plausible facts about f being true. Then we would have reduced the whole problem to a couple of technical lemmas.

      Moses, that was an interesting thought and I’ve been thinking about it, but I haven’t come up with anything worth saying in this comment. Perhaps it would be worth working out the SVD of the matrix for some large n just to see what it looks like. A slightly similar thought I had was to take the inner product I defined earlier (in the weighted \ell_2 space using f as weights) and do Gram-Schmidt orthogonalization. But I don’t have a good candidate for the basis that should be orthogonalized.

  18. gowers Says:

    With reference to the programme suggested in my previous comment, I thought I’d mention that I am in the middle of writing a note (under the Polymath pseudonym) about the representation-of-diagonals approach to Roth’s AP-discrepancy theorem. The main differences between this note and Kevin’s contribution to the wiki are (i) that I’ve written it paper form (in LaTeX) and (ii) that I’m intending to push the argument to give the bound it ought to give: if you decompose into APs of length m then the resulting discrepancy bound should be c\sqrt{m}, and the biggest m you can take is c\sqrt{n}. I’m almost certain I know how to do this, but it will take a small effort to sort out all the details.

    I’m doing this because it is necessary to sort out what happens with small m if there is to be any hope of modifying the proof to give something for HAPs rather than APs.

    • Moses Charikar Says:

      I look forward to the writeup. In reference to your earlier comment here , I’m not sure I follow your intuition for why getting the lengths m of the APs to tend to infinity slowly is useful for getting a result for HAP discrepancy. Maybe I’m being daft (and admittedly I haven’t worked through the the details of the Roth proof carefully), but why does having m small ensure that the common differences are small too ?

    • gowers Says:

      I hope I haven’t made a silly mistake, but m plays a dual role in the proof. In order to approximate a trigonometric function \omega_r(x)=\exp(2\pi irx/N), I convolve it with (a multiple of) an arithmetic progression P with common difference d\leq m. For this to work, we choose d such that \|rd/N\| is at most 1/m, and that allows us to take the length of P to be cm. It’s because we need d|P| to be at most N that m has to be at most \sqrt{N}.

      Thus, in brief, m is actually given as the bound on the common differences, and it requires a comparable bound on the lengths for the proof to work.

    • gowers Says:

      I hope this link will work. It’s very much a first draft, so I’ve been a bit lazy in places (not with the details of the proof, but with the accompanying explanations of what is going on). At some point I’ll update it, or if anyone else feels like doing so they are welcome and I can send the source file.

      I’ve just checked, and the link does work. I haven’t mentioned in the write-up that I used the extra flexibility that is provided by an observation of Moses, which is that it is not necessary to represent a diagonal matrix — it’s enough to find an efficient representation of a matrix D+S, where D is diagonal and S is positive semidefinite. It turned out to be very helpful not to have to make the decomposition exactly equal to the identity. I haven’t thought about how this would affect any generalization to an HAP version with f down the diagonal instead of 1.

      I also have a question: is this proof essentially the same as Roth’s original argument? I don’t know because I haven’t looked at the original argument.

      Update: I’ve noticed that the statement of Lemma 3.1 is wrong (though the proof is a proof of the correct statement). The answer should be 1 rather than 1/N. I’ve tried to replace the file by a corrected version, but it’s failing to work, for reasons that I simply don’t understand. I’ll try some crude fix later.

    • Moses Charikar Says:

      One crude fix to the file update issue would be to use a different filename.

    • gowers Says:

      I was hoping to avoid that, but have given up the attempt. The link is now to the corrected file. (I hope I won’t have to keep doing this every time I update the file.)

    • obryant Says:

      One way to handle a shared tex file is to use scribtex. This still requires some action on your part to share the file, so it isn’t as free-range as a wiki would be. (To get access, I’d need a free scribtex account, and you’d need to add my username to the file’s share list.)

    • gowers Says:

      Kevin, I’ve followed as much of your suggestion as I can, meaning that I’ve got myself a free scribtex account and uploaded the file as it currently is. I’m happy to share the file with anyone who’s interested.

    • obryant Says:

      My scribtex username is “obryant”. I haven’t used scribtex in awhile; it looks much more polished now than last summer. It also looks like the number of collaborators might be severely limited. /Something/ like this should be around and useful for the write-up stage, instead of “passing the token” and the wiki.

    • gowers Says:

      Well, you’re added now. If the limited numbers become a problem we can think again about what to do.

    • gowers Says:

      Well, you’re added now. If the limited numbers become a problem we can think again about what to do.

    • Alec Edgington Says:

      I’ve just created a Scribtex account — could you add me, please? My username is ‘obtext’.

    • gowers Says:

      I was going to do just that, but I now discover that the limited-numbers issue has kicked in already: if I have a free account then I’m allowed up to … er … one collaborator per project.

    • obryant Says:


    • Alec Edgington Says:

      Yes, it does seem to be a limit of one on the free account, unfortunately. I’ve done a bit of searching and found a couple of other options (latexlab and monkeytex), but I haven’t got either to work. I suppose a partial solution would be simply to post the Latex source on the wiki.

    • gowers Says:

      OK I’ve done that. You can get to it via the following sequence of links: Polymath5 wiki — General proof strategies — representation of diagonals — after which it is clear what to go for.

    • Alec Edgington Says:

      Thanks, that works. I’ve made a minor edit. (In case anyone isn’t aware, by clicking on ‘history’ at the top of the document and selecting two revisions you can see how they differ.)

  19. Polymath5 – Roth’s theorem (again) « Random Thoughts Says:

    […] – Roth’s theorem (again) Tim Gowers has published a new proof of Roth’s theorem (link to Polymath5) as part of the project. He extends previous polymath work from Kevin Obryant […]

  20. Polymath5 « Euclidean Ramsey Theory Says:

    […] see this comment and the responses to […]

  21. Jason Dyer Says:

    I perhaps have missed something, but are we worrying solely about the 3 case? Won’t

    1,1,-1,-1,0,1,1,-1,-1,0 …

    be bound by 2?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: