Gil has a not quite finished third post that will appear soon on this blog. Meanwhile, here are a few thoughts I’ve had recently as I tried to get my brain into EDP mode again.
The approach to the Erdős discrepancy problem that, rightly or wrongly, I found most promising when we were last working on it was to prove a certain statement about matrices that can be shown quite easily to imply a positive solution to the problem. In this post, I’m going to treat that matrix statement as the problem, and think about how one might go about trying to prove it. I’ll give the brief explanation of why it implies EDP, but not of what the possible advantages of the approach are (discussions of which can be found in some of the earlier material).
First I’ll need to recall a couple of definitions and some notation. A homogeneous arithmetic progression, or HAP, is a set of the form for some pair of positive integers
and
. Let me say that
is a HAP-function if it is the characteristic function of a HAP. Finally, if
and
are functions defined on
, let
denote the function defined on
that takes the value
at
, which can be thought of as an infinite matrix.
Here, then, is the statement that it would be great to prove. What is nice (but also quite daunting) about it is that it is a pure existence statement.
Problem. Prove that for every there exists a real matrix decomposition
with the following properties.
(i) is a diagonal matrix and
.
(ii) Each and
is a HAP-function.
(iii) .
I have stated the problem in this form because (for a reason I will outline below) I am confident that such a decomposition exists. However, it seems to be quite hard to find.
Why would the existence of an efficient decomposition of a diagonal matrix into products of HAP-functions prove EDP? Well, let be a
-sequence and consider the quantity
. On the one hand it equals
,
and on the other hand it equals
.
Since , there must exist
such that
, from which it follows that there exists a HAP
such that
. This shows that
must have HAP-discrepancy at least
for every
, and we are done.
Of course, one can deduce anything from a false premise, so one needs at least some reason to believe that the decomposition exists. Quite a good reason is that the existence of the decomposition can be shown (without too much difficulty) to be equivalent to the following generalization of EDP.
Problem. Show that if and
are any two sequences of vectors in a Hilbert space such that
for every
, then for every constant
there exist HAPs
and
such that
.
This is a strengthening of EDP, since it clearly implies EDP when the Hilbert space is . However, it doesn’t seem like such a huge strengthening that it might be false even if EDP is true. At any rate, it seems just as hard to find a counterexample to this as it does to EDP itself. (You might ask why I believe that EDP is true. It’s partly blind faith, but the heuristics presented by Gil Kalai in the previous post offer some support, as does — very weakly — the experimental evidence.)
How does one ever solve existence problems?
Let’s begin by thinking very generally about the situation we’re in: we have to come up with a mathematical object that has certain properties
. (Of course, we could combine those into one property, but it is often more natural to think of the object as having several different properties simultaneously.)
I’m going to list strategies until I can’t (or at least can’t immediately) think of any more that seem to have any chance of success. (An example of a strategy that I think has no chance of success for this problem is to search for a just-do-it proof, the problem being that the constraints appear to be quite delicate, whereas a just-do-it proof works better when one has a number of independent constraints that are reasonably easy to satisfy simultaneously.)
1. Use an object you already know, possibly with some small modifications.
2. Start with examples that work for a certain constant , and use them as building blocks for larger and more complicated examples that work with a larger constant
.
3. Use some kind of duality to prove that if an example does not exist, then some other object does exist, and show that in fact the second object does not exist.
4. Use a computer to search for small examples, try to discern a pattern in those examples, use that pattern to guess how to construct larger examples, and check that the guess works.
5. Strengthen the conditions on the object until there is a unique (or almost unique) object that satisfies those conditions.
6. Weaken the conditions on the object (e.g. by forgetting about some of the properties ), try to describe all objects that satisfy the weaker conditions, and then reformulate the problem as that of finding an object of that description that satisfies the stronger conditions. [As a very simple example, if you were required to find a positive solution of some quadratic equation, you would solve the equation without the positivity condition and then pick out a positive root.]
7. Assume that you have an object with the required properties, deduce some interesting further properties, find a natural object with the further properties, and hope that it has the original properties as well. [This is a variant of 6.]
How do the general strategies fare when it comes to EDP?
I will discuss them briefly at first, before concentrating in more detail on a few of them.
1. This doesn’t appear to be the kind of problem where there is already some matrix decomposition (or related mathematical object) waiting in the wings to be modified so that it becomes the example we want. Or if there is, the only way we are likely to discover it is by pinning down more precisely the properties we are interested in so that someone somewhere says, “Hang on, that sounds like X.” But I think it is very unlikely that we will discover an example that way.
2. I am definitely interested in the idea of starting with small examples and using them to build bigger examples. However, an approach like this needs plenty of thought: non-trivial small examples seem hard to come by, and the only obvious method for building bigger ones is taking linear combinations of smaller ones. (A trivial small example is to take a linear combination of the form where each
is the characteristic function of a singleton. This gives us
. To get anything interesting, we need the
to be HAP-functions for longer HAPs, but then it is very difficult to get the off-diagonal terms to cancel.)
3. We are trying to write as an efficient linear combination of objects of a certain form. That is a classic situation for applying the Hahn-Banach theorem: if there is no such linear combination, then there must be a separating functional. What can we say about that functional?
This approach seems promising at first, but is in fact of no use at all, since if you pursue it, the conclusion you come to is that if there is no such linear combination, then there are sequences of unit vectors and
in a Hilbert space and a constant
such that
for every pair of HAPs
and
. The reason duality doesn’t help is that we used duality to reformulate EDP as a problem about decomposing diagonal matrices. If we use duality again, we get back to (a generalization of) EDP.
4. We have tried this. Moses Charikar and others have written programs to search for the most efficient decompositions (for this and for a related problem). The results are useful in that they give one some feel for what the diagonal matrix needs to be like, but it does not seem to be possible to obtain a large enough matrix to use it to go as far as guessing a formula. For that, more theoretical methods will be necessary (but knowing roughly what the answer needs to look like should make finding those methods quite a bit quicker).
5. One difficulty is that we are not looking for a unique object: it would be nice if the properties we were trying to obtain determined the matrix and its decomposition uniquely, since then we could hope to calculate them. An obvious property to add is that the object is in some sense extremal (though what that sense is is not completely obvious). Another thing we can do is insist on various symmetries — something that has already been tried. Yet another is to restrict the class of HAPs that may be used: as far as we can tell at the moment, we can get away with HAPs with common differences that are all either prime or a power of 2. (I say that simply because it seems to be hard to find a sequence that has bounded discrepancy on all such HAPs. Gil’s heuristics suggest that the discrepancy should grow like for sequences of length
.)
6. What weaker properties might we go for? One idea is simply to try to find a linear combination of products of HAP-functions that leads to a considerable amount of cancellation off the diagonal, even if there are still off-diagonal terms left. This would be interesting because if one naively tries to find a decomposition that works, it seems to be very hard to get a substantial amount of cancellation (except if very small HAPs are used, in which case the resulting decomposition is not very informative). Another (somewhat related) idea is to look for interesting decompositions but with not particularly large values of . Yet another might be to allow a wider class of functions that HAP-functions.
7. I don’t have much idea of how to apply this strategy to the entire decomposition, but there is plenty one can say about the diagonal matrix that ones wishes to decompose, and that information, if it could be expressed neatly, would surely be very useful: it seems much easier to try to decompose a specific diagonal matrix than to try to decompose a diagonal matrix that you have to choose first, when the vast majority of diagonal matrices cannot be efficiently decomposed.
The proof that the diagonal decomposition implies EDP actually proves a much stronger result. It gives us a function such that
such that if
is any sequence of real numbers (not necessarily
-valued) with
, then
has unbounded HAP-discrepancy. To put that loosely, the sequence
cannot correlate with the square of any sequence of bounded discrepancy. That puts very strong conditions on
, but so far we have not fully understood what those conditions are. It seems that we should be able to make progress on this, perhaps even adding reasonable conditions that would allow us determine the function uniquely.
Creating new examples out of old ones.
Suppose that and
. Clearly if we take a linear combination of
and
, we get another decomposition of a diagonal matrix, and if the original decompositions were into products of HAPs then so is the new one.
Can we do anything more interesting? Another possibility is to look for some kind of product. Now Dirichlet convolutions are natural objects to look at in the context of EDP, since the expression can be thought of as follows. You start with a function
defined on
, then you define the dilate
to be the function that takes the value
at
and 0 at non-multiples of
, and finally you take a linear combination
of those dilates. If
is identically 1, then this is taking a linear combination of HAPs.
We can’t take a Dirichlet product of two matrices, but we can do something similar: given functions and
of two integer variables we can take the function
. If
and
are diagonal (meaning that they are zero if
), then the terms of the sum are zero unless
and
, so in this case
, considered as a function of
only, is the Dirichlet convolution of
and
, also considered as functions of one variable.
This operation also respects products in the following sense. If and
, then
,
which equals , where
is the Dirichlet convolution of
and
and
is the Dirichlet convolution of
and
.
One further obvious fact is that the operation is bilinear in the two matrices that it is applied to.
So far everything is working swimmingly: out of two decompositions of diagonal matrices into linear combinations of products we build another one that has some number-theoretic significance. However, there is a problem: a Dirichlet convolution of two HAP-functions is not an HAP-function except in trivial cases.
How much does that matter? Without doing some detailed calculations I don’t know. Here are a couple of simple observations. Let and
be two HAP-functions. Then their Dirichlet convolution is a sum of dilates of
, one for each point in the HAP of which
is the characteristic function. So we can at least decompose the Dirichlet convolution of two HAP-functions as a sum of HAP-functions. The sum of the coefficients is the number of points in the HAP corresponding to
— though since the situation is symmetrical, we can add dilates of
instead, so it is better to say that the sum of the coefficients can be taken to be the length of the smaller of the two HAPs.
That doesn’t sound very efficient, but we must also remember that when we take the Dirichlet convolution of two diagonal matrices, we obtain a diagonal matrix that sums to the product of the sums of the original two matrices. So there may be some gain there to compensate for the loss of efficiency. That is why I say that more detailed calculations are necessary, a point I will return to at some stage.
What can we say about the diagonal matrix ?
As I have already mentioned, if a diagonal decomposition of the required kind exists, then we can place strong constraints on the diagonal matrix itself. Writing
for
, it must have the following property: if
is any real-valued function defined on the integers such that
, then the HAP-discrepancy of
must be at least
. (This follows from the proof that the existence of the decomposition for a diagonal matrix with
implies that every
-sequence has HAP-discrepancy at least
.)
Turning that round, if we have any real-valued sequence of discrepancy at most
, then
cannot be more than
.
A simple example of the kind of conclusion one can draw from that is that the sum of over all
s that are not multiples of 3 is at most 1. This follows from the fact that the sequence
has HAP-discrepancy 1. More generally, if
is large, then for all small
most of the largeness must occur on multiples of
.
In qualitative terms, this suggests that ought to be concentrated at numbers with many factors, and the experimental evidence backs this up, though not quite as cleanly as one might ideally like. (However, “edge effects” could well account for some of the peculiar features that are observed experimentally, so I still think it is worth trying to find a very aesthetically satisfying diagonal matrix
.)
Here is another reason to expect that should be larger for numbers with many factors: they are contained in lots of HAPs. We are trying to find a measure on
(at least if we insist that
takes non-negative values, which seems a reasonable extra condition to try to impose) such that any function with a large
-norm with respect to that measure must have a large HAP-discrepancy. What might make it difficult to find a function with large
-norm and small HAP-discrepancy? It would be that we had lots of constraints on the values taken by the functions. And the natural way that we can place lots of constraints on one value
is if
has many factors and is therefore contained in many HAPs.
As an example of the opposite phenomenon, suppose we take an arbitrary sequence and alter its values however we like at all primes. What will happen to its HAP-discrepancy? Well, each HAP contains at most one prime, so we cannot have changed the HAP-discrepancy by more than 2. This illustrates fairly dramatically how little it matters what a sequence does at numbers with few factors.
A first wild guess.
It is tempting to jump straight from this kind of observation to a guess of a function that might work. For example, what if we took some primes
, a collection
of subsets of
and let
be the characteristic measure of the set of all numbers that can be written as
for some
. That is,
takes the value
on all such numbers and zero on all other numbers.
Does a function like that have any chance of working? Because we have ensured that it sums to 1, we would now be looking for a decomposition with absolute values of coefficients summing to at most for a small constant
. But before we even think about doing that, we should try to find a sequence that is large on many of the products of primes but that has bounded discrepancy. Only if we find it hard to do that is it worth looking for a decomposition.
Let me be more precise about what we are looking for. If we can find a decomposition of (the diagonal matrix with entries given by
) as
where the
and
are HAP-functions and
, then for every sequence
, we have two ways of calculating
. One of them gives us
, while the other gives us
. So if
, then there must be some HAP on which
has discrepancy at least
.
Therefore, in order to show that this particular will not do, we need to find a sequence
such that
averages at least 1 on the numbers
but has bounded discrepancy.
Let us make our task easier to think about by looking for a sequence that takes values on the numbers
. Can we choose the values elsewhere in such a way that we cancel out any discrepancy that we might accidentally pick up with those values?
To answer this question, let us think about how HAPs intersect the set, which, since I’m mentioning it quite a bit, I’d better give a name to: I’ll write for the set of numbers
with
. Let me also write
as shorthand for
. Any HAP that intersects
must have a common difference of the form
for some set
. The resulting HAP will contain every
for which
, which is rather pleasantly combinatorial.
Let’s suppose that we’ve chosen the values everywhere on
. What we are trying to do now is choose values off
(not necessarily
) in such a way that any HAP-discrepancy contributed by values in
is cancelled out by a roughly opposite discrepancy in the values outside
.
An obvious way to do that is to be fairly greedy about it. That is, we look at HAPs that have a substantial intersection with and simply choose a few further values on those HAPs, hoping that our choices won’t mess up too many discrepancies elsewhere. Let’s see if we can get something like that to work.
Consider, then, the (infinite) HAP with common difference . As already mentioned, this intersects
in every
such that
. So as we trundle along the multiples
, we find that the partial sums of the
s that we encounter and have already chosen go up and down, and we would like to choose some further values to ensure that they remain bounded.
At which values of are we still at liberty to choose the value
? It must be a multiple
of
, and two ways of ensuring that that multiple does not lie in
are (i) to make sure that
is also a multiple of some
with
and (ii) to make sure that
is a multiple of
for some
.
The question now is whether if we do that we will find that the value we choose has an effect on lots of other HAPs at the same time. And it looks rather as though it will. Suppose
is quite a large set. Then any multiple
that does not belong to
is also a multiple of
for every
, so there is indeed a danger that by choosing a value of
we are messing up quite a lot of HAPs and not just the one we were interested in. We could of course try to sort out the HAPs with large
first and later correct any damage we had done to smaller sets — but a large set
has a lot of subsets.
This is another situation where the devil is in the detail. Maybe there is enough flexibility that some kind of careful greedy approach would work — perhaps we could even choose to be 1 everywhere on
— but so far it seems at least possible that there exists an efficient decomposition of a diagonal matrix with entries that are concentrated on square-free numbers with many prime factors.
It is sometimes convenient to describe arithmetic progressions of the form as HAPs. Let me do that now, and let me define two HAPs to be adjacent if they are disjoint and their union is also a HAP (in this generalized sense). In other words, two HAPs are adjacent if one is simply a continuation of the other.
If and
are HAP-functions coming from adjacent HAPs
and
(in which case I’ll say that they are adjacent HAP-functions), then
is 1 on the diagonal at all points in the HAP
, while off the diagonal it is 1 on
and
latex P\times Q\cup Q\times P$. If we take lots of products of this form, we can hope to get lots of cancellation off the diagonal, while getting none at all on the diagonal.
Let’s try to pursue this idea in more detail. Suppose we have some kind of probability distribution on functions of this form. That is, we pick, according to some distribution, a random set and then a random pair of adjacent HAPs
and
. Now let
and
be their characteristic functions and let
and
be two positive integers. What is the expected value of
?
A necessary condition for this quantity to be non-zero is that should divide both
and
. So let us condition on this event. If we are clever about the way we choose our probability distribution, we should be able to organize it so that if you choose a random
, then typically plenty of its subsets have a similar probability of being chosen. The reason that is potentially important is that if we fix a common difference and pick random pairs of adjacent HAPs
and
of some given length, then pairs
that are close to the diagonal will tend to get positive values (because normally if one of them is in
then so is the other, and similarly for
) while pairs that are a little further away tend to get negative values. But if we have many different common differences in operation then we can hope that some of these biases will cancel out: a difference
that tends to result in a positive value at one scale might tend to result in a negative value at another scale.
While writing this I have just remembered a useful trick that we were well aware of during the previous attempt at EDP. It is straightforward to show that EDP is equivalent to the same question for the rationals. That is, one wishes to show that for every function defined on the positive rationals and every constant
there is a HAP (the definition is obvious) on which the sum of the function has absolute value at least
. The nice thing about this formulation is a much greater symmetry: for every rational
, the map
is an isomorphism from
(considered as a group under addition) to itself. This makes it unnecessary to think about numbers with lots of factors. (It is not hard to get into this nice situation using just integers — for example, you can just multiply everything by
for a very large
and you can divide your numbers by whatever you like, within reason — but it is more natural to use rationals.)
The hope was that if we thought about functions defined on the rationals, then the rather peculiar properties of the diagonal matrix one wishes to decompose might become less peculiar. However, one would still need to think carefully about ratios of numbers.
I’m tempted to continue thinking aloud about these possibilities, but I’ll save that up for the comments.
A second wild guess.
I’m now going to ignore what I’ve just written about working with the rationals and go back to the question of trying to find a natural function that is biased towards positive integers with many factors. What I’d like to do is build a sequence of non-negative functions , each better than the last, in the hope of eventually finding one that has (or at least doesn’t obviously not have) the property that every sequence
such that
has large HAP-discrepancy. (To make this non-trivial, I also need
.)
I’ll start with a function I know doesn’t work: let for every
and 0 for every
. That fails because the sequence
has HAP-discrepancy 1 but
(at least if
is a multiple of 3). That doesn’t satisfy the condition I said, but it does if we multiply it by
.
The problem there was that has a large sum on non-multiples of 3. To correct that and similar problems, it feels natural to give greater weight to numbers with more factors. But how? Well, let’s do it in a rather naive way and simply weight each number by how many factors it has (and therefore how many HAPs it belongs to). This at least has the advantage of being a natural number-theoretic function: it is (up to a constant) the Dirichlet convolution of the constant function 1 with itself, at least until you get beyond
. That is,
.
Roughly how big is ? This sum is the number of pairs
of positive integers such that
(since
is the number of pairs
of positive integers such that
). Counting the number of pairs with
we can crudely estimate this as
, which is roughly
. On the other hand, the Dirichlet convolution with itself of the function that is
up to
and 0 thereafter sums to
(since each pair of positive integers
with
contributes 1 to the sum). Since the two functions are equal up to
, we find that the vast majority of the second function lies beyond the point where it equals
.
But let’s not worry about that. On average, grows like
, which is not too fast a rate of growth, so let’s simply take the function
up to
. We now want to prove that every sequence
such that
has HAP-discrepancy at least
(for some
that tends to infinity with
). Alternatively, we want to find a counterexample, which will induce us to try to improve the function.
An obvious example to try is the usual one, namely the sequence . To see whether this works, we need to estimate the sum of
over all non-multiples of
up to
. That equals the number of pairs
such that neither
nor
is a multiple of 3 and
. And that is roughly
times what you get without the divisibility condition. In other words, it is roughly
. So to get
to equal
we need to multiply the sequence by
, which gives it a HAP-discrepancy of
, a slight improvement on the
that we obtained with a constant function.
This doesn’t prove much — maybe there are better examples — but it suggests in a weak way that taking Dirichlet convolutions results in improved functions. To be clear what I mean by “improved”, I am now looking for a non-negative function defined on
such that for large
, or perhaps even all
, if
is any sequence such that
, then
has HAP-discrepancy at least
. The bigger I can get
to be, the better I consider the function
to be.
If you take repeated Dirichlet convolutions of the constant function 1, then you get functions , where
is the number of
-tuples
of positive integers such that
. (The function
is equal to
.) Suppose
is large,
is very large, and
is a sequence such that
. Does it follow that
has large HAP-discrepancy? More ambitiously, does it have HAP-discrepancy that grows exponentially with
, provided that
is large enough?
Back to the rationals.
The following idea is one that feels familiar — I think something similar has come up already. But there’s nothing like a break from a problem to make an old stale idea seem new and fresh, and sometimes the apparent staleness was an illusion. So here it is again.
I’ll begin by creating a function on the diagonal that seems to me to have a sporting chance of being efficiently decomposable into HAPs. Let (for generating set) be the set
and let
be the
-fold multiplicative convolution of the characteristic function of
with itself. That is,
is the number of ways of writing
as
with
and all the
and
being integers between 1 and
.
Before I think about how one might go about decomposing the diagonal “matrix” (that is, a function defined on that is zero at
unless
) into products of HAP-functions, I want to see whether it looks hard to find a function
such that
but
is of bounded HAP-discrepancy.
To have any chance of that, we need to understand a bit about . We can get that understanding by representing every rational as a product of (possibly negative) powers of distinct primes. For the rationals where
is non-zero, the primes are all at most
, so we can think of
as a function defined on a lattice of dimension
(the number of primes less than or equal to
). Then
is the
-fold convolution of the characteristic function of
regarded as a subset of this lattice. If
is large enough, then this convolution will look like a large and spread-out Gaussian, which will have the potentially useful property that it is approximately invariant if you multiply (or divide) by an integer between 1 and
.
If has that property, does it rule out some kind of variant of the
example? A key feature of that example is that the function in question is zero on multiples of 3, but in
there is no such thing as a multiple of 3 (or rather, everything is a multiple of 3 so the concept ceases to make sense).
But that argument is far from conclusive. How about defining a function as follows: given a rational in its lowest terms, let
be 0 if either
or
is divisible by 3, and otherwise 1 if
mod 3 and
if
mod 3?
Let us think what the values of are at
,
, etc. If
is a multiple of 3, then all those values are 0 (since
is not a multiple of 3). If
is a multiple of
but not
, then
except if
is a multiple of
. If
and
are congruent mod 3, then the values of
go
. If they are not congruent mod 3, then they go
. That last argument worked even if
, so we have covered all cases.
What is the density of rationals such that neither numerator nor denominator is a multiple of 3? I haven’t made the question precise, and that allows me to give two different answers. The first answer is that if we pick the numerator and denominator randomly, then the probability that neither is a multiple of 3 is . Also, the probability that they are coprime is
, and the probability that they are both coprime and not multiples of 3 can easily be shown by similar methods to be at least an absolute constant. So it seems that our function is supported on a set of rationals of positive density.
But another way of thinking about it suggests that the support of the function has zero density: every rational can be written as a product of powers of primes, and the probability that the power of 3 that we choose is 0 tends to 0 as we choose more and more rationals.
The second answer is, I think, the correct one for us, though the question is a bit misleading. The -density of rationals for which the “3-coordinate” is zero does indeed tend to zero, so if we regard
as our measure, rather than something more additive, then we do appear to have ruled out examples that are similar to the
example.
Can we use information about the original problem to help us with the dual problem?
We know that for the integers there is not an efficient HAP-decomposition of a diagonal matrix that is 1 for the first terms and zero afterwards. The proof we have is that if such a decomposition existed, then the sequence
would have to have large HAP-discrepancy, which it doesn’t. But can we give a “direct” proof? I’m not quite sure what I mean by that, except that it should not involve the use of a separating functional. Ideally what I’d like to do is use that example and others like it to work out some rather general constraint that would end up saying a precise version of, “If an efficiently decomposable matrix equals 1 on the diagonal, then it must have quite a lot of stuff off the diagonal,” and ideally say a fair amount about that “stuff”.
The hope would be to find a clear obstacle on the dual side to a direct approach to decomposing the identity matrix, and then to show that when we move to the measure on the rationals defined in the previous section, that obstacle goes away and allows us to do what we wished we could do over the integers. The thing that would make life easier would be the ability to divide anything by small integers.
Actually, maybe it isn’t so important to go for the dual side. Let’s just understand what the existing proof is telling us goes wrong when we try to find an efficient HAP-decomposition. For the decomposition to be efficient, the HAPs we use have to be quite long. And for that to be the case, they must contain roughly as many numbers congruent to 1 mod 3 as they do numbers congruent to 2 mod 3. (They might contain none of either.) Therefore, any HAP product has roughly as many points congruent to or
mod 3 as points congruent to
or
mod 3. Let us call these white points and black points, respectively. By roughly I mean that these two numbers differ by at most a constant. In fact, I think they differ by at most 1. But no points on the diagonal are congruent to
or to
, so if a product of HAPs includes
points on the diagonal that are congruent to
or
, then it must include at least
more black points than white points off the diagonal.
So, roughly speaking, we can’t build up the non-multiples of 3 on the diagonal without building up a bias towards black points off the diagonal.
Does that matter? Can’t we correct that unwanted bias by adding or subtracting some other HAP products? No we can’t, unless we also cancel out much of the sum on the diagonal.
What rescues us if we deal with rationals instead (or more precisely a portion of the rationals weighted by the function )? Also, how easy is it to find an efficient decomposition so that the weight off the diagonal is comparable to the weight on it? I suspect the answer to the second question is that it is not easy, since we have used just the fact that the sequence
has bounded discrepancy, but there are many other examples.
A variant of the HAP problem.
I stopped the previous section prematurely because I was feeling a bit stuck. Instead I would like to think about a modified problem that may well be easier. EDP asks for a high discrepancy inside some HAP. The set of HAPs has the nice symmetry property that if you dilate a HAP then you get another one. This is particularly good over the rationals, when you can also contract.
Now let me ask a slightly vague question and then attempt to make it precise. Can we prove EDP for a different “base collection” of sets? For EDP I’m regarding the base collection as the collection of all sets . If you look at all dilations of those, you get all HAPs. Since I’m having trouble proving EDP for HAPs, I’d like to try to prove it for a different base collection. Of course, the smaller the collection, and the more closely related it is to the set of HAPs, the better, but for now my priority is to be able to prove something.
I actually have a collection of sets in mind. Let us think of the rationals multiplicatively as an infinite-dimensional lattice, or rather the portion of that lattice that consists of points with only finitely many non-zero coordinates. That is, the point corresponds to the rational
, the
being integers. I’m not sure exactly what I want to do next, so let me be slightly less precise. If we look at the rationals multiplicatively, as we are doing here, then dilation becomes simply translation. What I’d really like is something like an orthonormal basis, but I’d like it to consist largely of functions that are translates of one another. So far that won’t work, but it looks a lot more hopeful if we allow some kind of “spreading out” (which would be the analogue of taking longer and longer intervals). The most natural type of spreading out is, I would suggest, to take repeated convolutions. So a possible question — not necessarily the exact one I’ll end up asking — is this. Suppose we start with the set
that we considered earlier. Is it true that every function that has large
norm relative to the measure defined by (multiplicatively) convolving
times has a large inner product with some (multiplicative) translate of some iterated convolution of
?
This problem should be not that hard since it is purely multiplicative, so we can take logs and make it purely additive. If the answer turned out to be yes, then for any -function we would get a large inner product with some iterated convolution of
. If we could nicely decompose that iterated convolution into HAPs, we would then be done, though that seems like quite a lot to ask.
Hmm … I’ve almost instantly run into trouble. It seems that my “multiplicative version” of EDP is just plain trivially false. Consider the function that is 1 for products of an even number of primes and -1 for products of odd numbers of primes. (This is the completely multiplicative function , with which we are already very familiar.) From a multiplicative point of view, where we think of
as an infinite-dimensional lattice, this is putting 1 at “white points” and -1 at “black points”. If we now take any “cuboid”, by which I mean a set of the form
for every
, then the sum of
on that cuboid is either 0, 1 or -1. I was about to say that I couldn’t see an elegant proof of this, but here’s a fairly quick one. If we remove two adjacent slices from the cuboid, we remove an equal number of white and black squares. “Removing two adjacent slices” means reducing
by 2, provided
. If some
is even, then by repeating this operation for that
we can remove all points, which shows that the numbers of white and black points were originally the same. If all
are odd, then we end up reducing the cuboid to a set with just one point, in which case we get
.
If we multiplicatively convolve any two cuboids, we obtain a function that can be expressed as a linear combination of translates of a cuboid, so its inner product with will also be small, as will those of all its translates. (What’s more, those small inner products will themselves alternate in sign as you multiply or divide by primes.) So
will have small inner product with everything you build out of products of GPs using multiplicative convolutions.
It could be fruitful to think about why HAPs have a better chance of giving rise to discrepancy than the kinds of multiplicative sets I’ve just considered. In preparation for that, here is a question to which I know the answer. Let be a nested collection of subsets of
such that
for every
and
. Is it possible for there to be a
function
and a constant
such that
for every
and every
? In the case that
for every
this is EDP. But for a general nested collection, it is easy to see that bounded discrepancy is possible for that collection and all its dilates. You just take a completely multiplicative function such as
and choose
in such a way that
for every
(and such that the
are distinct and every integer is equal to
for some
). Then the sum
is always either -1 or 0, and multiplicativity ensures that the sum on dilates of the
is always either 1, -1 or 0.
The partial sums of grow (it is believed) like
, so if we choose our sets
greedily — that is, by taking
to be the smallest positive integer we have not yet chosen such that
takes the value
— then
will contain the first
integers, where
is around
(at most), and look a bit like a random selection of integers within around
of
for the rest of the set. In other words,
will be pretty close to the set
, so the system of sets
will be pretty close to the set of all HAPs. This suggests that proving EDP is going to be really quite delicate.
In fact, it will be more delicate still, since we can take a better multiplicative function than . If we take the unique completely multiplicative function that takes
to 1 if
is congruent to 1 mod 3, to
if
is congruent to 2 mod 3, and 3 to -1, its partial sums grow logarithmically, so we can say something similar to the above but with
around
rather than around
.
We can go slightly further with this function. Because it is 1 on the infinite arithmetic progression and
on the infinite arithmetic progression
, we can choose our sets
to be intervals up to roughly
plus logarithmically short (non-homogeneous) arithmetic progressions of common difference 3.
Since these sets are cooked up so as to ensure that the function has bounded discrepancy, they might seem not that interesting. But I think they are important, because if we want to find an efficient decomposition of a diagonal matrix into HAP products, we probably need to understand what it is that makes HAPs so much better than HAPs with highly structured tiny logarithmic perturbations at the end. Otherwise we are in danger of taking seriously arguments that would work just as well for statements that we know to be false.
As I write, it occurs to me that we might be able to prove a result that would be quite a bit easier than EDP but nevertheless interesting (and in particular interesting enough to be publishable, unless the answer turns out to be disappointingly easy): that there exists a permutation of
such that the discrepancy of every
sequence on the sets of the form
is unbounded. We know it is not true for all permutations — even ones that in some sense don’t permute by very much — and we suspect that it is true for the identity permutation but find that very hard to prove. What about if
is in some sense “fairly random”? For example, what if we chose a large
and randomly permuted the first
integers. That would give us
sets and all their dilates. Could we prove that the discrepancy on that set system is at least
? What if we drop the condition that the union of the
should be all of
? What if instead we ask that
for some strictly increasing sequence
?
I rather like those questions, which makes this a good place to stop the post — for the length of which I apologize.
September 1, 2012 at 9:03 am |
“Problem. Show that if
and
are any two sequences of unit vectors in a Hilbert space, then for every constant
there exist HAPs P and Q such that
.”
This seems to fail if you choose two orthogonal unit vectors a and b and set the sequence a to be constantly a and the sequence b to be constantly b.
September 1, 2012 at 9:57 am
Whoops — I missed out the crucial condition that
for every
. I’ve now gone back and added it.
September 1, 2012 at 11:48 am
Ah, so they don’t have to be unit vectors?
September 1, 2012 at 12:03 pm
Indeed. I’ll change that too. (When I wrote the post I was lazy and just relied on my faulty memory rather than thinking about the statement or looking it up.)
September 1, 2012 at 10:30 am |
After a good night’s sleep I’m losing faith in the questions with which I finished the post. Let me give a rough reason that I think the answer to the last one (about increasing sequences) is yes, but for an uninteresting reason. What I say won’t quite be a proof, but it will be close. Recall first that for every prime of the form
, there is a
matrix with
entries such that each row is the previous row translated by 1 mod
, and such that the inner product of any two rows is -1. (The matrix is simple to define:
if and only if
is a quadratic residue. If you add an extra row and column filled with 1s, you get the Paley matrix of size
, which has orthogonal rows and columns.) Inspired by that, let us take our sequence
as follows: we let
be a large prime of the form
and let
be
, where
is the
th quadratic residue mod
. If we dilate this sequence, then looked at along the powers of 2 what we are actually doing is translating it. If we also had some kind of wraparound (which we don’t), then the result would be a whole lot of translates that were all equal to the constant sequence
plus contributions that were orthogonal to one another. In that case, we could simply expand any
sequence in terms of that orthogonal basis and obtain a discrepancy of around
. We could then string together a lot of sequences like that for a rapidly increasing sequence of primes
.
Actually, I think this argument can be turned into a correct one if we simply repeat the sequence a couple of times so that a segment of it has the orthogonality properties I want, but I haven’t checked the details.
If all that is correct, then we need to impose extra conditions on the sequence
to make it interesting if we can prove unbounded discrepancy.
September 1, 2012 at 10:42 am
As a PS to that last remark, if it is correct, it also kills off the permutations version, since we can pick a permutation in such a way that
is any sequence we like, and then we can go back and “fill up” much later.
September 1, 2012 at 11:13 am
And a PPS that kills the question off completely. There was no need to use Paley matrices. All we have to do is make sure that for every
and every subset
there exists
such that for every
,
is a term in the sequence if and only if
. In other words, roughly speaking you make sure that the set of powers of 2 you choose contains a copy of every finite set. Then for every
sequence, to show that it has discrepancy at least
, pick
so large that all subsets of
have appeared by
. Then we can dilate the sequence so that it either picks out exactly the 1s in the set
or all the -1s. At least one of those will give us a discrepancy of
.
The remark just above shows that this deals with the permutations version as well.
September 1, 2012 at 11:30 am |
The comment above and its subcomments dispose of the questions in the last paragraph. The following extra conditions may possibly rescue them. In the permutations case, one can insist that
for every
, and in the sequence case one can insist that
.
In both cases, the conditions are there to rule out the appearance of long sequences of powers of 2 (or of any number for that matter). I hope they are a good combination of quite weak from a structural point of view, but strong enough to rule out all counterexamples that use the basic set of ideas that showed that the questions as originally formulated were uninteresting.
September 1, 2012 at 12:16 pm |
“The proof that the diagonal decomposition implies EDP actually proves a much stronger result. It gives us a function
such that
such that if
is any sequence of real numbers (not necessarily
-valued) with
, then \epsilon has unbounded HAP-discrepancy.”
for all n in this set and 0 for all other n, we get a sequence with bounded discrepancy (since it has only finitely many non-zero terms) that satisfies
. I don’t know if you meant the last formula to be
?
If the d(n)’s sum to infinity, there is a finite subset that sums to at least 1. By setting
What is the simplest non-trivial (that is,
) example we know of a decomposition of a diagonal matrix?
September 1, 2012 at 3:21 pm
I seem to remember that we basically didn’t have a non-trivial example: I think finding one may be a rather hard problem (related to the fact that there are long sequences with small discrepancy even in the
case and therefore quite possibly even longer ones in the two-vector-valued-sequences case). So if we want to obtain partial results, it might be better instead to try to find interesting decompositions of matrices that are “reasonably well concentrated on the diagonal”, whatever that might mean.
September 1, 2012 at 3:32 pm
I thought there was a computer search for such decompositions? Someone used linear programming to find possible values on the diagonal. Didn’t we have a decomposition of the corresponding diagonal matrix?
September 1, 2012 at 3:42 pm
I think so, but in order to be able to regard the constant as non-trivial we had to be generous in our definition of HAP — allowing sets of the form
. I think if you insist on sets of the form
then we didn’t find any examples with
. That’s a problem for any approach that might hope to use some kind of product construction, since one would like the examples to get better rather than worse when you take products!
September 1, 2012 at 4:08 pm
Ah. So do we have simple non-trivial examples using the generalized HAPs? Or just using any sets? I can’t think of any examples.
September 1, 2012 at 4:39 pm
My memory is that even with generalized HAPs we basically didn’t have any examples. But I’ll wait to see whether anyone else remembers the situation better. (In theory one could look it up I suppose.)
September 1, 2012 at 2:39 pm |
On the other hand, we know that if
is a sequence with
for all i, then
, so we cannot choose
to be supported on e.g. the factorials.
September 1, 2012 at 4:30 pm |
I’ve remembered why the strengthening of EDP that says that the common difference of the HAP can be taken as either a prime or a power of 2 had a chance of being true. (But it seems a bit outrageous, so I would stop short of conjecturing that it is true.) If you want to have bounded discrepancy on all HAPs with common difference an odd prime, then that’s easy. For example, the sequence 1, -1, 1, -1, … will do the job. More generally, so will lots of periodic sequences of even length: if the period is
then you need to ensure that for every prime
that divides
the number of residues divisible by
taking the value 1 is equal to the number taking the value -1. An example with
is
repeated over and over again.
But no periodic
sequence can have bounded discrepancy on all HAPs with common difference a power of 2. Indeed, if the period is
and
is the largest power of 2 that divides
, then the restriction of the sequence to the HAP of common difference
is periodic with odd period, so has unbounded sum.
The Morse sequence is the natural example of a sequence with bounded discrepancy on all HAPs with common difference a power of 2. But the Morse sequence has unbounded discrepancy on other HAPs. (I can’t remember whether we checked this for primes, but I think we did.)
The fact that the strengthening isn’t obviously false suggests the following question: what sorts of matrices can we make out of products of HAPs of prime common difference? I’ll stop this comment and think about that question in a new comment.
September 1, 2012 at 5:19 pm |
I now want to think about what we can do if we take combinations of HAP products where the HAPs have prime common differences. I’ll start with a rather simple case, where we choose weights
for each odd prime and take the sum
, where
is 1 on all multiples of
and 0 everywhere else. The value of this sum at the point
is
. Can we choose the weights
in a way that makes this sum “nice”?
One set of weights that springs to mind straight away is to take
. In fact, I’d rather take all HAPs with prime power difference and take
. This is using the von Mangoldt function
(defined to be
at
and 0 at all integers that are not prime powers). That is, we are taking the sum
, which is easily checked to equal
.
The level sets of this function are sets of the form
if you’ll excuse the confusing notation (the first
is an ordered pair and the second the hcf of
and
). They are of some moderate interest in that pairs on the diagonal tend to have much bigger higher common factors than pairs off the diagonal.
The matrix we have obtained is not diagonal, but we can still ask how efficient the decomposition is. The total mass on the diagonal up to
is roughly
. The number of powers of
up to
is roughly
, but … hmm … that’s killed by the
term. Up to a constant I think this is
, which is around
. (I’m not certain that was correct.) As for the sum of the coefficients, the sum over all powers of
is roughly
, so the total sum is roughly
. So it looks as though we would have an efficient decomposition if only we could pretend that the off-diagonal terms didn’t exist.
What can we learn from a decomposition of a non-diagonal matrix
into HAP products? If
, then we can still look at
in two different ways. The fact that the sum of the coefficients above was at most
tells us that if
is at least
, then the sequence
must have HAP discrepancy at least
. Recalling that
, that tells us that for a sequence
to have bounded discrepancy on all HAPs with prime-power common difference, the quantity
must be bounded.
I’ll stop this comment here, because I have no idea whether that is a difficult condition for a sequence to satisfy.
September 1, 2012 at 5:40 pm
A quick remark about that last question — one could consider attacking it using Moses Charikar’s idea. That is, one would like to try to subtract some mass from the diagonal and still have a matrix that is non-negative definite. Can that be done?
September 3, 2012 at 7:45 pm
Just the obvious remark: since a diagonally dominant matrix is PSD, one question to ask is how much can we subtract from the diagonal while the remaining matrix is still diagonally dominant.
So instead of a decomposition of a diagonal matrix, it is enough to look for a decomposition of a strongly diagonally dominant matrix, i.e. a decomposition of a symmetric matrix
where each diagonal entry
satisfies
for
. The goal is to maximise
with respect to the coefficients in the decomposition.
September 2, 2012 at 10:49 am |
It occurs to me that a useful thing we might do is describe all matrices that can be efficiently decomposed into products of HAPs of common difference 1. If we understood that properly, then for the primes/powers-of-2 question, we would be in a position where we needed to choose just one matrix (of a kind we understood) for each possible common difference. There would be a similar benefit for EDP itself, though possibly one that would be harder to use.
Let me be clear what I mean by an “efficient decomposition” in this context. I mean a decomposition
such that
is significantly less than
. That is, the sum of the absolute values of the coefficients of the products of HAP-functions is significantly less than the sum along the diagonal of the resulting matrix. If
and
are both the characteristic function of
, then the contribution to the diagonal is
, so a rough way of thinking about efficient decompositions is to say that we mostly use HAPs of length significantly greater than 1, and the sum of the positive contributions is not cancelled out by the sum of the negative contributions.
If all the HAPs have common difference 1, then one thing we know about the resulting matrix
is that its inner product will be small with any matrix of the form
, where
is a sequence that has bounded partial sums. Note that this very strongly rules out
being diagonal. However, it doesn’t rule out
living in a band around the diagonal: indeed, if we take a sum of the form
where each
is an interval of length
, then the resulting matrix will be efficiently decomposed and will live in a band of width
around the diagonal.
As an exercise, can we characterize infinite sequences that have inner product at most
with all sequences that have partial sums bounded above (in absolute value) by 1? It seems to say something about the amount of oscillation that’s allowed. Indeed, a necessary condition is that for any
the sum
should be at most
. In fact, what it seems to boil down to is that the sum
should be bounded.
The condition for matrices rather than sequences will probably be a bit more complicated, but I think it should be possible to work it out.
September 2, 2012 at 11:38 am |
Come to think of it, maybe the condition for matrices is the obvious generalization of the condition for sequences. Could it be that a matrix
can be efficiently decomposed as a linear combination of products of intervals (of integers) if and only if the sum of all
is small? If
is itself a product of intervals, then that sum evaluates to 4 (a contribution of 1 coming from each corner of the rectangle), so if
is a linear combination of such products then the sum is at most 4 times the sum of the absolute values of the coefficients. So the condition is necessary. But it remains to see whether it is sufficient: can every matrix for which that sum is bounded be decomposed efficiently into products of intervals?
September 2, 2012 at 1:00 pm |
To try to tackle that question, let me do the 1D case first, since I haven’t actually done so. If we have an arbitrary sequence, we can express it as a sum of initial segments of
using something like partial summation. If the sequence is
, and
is the set
(or rather its characteristic function), then the sequence can be written as
.
In fact, that isn’t even remotely clever: we just observe that the sequence that is 1 in the
th place and 0 everywhere else can be written as
and see what we get.
Can we do something similar in 2D? Let’s write
for the set of all
such that
and
. Then
takes the value 1 at
and 0 everywhere else.
If we now take a function
in two positive integer variables, then the sum we end up looking at is
, which works out as
. In particular, if the sum
is at most
, then there is a decomposition of
into products of intervals, with sums of absolute values of coefficients at most
.
September 2, 2012 at 3:13 pm |
I was aware while writing that that I had done precisely the same calculation when we last worked on EDP. Indeed, I feel myself being drawn inexorably towards an approach we had to the problem back then, which is worrying, because that approach seemed pretty hard to make work.
The observation back then was that if we work in the rationals, then we can take any decomposition of a diagonal map that we might have and average it with a whole lot of dilates of the same decomposition. In that way, we can obtain a decomposition of a diagonal matrix that is approximately invariant under multiplication or division by small positive integers. If we look “locally”, then we can pretend that we have a decomposition of the identity over the rationals (that is, the function
defined on
that’s 1 when
and 0 otherwise).
What this is effectively saying (I think) is that we should be able to take a single matrix
made out of products of intervals and then add up all its dilates. As long as
is absolutely summable, this will give us a well-defined infinite matrix indexed by
. Its value at
will be
. So what we’d like is a matrix with the two-dimensional bounded-total-oscillation property from the last comment that sums to zero along all “rays” in
. This feels like a much easier thing to look for than a general decomposition of a diagonal matrix into products of HAP-functions, but is it? I’ll think about that in a new comment.
September 2, 2012 at 3:58 pm |
I need to correct what I said above. To work out the value at
we first write
as
with
and
coprime positive integers, and then the value at
is
. So that has to be zero whenever
and
are coprime and not both equal to 1.
This won’t work, but I’d nevertheless like to think about a function of the form
for some complex
. That gives us
, where
is the Riemann zeta function and
. That is completely hopeless, since whether or not it is zero appears to have no significant dependence on the pair
, and in any case
.
What was the thought that led to this hopeless idea? It was roughly this. If you take a function of the form
for, say, an irrational
, then as long as
, the average along all multiples of
will be 0. Meanwhile, if
, then the average is 1. However, this function fails to do the job we want, because its total 2D oscillation is infinite.
What I’d like is something similar but with a matrix whose entries are absolutely summable. To prove that the sum along non-trivial rays was zero, I would ideally like something like a multiplicativity property that would allow me to use essentially the same argument for all rays — apart, that is, from the main diagonal where we wouldn’t get zero because we wouldn’t have the oscillation.
There may be a good number-theoretic reason for this being impossible, but suppose we had a completely multiplicative
-valued function
such that the Dirichlet series
had a zero at some real number
. What could we say about the function
? If we summed it along the diagonal, we’d get
, which would be positive. But if we summed it along multiples of
for some coprime pair, we would get
. This is again hopeless, since again it doesn’t in fact distinguish between points on the diagonal and points off it.
September 2, 2012 at 5:01 pm |
On further reflection, it is silly to look at functions of the form
with
completely multiplicative, since the sum along multiples of
will be
which will not pick out the diagonal.
What if
is not multiplicative? Is it too simple to look for a rank-1 function
, say? Then the sum over multiples of
is
. For that to work, we would need the sequences
to be orthogonal to each other whenever
, and we would also want the function
to have bounded total oscillation and to tend to zero. That seems to be equivalent to being absolutely summable.
Is there an absolutely summable sequence
defined on the positive integers such that for any pair of coprime positive integers
and
the sum
is zero? It seems like a lot to ask, and I think it might be reasonably easy to prove that no such sequence exists.
September 3, 2012 at 8:31 am
Realized later: having bounded total oscillation and tending to zero is not equivalent to being absolutely summable (which is good news for a number of reasons). For example, if
has bounded total oscillation and tends to zero and
is obtained by taking the first
terms to equal
, the next
terms to equal
, and so on, then
has bounded total oscillation and tends to zero. Also, any decreasing sequence that tends to zero trivially has bounded total oscillation.
I think what was in the back of my mind was that if you want to choose moduli such that choosing even quite random-looking signs results in bounded total oscillation, then you’d better make the moduli have a finite sum.
September 2, 2012 at 5:52 pm |
Although the condition on the sequence is pretty strong, it might be worth at least attempting to obtain it with the help of a greedy algorithm or just-do-it argument. For example, perhaps one could fix some
and take
to be of the form
for some bounded (maybe even
-valued) sequence
. If
was only just bigger than 1, then it is just conceivable that one could keep track of all the sums
that are required to be zero, and ensure that they are all tending to zero.
September 3, 2012 at 12:16 am |
How easy is it for a non-trivial sequence
to be absolutely summable and to have the property that
for every
? For that it would be enough to find a completely multiplicative
-valued sequence
and a real number
such that
. And by the intermediate value theorem, it would be enough to find
such that that sum is negative, since it will tend to 1 as
.
But let’s try a greedyish algorithm. I’ll start with
. To compensate for that I’ll take
. Now I’m fine for
but not so fine for
. To compensate for
I could take
, but now I’ve messed up
, as well as
again. That’s more or less inevitable, but if I keep going like this I’m not going to get an absolutely summable sequence.
So to make things a bit more hopeful, let me start with
and then take
. The advantage of this is that I can correct the
and
problems by taking
. However, let me learn from my earlier mistakes and take
. Now I have problems with
(so far the sum is
),
(so far the sum is
) and
(so far the sum is
). As for the sum of the absolute values, I’ve got up to 1+1/2+1/2+1/4+1/4. I must stop for today, but I have the feeling that pursuing a construction like this I can get an example. Of course, it’s just a 1D example whereas I’m looking for something 2D-ish. But it might be a useful start.
September 3, 2012 at 9:03 am
I now realize that I don’t need absolute summability — see reply to previous comment but one.
September 3, 2012 at 8:57 am |
Another little thought. Suppose that
is a zero of the Riemann zeta function. What can we say about the total oscillation of the sequence
? If
, then
. The sequence spirals in towards zero. Speaking roughly, there will be some number
such that between successive powers of
the sequence has done one rotation. So the total oscillation is going to be something like
. In particular, it is finite.
And now, if we sum along multiples of
we get
.
September 3, 2012 at 9:07 am
Hmm … what I wrote there was a bit fanciful, since the Dirichlet series doesn’t actually have a convergent sum.
September 3, 2012 at 9:17 am |
If a series sums to zero and has bounded total oscillation, must it converge absolutely? I don’t think so. Let’s take the series
, where
and changes sign at
. This series doesn’t converge absolutely. The total oscillation is, up to an additive constant,
. To ensure that the sums
tend to zero as
, we need
for some sequence
that tends to zero. That is, the
must grow more slowly than exponentially. But there is no problem in doing that and getting the sums of their reciprocals to be bounded. For instance, we can take
. That gives us a series that converges. Adding an extra term on to the beginning, we can get the sum to be zero.
September 3, 2012 at 3:21 pm |
I’ve just remembered something from the last time we tried EDP, which suggests that one has to be a little bit careful with this approach. It’s that it is not hard to give a non-trivial example of a linear combination
of products of intervals, with absolutely summable coefficients, such that
whenever
and
and
are not both 1. The rough idea is this. First you take the interval
for some large
and multiply it by itself. You now have
1s on the diagonal and
1s off the diagonal. Now pick a pair
of coprime integers that are at most
. Suppose that the sum along multiples of
is
. We can make it zero by picking very long intervals
and
in such a way that
contains
multiples of
, and attaching a coefficient of
to this product. Moreover, we can make
as large as we want, so we can make the coefficient as small as we want.
Once we have made this correction, we can go on and correct other pairs. Moreover, once we have ensured that the sum along multiples of
is zero, we can be careful to avoid those multiples in all future products of intervals that we use — just by making those intervals start a very long way away from 0.
So in a rather trivial way we can find an infinite matrix that’s a linear combination of HAP products with long HAPs and coefficients whose absolute values sum to at most 1, and we can do so in such a way that the sum along the diagonal is at least
and the sum along multiples of
is 0 whenever
and
are coprime and not both 1.
I remember getting very excited about this, and then realizing that it doesn’t prove anything. I need to check why it doesn’t imply EDP, since that will make it necessary to impose an extra condition on the matrix
.
September 3, 2012 at 4:18 pm |
OK, suppose we have an infinite matrix
, indexed by the positive integers, with the following properties.
(i)
can be written as
, where
and
are characteristic functions of intervals of integers and
.
(ii)
.
(iii) If
and
are coprime and not both equal to 1, then
.
Let me try to deduce EDP from the existence of
. Barring a major surprise, I will fail, but I want to understand why.
Incidentally, there are two reasons that it would be extraordinary if the proof worked. The first is that I realized last time that it didn’t work. But that on its own is not a good enough reason — it’s easy to make mistakes when deciding that something can’t work. A better reason is that there is something too simple about this approach. A solution to EDP ought to involve more work, and I can’t escape the feeling that if an approach like this worked, then it would prove a number of other statements that we know to be false. (However, I don’t have a way of making that hunch precise.)
The first thing I’m going to do is create a matrix
out of
, this one indexed by the positive rationals. To do this, for each rational
define
by the formula
, and then let
. If
is the largest rational that goes into both
and
, then
for a pair
and
of coprime integers. Then
, which is non-zero only if
is an integer (since both
and
are integers and
is an integer combination of them). The values of
for which
is an integer are
, where
takes the values
. These sum to zero unless
, by property (iii) of
, while if
, so
, they sum to a constant greater than
.
What we have here is a way of expressing the identity matrix indexed by
as a linear combination of HAP products with coefficients that are in some sense “small”. The sense in which they are small is that each matrix
contributes more to the diagonal than it does to the sum of the absolute values of the coefficients, by a factor of at least
.
The hope was to use some kind of limiting argument to show that this clean and tidy infinite situation could be approximated by a less clean and tidy large finite situation. It was there that the trouble arose. I’ll start a new comment to think about it.
September 3, 2012 at 4:51 pm |
Let’s suppose I’ve got a
-valued function
defined on
. Ah, here’s where things got a bit hairy. As usual, let’s calculate
. On the one hand, since
is just
times the infinite identity matrix (indexed over
) we get
. On the other hand, if
has discrepancy at most
on every HAP, then the inner product
is at most
for any two HAP functions
and
, so the contribution to
from each
is at most
. That is,
(by property (i) of the matrix
).
So, speaking extremely loosely, on the one hand we get a contribution of
for every positive rational, since the diagonal contribution is
, and on the other hand we get a contribution of at most
for every positive rational, since the sum is
.
It would be great if this were some kind of contradiction when
, but clearly for that we need to go into a lot more detail about how things tend to infinity. I’ll turn to that in a new comment.
September 3, 2012 at 5:14 pm |
One easy way to make everything finite is to take for
not a function that is
-valued everywhere, but rather a function that is
-valued on some very large finite set of positive rationals. To give this any chance of working, we’d want the set to be “approximately closed” under multiplication by rationals with small numerator and denominator.
We know how to define such sets, but how big do we want “small” to be in this context? Here is where it gets problematic. One might think that it would be a good idea if numerators and denominators were allowed to go up to some
such that in some useful sense “most of
” is defined on
. But what does “most of” mean here? It can’t refer to the sums of the absolute values of the coefficients
, since we can make all but the first one arbitrarily small.
At this point one notices that even if the coefficients of the HAP products are small, the HAP products themselves are large in some pretty natural norms. For example, the Hilbert-Schmidt norm of
, if
, is 1, and we were in a situation where the length of the HAPs we needed to take was proportional to the reciprocal of the coefficient we attached to the product.
September 3, 2012 at 5:28 pm |
I don’t want to throw away these ideas completely, however. Let’s make sure everything is finite by looking at a very large positive integer
, and let’s go back to EDP where the HAPs have to have common differences that are either primes or powers of 2.
I’ll say that a matrix
has total oscillation at most
if
. I’ll also call a matrix
a dilate of a matrix
if
and
is zero unless both
and
are multiples of
.
I would like to find a matrix
for each allowable common difference (that is, a prime or a power of 2) such that
(i) each
is an
matrix, where
;
(ii) the sum of the total oscillations of the
is at most
;
(iii) if
is the dilate of
by a factor
, then
is a diagonal matrix with trace at least
.
Now this looks really pretty hard if almost all the
have to be prime, so I would like to try to argue that it cannot be done. If it can’t, then we might manage to find a very interesting sequence with bounded discrepancy (for one of the modified problems). If it can, then we’re in great shape for EDP.
September 3, 2012 at 7:04 pm |
Here’s the intuitive, but very possibly wrong, reason that it seems to me not to be possible to find a set of matrices of the above form. If each
has low total oscillation, then there ought to be a fairly large rectangle inside which all the
are roughly constant (or rather, roughly constant when both coordinates are multiples of
and zero elsewhere).
Now let’s just look at primes and let’s suppose that the
are actually constant matrices. Then the value of the sum of the
at
can be expressed by a formula of the form
. But that expression just doesn’t look as though it can be anything like constant: for instance, if we restrict attention to small primes, then in any large rectangle we can find pairs
and
such that the highest common factor is a product of all small primes for which
is large and positive, and other pairs that are coprime.
The weaknesses in that argument are that larger primes do exist, and they make things more complicated, and also that, as I noted earlier, one can impose quite a lot of oscillation on, say, the sequence
, while still having a bounded total oscillation. So the large rectangle doesn’t obviously exist.
September 3, 2012 at 10:21 pm |
I’ve just realized something else, which makes the whole discussion more complicated. I was implicitly assuming that we would try to decompose a diagonal matrix as a linear combination of products of HAPs with the same common difference. But if we do that, then for every prime
there are lots of points
that are contained only in products where the common difference is
, since a positive proportion of all points
have highest common factor
.
I think this means that for the primes and powers-of-two problem one basically has to take products of HAPs with different common differences.
September 4, 2012 at 10:53 am |
I want to test that last claim more carefully. Suppose that we cannot represent any diagonal matrix of trace at least
as a linear combination of products of HAP functions where the two HAPs in each product have the same common difference that belongs to a set
and the sum of the absolute values of the coefficients is at most 1. The set of diagonal matrices of trace at least
is convex, as is the set of all linear combinations of products of HAP functions with the properties just specified, so there must be a separating functional
, which we can regard as another matrix. If
for every diagonal matrix of trace at least
, then every diagonal entry of
must be at least
. But also, no two diagonal entries can be distinct, because then we could add something to one diagonal entry of
and subtract the same amount from another, in such a way as to make the inner product with
whatever we like. So the diagonal entries of
must all equal some constant, and that constant must be at least
.
Now let’s think about what it says about
if
for every linear combination of products of HAP functions, where the two HAPs in each product have the same common difference that belongs to a set
and the sum of the absolute values of the coefficients is at most 1. That will be the case if and only if it is the case for each individual product. So we are asking for
whenever
and
are HAPs that both have common difference
, where
is required to belong to
.
This is a kind of 2D version of EDP. If we scale things up, then it asks the following. Do there exist a constant
and a function
such that
(i) for every
,
;
(ii) for every
and every pair
of HAPs of common difference
, the sum
has absolute value at most
?
In particular, do these exist if
is the set of all primes or powers of 2? If so, then for that problem we are forced to consider products of HAPs of different common differences.
Maybe this is an interesting problem to add to our collection of EDP-related questions, but I think there may turn out to be a fairly simple example that has the given properties.
September 4, 2012 at 12:44 pm |
Hmm, there is indeed a reasonably simple example. Start with a periodic doubly infinite sequence
of period 9 with the following properties.
(i)
.
(ii)
.
(iii)
.
An example is the sequence where
are
. The discrepancy of this sequence along any HAP of common difference that is not a multiple of 9 is at most 1. Now let us define
to be
. Then we certainly have property (i) from the previous comment. How about property (ii)? I’m fairly sure we get it with
at most something like 6, for all
that are not multiples of 9, that is.
Here’s a quick proof. For every
and
the sums
and
are both zero. But the sum over
can be decomposed into a sum of these zero sums plus the sum over a product of HAPs of length at most 8. So the discrepancy is bounded (by a constant I can’t be bothered to work out — it’s trivially at most 64 but that can be improved quite a lot; in fact, it’s trivially at most 16; in fact, it’s trivially at most 8, but even that isn’t best possible).
This shows that we can’t use products with the same common difference if
consists only of non-multiples of 9. I have a strong suspicion that we can do the same for non-multiples of
for an arbitrary
. All we need is numbers
such that
and such that for every
that divides
apart from
itself we have
. It is trivially possible to satisfy those equations. Just choose the numbers arbitrarily except that each time you reach
for some proper factor
of
make sure that it cancels out the sum
.
This shows that if we want to decompose a diagonal matrix efficiently into products of HAPs of the same common difference, and if we insist that the differences all belong to some set
, then
must contain multiples of every positive integer (or every small positive integer if we are thinking more quantitatively).
But we’re still left with a 2D question related to EDP that might turn out to be interesting. Let
be an arbitrary function such that
for every
. Is it true that for every
there must exist HAPs
and
with the same common difference such that
?
If
is forced to be of the form
, then this question reduces to EDP. Since we are looking at a far more general class of functions, and since EDP appears to be “only just true”, it seems highly likely that the answer to the above question is no. I just haven’t seen it yet.
September 4, 2012 at 1:04 pm |
Suppose we restrict attention in the problem just discussed to functions
of the form
. Let
and
. Then
. We can write this as
. This equals
, where
is the convolution of the characteristic functions of the intervals
and
.
This gives us quite a strange EDP-like problem. We have a class
of functions
(
being the amount by which we dilate
) and we’d like to find a function
, defined on
, that has a bounded inner product with every function in
. Of course, something has to force
to be “large” (so we can’t just take the zero function). At first, the condition appears to be ludicrously weak — all we ask is that
should equal 1 — but it isn’t weak because the functions
are not bounded at zero.
As ever, we can dualize. It will be impossible to find such a function
if one can write the function
(which takes the value 1 at 0 and 0 everywhere else) as a linear combination of the functions
in such a way that the absolute values of the coefficients can have arbitrarily small sum.
Another way of looking at the dual problem is that we try to find an efficient matrix decomposition into HAP products where the common differences in each product are the same, but instead of asking for the resulting linear combination to be a diagonal matrix with large trace, we ask merely that the matrix has large trace and that all diagonals apart from the main diagonal sum to zero.
September 4, 2012 at 2:58 pm |
Actually, I don’t see that taking HAPs with different common differences helps all that much. Suppose we have our numbers
extended periodically as in the previous comment but one and, as then, we define
to be
. If
and
are not multiples of
then for every
we have
and
. Therefore, if
and
are HAPs with common differences
and
, we can subtract from
a whole lot of zero contributions until we’re left with the sum over a product of HAPs of length less than
. So the discrepancy is bounded.
If that is correct, then we can set
and prove that there is no efficient decomposition of the identity into products of HAPs of prime-power length. That ought to prove that some generalization of EDP is false for such HAPs. I’ll try to work out the details — I’m not quite sure that this is correct.
September 4, 2012 at 4:03 pm |
Looking back, I see that the proof that the nonexistence of a HAP-product decomposition implies the non-symmetric vector-valued EDP is very simple but also slightly strange. You take a separating functional
as above, thought of as a matrix with 1s down the diagonal. You then take your two sequences of vectors to be the rows of the matrix and the unit vector basis. Then we have that
for every
and that
is equal to the sum of the matrix values over
. What’s odd about this is that the norms of the
tend to infinity, so we can’t use compactness. So it looks as though we can find, for any
, sequences of vectors
and
such that
for every
and
has modulus at most
(say) for every pair of HAPs
and
with common difference not a multiple of 6. But it doesn’t seem to follow obviously from that that we can find infinite sequences with the same property.
September 4, 2012 at 5:47 pm |
Maybe that problem doesn’t exist actually. Let
be the standard basis of
. Suppose we define
to be
and
to be
. (All indices are mod 6.) Then
for every
, exactly as we want. In other words, the periodicity allows us to collapse everything down to six dimensions.
So I’m pretty sure that is a very simple example that shows that the non-symmetric vector-valued EDP relies strongly on taking HAPs of every common difference. Or at least, you have to have multiples of every positive integer amongst your common differences.
That doesn’t answer the question about whether EDP is true for common differences that are primes or powers of 2, but it says that a 6D generalization is false.
It is genuinely unclear to me whether the 6D counterexample should lead one to believe that there is a 1D counterexample. In one dimension there is a parity argument (in any
sequence of odd length the number of 1s cannot equal the number of -1s) that has no counterpart in higher dimensions. Maybe that is just a superficial fact that stops a certain example from working. But maybe all examples have to be “essentially periodic” in some sense, in which case there is a genuine distinction between 1D and higher dimensions.
Perhaps we should try to find a complex example …
September 4, 2012 at 10:19 pm |
Hmm … suppose you take
. Then if
is not a multiple of 6, then the discrepancy along HAPs (or even APs) of common difference
is at most 2. So the primes and powers-of-2 version of EDP does indeed fail in the complex case (as does the prime-powers version).
What I think this (embarrassingly simple) observation shows is that if we want to prove EDP for primes and powers of 2, then we are forced to use in a strong way the fact that the sequences are
-valued: we can’t use matrix decompositions to prove the result, since they prove more general results that are false.
And what that shows — at least if you believe that matrix decompositions are the most promising approach — is that we are not likely to make EDP easier if we try to prove a stronger result by restricting the common differences that are allowed. It seems to be important to use all common differences.
September 4, 2012 at 11:03 pm |
Backtracking a bit, what about the earlier rather big strengthening of EDP that I was wondering about, if we don’t restrict the possible common differences? The question was this. Suppose you have a constant
and a function
such that
for every
. Must there exist HAPs
and
with the same common difference (but if that’s too much to ask, then I’m happy to drop that condition) such that
?
On writing that, I have suddenly had an idea for a counterexample for the same-common-difference version. I’ll describe it very roughly and think about it properly tomorrow. The rough idea is to paint
with diagonal stripes of constant width and alternating colours (the colours being 1 and -1), but at an angle whose tangent is a badly approximable rational. The hope would be that if
and
have the same common difference, then …
No, I don’t think that works. Given any two real numbers
and
we can find an integer
such that
and
are both very close to an integer, and I think that fact can be used to kill off the idea I had.
Actually, I think a counterexample to this can be turned into a counterexample to the non-symmetric vector-valued EDP with the added condition that
and
have the same common difference. So maybe I believe that the answer is yes (but possibly without the same-common-difference condition). It also feels like a friendlier formulation of the problem, and one where it might be interesting to do some computer searches.
Just in case anyone’s reading this (it feels a bit monomathic, so maybe nobody is), here’s a search that would interest me a lot. How large a
matrix, or even just a matrix with 1s on the diagonal, is it possible to find such that the sum over every product
of HAPs is at most … well, whatever small number is most interesting? Are there very big matrices where it’s at most 2? What about 3?
September 5, 2012 at 6:00 am
Seems like a good problem to try with linear programming. I hope to find the time to try it.
September 5, 2012 at 8:05 am
That would be great. On further reflection, I think the
version is not so interesting (not that you could attack that with linear programming anyway) — it seems to me that a counterexample to EDP could probably be turned into a counterexample to the
version. So I prefer to stick with the condition that the matrix is 1 on the diagonal.
September 5, 2012 at 8:30 am
Do you mean the other way around: a counterexample (= low discrepancy function) to the
version can be turned into a counterexample to EDP? Because the direction you wrote seems clear: just take
where
is the low discrepancy function.
In fact, you constructed this problem as a weakening of vector-valued EDP. In the vector valued EDP case the function
would need to be PSD in addition to
for all
. I am not totally clear at what step the weakening came in. We took the dual of vector-value EDP, and then restricted it more by looking for a decomposition of a diagonal matrix as opposed to a decomposition of a PSD matrix. Taking the dual of a dual with additional constraints, resulted in this weaker discrepancy problem, I suppose?
One benefit is that this can be tried with linear programming and hopefully scale a little further than the semidefinite programming search.
September 5, 2012 at 3:56 pm
Let me try to sort out what I meant, or at least should have meant. First, if you have a
-function on
of discrepancy at most
(with respect to products of HAPs of the same common difference), then for arbitrarily large
you can find a
-function defined on
of discrepancy at most
. That is because if you look at the
th row of the 2D function, then any HAP of common difference at most
along that row can be expressed as the difference between two products of HAPs of that same common difference.
Conversely, if you have a low-discrepancy
-function on
then, as you say, its product with itself obviously has at most the square of that discrepancy with respect to products of HAPs.
Looking back at your question, it seems that I did indeed mean what you thought I meant, which is not what I actually wrote.
The one thing that I still haven’t quite seen the answer to is what happens if you look at products of HAPs of the same common difference and the same length, or perhaps even products of HAPs with themselves. I’m not sure how interesting that is, though.