The story so far:
,
and
have been discussing how it might be possible to prove lower bounds for formula or circuit complexity. They have come to some understanding of what a proof could not be like, and in particular they have looked at examples of proof attempts that are ruled out by the result of Razborov and Rudich on natural proofs. But
has proposed basing a proof on the dual of the norm, with
around
Several observations have been made that suggest that this is very unlikely to work, but the approach has not been definitively killed off. In the next stage of the conversation, our characters investigate the approach in more detail, finding further ways to attack it. They end on an optimistic note, but that optimism will be shaken in later instalments of the dialogue.
Incidentally, I’ve decided to let the rate that I post instalments be governed by the statistics: when they suggest that most people who are going to read an instalment have done so, then I’ll post the next one. It’s not an exact science, of course.
*************************************************
Before we dismiss the norm for unbounded
as a useful complexity measure, it still seems worth thinking about the following problem.
Problem. Let and
be two Boolean functions. If
and
, then how big can
be?
I suppose quite a nice feature of that problem is that you can start right at the bottom, with .
Yes indeed. I’ve been thinking about that. I have to say that it’s not obvious to me even that the answer is independent of . A preliminary observation is that if
is a basic function, then
, and if
and
are two different basic functions, then
. Both these facts can be proved by “cheating” and using the fact that
. The Fourier transform of
is
at one point and
everywhere else, while the Fourier transform of
is
at four points and
everywhere else.
Ah, I think I’ve come up with a fairly disastrous example. Choose a positive integer and take
to be a random function that depends just on
and
to be a random function that depends just on
. Then
is a randomish function of
. It’s not hard to show that the
dual norms of
and
are something like
. I’m not sure what the norm of
is, but I think it might be more like
. If so, then we’d be in big trouble.
What makes you think that the norm of is that big?
Well, somehow is a bit like a “tensor product” of
and
, and so the badnesses of
and
feel as though they could multiply together.
But what would the function be that norms ?
Good question. Let’s try to find one. I think that norms itself and
does too. (If not, I can create explicit examples of quadratic phase functions where that is definitely the case.) So we could try taking
as a norming function. Since
and
depend on disjoint sets of variables, it isn’t too hard to prove that
. Just to make things simpler, let’s look at
instead. Then, as we’ve already commented,
, so we want a lower bound for
Now the pointwise product of with
is just
, and similarly the pointwise product with
is
. Also, since
and
are “independent”, the expectation of
is zero. So all the first three terms are small. The last one, since
is a Boolean function, is
. Therefore,
. Since
, we get that
. So my prediction seems to have been correct, unfortunately.
If, as this calculation suggests, pointwise maxima somehow behave more like tensor products than sums, then one might expect to be a formal complexity measure, or at least to have more of a chance of being one. But that’s not very encouraging, since the largest value it can take is not going to be more than
, so there would be no hope of proving superlinear lower bounds.
Well, I can’t say that came as a huge surprise.
The argument seems to apply to any dual norm if the original norm has what one might call the product property: that configurations in a product group are of the form
, where
is a configuration in
and
is a configuration in
. Unfortunately, that’s a pretty common property for these norms to have.
That’s the price you pay for proposing a complexity measure without any thought for why it should do the job.
I think that’s slightly unfair. I did at least think about why it wouldn’t trivially not do the job.
Actually, hang on a moment.
Groan.
No, bear with me for two more minutes. Probably this is nothing, but it occurs to me that we ought to think a little bit harder about the argument above before being certain that it applies to the norm when
.
Why? Where was the dependence on ?
Well, it looks like a merely technical point at first, but I’m starting to wonder. You recall that we found Boolean functions and
for which we could prove that
Now in order for that to be significant, we need at least one of
and
to be bigger than
When
is bounded, that’s easy to achieve: you just pick a random function that depends on a small but bounded number of variables. But when
is unbounded, the number of variables it has to depend on becomes unbounded too. Indeed, a random function that depends on
variables will have
norm about
, so we appear to need
to be about
, which is bigger than
. So here’s a potential reason that the
norm could behave genuinely differently when
passes the
threshold.
Of course, we’re still in the realms of guesswork here, but this suggests we need to think a little further.
Can I interject at this point?
Of course.
Well, doesn’t the argument you’ve just given show that no function has small norm? After all, if random functions have the smallest
norms and you need more than
variables to get them to be small, then you seem to have a proof that all functions have large
norm. Doesn’t that just kill off the idea completely?
Oops. Nor for the first time I feel quite embarrassed. Indeed, the norm of an arbitrary Boolean function is at least
, which becomes constant when
. I was forgetting that when you calculate the
norm then the power you take is exponentially small in
, so
is actually a bigger function than it looks.
Can we stop this conversation now?
OK.
…………….
No, sorry, no, hang on a moment!
All right, but at the next collapse I’m off for some coffee, and I can’t promise you that I’ll be back.
Well, all I wanted to say was that
’s observation doesn’t kill off the approach at all. In fact, the more I think about it, the more I wonder whether we shouldn’t in fact be encouraged by it.
Your optimism really does know no bounds.
Just let me explain what I mean. The point I want to make is that even if all norms lie between
and
(or perhaps
and
or something like that), the
dual norm could still be a useful measure of complexity. It might be that all the norms of low-complexity functions were incredibly close to 1. Perhaps we would find that it was more natural to raise everything to the power
, in which case the largest possible value would be
, which is back to being acceptably large. In short, what I’m saying is that what matters is behaviour within a scale: if the scale looks a bit narrow, then rescale it.
That looks like another desperate manoeuvre, and perhaps it is. But why did I say that it might even be positively encouraging rather than merely not disastrous? The reason is that the argument that the dual of the norm is not a complexity measure really does fail when
is of order
, because then that seemingly innocuous division by 2 becomes much more important. So we’re back in the nice position of having a proposed measure of complexity, and no convincing reason for its not working.
Yes, but you don’t have any convincing reason in favour of its working either.
I fully admit that. All I’m saying is that we need to go back to the phase of trying to attack it.
There seem to me to be two natural lines of attack, and I can’t get either of them to work. The first is to look for a more sophisticated example of a pair of Boolean functions where or
is big given the values of
and
. The second is to try to come up with some low-complexity function
such that
is near to being maximal.
Let me start with the first approach. Now this is a bit strange, because so far, as a result of the division by 2, I haven’t even got an example where I’ve managed to show that is bigger than
But let’s have a look again at the example where
is a random function of
variables and
is a random function of a disjoint set of
variables. As I commented before,
If
is small and
is big, then it seems pretty clear that
is a good choice of function to norm
That is because
is much smaller than any of
,
or
so it is an efficient function to use for providing a lower bound for the
dual norm. But if
is very large, then the norms of all vectors are pretty similar and this argument breaks down. So we could for example consider taking
But what is ? Let’s assume that
and
are both at most
Then
is at most
, so by the triangle inequality it is at most
It follows that
is at least
Unfortunately, this is smaller than
which is the lower bound we know for
and
But that could be because of our inefficient use of the triangle inequality. What, for instance, do we expect
to be when
and
are random Boolean functions of disjoint sets of
variables each?
Hmm … I don’t yet see how to answer this question precisely enough. So let me make a general remark that clarifies a little
’s remark about the dual norm never being big. If is
and
is any function defined on
(not necessarily Boolean), then a trivial upper bound for
is
since if
for every
then the average in the expression for
is also at most 1.
But for large this is also a lower bound. Indeed, if
then when you choose a random
the probability that
and
is
It’s not too hard to prove that this implies that
So as long as
is at least comparable to
the
norm is equal to the
norm up to a constant. It follows that the
dual norm is equal to the
norm up to a constant.
Doesn’t that make it useless as a complexity measure?
No. As I’ve already said, but perhaps not quite in these terms, I am now trying to understand the and
dual norms at a finer level of detail than just equality up to a constant. In fact, if the approach stands any chance of working, I think it will be just this feature that distinguishes it from the
norms for bounded
and may lead to different behaviour.
I still don’t see any particular reason to suppose that for large the
norm of every low-complexity function is large.
OK, you’re moving to the second line of attack, so let’s have a go at that. A first idea is that when is bounded, if we believe the natural-proofs heuristic, then a random formula (I’m still thinking about formula complexity, by the way) gives an extremal example. So it makes sense to try to work out, or at least guess, what
is when
is a random low-complexity function. At the moment I don’t have a clear idea how to go about that. All I would point out is that, at least to start with, the functions one builds up all have norm 1. That’s because
. Therefore, if
and
are polynomial phase functions of low degree (meaning functions like
, with
a low-degree polynomial), then so are
and
.
Sorry to interrupt with a very basic question, but how do you prove that for ? Your proof applied to
only.
Yes, but basically anything you can do for you can do for
. That’s because
, which is one way of stating de Morgan’s laws.
I see. And if , then
, so you’ve got another function of the same kind.
Exactly. So as I say, it takes a while to produce a Boolean function that doesn’t have norm 1. That is slightly disconcerting, because it makes it a little hard to see how an inductive argument would work. But maybe one could come up with a more sophisticated complexity measure that looked at all dual norms at once and picked out the point at which they started to be large, or something like that.
Let me leave that question aside and turn to the question of trying to devise an explicit low-complexity Boolean function such that
is large.
If you recall from way back in the conversation, the natural explicit example of a function with very low norm was a quadratic phase function. I suggested the function
. Now why didn’t I just go for the simpler
? The reason is that if you use a low-rank quadratic form, then you can cut the function up into nice linear pieces: this function, for example, is constant on each of the four subspaces you get by fixing the coordinates
and
. The result is that you don’t get a small
norm after all.
With this thought in mind, it would seem that the natural choice for an explicit Boolean function with almost minimal norm would be a polynomial of degree
that in some sense has “high rank” so that it can’t be built out of lower-degree polynomials.
Now a problem I have when trying to build an explicit polynomial phase function of high degree is that it doesn’t seem to be easy to find one for which I can prove that it really does have a small norm. Incidentally, the unboundedness of
makes a difference here: when
is bounded we can just pick a random collection
of sets of size
, define
to be
(that is a function from
to
), and set
Computing this function will take something like
steps (and can easily be done by a formula of that size too). But that’s not a polynomial if
is unbounded, so we need to think of an explicit example.
For instance, we could try the obvious generalization of what I did in the case and consider the polynomial that you get when
consists of the
sets of the form
, where addition is mod
.
But if we restrict to the subspace , then this polynomial is identically zero. The codimension of this subspace is about
, so the
norm looks to me as though it is at least
, which is very close to 1, since
(Again, this argument does not work when
is bounded.)
That doesn’t really prove much, since I could obviously make some much cleverer choices of , though always with the strong constraint that the resulting polynomial must be of low complexity. But I can also make some much cleverer choices when I try to restrict the polynomial to a set where it is constant: for example, instead of restricting to a linear subspace, I can restrict to a degree-
submanifold of
That’s where I’ve got so far. I have to say that the question I like best out of the ones I’ve just raised (either explicitly or implicitly) is the one about random low-complexity functions and their norms and dual norms. If I could convince myself that a random low-complexity function almost certainly had a fairly large
norm, then I’d be pretty excited.
What you’re saying is reminiscent of a standard proof technique: the method of random restrictions.
What’s that?
Well, some nice lower bounds have been proved as follows. You take a function of low complexity, and you randomly fix a number of its coordinates. You then argue that this hugely simplifies the function — for instance, by making it almost constant. And then you point out that the parity function cannot be simplified unless you restrict all the variables. In this way it has been shown that the parity function has formula-size at least (this is a result of Andreev) and cannot be computed by a circuit of constant depth and polynomial size (this is a result of Hastad).
It seems to me that you are randomly restricting to a -dimensional subspace and hoping that a very slight simplification will occur that encourages your Boolean function to have even parity. It’s not quite the same, since you don’t expect this simplification to be all that dramatic — just faintly detectable.
Hmm. I think it is fairly different, but it’s quite a nice way of thinking about the problem, because if we take a random low-complexity Boolean function and restrict it to a random -dimensional affine subspace, the result ought itself to be some kind of random low-complexity Boolean function.
How do you mean?
Well, let be a random
-dimensional affine subspace and let
be the basic function
. Then there are two possibilities. Either
is contained in one of the two subspaces
or
, or we have two subspaces
and
, and the restriction of
to
is
on the first and
on the second. In other words, the restriction of a basic function is either constant or very similar to a basic function. And with high probability it is the latter, since a
-dimensional subspace will have constant
coordinate with probability
.
Since , this in fact shows that with reasonably high probability all the basic functions will end up non-constant when you restrict them to
. And then when you build up a low-complexity function
using Boolean operations, you do exactly the same to the restriction of
to
.
Does this mean that inside there is a little copy of
? Not at all, because if you look at the intersections of the sets where the basic functions are constant, then you find that they don’t behave as they should. To give a trivial example, the intersection of all the sets
is almost certainly empty.
And now my intuition is starting to desert me again. What I want to understand is what happens when I just build up a function randomly using Boolean operations. Could it be that then the random restriction cleverly manages not to lose information? I think what I mean by that is something like that one should be able to reconstruct the original function pretty well from its restriction to a random -dimensional subspace.
That feels plausible somehow. But if it is true, then why would it help to produce a large norm? One might think it would do the opposite, since all those random restrictions of random low-complexity functions would still look random. It’s not clear where the bias towards even parity would come in. But that thought may be orthogonal to what is really going on: even if a random restriction does indeed look pretty random, we are looking for just a small proportion of subspaces where it isn’t random. In other words, we are interested in a low-probability event.
If I had to evaluate what I’ve heard so far, I think the point where your approach is most vulnerable to attack is one you mentioned just before we started talking about random restrictions. You’ve just mentioned that if it were possible to take a random polynomial of degree then it would have a very low
You then made a rather pathetic effort to come up with an explicit example that would do the job, and were encouraged by the fact that it failed. But perhaps there is a clever method of defining a pseudorandom polynomial of degree
Do let me know if you find one. But I’m having trouble doing so. And bear in mind that there’s an important distinction between explicitly defining a set system that looks like a random collection of sets of size
and defining a low-complexity function
such that
is a polynomial of the form
where
looks like a random collection of sets of size
Also, now that I think about it, there is a better reason that my “pathetic” attempt, as you put it, failed. That is that the polynomial is almost always zero, so the polynomial phase function that results is almost always 1. Indeed, the probability that it is not zero if you choose a random
is at most
which is very small when
is
say. One could cure that by taking a sum of
monomials, but if it’s a small challenge even to get the polynomial not to be almost constant, a lack of quasirandomness that even the
seminorm will pick up, think how much harder it will be to make it quasirandom for the
norm.
Ah, but I’ve now had an idea. It’s a way of building up a random polynomial of degree at most and complexity at most
I’m going to think of it as a function from
to
rather than to
I then allow myself three operations, which are
and
The first two of these are Boolean operations (NOT and AND) and the third, addition mod 2, or parity, is easily built out of Boolean operations.
If you can easily build it out of the other two operations, then why include it?
Because I’m going to insist that it is used a lot. Once I’ve got these operations, I can build up functions rather as one does with circuit complexity: I just take a sequence of functions such that each
is built from earlier functions
and
, or possibly just
, using one of the permitted operations. Because parity can be built easily out of
and
(and I should have mentioned that multiplication is
), the circuit complexity of
is at most
. Also, the degree of
is the maximum of the degrees of
and
if
, and the sum of their degrees if
(and the degree of
if
). Therefore, if I want to keep the degree down, then all I have to do is make sure that I use multiplication only a limited amount.
One way of doing that is as follows. I start by producing a “layer” of polynomially many functions of the form , where the
are basic functions. From these, I form a second layer of polynomially many functions, each of which is a product of two functions from the first layer. Then for the third layer I take sums of functions from the second layer, and so on. I can keep this up for
layers before I will start getting polynomials of degree
My question now is this. If I do that randomly, then do I get a polynomial that is random enough for the corresponding phase function
to have a small
norm? What we’ve done is sufficiently linear that it’s not even obvious that its
norm will be all that small.
That at least I can answer. If you let your first layer consist just of the basic functions, your second layer consist of a random collection of products of basic functions, and your third layer be the sum of all those products, then you have a random quadratic, and I thought you said that random quadratic phase functions had tiny norm.
Oh yes, sorry. So we can be pretty certain that the random polynomials I’ve just constructed will have almost minimal norm for all bounded
But what happens when
gets to
? I can see a problem arising at the last layer. There we would be wanting to take the sum of a whole lot of functions, each of which would typically be a product of two polynomials of degree
. I don’t know, but it sounds to me as though such a polynomial might be rather “simple” and “of low rank”. And it might be that you had to add together a superpolynomial number of “low-rank” polynomials to get a “high-rank” one.
I’m a bit confused here. Ultimately what you want to prove is that all functions of polynomial complexity have large norms (and even small
dual norms). In order to test this hypothesis you are attempting to find a counterexample. And in order to do that you are going to a lot of effort to ensure that your example is a polynomial of low degree. But why bother? All you need is a counterexample. Why not just take a random function of complexity
?
It’s because that would result in a mere rephrasing of the question. It seems almost certain that either the hypothesis is true or a random function of complexity is a counterexample to it. But I don’t know how to argue that random functions of complexity
are so random that even the
norm fails to detect any nonrandomness. So instead of looking for a generic counterexample I am looking for the simplest one. And that is why I am restricting my attention to polynomials of degree at most
If there seem to be reasons for such an example not to exist, it may give us some kind of hint about why the hypothesis could after all be true.
OK, I think we can all agree that this problem doesn’t look completely straightforward. But still, it seems possible that somebody will have a clever idea for producing such a polynomial — indeed, your random polynomial may even be such an idea — and if they do then your dual proposal is destroyed instantly.
I admit that. But it looks to me like a nice problem, unless there’s some construction that computer scientists already know about, in which case I can’t really think of any consolation, other than the weak one that I will have learnt something.
If it failed for that reason, then perhaps you could analyse the failure and come up with a new barrier to proving complexity lower bounds.
I suppose that would be quite nice if it could be done.
I’ve thought of another way of producing randomish polynomials of degree in
variables. Perhaps it might work.
Uh oh. What is it?
I can think of many variants, but I’ll choose one arbitrarily. Pick some that’s at most polynomially large in
and choose a random function
I’m going for a function of polynomial circuit complexity, but if I wanted it to be in P I could replace “random” by “pseudorandom” there.
Now, given a Boolean sequence , form the
matrix
where
In other words, fill up the matrix with the values taken by the sequence
At this point one could just take the determinant, which would be a polynomial of degree
and I don’t see any reason for it to have a large
norm. But you insist on a polynomial of degree
so we have to be slightly cleverer. What I suggest here is to calculate the characteristic polynomial of
and then the coefficient of
in that polynomial. That will be a polynomial of degree
in the variables
and that is my proposed “random looking polynomial of degree
“.
Can we just think about how one calculates the characteristic polynomial in polynomial time?
Well, it’s just the determinant of and calculating determinants can be done in polynomial time, by using row operations.
I’m happy with that when you’ve got a matrix of numbers, but it doesn’t seem quite so straightforward when you’ve got variables involved.
Oh yes …
Right, I’ve done a search on the web, and it seems that there are polynomial time algorithms for calculating the characteristic polynomials of matrices over the reals (often known as “chemical graphs” in the literature). So one could just reduce mod 2.
Oh dear. If that’s right then it does indeed look like a pretty random polynomial of degree I can’t think offhand of a reason that it ought to have “low rank”.
One way to think about that is to see exactly what it is. You put all your s into an
matrix and then you sum up all the determinants of all the
submatrices, without worrying about signs since we’re working in
Each determinant is itself a sum over all products of
entries, where you pick exactly one from each row and each column. Since each product determines the
submatrix it has come from, the entire polynomial is the sum over all products of
entries of the matrix that do not have two entries from the same row or the same column.
I have two remarks here. The first is that it’s not so obvious that the determinant has polynomial formula size, so perhaps there is still some possibility of using the dual norm as a useful formal complexity measure. The other, which I don’t think you will like, is that I can’t think how I’d go about estimating the
norm of your “random-like polynomial”. So if I believe strongly enough that the
norm of a function of polynomial circuit complexity must be large, then I can believe that the very fact that your polynomial is in fact computable in polynomial time implies that it has large
norm.
You must be getting truly desperate to have said that.
I sort of am. But what I’m really saying is that if we don’t manage to come up with two Boolean functions and
such that the
dual norm of
is quite large, given the
dual norms of
and
(all this relative to the very fine scale that we must take, given that the norms of all Boolean functions are the same up to a constant), then I won’t feel I can completely rule out the existence of some inductive argument that shows that an explicit function has superpolynomial formula complexity. Who knows, it might even be the function that
has just thought of.
You’ve switched back to formula complexity again. But earlier you were talking about circuit complexity. You clearly haven’t let go of your dream of proving that PNP, but neither have you given the slightest reason to suppose that the
norm of a function of low circuit complexity must be large. And
has given us an example of a polynomially computable function that it’s very plausible to suppose has an almost minimal norm.
I almost agree with you. It’s just that I don’t see any strong arguments for whether
’s function has a large norm or a small one. Perhaps the fact that determinants are used gives the function some special structure that causes it to have “low rank” in some sense. I feel as though more thought is needed before we accept this example.
In fact, here’s an argument that this determinant construction isn’t all that different from my random polynomials construction. Unfortunately, owing to the annoyances of characteristic 2, it only seems to work when is odd. But in that case recall that we are summing over a particular set of products of
-tuples of entries from the matrix, namely the ones that do not involve two entries from the same row or column. To evaluate that we could take each entry of the matrix in turn, remove its row and column, work out the corresponding degree-
polynomial of the resulting
matrix, multiply it by the given entry, and add. This counts each
-tuple that we want to count
times, which is fine if
is odd.
That argument shows that the polynomial you have defined is a sum of polynomials that are “simple”, in the sense that they are products of a linear polynomial with a polynomial of degree
. So we’re back to the question we had earlier: in order for a degree-
polynomial to have sufficiently high rank for its
norm to be very small, is it enough to take a sum of only polynomially many “simple” polynomials? Given that
depends on
I don’t think we can claim that the answer is obvious.
It may not be much, but we seem to have arrived at a purely mathematical question that would shed quite a lot of light on this whole approach (either demolishing it or giving quite a good heuristic reason in its favour, depending on whether or not a polynomial number of “simple” degree- polynomials can be considered as having high rank).
I don’t know why you keep going on about this. My intuition tells me that you shouldn’t need more than simple polynomials to make a truly complex one.
Can you back that up?
A little, as follows. Consider the following simple polynomial of degree . I take two polynomials
and
of degree
and I define
to be
if
and
if
I can write
as
so
is a simplish polynomial of degree
according to your definition. Now let’s suppose I build random polynomials
like that, one for each coordinate
and define
to be
Then
looks to me like a polynomial that’s “truly of degree
“. In particular, a
-dimensional subspace isn’t going to detect any unusual correlations, as far as I can see, because as you change coordinates you keep changing from one random degree-
polynomial to another. And because we’ve got a simple polynomial for each coordinate, we can’t avoid these changes of coordinate. This isn’t a rigorous proof, but it convinces me for the time being.
Ah, but you haven’t said how to produce all those random polynomials of degree . You can’t just iterate the procedure, because then you’ve got to build each of those polynomials out of
polynomials of degree
and so on, and the computational complexity of the eventual polynomial is
or something. So you have to recycle the lower-degree polynomials (as you clearly do in both my random-polynomials construction and your determinant one).
Agreed, but all I was trying to criticize was your intuition that when is unbounded you need a superpolynomial number of simple polynomials to create one with small
norm. If you want to prove that a polynomial of low complexity cannot have small
norm, it is now clear that you can’t just look back one layer in the way it is constructed, but will have to argue from the entire construction, as you were starting to do just now. And that looks a lot harder.
Unless we could prove somehow that the dual norm was a good complexity measure. Then we’d be done by induction, for formula size at least.
By the way, a small change to my random-polynomials construction makes it look more like your determinant one. You build up layers as follows. If you’ve got a layer that consists of a bunch of polynomially many polynomials of degree
then you form a layer
by taking polynomially many random sums of the form
where
and each
is a polynomial drawn from layer
The advantage of this is that it goes on for
layers rather than
layers, so it looks more random. In fact, it looks more random than your determinant construction, so I’m no longer sure that yours affects my perception of the chances that low-complexity functions have large
norms.
By the way, I take your point about the lemma I was hoping for being highly likely to be false. But my point is that I still don’t find it clear whether we can build up an explicit degree- polynomial with almost minimal
norm.
But that’s still the weakest-looking point of your approach in my view.
Quite possibly.
I don’t know whether I agree. I’ve just done a bit of Googling and got led to a paper by Viola and Wigderson that contains a number of ideas that are quite close to the ideas we are discussing here. Of particular relevance is a theorem that exhibits a polynomially computable function that has low correlation with any degree-$d$ polynomial. But the bound that they get for the correlation is of the form Since functions with very small
norm have very small correlation with degree-
polynomials …
How did you know that? It’s true, but you seem to have picked it up very quickly.
A form of telepathy. Anyway, what I’m saying is that if the result of Viola and Wigderson is state of the art (as it may well be given that it was published in 2008), then it suggests that finding an example to demonstrate that your approach doesn’t work may be a challenging problem, even if it doesn’t work.
I wouldn’t celebrate just yet. It suggests to me that getting the approach to work could be a very challenging problem even if it does work.
I don’t know whether you have any grounds for saying that, but it is obviously an important issue. Let’s suppose that it really is true that the dual norm is small for Boolean functions of low complexity — and as
has pointed out, there is probably no example known that contradicts this.
I have to admit that that is a point in your proposal’s favour.
And let’s suppose also that there exists some function in NP has a large dual norm. There still remains the question of whether proving these two facts is any easier than proving that P
NP.
How can it be easier if it actually implies that PNP?
Oh dear, that’s one of those philosophical questions that keeps coming up. But this one has an answer, which, if you believe that PNP, or even if you don’t know how to prove that P=NP, you have to take seriously. If you want to prove that A implies C, it may be that there exists a statement B such that A implies B easily and B implies C easily, but it may also be that it is hard to think of statement B. (What I mean about P and NP is that it’s easy to see that B works when you’re told it, so what I’m saying doesn’t make sense if P=NP.)
When one is trying to prove that A implies C, one often tries to find a statement B that will have this nice property. If you have some candidate for B, then one obvious thing that can go wrong is that either A doesn’t imply B or B doesn’t imply C. But another thing that can go wrong is that B so obviously implies C that proving that A implies B is no easier than proving that A implies C. So my question is whether, if it is true that the dual norm of every low-complexity function is small, this is easier than proving that P
NP or whether it’s typical of the kinds of problems that are already known to be excruciatingly difficult.
One reason to suspect the latter is that it deals, implicitly at least, with polynomials of degree , and that seems to be the point at which people don’t know how to prove anything. But then again, we wouldn’t be trying to improve the result of Viola and Wigderson: we’d be hoping that it couldn’t be improved! And perhaps if we thought hard enough about the
dual norm we’d be able to say something interesting about the norms of
and
There seems to me to be at least a chance, if a slim one, that this property is an intermediate property of the kind one wants — not something that makes the whole problem easy, but at least something that splits it into two parts that are strictly easier than the original problem.
October 2, 2009 at 10:49 pm |
Hi,
I thought I’d point out a recent survey on correlation bounds for polynomials: http://www.ccs.neu.edu/home/viola/papers/viola-sigact-gf2.pdf.
Let me also take advantage of this to mention a few things:
Exhibiting an explicit function (on n bits) that has small correlation (1/n) with polynomials of low degree (log n) is necessary to prove circuit lower bounds (if for every distribution you can find a low-degree polynomial that correlates with f, then f can be written as a threshold of few such polynomials).
We have no reason to believe that this is hard to do (in particular, Razborov-Rudich is not known to apply to polynomials). In fact, Razborov and Smolensky did prove some correlation bounds for high-degree polynomials. While their bounds are worse than 1/n, there is no excuse like “we cannot say anything about these things.”
Here’s a candidate hard function: divide the input in blocks of sqrt(n) bits. Take Mod3 in each block, take parity of the results.
October 3, 2009 at 8:56 pm |
Dear Emanuele,
I am very interested by what you write (and also by the survey paper you mention, which I discovered and printed out in between writing and posting this instalment). I would like to understand it better, so let me ask you two questions.
I don’t see why it matters if f can be written as a threshold of polynomials of degree logn, because those polynomials don’t seem to me to be obviously computable in polynomial time. But perhaps there is some clever argument here that I don’t know, since the number of polynomials of degree logn appears to be comparable to the number of polynomial-size circuits.
Assuming, however, that it does matter. is the proof of the result you mention in parentheses something like this? I am guessing that “threshold of few such polynomials” means a reasonably general linear combination and not just some majority-like function. (Probably you can get them to be roughly equivalent by repeating polynomials.) Anyhow, if f is not a linear combination, with certain natural properties, of few such polynomials, then the Hahn-Banach theorem should give some kind of linear functional that separates f from all those polynomials and also has some other properties. That, I can imagine, would translate into a distribution with respect to which f does not correlate with any low-degree polynomial.
The problem of trying to obtain improved correlation results for the function you mention looks very interesting. I am particularly interested that you think it is a problem that is highly relevant to circuit lower bounds but potentially easier. For similar reasons I liked Ryan’s problem about threshold circuits. In both cases, I wonder whether they might make good Polymath projects.
October 3, 2009 at 11:00 pm
Hi Tim,
thanks for the interest (and the blog).
I am not sure I completely understand your first question, but let me try to answer anyway.
All I meant is that a polynomial of degree log n has only quasipolynomially many (n^{O(log n)}) terms and so it can be computed by a circuit of size (n^{O(log n)}).
Is the question about the difference between n^{O(1)} and n^{O(log n)}?
If so, indeed the circuit has slightly superpolynomial size. On the other hand, it has a very simple structure (just a threshold of polynomials).
Regarding the proof: indeed one can have a majority-like function by just repeating polynomials (i.e., the weights in the threshold only have polynomial magnitude).
If I understand it correctly, the proof you describe is similar to the one I have in mind.
The paper by Goldmann, Håstad, and Razborov
http://people.cs.uchicago.edu/~razborov/files/pt1.ps (Theorem 10) has a simple proof using the min max theorem: if for every distribution D on inputs one can find a polynomial that correlates well (1/n) with f with respect to D, then by min max there is a distribution on polynomials that for every input x correlates well with f(x). By concentration of measure, the sum of poly(n) independent polynomials has exponentially large correlation with f (and so some choice of the polynomials will work for every input x).
I am absolutely thrilled at the idea of having a polymath project on correlation bounds for polynomials! I would definitely like to be involved.
I think Ryan’s problem about threshold of thresholds would be great too.
I have thought more about polynomials, so my opinion is severely biased, but let me try to comment more on the two problems.
Let me stress right away the obvious thing that I feel I have absolutely no clue of which problem is more interesting or easier.
The threshold of thresholds problem looks “close” to being related to the “natural-proofs barrier:” It seems that it is known how to compute pseudorandom functions using only 4 levels of majority gates (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.5547&rep=rep1&type=pdf), and in general, I think, few levels of majorities can do amazing things. On the other hand, polynomials may be “further” away. (Or maybe not? An interesting point here might be trying to see if low-degree polynomials correlate with pseudorandom functions, which might have bearing on this research via an extension of the Razborov-Rudich approach. I tried this a bit but in vain.) Finally, for polynomials we can already say something for high degree, and correlation bounds with respect to the uniform distribution would give pseudorandom generators as a bonus.
Anyway, I think any project on low-level lower bounds would be exciting.
October 4, 2009 at 8:53 am
Yes it was indeed just a question of whether your remark applied to beating
rather than
. I think the proof you refer to must be exactly the kind of argument I had in mind: previous experience suggests that where I say “Hahn-Banach”, computer scientists say “min-max”.
A quick question: what is the best-known explicit function for not correlating with quadratics?
Also, apologies that your comments take a while to appear. For some reason they are being treated as spam (the above one perhaps because it had two links). I don’t know if there’s some way I can mark you as a legitimate commenter — I’ll look into it.
October 4, 2009 at 6:54 pm
Hi Tim,
for quadratics, there is a result by Frederic Green that computes exactly the correlation between parity and polynomials modulo 3 (the input is still {0,1}^n).
( http://mathcs.clarku.edu/~fgreen/papers/quad.pdf ).
To my knowledge, no such exact result is known in any other case. His paper also discusses some of the difficulties in extending his result.
For polynomials modulo 2, I am not aware of any result which improves on the exp(- \alpha n/2^d) bound for any d > 2, I think that it would be exciting to prove any bound that is better than 2^(- n/2^d).
As is well-known, this n/2^d in the exponent comes from the iterated Cauchy-Schwarz approach. The same loss appears in the best known lower bounds in multiparty communication complexity, so I think it would be great if one could understand this better.
October 4, 2009 at 7:03 pm
@Emanuele: Yeah, it’s basically the case [Yao90] that ACC0 is contained in MAJ o MAJ o MAJ. I think Yao showed that bounded-depth circuits with arbitrary AND, OR, NOT, and MOD gates of quasipoly size are computable by MAJ o MAJ o AND_{polylog} circuits of quasipoly size. So the lower bound scene looks grim even once you step up from THR o THR to MAJ o MAJ o MAJ. In that sense, I think THR o THR lower bounds are a bit of a dead end, since even if you get them, you would despair of extending things to the next level. (Still, I think it’s a kind of fun and possibly tractable dead end.) Correlation bounds for log-degree polynomials seems a bit more open-ended to me.
October 4, 2009 at 7:10 pm
Sorry, I still cannot get that paragraph to show right. One last try.
What I meant to say is that alpha is related to constructions of small-bias generators, that for degree two I would guess something can be done using the fact that the polynomials can be diagonalized up to a linear transformation of the variables, and that for degree bigger than two I think that any bound better than alpha = 1 would be interesting.
[I've corrected the first attempt now, and deleted the failed attempt to correct it.]
October 5, 2009 at 9:57 pm
A very minor correction to what you say. I think the bound to beat is
For example, here’s a sketch of how to obtain a bound of
in the case of quadratic correlation. Let
be a random cubic, by which I mean that
for some random collection
of triples. Now let
Then by two applications of Cauchy-Schwarz one can show that
The exponent on the right-hand side is a trilinear function of
plus a function of
and
only. The randomness of
should make the expectation over
zero almost always. Indeed (and this is the bit where I am being sketchy and possibly even incorrect, but I hope not) I believe the probability (over
and
) that it is 1 rather than 0 is pretty close to the trivial minimum of
. If that argument works, it gives
for quadratic correlation and in general it gives
. (It can also be regarded as the natural generalization of the argument that gives
in the linear case.) Sorry, I forgot to say that the above argument works if you add an arbitrary quadratic to
, which is how one deduces something about quadratic correlation.
One might object that a random cubic doesn’t give a uniform family of functions, but I suspect that one could derandomize it somehow.
Another question I wanted to ask. It seems that in the case of the Gauss sum
much better bounds are known than you get from iterated Cauchy-Schwarz arguments. For instance, I think you can get
by using Weil’s results. I would guess that the same is true if you replace
by any polynomial of degree
If so, then it is slightly surprising if there isn’t some analogue for
but perhaps there are good reasons for its being a genuinely different problem. (For instance, the number of quadratics is superpolynomial in
so one might expect correlation results to be harder to prove.)
October 5, 2009 at 11:47 pm
Regarding your correction, there has to be something very simple I don’t see:
?
achieves just
which indeed corresponds to
, right?
Are we thinking of the bound
Then your example
Perhaps some of (my?) confusion comes from the fact that actually
is a bit better than what is stated in some papers, which Cauchy-Schwarz
times to handle degree
.
Anyway, in your argument, shouldn’t be easier to take
to be a truly random function, rather than a random cubic? I would guess that if you can make the proof go through in that case then you can derandomize it (that’s what some of the known bounds do).
Ryan: I was wondering if you have some specific motivation for the cylinder intersection problem you mention. Is it connected to some problem in 3-party communication complexity? (By the way, I think superpoly lower bounds for bounded-depth circuits are due to Ajtai independently of FSS, then Yao improved them to get oracles that separate PH, i.e., superquasipolynomial bounds, I think.)
(I hope the latex will show up right; I wish there was a way to preview comments before posting them.)
October 6, 2009 at 12:10 am
Regarding the question on Gauss sum and Weil’s results, just for the record let me say that I don’t know: I think a few people thought about Weil’s bound but did not see how to make progress on the GF(2) problem.
From a biased computer science perspective, perhaps it may not be too surprising that the GF(2) problem may be different from the problem in larger domains.
October 6, 2009 at 2:04 am
@Emanuele: Oops, yeah, I meant to write Ajtai when I wrote Yao. Thanks. And yes, the motivation for Sergey’s problem is 3-party communication complexity… unfortunately I don’t remember the exact details.
October 6, 2009 at 6:26 pm
Responding to Emanuele’s last comment but one, I wrote something stupid. I think the bound you give for
can be improved by a factor of 2, but what I should have said is that you can take
Or at least, I’m pretty sure of this. So when
you get
which is indeed the bound you get from quadratics. And then the exponent goes down by a factor of 2 each time.
Hang on, I’m being even more stupid than I thought. I keep getting confused between the degree of the polynomial I am thinking about and the degrees of the polynomials I am trying not to correlate with. So you were indeed right all along and
is the correct factor.
I’m not sure I understand your remark about purely random functions: I thought it was fairly obvious that they didn’t correlate, just by a counting argument. But let me check, since I haven’t. The probability of correlating by
should be
and the number of quadratics is at most
So it looks as though we can organize for the best correlation with a quadratic to be about
In other words, within a log factor of the best possible bound. I don’t see how one could derandomize that, since precisely the same argument can be used to show that a random function has small correlation with all polynomially computable functions.
Probably I am misunderstanding what you write, though.
October 7, 2009 at 3:00 pm
Regarding the remark about purely random functions:
, i.e., you want that over the choice of
with high probability the bias over
is small (or something like that). I guess this is easier to prove when you take
to be a purely random function. As long as that’s the only property you need, that can be derandomized. Indeed, if I am not misunderstanding, that’s exactly what some previous bounds do. (Perhaps a minor point is that derandomizing a purely random function may give slightly worse parameters than derandomizing a cubic, say. So there might be some slight advantage in this, or at least a different tradeoff, but I am not sure about this and in any case it seems to have minimal consequences.)
It seems that your argument only exploits a certain “hitting-the-subcube” property of
October 7, 2009 at 3:25 pm
It seems to me that for such a derandomization to work, one would need to have a pseudorandom generator that is indistinguishable from a random generator by any quadratic. But I don’t see how constructing such a generator is any different from solving the original problem. My guess is that we are talking about different problems: perhaps you are talking about correlations like
whereas I am talking about correlations closer to
which would of course be best possible.
Ah, actually, I think I understand. You’re saying that to get
one just uses the fact that a generic cubic should have a small
norm. I agree that that would be easier to do by using random functions and derandomizing them. What I am saying (which doesn’t contradict you) is that the trivial argument that I gave in the second paragraph of my previous comment in this sequence gives a best possible result (more or less) for purely random functions but does not derandomize. So when you said “your argument” you were referring not to that argument but to the one that used iterated Cauchy-Schwarz.
October 3, 2009 at 11:05 pm |
Emanuele: I’m glad you posted and linked to your survey. I was going to do the same!
A collection of short comments:
. Tim: the proof you sketched of Emanuele’s parenthetical is right, I believe. Regarding your question about circuit lower bounds implying correlation bounds, perhaps Emanuele was referring to super-quasipolynomial lower bounds?
. The correlation problem(s) mentioned in Emanuele’s survey are very scary to me — a lot of heavyweights have worked on them since maybe Babai-Nisan-Szegedy in the late ’80s. I’m glad Emanuele is enthusiastic though; it’s true that there’s been tremendous progress on understanding low-degree polynomials lately, and no killer reason why a correlation bound for degree log(n) couldn’t be proved.
. On that note, here is a nice problem Sergey Yekhanin told me (I hope I’m stating it correctly): Show that a cylinder intersection in
of density
has a much larger than
fraction of 2 x 2 x 2 hypercubes. Being a cylinder intersection means that
is present iff
,
, and
all hold, where
are predicates on
.
. Ran Raz has the latest on formula size bounds for the determinant: http://www.wisdom.weizmann.ac.il/~ranraz/publications/Pmultlin.ps
[. On attributions: n^2 formula-size lower bound for Parity was Khrapchenko, I believe, and superpoly size for bounded-depth circuits was first Furst-Saxe-Sipser (and sometimes Yao is credited too).]
October 6, 2009 at 2:53 am |
I think it’s useful to state the following general form of the Goldreich-Goldwasser-Micali / Razborov-Rudich construction:
Under standard assumptions, for every time bound
there is a distributions over functions
such that each function is computable by a formula of size
and such that functions from the family are indistinguishable from truly random functions by probabilistic distinguishers running in time
.
(A sufficient assumption is that factoring
-digit composites of a certain type requires time at least
for a constant
.)
In particular, there is a family of polynomial size formulas such that their
norm (which is computable in time roughly
) has approximately the same distribution as the
norm of truly random functions.
This means that, to avoid the natural proof barrier, it is not enough to have a property that is not computable in time polynomial in
; the property should also be not computable in time
.
Another, more “philosophical,” obstacle is that if one has a property that distinguishes functions of formula size
from random functions, then the computation of the property must not only just take time at least
, but it must also “encode” the computation of a factoring algorithm operating on integers with
digits. More generally, it must encode computations of inverting algorithms for all one-way functions computable by formulas of size of
. This means that the property is already a nearly universal computation, and then exhibiting a specific function in NP which does not satisfy the property, has nearly “all” the difficulty of proving a lower bound for NP.
October 6, 2009 at 10:31 am
I’m very interested in what you write. The first part explains why some of the ideas I was playing with began to seem as though they couldn’t, even in principle, give superpolynomial bounds — which led me to focus more on superlinear bounds (which forced me to think more about circuit complexity). This comes up in a later instalment, but I’ll insert a note with a link to this comment of yours.
As for the second, I’ll think about it. I find this concept of a property that is useless because it is too universal an interesting one. Is there some way of formalizing it (and hence proving results that say that such-and-such a property cannot give a proof because it leads to a simple reformulation of the problem)? Informally, there seems to be a serious barrier there: natural proofs tell you that your property cannot be too simple, and this one tells you that it cannot be too “complex” — except that that’s not quite what I want to say. Somehow the second barrier should rule out things like properties that are defined inductively in terms of Boolean operations.
October 7, 2009 at 3:14 pm
Tim:
I don’t feel confident that there would be something to gain in considering superlinear instead of superpolynomial. I’d think it’s only a matter of optimizing the constructions mentioned in Luca’s comment to obtain that the natural-proofs obstacle applies to linear size circuits as well.
In particular there is a recent result that shows how to construct pseudorandom functions using linear-size circuits (“cryptography with constant computational overhead”). (From a cursory glance there might be a loss in security in their result which does not quite give the optimal connection, I have not checked that carefully, but to me it seems to indicate that linear size circuits are a difficult target already.)
October 7, 2009 at 3:30 pm
I don’t feel confident about that either! I just mean that the arguments I was trying seemed to force me to give up all hope of proving anything better, but in the end even superlinear seemed out of reach too. (In fact, in later instalments I have some speculations about linear-sized pseudorandom functions, which sound pretty similar to what you are referring to: I suggest first applying a clever linear-sized code and then doing a small amount of extra scrambling randomly.)
October 10, 2009 at 10:06 pm
Emanuele, even if you have linear size pseudorandom functions of exponential hardness, it remains possible that a complexity measure computable in time
would distinguish functions of circuit complexity
from random functions. Indeed, the following complexity measure is computable in time
and does perform such a distinguishing:
Then, of course, we are back to Tim’s question of whether the only complexity measures that do the distinguishing are those like the above one which shift all the difficulty of the lower bound proof into showing that a given function has large measure.
October 14, 2009 at 5:36 pm |
Luca, all I meant is that, probably, even proving a superlinear lower bound requires going beyond natural proofs.
But I now see that this comment may have been somewhat off-topic.