This is the second part of a dialogue in which three characters, :), 8) and :|, discuss issues connected with the P versus NP problem. There are now nine parts: if this number continues to increase at a faster rate than I post them, then I may have to increase the rate of posting.
The story so far: has put forward one or two ideas, and 8) has drawn attention to the work of Razborov and Rudich that places very serious constraints on what any proof that PNP could possibly look like. This makes precise an intuition that had been beginning to build up after several failed attempts to find such a proof. then enthusiastically discusses further possible applications of the basic idea of Razborov and Rudich (seemingly unaware that Razborov and Rudich themselves had had similar but much more precise thoughts of the kind), before retiring to think for a while about what a “non-natural” proof might look like.
8) Hello. Got anything for us today?
I’m not really sure. But I have had a few thoughts that I’d like to try out on you.
8) Fire away.
Well, the result of Razborov and Rudich tells us that if we want to define some notion of “structure” or “simplicity” that all functions of low complexity have and some function in NP does not have, then we have a choice. Either almost all functions must be simple, or it must not be possible to decide in polynomial time whether a function is simple. I have to say that I rebel against the first, so I’ve been trying to come up with properties that one might be able to prove something about, despite their not being in P.
If you take the attitude that any property not in P must be “weird” and “complex”, then this seems hopeless. But it’s gradually been dawning on me that there are plenty of properties that we understand perfectly easily even though we don’t have efficient algorithms for determining whether they hold. A pretty obvious class of such properties is NP!
For instance, if the property were “this graph on vertices contains a clique of size ” one can easily imagine proving that it holds for some specific graph, even if we do not have a polynomial-time algorithm for determining whether it holds for a general graph.
8) Yes, but if you allow properties in NP, then you allow the trivial universal properties such as “Has circuit complexity at most .”
Hang on just a moment. Can you explain why that’s in NP?
8) Certainly. If you want to convince me in polynomial time that a certain function from to has circuit complexity at most , then all you have to do is tell me a sequence of functions that you claim is a straight-line computation of . If you want to make it easy for me, you can also tell me what the Boolean operations are that produce each term in the sequence. I will then go away and check that for each of the input strings , if I apply the Boolean operations you have given me then I end up with .
In short, if you tell me how is built up using Boolean operations, it’s easy for me to check that you are telling me the truth. But it is not at all easy for me to work out from the values of what the Boolean operations were that led to those values.
OK, I get the general idea I think.
Hmm, so you’re saying that if I look at properties in NP then I run into the other difficulty—that I am allowing properties that lead to trivial reformulations of the original problem?
8) Well, it sort of looks that way.
Maybe. But it still doesn’t rule out that there might be some property in NP that does the job for us. For instance, perhaps for every polynomial-time computable function on one can find some subset of such that the restriction of to has a property that can be verified in polynomial time. But there might be so many candidates for that it would not be possible to tell in polynomial time whether this was the case. And perhaps it would be possible to prove for some specific function in NP that no such existed.
8) Well, in theory that might be possible I suppose. But the kind of thing you have just proposed won’t do as it stands, because if restricts to a nice function on and restricts to a nice function on some disjoint set , then it might be impossible to say anything about . You’d need some much stronger inductive hypothesis, such as that there were “many” sets for which the restriction of was nice.
Hmm. But then if is the set of all such that restricts nicely to , and is the set of all such that restricts nicely to , then it might be that the corresponding set for was something like . That’s a bit worrying, as the size of the intersection could go down quickly.
But wait. Is it as bad as all that? Let’s suppose that these collections had some kind of positivity property that meant that if has density and has density then has density at least . There are many examples of collections of sets with this kind of property. Then if basic functions had sets associated with them of density , then the density after steps would have to be at least . That sounds bad until one remembers that each set that we are talking about is a collection of subsets of . In other words, we are potentially talking about a set of size here, in which case a density of could be considered large even when is much larger than .
8) So, have you got any suggestion for a property along these lines. You appear to be wanting to say that is “structured” if there is a “large” and “nice” collection of subsets of such that the restriction of to each is “nice”. And you appear also to want it to be the case that if the restrictions of and to are both nice, then so are the restrictions of both and . Good luck!
I see what you mean. In particular, that last requirement is very strong indeed, since it implies that for every that we use, the nice functions on are closed under Boolean operations—or perhaps just approximately closed in some sense. But then, if niceness on is a property in P, it would seem to follow (from natural-proofs type considerations) that almost every Boolean function on is nice. (I admit that I don’t have a precise argument here—I’m just trying to make useful guesses.)
That wouldn’t quite kill off the approach, because it might be that for a typical function, even if the restriction to almost every was nice, it was nevertheless hard to find a nice collection of such sets. But rather than explore this desperate idea I’d like to mention something else I thought of.
8) OK, let’s hear it.
Well, I was trying to think of naturally occurring properties that are in NP and not obviously in P, and I suddenly realized that additive combinatorics can supply us with some.
8) Oh yes. Like what?
Well, an interesting phenomenon in additive combinatorics is that there are certain norms that can be computed in polynomial time, but that have dual norms that seem to be hard to think about except in an NP-ish way.
Whoah, you’re losing me here. What is a dual norm, and what on earth has it got to do with NP?
Sorry, I keep forgetting that many people have an irrational dread of norms. I myself find that I can’t really understand anything until I’ve phrased it in terms of norms. (That’s an exaggeration of course, but it contains a kernel of truth.)
Anyhow, suppose we’ve got a finite set and a norm defined on the vector space of all functions from to or . And let’s suppose also that we have defined an inner product to be . Here, is shorthand for .
We now define the dual norm by the formula
Note the trivial but useful inequality , which holds for any two functions and and any norm .
Now suppose that the norm of a function can be computed in polynomial time. It takes a little bit of effort to say precisely what that means, since we are dealing with real numbers here, but that’s a technicality I don’t want to get into. It’s sort of obvious in any individual case. For example, if I define to be , then that is in P because all I need to do to calculate is arrange the values of in non-increasing order and take the average of the first of them.
Next, suppose that we have a norm that can be computed in polynomial time and we want to prove that for some function . This statement belongs to NP because one can verify it by exhibiting a function such that and : note that checking those two inequalities can be done in polynomial time, the first one by hypothesis and the second one by just working out all those products and taking their average.
That’s all very well, but for the examples I can think of, it seems to be possible to calculate the dual norm in polynomial time. For example, assuming this concept of duality is basically the same as a concept I was once taught about in a functional analysis course, isn’t the dual of equal to , where ? If so, it would seem that at least for spaces the dual space is of the same complexity.
That’s what I thought as well to start with. Here’s a more sophisticated example. A well-known norm in additive combinatorics is the norm. If is a finite group and is a function from to , then the norm of is defined by the formula
Now the direct definition of would be
and if we define it that way then it looks like a function in NP that’s not obviously in P. However, it is an easy exercise to prove that is actually equal to , where is the Fourier transform of . But from that it follows from duality that is equal to , which can be calculated in polynomial time (since the Fourier transform can).
This shows that sometimes one can do a clever transformation, in this case the Fourier transform, that makes a dual norm easy to calculate, even if at first it appears to be hard to calculate.
8) OK then, what’s your point?
My point is that there are norms around where it’s nothing like so obvious how to do this. Here, for example is a problem that I don’t know the answer to. The norm of a function is defined by the formula
Now quite a lot is known about the norm. In particular, if a Boolean function has large norm, then a theorem of Samorodnitsky says that it must correlate quite well with something called a quadratic phase function. But there are around such functions, which is more than polynomially many in , and we don’t know how to express the norm or the dual norm of in terms of some easily calculated decomposition of into quadratic phase functions. So it is not at all obvious whether there is a polynomial-time algorithm for calculating .
8) Are you trying to suggest that the dual of the norm could be a useful complexity measure?
Well, it seems like a long shot, since I thought of it just as an example of a fairly natural function that wasn’t obviously in P, and have no reason to suppose that it really is a formal complexity measure.
Maybe we should think about that. But first I want to tell you another reason that I can’t help being slightly excited about dual norms. Recall that we are trying to define a property of functions that we think of as “simplicity” or “structuredness”. A natural idea would be to define a notion of quasirandomness and say that a function is simple if it isn’t quasirandom. The and norms can be thought of as measures of quasirandomness of a certain type, and they are very useful in additive combinatorics.
However, there is a problem, which is that if and are non-quasirandom, it certainly doesn’t follow that is non-quasirandom. For example, suppose that we divide our group up into two nice sets and . We define to be on and random on , and we define to be random on and on . Then neither nor is quasirandom (their and norms will be big) but is a random function.
However, if we use dual norms, we are in much better shape. We say that is simple if is not too big. Now what does this mean intuitively? It means that is not only not quasirandom, but it doesn’t even correlate with a quasirandom function. So the functions and above are completely ruled out. In general, it seems far more likely that we would end up with a complexity measure. After all, we’ve already got the triangle inequality, which isn’t what we want but does at least resemble it a bit.
So you’re saying that it’s no good using “is not quasirandom” as a definition, but “does not correlate with a quasirandom function” might be OK.
That’s exactly what I’m saying.
There’s something here that’s confusing me. Earlier, you mentioned what you called a trivial inequality, that .
Yes I did.
Well a special case of that is the statement that .
So if is a Boolean function, and therefore takes values everywhere, we get that .
Oh. I think I see where this is going.
So the only way can be small is if is reasonably large. Or to put it another way, if is very small, which is your interpretation of the statement “ is quasirandom,” then must be very large.
Thus, for to be a useful complexity measure, we need to be not all that small whenever is a function of low complexity. But if the norm can be calculated in polynomial time, then the property “ is not that small” is in P. Therefore, given all those results about natural proofs that we were talking about in our earlier conversation, almost all Boolean functions must have the property that is not all that small. That doesn’t instantly imply that cannot be big, because in theory could correlate with a quasirandom function that is not a Boolean function. But it is a bit worrying, since it no longer seems to make sense to describe the norm as small for quasirandom functions if it tends to be big for random Boolean functions.
Damn. I was hoping to use some “magic power of duality”, but the observation you’ve just made shows that even the fact that the predual of some norm is in P causes problems for that norm.
That’s annoying, because dual norms had other virtues.
What were those other virtues?
I just mean the things I’ve already mentioned: that the property of not even correlating with a quasirandom function seems much more likely to be closed under Boolean operations (in some useful sense that one would have to invent). I was also encouraged by the fact that the duals of the norms are known to have what one might call “algebra-like” properties. In particular, a crucial lemma in the proof by Green and Tao that the primes contain arbitrarily long arithmetic progressions asserts that certain special functions in the dual space have pointwise products with dual norms that are not too large. Of course, pointwise products are not the same as pointwise maxima and minima, but if and are functions that take values , then , for example, so there is some connection. The lemma of Green and Tao does not come close to what one would need for the dual norm to be a useful complexity measure, but I found it encouraging nevertheless. But now it seems that the whole idea is completely wrecked.
8) Perhaps we could formalize the difficulty and give a very slight (and very simple) extension of the Razborov-Rudich result. The reason your NP property (having a small dual norm) doesn’t work is that it implies a property in P (having a large norm) that holds for almost no functions. And that implies, by the work of Razborov and Rudich, that it cannot hold for all functions of low complexity. Therefore, your original property cannot hold for all functions of low complexity.
In fact, this is not an extension of the Razborov-Rudich result, since they have a notion of a naturalizable proof, which essentially means a proof that can easily be turned into a natural proof. And that is the case whenever the “simplicity” property you are considering implies another property that can be checked in polynomial time and does not hold for almost all functions.
That brings me to another point I wanted to make. It seems to me that coming up with an NP function that is quasirandom is not a particularly easy task, whereas coming up with an NP function that correlates with a quasirandom function ought to be significantly easier. I have examples up my sleeve to illustrate this, which perhaps I’ll talk about later. But for now I just wanted to point out that this was supposed to be another advantage of dual norms. The odd thing is that the failure of the proof seems to have nothing to do with the part of the argument where you show that some NP function is “complex”.
8) There. I told you you’d end up sadder but wiser.
I find that these feelings of sadness pass very quickly. I’ve just had another thought that might make things work.
8) I’m all ears.
Let me begin by saying what the norm is. It’s the obvious generalization of the and norms. If , then is defined by the formula
Here, is the operation of taking the complex conjugate, and is shorthand for So if your function is real you can ignore the
By the way, is it supposed to be obvious that that formula really does define a norm?
Oh, sorry. No, it’s not obvious, but the proof is not too hard either. It’s a generalization of the proof that the norm is a norm. It’s proved in a similar way, but one uses the Cauchy-Schwarz inequality more times.
Anyhow, the thought that’s occurred to me is that perhaps we’d get something useful if we took to be a slowly growing function of rather than just a fixed constant. Then the number of -tuples is no longer a polynomial in , so the norm no longer appears to be in P.
8) You do realize, I suppose, that there is more to proving a complexity lower bound than just identifying a non-natural property. Do you have any reason to suppose that this one works, or are you just trying to find an approach that doesn’t trivially not work? Given the state of play with the problem, even the second aim has something to be said for it, but if you just make wild guesses like this, then the likelihood is that they won’t actually work.
You’re absolutely right. At this stage I just want to try to think about what a non-natural proof could conceivably look like. But now that I’ve got a candidate, why don’t we try to attack it and show that it doesn’t work?
8) Sounds good to me. And here’s my first attack. What makes you so confident that this norm does not belong to P?
I’ve just explained that. There are about configurations to look at, so if tends to infinity then we have a superpolynomial number.
8) Yes, but do we really have to look at all those configurations? Why couldn’t we take a much smaller random sample and get a good approximation to the norm with high probability? I’m pretty sure that would make the property naturalizable. (Here I’m using the general philosophy that all randomized algorithms can be derandomized — I could go into that, but suffice it to say that if we are assuming that strong pseudorandom generators exist then it shouldn’t be a problem to use randomized algorithms.)
I see. That’s a serious attack. So let’s think about how big the random sample needs to be.
Basically, it needs to be big enough that if there is some non-quasirandomness around, we will find it. So to make it as hard as possible to find, how about if we squash all the non-quasirandomness into a small space? For example, we could choose a function to be entirely random on the whole of , apart from on some -codimensional subspace where it takes the constant value 1.
I can’t be bothered to work out precisely what the norm of this function is, but a woolly argument would be that when we choose a random , we get a contribution of if all of belong to , and on average everything else gives almost zero. The probability that all lie in is , so the norm ought to be something like . This will be large even when , in which case the probability that all lie in is . So in order to pick up any configurations at all that make a contribution to the norm, we would need a random sample of size at least . That would be superpolynomial in if is , or . (I’m taking logs to base 2 here.)
What this seems to show is that, for this approach to have any chance of working, we would have to take to be greater than . But that’s only , so the approach doesn’t seem to be killed off by your random-sampling objection.
8) Maybe not. Actually, I find your argument quite convincing. If all the non-quasirandomness of some function can be hidden in a very small subspace, it could well be quite hard to find that subspace.
I’m not sure I’m quite as convinced. With your example, the function was constant on a subspace. Surely there must be ways of identifying large subspaces on which a function is constant if it is otherwise random. For instance, couldn’t one just take the Fourier transform?
Yes, but that was just one example. Having found that, one could disguise it a bit. For example, instead of making constant on , one could instead make it a polynomial phase function of degree . The estimate above would still go through, and it is much less clear how one would identify a subspace on which an otherwise random function was equal to some arbitrary polynomial phase function of unbounded degree. And one could make it harder still by randomly flipping a few values in order to defeat any clever algebraic tricks.
OK, you’ve bludgeoned me into submission.
By the way, I should make clear that the complexity measure I would then go for is the dual norm of the norm. Although, as we’ve seen, this is not a philosophical advantage, the other advantages of dual norms were enough to make this a good thing to do. Another way I like to think of this: if a low-complexity function cannot be quasirandom, then it ought to be the case that no restrictions of it can be quasirandom, since they too are built up from simple functions by means of a limited number of Boolean operations. This idea of being “non-quasirandom everywhere” is much better captured by duals of quasirandomness norms than by the quasirandomness norms themselves.
Having said all that, I still don’t really believe that I’ve just stumbled on a genuinely useful formal complexity measure. Here’s why.
We already more or less know that for bounded the duals of the norms don’t give us formal complexity measures, because a random formula probably has almost maximal dual norm. Therefore, we must have many pairs of functions such that . Probably in the worst case, the left-hand side is greater than the right-hand side by at least a constant factor depending on .
Suppose we were very lucky and could prove that , for some constant that tended to zero with . One might at first think that one could just choose nice and large and end up with a good enough approximation to a formal complexity measure to be able to prove some serious results. But unfortunately something else is decreasing at the same time: the largest possible dual norm of a Boolean function. Roughly speaking, the th power of this is equal to the probability that a configuration is degenerate, which is of order .
What do you mean by a degenerate configuration?
Well, with the norm, for example, one takes an expectation over quadruples . Associated with each quadruple is a three-dimensional structure that consists of the points Normally, these points are all distinct, but if some of them are equal then we say that the configuration is degenerate.
Anyhow, I’m pretty sure that the largest possible dual norm of a Boolean function is around . So it is not enough for that constant to tend to zero: it has to do so “faster”, in a certain sense, than tends to 1. And there doesn’t seem to be any reason for that to be the case.
8) I’d like to try out another demolition job if I could.
Fine by me, since I’m feeling less and less emotionally attached to this particular idea.
8) Well, if I understand correctly your definition of the norm, there is a simple combinatorial interpretation of the norm of a Boolean function, at least if one ignores the degenerate cases.
Yes, I’m pretty sure there is. But why don’t you explain it?
8) Well, let be a non-degenerate -tuple and let be a Boolean function. Let be the subspace of generated by and let . Then the product that appears in the definition of the norm is if the number of points where is odd, and otherwise. In other words, it measures the parity of inside . So the th power of is the average parity of in a random -dimensional affine subspace of .
Thus, for a Boolean function to have a large norm, one needs the function to have even parity noticeably more often than odd parity, or vice versa.
A simple argument shows that the expectation is non-negative, so if there is going to be any bias of this kind it will have to be towards even parity.
8) OK, forget the “or vice versa”.
Now the parity of the restriction of to an affine subspace is an easily computable function, so if we believe that a random low-complexity is polynomially indistinguisable from a genuinely random , then it would seem to follow that is guaranteed to have even parity almost exactly half the time. And now by linearity of expectation, it seems to follow that the expected parity is small. So even when is unbounded, it seems that the norm of a random low-complexity is small, and therefore that its dual norm is large.
Hmm … it sounds as though we’ve hit a fairly general obstacle here.
Oh, but hang on a moment. There’s a flaw in your argument. You correctly show that the expectation of the th power of the norm of a random low-complexity function will be small, but how small will it be? Polynomial indistinguishability requires only that it should be at most for any fixed . But if we raise this to the power , we get . If we’ve chosen to be , then this is , which is more or less equal to 1.
The point I’m making here is that if is as big as , then even a very tiny bias, smaller than any power of , in the number of affine subspaces where the parity is even, is detectable by the norm. And the polynomial indistinguishability doesn’t tell us that we can’t find bias that’s as small as that.
8) Hmm, I’m going to come back to this point, because I’m still very suspicious of it. And there’s also your own objection to worry about — the one about getting smaller and smaller. In fact, if that looks very small indeed. And if you set then is no longer more or less equal to 1.
Yes, but somehow that feels like an objection to one particular proposal for a complexity measure. And that proposal was, as you say, just a pure guess, with no thought for why low-complexity functions might turn out to have small dual norms. Perhaps some more carefully thought out quasirandomness norm might do better.
Also, I have to say that I don’t want to dismiss the dual of the norm until I have understood rather better why it doesn’t work.