A conversation about complexity lower bounds, continued

This is the second part of a dialogue in which three characters, :), 8) and :|, discuss issues connected with the P versus NP problem. There are now nine parts: if this number continues to increase at a faster rate than I post them, then I may have to increase the rate of posting.

The story so far: 🙂 has put forward one or two ideas, and 8) has drawn attention to the work of Razborov and Rudich that places very serious constraints on what any proof that P $\ne$ NP could possibly look like. This makes precise an intuition that 🙂 had been beginning to build up after several failed attempts to find such a proof. 🙂 then enthusiastically discusses further possible applications of the basic idea of Razborov and Rudich (seemingly unaware that Razborov and Rudich themselves had had similar but much more precise thoughts of the kind), before retiring to think for a while about what a “non-natural” proof might look like.

****************************************************************

🙂 Hi again.

8) Hello. Got anything for us today?

🙂 I’m not really sure. But I have had a few thoughts that I’d like to try out on you.

8) Fire away.

🙂 Well, the result of Razborov and Rudich tells us that if we want to define some notion of “structure” or “simplicity” that all functions of low complexity have and some function in NP does not have, then we have a choice. Either almost all functions must be simple, or it must not be possible to decide in polynomial time whether a function is simple. I have to say that I rebel against the first, so I’ve been trying to come up with properties that one might be able to prove something about, despite their not being in P.

If you take the attitude that any property not in P must be “weird” and “complex”, then this seems hopeless. But it’s gradually been dawning on me that there are plenty of properties that we understand perfectly easily even though we don’t have efficient algorithms for determining whether they hold. A pretty obvious class of such properties is NP!

For instance, if the property were “this graph on $n$ vertices contains a clique of size $n/100$ ” one can easily imagine proving that it holds for some specific graph, even if we do not have a polynomial-time algorithm for determining whether it holds for a general graph.

8) Yes, but if you allow properties in NP, then you allow the trivial universal properties such as “Has circuit complexity at most $m$ .”

😐 Hang on just a moment. Can you explain why that’s in NP?

8) Certainly. If you want to convince me in polynomial time that a certain function $f$ from $G=\{0,1\}^n$ to $\{-1,1\}$ has circuit complexity at most $m$ , then all you have to do is tell me a sequence of $m$ functions that you claim is a straight-line computation of $f$ . If you want to make it easy for me, you can also tell me what the Boolean operations are that produce each term in the sequence. I will then go away and check that for each of the $N=2^n$ input strings $x$ , if I apply the Boolean operations you have given me then I end up with $f(x)$ .

In short, if you tell me how $f$ is built up using Boolean operations, it’s easy for me to check that you are telling me the truth. But it is not at all easy for me to work out from the values of $f$ what the Boolean operations were that led to those values.

😐 OK, I get the general idea I think.

🙂 Hmm, so you’re saying that if I look at properties in NP then I run into the other difficulty—that I am allowing properties that lead to trivial reformulations of the original problem?

8) Well, it sort of looks that way.

🙂 Maybe. But it still doesn’t rule out that there might be some property in NP that does the job for us. For instance, perhaps for every polynomial-time computable function $f$ on $\{0,1\}^n$ one can find some subset $X$ of $\{0,1\}^n$ such that the restriction of $f$ to $X$ has a property that can be verified in polynomial time. But there might be so many candidates for $X$ that it would not be possible to tell in polynomial time whether this was the case. And perhaps it would be possible to prove for some specific function in NP that no such $X$ existed.

8) Well, in theory that might be possible I suppose. But the kind of thing you have just proposed won’t do as it stands, because if $f$ restricts to a nice function on $X$ and $g$ restricts to a nice function on some disjoint set $Y$ , then it might be impossible to say anything about $f\vee g$ . You’d need some much stronger inductive hypothesis, such as that there were “many” sets $X$ for which the restriction of $f$ was nice.

🙂 Hmm. But then if $\mathcal{A}$ is the set of all $X$ such that $f$ restricts nicely to $X$ , and $\mathcal{B}$ is the set of all $X$ such that $g$ restricts nicely to $X$ , then it might be that the corresponding set for $f\vee g$ was something like $\mathcal{A}\cap\mathcal{B}$ . That’s a bit worrying, as the size of the intersection could go down quickly.

But wait. Is it as bad as all that? Let’s suppose that these collections $\mathcal{A}$ had some kind of positivity property that meant that if $\mathcal{A}$ has density $\alpha$ and $\mathcal{B}$ has density $\beta$ then $\mathcal{A}\cap\mathcal{B}$ has density at least $\alpha\beta$ . There are many examples of collections of sets with this kind of property. Then if basic functions had sets associated with them of density $1/2$ , then the density after $m$ steps would have to be at least $2^{-m}$ . That sounds bad until one remembers that each set that we are talking about is a collection of subsets of $\{0,1\}^n$ . In other words, we are potentially talking about a set of size $2^{2^n}$ here, in which case a density of $2^{-m}$ could be considered large even when $m$ is much larger than $n$ .

8) So, have you got any suggestion for a property along these lines. You appear to be wanting to say that $f$ is “structured” if there is a “large” and “nice” collection $\mathcal{A}$ of subsets $X$ of $\{0,1\}^n$ such that the restriction of $f$ to each $X\in\mathcal{A}$ is “nice”. And you appear also to want it to be the case that if the restrictions of $f$ and $g$ to $X$ are both nice, then so are the restrictions of both $f\vee g$ and $f\wedge g$ . Good luck!

🙂 I see what you mean. In particular, that last requirement is very strong indeed, since it implies that for every $X$ that we use, the nice functions on $X$ are closed under Boolean operations—or perhaps just approximately closed in some sense. But then, if niceness on $X$ is a property in P, it would seem to follow (from natural-proofs type considerations) that almost every Boolean function on $X$ is nice. (I admit that I don’t have a precise argument here—I’m just trying to make useful guesses.)

That wouldn’t quite kill off the approach, because it might be that for a typical function, even if the restriction to almost every $X$ was nice, it was nevertheless hard to find a nice collection $\mathcal{A}$ of such sets. But rather than explore this desperate idea I’d like to mention something else I thought of.

8) OK, let’s hear it.

🙂 Well, I was trying to think of naturally occurring properties that are in NP and not obviously in P, and I suddenly realized that additive combinatorics can supply us with some.

8) Oh yes. Like what?

🙂 Well, an interesting phenomenon in additive combinatorics is that there are certain norms that can be computed in polynomial time, but that have dual norms that seem to be hard to think about except in an NP-ish way.

😐 Whoah, you’re losing me here. What is a dual norm, and what on earth has it got to do with NP?

🙂 Sorry, I keep forgetting that many people have an irrational dread of norms. I myself find that I can’t really understand anything until I’ve phrased it in terms of norms. (That’s an exaggeration of course, but it contains a kernel of truth.)

Anyhow, suppose we’ve got a finite set $A$ and a norm $\|.\|$ defined on the vector space of all functions $f$ from $A$ to $\mathbb{R}$ or $\mathbb{C}$ . And let’s suppose also that we have defined an inner product $\langle f,g\rangle$ to be $\mathbb{E}_{x\in A}f(x)\overline{g(x)}$ . Here, $\mathbb{E}_{x\in A}$ is shorthand for $|A|^{-1}\sum_{x\in A}$ .

We now define the dual norm $\|.\|^*$ by the formula

$\displaystyle \|f\|^*=\max\{\langle f,g\rangle:\|g\|\leq 1\}.$

Note the trivial but useful inequality $\langle f,g\rangle\leq\|f\|^*\|g\|$ , which holds for any two functions $f$ and $g$ and any norm $\|.\|$ .

Now suppose that the norm of a function can be computed in polynomial time. It takes a little bit of effort to say precisely what that means, since we are dealing with real numbers here, but that’s a technicality I don’t want to get into. It’s sort of obvious in any individual case. For example, if I define $\|f\|$ to be $\max_{|B|=m}\mathbb{E}_{x\in B}|f(x)|$ , then that is in P because all I need to do to calculate $\|f\|$ is arrange the values of $|f|$ in non-increasing order and take the average of the first $m$ of them.

Next, suppose that we have a norm $\|.\|$ that can be computed in polynomial time and we want to prove that $\|f\|^*\geq C$ for some function $f$ . This statement belongs to NP because one can verify it by exhibiting a function $g$ such that $\|g\|\leq 1$ and $\langle f,g\rangle\geq C$ : note that checking those two inequalities can be done in polynomial time, the first one by hypothesis and the second one by just working out all those products and taking their average.

😐 That’s all very well, but for the examples I can think of, it seems to be possible to calculate the dual norm in polynomial time. For example, assuming this concept of duality is basically the same as a concept I was once taught about in a functional analysis course, isn’t the dual of $L_p$ equal to $L_q$ , where $p^{-1}+q^{-1}=1$ ? If so, it would seem that at least for $L_p$ spaces the dual space is of the same complexity.

🙂 That’s what I thought as well to start with. Here’s a more sophisticated example. A well-known norm in additive combinatorics is the $U^2$ norm. If $G$ is a finite group and $f$ is a function from $G$ to $\mathbb{C}$ , then the $U^2$ norm of $f$ is defined by the formula

$\displaystyle \|f\|_{U^2}^4=\mathbb{E}_{x,a,b}f(x)\overline{f(x+a)f(x+b)}f(x+a+b).$

Now the direct definition of $\|f\|_{U^2}^*$ would be

$\displaystyle \|f\|_{U^2}^*=\max\{\langle f,g\rangle:\mathbb{E}_{x,a,b}g(x)\overline{g(x+a)g(x+b)}g(x+a+b)\leq 1,$

and if we define it that way then it looks like a function in NP that’s not obviously in P. However, it is an easy exercise to prove that $\|f\|_{U^2}$ is actually equal to $\|\hat{f}\|_4$ , where $\hat{f}$ is the Fourier transform of $f$ . But from that it follows from $\ell_p$ duality that $\|f\|_{U^2}^*$ is equal to $\|\hat{f}\|_{4/3}$ , which can be calculated in polynomial time (since the Fourier transform can).

This shows that sometimes one can do a clever transformation, in this case the Fourier transform, that makes a dual norm easy to calculate, even if at first it appears to be hard to calculate.

8) OK then, what’s your point?

🙂 My point is that there are norms around where it’s nothing like so obvious how to do this. Here, for example is a problem that I don’t know the answer to. The $U^3$ norm of a function $f$ is defined by the formula

$\displaystyle \|f\|_{U^3}^8=\mathbb{E}_{x,a,b,c}f(x)\overline{f(x+a)f(x+b)}f(x+a+b)$

$\displaystyle \overline{f(x+c)}f(x+a+c)f(x+b+c)\overline{f(x+a+b+c)}.$

Now quite a lot is known about the $U^3$ norm. In particular, if a Boolean function has large $U^3$ norm, then a theorem of Samorodnitsky says that it must correlate quite well with something called a quadratic phase function. But there are around $2^{cn^2}$ such functions, which is more than polynomially many in $2^n$ , and we don’t know how to express the $U^3$ norm or the $U^3$ dual norm of $f$ in terms of some easily calculated decomposition of $f$ into quadratic phase functions. So it is not at all obvious whether there is a polynomial-time algorithm for calculating $\|f\|_{U^3}^*$ .

8) Are you trying to suggest that the dual of the $U^3$ norm could be a useful complexity measure?

🙂 Well, it seems like a long shot, since I thought of it just as an example of a fairly natural function that wasn’t obviously in P, and have no reason to suppose that it really is a formal complexity measure.

Maybe we should think about that. But first I want to tell you another reason that I can’t help being slightly excited about dual norms. Recall that we are trying to define a property of functions that we think of as “simplicity” or “structuredness”. A natural idea would be to define a notion of quasirandomness and say that a function is simple if it isn’t quasirandom. The $U^2$ and $U^3$ norms can be thought of as measures of quasirandomness of a certain type, and they are very useful in additive combinatorics.

However, there is a problem, which is that if $f$ and $g$ are non-quasirandom, it certainly doesn’t follow that $f\vee g$ is non-quasirandom. For example, suppose that we divide our group up into two nice sets $X$ and $Y$ . We define $f$ to be $-1$ on $X$ and random on $Y$ , and we define $g$ to be random on $X$ and $-1$ on $Y$ . Then neither $f$ nor $g$ is quasirandom (their $U^2$ and $U^3$ norms will be big) but $f\vee g$ is a random function.

However, if we use dual norms, we are in much better shape. We say that $f$ is simple if $\|f\|^*$ is not too big. Now what does this mean intuitively? It means that $f$ is not only not quasirandom, but it doesn’t even correlate with a quasirandom function. So the functions $f$ and $g$ above are completely ruled out. In general, it seems far more likely that we would end up with a complexity measure. After all, we’ve already got the triangle inequality, which isn’t what we want but does at least resemble it a bit.

😐 So you’re saying that it’s no good using “is not quasirandom” as a definition, but “does not correlate with a quasirandom function” might be OK.

🙂 That’s exactly what I’m saying.

😐 There’s something here that’s confusing me. Earlier, you mentioned what you called a trivial inequality, that $\langle f,g\rangle\leq\|f\|^*\|g\|$ .

🙂 Yes I did.

😐 Well a special case of that is the statement that $\langle f,f\rangle\leq\|f\|^*\|f\|$ .

🙂 Yes.

😐 So if $f$ is a Boolean function, and therefore takes values $\pm 1$ everywhere, we get that $\|f\|^*\|f\|\geq 1$ .

🙂 Oh. I think I see where this is going.

😐 So the only way $\|f\|^*$ can be small is if $\|f\|$ is reasonably large. Or to put it another way, if $\|f\|$ is very small, which is your interpretation of the statement “ $f$ is quasirandom,” then $\|f\|^*$ must be very large.

Thus, for $\|f\|^*$ to be a useful complexity measure, we need $\|f\|$ to be not all that small whenever $f$ is a function of low complexity. But if the norm $\|.\|$ can be calculated in polynomial time, then the property “ $\|f\|$ is not that small” is in P. Therefore, given all those results about natural proofs that we were talking about in our earlier conversation, almost all Boolean functions $f$ must have the property that $\|f\|$ is not all that small. That doesn’t instantly imply that $\|f\|^*$ cannot be big, because in theory $f$ could correlate with a quasirandom function that is not a Boolean function. But it is a bit worrying, since it no longer seems to make sense to describe the norm as small for quasirandom functions if it tends to be big for random Boolean functions.

🙂 Damn. I was hoping to use some “magic power of duality”, but the observation you’ve just made shows that even the fact that the predual of some norm is in P causes problems for that norm.

That’s annoying, because dual norms had other virtues.

😐 What were those other virtues?

🙂 I just mean the things I’ve already mentioned: that the property of not even correlating with a quasirandom function seems much more likely to be closed under Boolean operations (in some useful sense that one would have to invent). I was also encouraged by the fact that the duals of the $U^k$ norms are known to have what one might call “algebra-like” properties. In particular, a crucial lemma in the proof by Green and Tao that the primes contain arbitrarily long arithmetic progressions asserts that certain special functions in the dual space have pointwise products with dual norms that are not too large. Of course, pointwise products are not the same as pointwise maxima and minima, but if $f$ and $g$ are functions that take values $\pm 1$ , then $f\wedge g=(f+g+fg-1)/2$ , for example, so there is some connection. The lemma of Green and Tao does not come close to what one would need for the dual norm to be a useful complexity measure, but I found it encouraging nevertheless. But now it seems that the whole idea is completely wrecked.

8) Perhaps we could formalize the difficulty and give a very slight (and very simple) extension of the Razborov-Rudich result. The reason your NP property (having a small dual norm) doesn’t work is that it implies a property in P (having a large norm) that holds for almost no functions. And that implies, by the work of Razborov and Rudich, that it cannot hold for all functions of low complexity. Therefore, your original property cannot hold for all functions of low complexity.

In fact, this is not an extension of the Razborov-Rudich result, since they have a notion of a naturalizable proof, which essentially means a proof that can easily be turned into a natural proof. And that is the case whenever the “simplicity” property you are considering implies another property that can be checked in polynomial time and does not hold for almost all functions.

🙂 That brings me to another point I wanted to make. It seems to me that coming up with an NP function that is quasirandom is not a particularly easy task, whereas coming up with an NP function that correlates with a quasirandom function ought to be significantly easier. I have examples up my sleeve to illustrate this, which perhaps I’ll talk about later. But for now I just wanted to point out that this was supposed to be another advantage of dual norms. The odd thing is that the failure of the proof seems to have nothing to do with the part of the argument where you show that some NP function is “complex”.

8) There. I told you you’d end up sadder but wiser.

🙂 I find that these feelings of sadness pass very quickly. I’ve just had another thought that might make things work.

8) I’m all ears.

🙂 Let me begin by saying what the $U^k$ norm is. It’s the obvious generalization of the $U^2$ and $U^3$ norms. If $f:G\rightarrow\mathbb{C}$ , then $\|f\|_{U^k}$ is defined by the formula

$\displaystyle \|f\|_{U^k}^{2^k}=\mathbb{E}_{x,a_1,\dots,a_k}\prod_{\epsilon\in\{0,1\}^k}C^{|\epsilon|}f(x+\epsilon_1a_1+\dots+\epsilon_ka_k).$

Here, $C$ is the operation of taking the complex conjugate, and $|\epsilon|$ is shorthand for $\epsilon_1+\dots+\epsilon_k.$ So if your function is real you can ignore the $C^{|\epsilon|}.$

😐 By the way, is it supposed to be obvious that that formula really does define a norm?

🙂 Oh, sorry. No, it’s not obvious, but the proof is not too hard either. It’s a generalization of the proof that the $L_2$ norm is a norm. It’s proved in a similar way, but one uses the Cauchy-Schwarz inequality more times.

Anyhow, the thought that’s occurred to me is that perhaps we’d get something useful if we took $k$ to be a slowly growing function of $N=|G|$ rather than just a fixed constant. Then the number of $(k+1)$ -tuples $(x,a_1,\dots,a_k)$ is no longer a polynomial in $N$ , so the norm no longer appears to be in P.

8) You do realize, I suppose, that there is more to proving a complexity lower bound than just identifying a non-natural property. Do you have any reason to suppose that this one works, or are you just trying to find an approach that doesn’t trivially not work? Given the state of play with the problem, even the second aim has something to be said for it, but if you just make wild guesses like this, then the likelihood is that they won’t actually work.

🙂 You’re absolutely right. At this stage I just want to try to think about what a non-natural proof could conceivably look like. But now that I’ve got a candidate, why don’t we try to attack it and show that it doesn’t work?

8) Sounds good to me. And here’s my first attack. What makes you so confident that this norm does not belong to P?

🙂 I’ve just explained that. There are about $N^{k+1}$ configurations to look at, so if $k$ tends to infinity then we have a superpolynomial number.

8) Yes, but do we really have to look at all those configurations? Why couldn’t we take a much smaller random sample and get a good approximation to the norm with high probability? I’m pretty sure that would make the property naturalizable. (Here I’m using the general philosophy that all randomized algorithms can be derandomized — I could go into that, but suffice it to say that if we are assuming that strong pseudorandom generators exist then it shouldn’t be a problem to use randomized algorithms.)

🙂 I see. That’s a serious attack. So let’s think about how big the random sample needs to be.

Basically, it needs to be big enough that if there is some non-quasirandomness around, we will find it. So to make it as hard as possible to find, how about if we squash all the non-quasirandomness into a small space? For example, we could choose a function to be entirely random on the whole of $\{0,1\}^n$ , apart from on some $d$ -codimensional subspace where it takes the constant value 1.

I can’t be bothered to work out precisely what the $U^k$ norm of this function is, but a woolly argument would be that when we choose a random $(x,a_1,\dots,a_k)$ , we get a contribution of $1$ if all of $x,a_1,\dots,a_k$ belong to $X$ , and on average everything else gives almost zero. The probability that $x,a_1,\dots,a_k$ all lie in $X$ is $2^{-(k+1)d}$ , so the $U^k$ norm ought to be something like $2^{-(k+1)d/2^k}$ . This will be large even when $d=2^k/(k+1)$ , in which case the probability that $x,a_1,\dots,a_k$ all lie in $X$ is $2^{-{2^k}}$ . So in order to pick up any configurations at all that make a contribution to the $U^k$ norm, we would need a random sample of size at least $2^{2^k}$ . That would be superpolynomial in $N$ if $k$ is $10\log\log N$ , or $10\log n$ . (I’m taking logs to base 2 here.)

What this seems to show is that, for this approach to have any chance of working, we would have to take $k$ to be greater than $\log n$ . But that’s only $\log\log N$ , so the approach doesn’t seem to be killed off by your random-sampling objection.

8) Maybe not. Actually, I find your argument quite convincing. If all the non-quasirandomness of some function can be hidden in a very small subspace, it could well be quite hard to find that subspace.

😐 I’m not sure I’m quite as convinced. With your example, the function was constant on a subspace. Surely there must be ways of identifying large subspaces on which a function is constant if it is otherwise random. For instance, couldn’t one just take the Fourier transform?

🙂 Yes, but that was just one example. Having found that, one could disguise it a bit. For example, instead of making $f$ constant on $X$ , one could instead make it a polynomial phase function of degree $k-1$ . The estimate above would still go through, and it is much less clear how one would identify a subspace on which an otherwise random function was equal to some arbitrary polynomial phase function of unbounded degree. And one could make it harder still by randomly flipping a few values in order to defeat any clever algebraic tricks.

😐 OK, you’ve bludgeoned me into submission.

🙂 By the way, I should make clear that the complexity measure I would then go for is the dual norm of the $U^k$ norm. Although, as we’ve seen, this is not a philosophical advantage, the other advantages of dual norms were enough to make this a good thing to do. Another way I like to think of this: if a low-complexity function cannot be quasirandom, then it ought to be the case that no restrictions of it can be quasirandom, since they too are built up from simple functions by means of a limited number of Boolean operations. This idea of being “non-quasirandom everywhere” is much better captured by duals of quasirandomness norms than by the quasirandomness norms themselves.

Having said all that, I still don’t really believe that I’ve just stumbled on a genuinely useful formal complexity measure. Here’s why.

We already more or less know that for bounded $k$ the duals of the $U^k$ norms don’t give us formal complexity measures, because a random formula probably has almost maximal dual norm. Therefore, we must have many pairs of functions such that $\|f\vee g\|_{U^k}^*>\|f\|_{U^k}^*+\|g\|_{U^k}^*$ . Probably in the worst case, the left-hand side is greater than the right-hand side by at least a constant factor depending on $k$ .

Suppose we were very lucky and could prove that $\|f\vee g\|_{U^k}^*\leq(1+c(k))(\|f\|_{U^k}^*+\|g\|_{U^k}^*)$ , for some constant $c(k)$ that tended to zero with $k$ . One might at first think that one could just choose $k$ nice and large and end up with a good enough approximation to a formal complexity measure to be able to prove some serious results. But unfortunately something else is decreasing at the same time: the largest possible $U^k$ dual norm of a Boolean function. Roughly speaking, the $2^k$ th power of this is equal to the probability that a configuration is degenerate, which is of order $N^{-1}$ .

😐 What do you mean by a degenerate configuration?

🙂 Well, with the $U^3$ norm, for example, one takes an expectation over quadruples $(x,a,b,c)$ . Associated with each quadruple is a three-dimensional structure that consists of the points $x, x+a,x+b,x+a+b,x+c,x+a+c,x+b+c,x+a+b+c.$ Normally, these points are all distinct, but if some of them are equal then we say that the configuration is degenerate.

Anyhow, I’m pretty sure that the largest possible $U^k$ dual norm of a Boolean function is around $N^{1/2^k}$ . So it is not enough for that constant $c(k)$ to tend to zero: it has to do so “faster”, in a certain sense, than $N^{1/2^k}$ tends to 1. And there doesn’t seem to be any reason for that to be the case.

8) I’d like to try out another demolition job if I could.

🙂 Fine by me, since I’m feeling less and less emotionally attached to this particular idea.

8) Well, if I understand correctly your definition of the $U^k$ norm, there is a simple combinatorial interpretation of the $U^k$ norm of a Boolean function, at least if one ignores the degenerate cases.

🙂 Yes, I’m pretty sure there is. But why don’t you explain it?

8) Well, let $(x,a_1,\dots,a_n)$ be a non-degenerate $(k+1)$ -tuple and let $f$ be a Boolean function. Let $A$ be the subspace of $\mathbb{F}_2^n$ generated by $a_1,\dots,a_k$ and let $X=x+A$ . Then the product $\prod_{\epsilon\in\{0,1\}^k}C^{|\epsilon|}f(x+\epsilon_1a_1+\dots+\epsilon_ka_k)$ that appears in the definition of the norm is $-1$ if the number of points $x\in X$ where $f(x)=-1$ is odd, and $1$ otherwise. In other words, it measures the parity of $f$ inside $X$ . So the $2^k$ th power of $\|f\|_{U^k}$ is the average parity of $f$ in a random $k$ -dimensional affine subspace of $\mathbb{F}_2^n$ .

Thus, for a Boolean function to have a large $U^k$ norm, one needs the function to have even parity noticeably more often than odd parity, or vice versa.

🙂 A simple argument shows that the expectation is non-negative, so if there is going to be any bias of this kind it will have to be towards even parity.

8) OK, forget the “or vice versa”.

Now the parity of the restriction of $f$ to an affine subspace is an easily computable function, so if we believe that a random low-complexity $f$ is polynomially indistinguisable from a genuinely random $f$ , then it would seem to follow that $f$ is guaranteed to have even parity almost exactly half the time. And now by linearity of expectation, it seems to follow that the expected parity is small. So even when $k$ is unbounded, it seems that the $U^k$ norm of a random low-complexity $f$ is small, and therefore that its dual norm is large.

🙂 Hmm … it sounds as though we’ve hit a fairly general obstacle here.

Oh, but hang on a moment. There’s a flaw in your argument. You correctly show that the expectation of the $2^k$ th power of the $U^k$ norm of a random low-complexity function will be small, but how small will it be? Polynomial indistinguishability requires only that it should be at most $1/N^\alpha$ for any fixed $\alpha$ . But if we raise this to the power $1/2^k$ , we get $N^{-\alpha/2^k}$ . If we’ve chosen $k$ to be $C\log\log N$ , then this is $N^{-\alpha/(\log N)^C}$ , which is more or less equal to 1.

The point I’m making here is that if $k$ is as big as $C\log\log N$ , then even a very tiny bias, smaller than any power of $N$ , in the number of affine subspaces where the parity is even, is detectable by the $U^k$ norm. And the polynomial indistinguishability doesn’t tell us that we can’t find bias that’s as small as that.

8) Hmm, I’m going to come back to this point, because I’m still very suspicious of it. And there’s also your own objection to worry about — the one about $N^{1/2^k}$ getting smaller and smaller. In fact, if $k=C\log\log N,$ that looks very small indeed. And if you set $C=1,$ then $N^{-\alpha/(\log N)^C}$ is no longer more or less equal to 1.

🙂 Yes, but somehow that feels like an objection to one particular proposal for a complexity measure. And that proposal was, as you say, just a pure guess, with no thought for why low-complexity functions might turn out to have small dual norms. Perhaps some more carefully thought out quasirandomness norm might do better.

Also, I have to say that I don’t want to dismiss the dual of the $U^k$ norm until I have understood rather better why it doesn’t work.

This entry was posted on September 28, 2009 at 9:41 am and is filed under complexity. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

12 Responses to “A conversation about complexity lower bounds, continued”

Ryan O'Donnell Says:
September 28, 2009 at 6:49 pm | Reply
This might be a good time to point to the paper of Timothy Y. Chow, http://math.mit.edu/~tchow/almost.pdf. Therein he shows that the Razborov-Rudich result provably fails if you relax the “largeness” condition slightly; specifically, if you only insist that a random function have the property with probability at least $2^{-\text{quasipoly}(n)}$ rather than $2^{-\text{poly}(n)}$ .

He gives Salil Vadhan’s short proof of this statement at the top of page 10.
andrej Says:
September 29, 2009 at 7:00 am | Reply
Incidentally, 8) , the calculation of the U^k norm can be derandomized unconditionally using “epsilon-biased” spaces.
Stones Cry Out - If they keep silent… » Things Heard: e87v2 Says:
September 29, 2009 at 3:12 pm | Reply
[…] emoticon’s converse … my guess this isn’t the conversation most people think they’re going to be […]
Rahul Santhanam Says:
September 29, 2009 at 4:35 pm | Reply
Thought-provoking post. Re. the “desperate idea”, it seems that niceness on X cannot be decidable in polynomial time (under a complexity hypothesis). Razborov and Rudich show that any formal complexity measure must be high for most Boolean functions. The same argument seems to carry through to say that if there is any function that is not nice on X, then it follows from the closure (or approximate closure) of niceness under Boolean operations that most functions on X are not nice on X. But then any criterion for niceness on X can be used to distinguish between random functions on X and pseudorandom functions (since pseudorandom functions ought to be nice on X), and hence cannot be decidable in polynomial time.

Perhaps a good test for a complexity measure in general is to check that it’s small for simple functions in NC^1, such as Recursive Majority. Indeed, if a measure is small for a complete function such as Boolean Formula Evaluation or iterated multiplication over S_5, and if it’s “closed under AC^0 reductions”, it’s small for all of NC^1. This would typically be the “easy” direction of a lower bound, the hard part being showing that there’s a function in NP for which the complexity measure is high. Maybe for “interesting” complexity measures, there would be a different tradeoff between these two parts of the argument, where it would now be non-trivial to show that the measure is small for functions with small formula size, but the proof that there’s a function in NP for which the measure is large would be simplified.
- gowers Says:
  September 29, 2009 at 6:14 pm
  The “desperate idea” took that point on board, and was proposing a definition that was something like this: a function f is “simple” if there is a “structured collection” A of “structured sets” X such that the restriction of f to any X in A is “nice”. If f is a typical low-complexity function, then for any individual X one would expect that the probability that f has a nice restriction to X is small (if niceness is in P), for the reason you give. The desperate idea was that for any low-complexity f there was nevertheless a structured collection A of exceptional sets X for which f did have a nice restriction, and that this property was preserved (in some sense) by Boolean operations. I think what I’m writing here is consistent with what you write, so apologies if this clarification is not needed.
  
  I get the general gist of your second paragraph (after looking up what NC^1 is, but I haven’t yet got on to iterated multiplication over S_5 or closure under AC^0 reductions). I like the idea of trying to move the tradeoff.
Rahul Santhanam Says:
September 30, 2009 at 4:43 pm | Reply
Thanks for the explanation; I understand better now.

With regard to the second paragraph, the idea was that complexity measures which are proved to be useful by induction (i.e., by analyzing the effect of each individual AND and OR gate) might turn out to be just as hard to work with as formula size itself. A different way to show that a complexity measure is useful is to prove it’s small for a problem that is complete for NC^1, and then argue that it is small for all of NC^1 by using reductions. Typically, complete problems are complete under some simple form of reduction, such as AC^0 reductions. AC^0 reductions are reductions computable by a polynomial-size constant-depth family of circuits, where the circuits consist of AND and OR gates with arbitrary fan-in. The nice thing about AC^0 is that it’s fairly well-understood as a complexity class (relatively speaking!). Linial, Mansour and Nisan showed that any AC^0 Boolean function is well-approximated in L^2 norm by the Fourier spectrum restricted to co-efficients of weight at most polylog(n). As for completeness, there are some non-trivial (and simple) examples of problems that are complete for NC^1 – Barrington showed that multiplying a sequence of n elements of S_5 is complete, which is quite surprising since this can be done even by a finite automaton. An optimist might consider all this as evidence that NC^1 is a “simpler” and “easier to analyze” class than P.
- gowers Says:
  September 30, 2009 at 5:09 pm
  Exactly the same first sentence as yours!
  
  Let me check that I really do understand what you’re saying—I’m fairly sure I do though. I haven’t checked, but I’m assuming that you focus on NC^1 because it is closely related to the set of functions of polynomial formula complexity, though if I’ve understood the definition correctly (bounded fanin, logarithmic depth, polynomial size circuits) then it doesn’t seem to be obviously identical. (In my head, it feels as though any NC^1 circuit can be expanded out into a polynomial-sized formula, but that some high-depth formulae cannot obviously be rewritten so as to have logarithmic depth.) But I’m also guessing that proving that some function is not in NC^1 would still be of major interest. So then you’re suggesting that to prove that using a complexity measure, it might be a good idea to do it in two stages. First, prove that some complete problem, such as multiplying $n$ permutations of a set of size 5 together (which can be done in NC^1 if you pair them up and multiply, then iterate that process until you’ve got a tree of permutations, each the product of the two before it) has low complexity according to that measure. Then prove that if you compose a low-complexity function with a function of bounded depth, you still have a low-complexity function. You suggest, but don’t quite say, that every function in NC^1 can be AC^0-reduced to the multiplying-permutations function. So then one would be done.
  
  Even if that’s only approximately correct, what you say looks very interesting, and ties in a little bit with things that the characters say in later instalments of the dialogue. (What I mean here is that they consider computational models where you do one explicit function to scramble things up a bit and follow it by a random function from a rather simple model. The composition appears to end up looking a lot more random than either of the two ingredients that make it up.) And whether or not your approach could be made to work, it focuses the mind on a very interesting question: just what is it that is “simple” about iterated multiplication over $S_5$ ?
- Rahul Santhanam Says:
  September 30, 2009 at 5:39 pm
  Actually, NC^1 is equivalent to having polynomial formula size, but there’s a (standard) trick required to see that. A formula can be thought of as a binary tree, and one can always find a node in this tree such that the size of the subtree rooted at that node is between n/3 and 2n/3 (where n is the size of the tree). This allows us to “re-balance” the formula to depth O(log(n)). And yes, you’re right, every function in NC^1 can be AC^0 reduced to the standard complete problems, such as Boolean formula evaluation or multiplication over S_5.
  
  What makes me more hopeful about dealing with NC^1 rather than P is that regular languages are well understood, and AC^0 is somewhat well understood, so one might expect (naively) that it’s also possible to get a handle on functions that are AC^0-reducible to a regular language. In any case, I look forward to future instalments of the dialogue!
Terence Tao Says:
October 6, 2009 at 5:20 am | Reply
This is somewhat tangential to the main post, but I wanted to mention a related open problem, which is to find faster ways to compute $\|f\|_{U^k(Z/NZ)}$ deterministically. For k=2, the FFT lets one compute this quantity in $O(N \log N)$ arithmetic operations. But for k=3, even with the FFT the fastest I can see is $O(N^2 \log N)$ (by using the recursive formula for the uniformity norms); more generally, the fastest algorithm I know of for the U^k norm is $O(N^{k-1} \log N)$ . It may be that an improvement on this could lead to some further understanding on these norms (historically, there has been a reasonable correlation between the ability to quickly compute an expression, and the ability to control that expression).

Of course, by random sampling (Monte Carlo integration, basically), one can quickly get the most significant digits of the U^k norm (assuming f bounded by 1, say), but this says nothing about the circuit complexity of these norms. I suspect though that the $N^{k-1}$ type bound for the U^k norm can be beaten.
- gowers Says:
  October 6, 2009 at 12:32 pm
  A related question that I find interesting is to find a polynomial-time algorithm for the $U^k$ dual norm. I haven’t completely convinced myself that it’s hard, but I certainly don’t see how to do it. In fact, I don’t even see how to do the following (but again I haven’t sweated over it). If you have a function with $U^3$ norm at least $c>0$ , then you know that it correlates with a quadratic phase function on a Bohr set (or a bracket polynomial, or whatever your favourite quadratic structure is — or look at the equivalent problem for $\mathbb{F}_p^n$ and take straight quadratics). There are superpolynomially many such structures, so one cannot find one in polynomial time by brute force. But can one do so by clever techniques?
- luca Says:
  October 7, 2009 at 3:18 am
  Tim’s question is answered in the case of $\mathbb{F}^n_2$ for boolean functions in this paper:
  
  Parikshit Gopalan, Adam R. Klivans, David Zuckerman: List-decoding reed-muller codes over small fields. STOC 2008: 265-274. (link)
  
  They show that if $f: \{0,1\}^n \rightarrow \{0,1\}$ agrees with a degree-2 polynomial $p$ on at least a $1/2 + \epsilon$ fraction of inputs, then:
  
  1) There are at most $2^{O_\epsilon (n)}$ polynomials $p$ having agreement at least $1/2 + \epsilon$ with $f$ and
  
  2) They can all be found in time $2^{O_\epsilon (n)}$ .
  
  Even the first result is non-trivial because there are about $2^{n^2}$ degree-2 polynomials.
- Ryan O'Donnell Says:
  October 7, 2009 at 2:54 pm
  I think Parikshit generalized some of these results to the $\mathbb{F}_p^n$ case, here:
  
  Click to access Quadratic.pdf
  
  although I haven’t read his paper carefully yet…