A conversation about complexity lower bounds, V

Here is the next instalment of the dialogue. Whereas the previous one ended on an optimistic note, this one is quite the reverse: plausible arguments are given that suggest that the $U^k$ dual norm approach is unlikely to give superlinear lower bounds. A few other ideas are introduced in the process.

********************************************************

😐 Can I make a slightly cheeky comment?

🙂 OK

😐 Well, over the years I’ve heard about a number of approaches that people have had to famous problems, including the P versus NP problem, that have the following characteristic. Somebody who works in a different area of mathematics comes in and claims that the right way of solving the famous problem is to reformulate it as a problem in their area. They then make grand claims for the reformulation, and there’s quite a bit of excitement, though also scepticism from those who have thought hardest about the problem, and ten years later nothing has actually happened. I’m usually a little suspicious of these reformulations, unless they feel so natural as to be almost unavoidable. So the fact that you like additive combinatorics and are trying to reformulate the P versus NP problem (or even just the formula-complexity problem) as a problem in additive combinatorics makes me very slightly wary.

🙂 Yes, I was conscious of that myself, although the $U^k$ norms have already been considered by theoretical computer scientists. It’s another reason I don’t want to make grand claims for this approach. I certainly wouldn’t say that the additive-combinatorial perspective is forced, but I do think I can argue that at least some of the ideas — particularly the duality idea — are quite natural. I think the next step is for real computer-science experts to have a look at them. [This has to a large extent already happened in the comments on the previous instalments of this dialogue.] I hope, 8), that you don’t mind my implication that you are not a true expert — I’m very grateful for the comments you’ve made.

8) No no, absolutely not. I admit that there’s a lot I don’t know in this area, and I could easily have missed some serious objection to your proposals.

🙂 Also, I’ve been thinking a bit about your objection that the $U^k$ dual norm is too much of a guess to be taken seriously. I think I have a heuristic argument in its favour.

8) Ah, that’s much more interesting. What is it?

🙂 I can summarize it as follows. I have the following picture of what a random function of circuit complexity (or perhaps formula complexity — let me be vague about this, since we can always think about it properly later if we want to develop this heuristic argument further) looks like. It springs from a result that appeared in a little known paper by Gowers about 15 years ago. In it, he defined a model of random computable functions, though in fact they were rather special functions because they were reversible, and proved that if you choose a function of complexity $m$ from this model, then it will be approximately $k$ -wise independent, where $k$ depends in a power-type way on $m$ and $n.$

😐 You’re losing me a bit here.

🙂 Let me not bother to say precisely what the model is. But to say that it is approximately $k$ -wise independent is to say that if you choose any $k$ Boolean strings of length $n$ and look at the value of a randomly chosen function $f$ of complexity $m$ , then the values that $f$ takes at those strings is distributed almost uniformly amongst all the $2^k$ possible assignments of values to those $k$ strings. In other words, you cannot tell that $f$ is not a purely random function by just looking at $k$ strings of its output.

😐 So in a weak sense Gowers’s random model was a pseudorandom generator.

🙂 In a very weak sense, yes, but still quite an interesting sense.

8) Did you know that if Gowers had waited a decade or so then he could have had a stronger result for free?

🙂 Eh? He told me he had wondered about stronger results, but got nowhere.

8) That’s amusing, because it has subsequently been proved that almost $k$ -wise independence implies pseudorandomness for circuits of constant depth. This result is discussed by Scott Aaronson on his blog.

🙂 So you’re saying that in fact he defined a generator that was pseudorandom for circuits of constant depth, and provably so. I must ask him whether he knows about that.

Anyhow, all I need for this discussion is the original result. Let us take “random computable function” to mean a function that comes from his model. I should add that Gowers was not in fact the first to define reversible computations, though he rather sweetly appeared to think he was. In particular, they are important in quantum computation. But the result about almost $k$ -wise independence was his. And $k$ -wise independence is enough to imply that the $U^d$ norm is typically small, where $d=\log_2k$ . We’ve noted this already.

What follows is plausible guesswork. I’m going to guess that a random function of complexity $m$ is almost $k$ -wise independent for some $k$ that is comparable to $m$ . In fact, I’ll guess that the $k$ you get is almost as big as the trivial upper bound you’d get by counting how many different functions of complexity at most $m$ there are. And I’ll also guess that if you take $k$ larger than the bound for which almost independence holds, then in some sense the distribution is “maximally dependent”.

In other words, what I’m saying is that if you choose a random function of complexity $m$ , then in a certain sense it should be completely random on sets of size up to around $m$ (including on subspaces of dimension $d=\log m$ , which is what we care about when evaluating the $U^d$ norm), and as structured as it can be, given this “local” randomness, on larger sets.

If that is correct, then how might we detect the structure? Well, a fairly natural idea is to look at a $U^d$ norm for some $d$ such that $2^d$ is bigger than $m$ . The hope is that this would pick up the large-scale correlations that are invisible to lower uniformity norms.

This is supposed to give an answer to the following question. If we are presented with lots of random computable functions, then it makes sense to talk about almost $k$ -wise independence. But what does it mean to look at an individual function and say that “it looks like a $k$ -wise independent function but no more”? At first that seems nonsensical. But perhaps if $2^d>k$ , then it could say something like that the restrictions of the function to $d$ -dimensional subspaces are sufficiently related to each other for some correlation to be picked up by the $U^d$ norm. In other words, our attention turns from correlation between functions to correlation between restrictions of the same function.

8) Hmm, that does make me feel somewhat happier with your idea. But it also suggests to me that there might be other ways of exploiting these large-scale correlations. You haven’t given any particular reason to go for subspaces rather than arbitrary sets of size $k$ .

🙂 True, but you have to admit that subspaces are pretty natural objects.

Actually, wait a moment. Now that I’ve gone to the trouble of explaining my heuristic picture of what a random low-complexity function is like, I’m starting to wonder whether I might have stumbled on a two-line proof that P $\ne$ NP.

8) You haven’t, but go on, let’s hear it all the same.

🙂 Well, the idea is closely related to what I was saying just now. I was thinking of a random function of complexity $m$ as being utterly random if you look at any $m$ (or thereabouts) values, and much more structured if you look at more values than this. But I wanted to think about just a single function rather than a probability distribution based on such functions. However, I can convert a single function into a collection of functions in a very simple way: I just look at restrictions of it.

8) I hope there’s going to be more to your argument than looking at random restrictions, because those have been done to death.

🙂 I’m slightly worried about that, I have to admit, because I don’t seem to be using much more. But just hear me out.

8) OK.

🙂 Well, suppose you’ve got a function $f$ of circuit complexity $m$ . We can convert $f$ into a function of $k$ variables by fixing the values of $n-k$ variables and allowing the other $k$ to vary. It is easy to check that if you do that, then the resulting function will have circuit complexity at most $m$ . Now there are at most $n^{Cm}$ functions of circuit complexity $m$ , whereas the number of Boolean functions of $k$ variables is $2^{2^k}$ . Therefore, if $k$ is big compared with $\log m$ , the number of possible restrictions of $f$ to $k$ variables is tiny compared with the number of Boolean functions of $k$ variables.

We have been searching for a difference between random low-complexity functions and purely random functions. Well, here it is: a random low-complexity function has just a limited number of possible restrictions, whereas a purely random function is completely arbitrary.

8) Hmm, I don’t yet see what’s wrong with your argument, but I do have a proof that it’s wrong.

😐 How can one prove that an argument is wrong without seeing what is wrong with it?

8) That’s not mysterious at all. For example, one can show that it implies something false. But that’s not what I’m going to do here. Rather, I’d just like to point out that the property that 🙂 claims distinguishes between random low-complexity functions and purely random functions is a natural property.

🙂 You mean, not only have I proved that P $\ne$ NP, but I’ve also shown how to factorize large numbers?

8) Er, I think not.

🙂 Cool it, I was joking.

8) You had me there for a moment.

😐 Can you explain why this property is natural? In fact, what property are we talking about?

8) I assume that what 🙂 has in mind is to look at the restrictions of $f$ to $k$ variables and simply count how many of them are different. But the number of restrictions is $2^{n-k}\binom nk,$ which is polynomial in $2^n,$ and checking how many of them are different can clearly be done in polynomial time as well.

😐 Ah, thanks. But doesn’t that also show what’s wrong with the argument? After all, if there are only $2^{n-k}\binom nk$ possible restrictions, then the number of distinct functions you can get is not $2^{2^k}$ but $2^{n-k}\binom nk,$ which is far less. Indeed, it’s far less than $n^m,$ our upper bound for the number of functions of circuit complexity $m.$

🙂 Oops, you’re right. To see in an extreme way why the argument is wrong, one could take the case $k=n.$ Then there is only one restriction of $f,$ and the fact that it is one of at most $n^m$ functions has nothing to do with anything. It just points out the familiar fact that the number of low-complexity functions is much smaller than the number of all Boolean functions. How ridiculous that this argument could have fooled me for even a millisecond. Though I did in fact share your feeling that it was too simple to be correct.

But wait a moment.

8) Here we go again …

🙂 Sorry, but there’s something else to check. What if, instead of looking at restrictions to subspaces generated by $k$ of the basic variables, we were to look at arbitrary $k$ -dimensional subspaces? There are far more of these: indeed, if $k$ is much smaller than $n$ , as it is in our case, then almost any $k$ elements of $\mathbb{F}_2^n$ generate a subspace, and those subspaces don’t coincide all that much. So, speaking crudely, there are something like $2^{nk}$ subspaces. And that … damn, that’s still smaller than $n^m$ if $m$ is something like $n^2.$ But perhaps we could still get a superlinear bound this way.

8) I don’t think so, because now it’s no longer the case that the restriction of a function of complexity at most $m$ has complexity at most $m.$

🙂 Oh yes, you’re right. But perhaps it must still be quite special. How many functions can you get by taking a function of $n$ variables of complexity at most $m$ and restricting it to a $k$ -dimensional subspace?

Hmm, one way of thinking about it is this. Each such function is obtained by taking a Boolean string of length $k$ , multiplying it by an $n\times k$ matrix over $\mathbb{F}_2$ , and applying a function of complexity at most $m$ to the result. But this tells us precisely nothing, since the number of $n\times k$ matrices is essentially the same as the number of $k$ -dimensional subspaces, as we were in a sense using the matrices to define the subspaces. I’m beginning to think that the number of possible restrictions of functions of complexity $m$ to $k$ -dimensional subspaces is going to be essentially maximal.

But in a funny kind of way I regard this as a further justification for the $U^k$ norm approach. After all, it’s clear that if you take a function $f$ of low complexity and look at all its restrictions to $k$ -dimensional subspaces, then there must be some relationship between all these restrictions. That’s meant as a trivial statement: there are only $n^m$ such functions, so once you know some of the restrictions you must be able to deduce quite a lot about the others. For instance, if $2^k$ is a lot bigger than $n^m$ and you choose a random restriction to a $k$ -dimensional subspace, then it might come close to determining the entire function.

8) It wouldn’t determine it exactly, because it wouldn’t distinguish between a function $f$ and a function $g$ that equalled $f$ except at one point. And $g$ would have complexity comparable to $f$ (since you’d just have to add an initial step like, “If $x_i=\epsilon_i$ for every $i$ , then $g(x)=-1$ . Otherwise, …”). But it might usually be enough to determine a function $g$ that agreed with $f$ almost all the time.

🙂 That’s fine by me. All I’m trying to argue here is that there must be some relationship between the various $k$ -dimensional restrictions of $f.$ This relationship may be very complicated: for instance, I was hoping that there might not be very many different possible restrictions, but I no longer believe that to be the case. However, the fact that there is a relationship of some kind suggests that there could be correlations of exactly the kind that the $U^k$ norm (or perhaps, say, the $U^{2k}$ norm) is ready to detect.

😐 Can I just ask something here?

🙂 Of course.

😐 It’s that you seem to be focusing on the $U^k$ norm, when earlier you argued, to me convincingly, that one should look at the dual of this norm.

🙂 Ah yes. Let me make clear once and for all what I’m doing here. I’m focusing on the $U^k$ norm, as you say, because if the $U^k$ dual norm of a function is small, then its $U^k$ norm is big, by the trivial inequality $1=\langle f,g\rangle\leq\|f\|\|g\|^*.$ So if my approach is to have any chance of working, it is necessary for the $U^k$ norm of a low-complexity Boolean function to be large. So I may as well use that as a check. However, it is not sufficient, and if there was a hint that the $U^k$ norm was going to work, then I’d move on to looking at the dual norm.

😐 I see. Thanks.

I also have another worry, and in fact it’s a worry about dual norms. If I understand you correctly, your plan is to prove that every low-complexity function has a small $U^k$ dual norm, and some function in NP — for example, the clique-hiding function, has a large $U^k$ dual norm.

🙂 Yes, that’s exactly right.

😐 Now to prove the latter, you need to show that the clique-hiding function correlates with a function with very small $U^k$ norm. And from the sounds of things, you will prove that by actually exhibiting a function $g$ with which it correlates.

🙂 Yes, that’s also correct.

😐 That function $g$ is really a family of functions, one for each $n,$ so let’s call it $g_n.$

🙂 Fine, if it makes you happier.

😐 Now let’s define a property S to be “has good correlation with $g_n.$ ” Your claim is that the clique-hiding function has property S. Also, since $g_n$ has small $U^k$ norm and low-complexity functions are supposed to have small $U^k$ dual norm, the “trivial inequality” $\langle f,g\rangle\leq\|f\|^*\|g\|$ implies that no low-complexity function correlates with $g_n$ . That is, no low-complexity function has property S.

🙂 Indeed.

😐 But checking whether a function correlates with $g_n$ is obviously something that can be done in polynomial time — you’re just working out an inner product. And $g_n$ looks explicit enough for it to be possible to calculate its values in polynomial time too (bearing in mind that this means in time $C^n$ ). So S looks like a natural property that distinguishes between low-complexity functions and the clique-hiding function.

😦 Oh dear. I think you’ve just killed off the whole approach.

8) Er … I think you are both forgetting something. Far be it from me to defend the approach in general, but at least against this attack I can. To violate the theorem of Razborov and Rudich, you need a polynomially computable property that distinguishes between low-complexity functions and random functions. All this property S does is distinguish between low-complexity functions and one single function in NP.

🙂 Oh yes. Phew!

8) Another way of putting it is this. :|’s property S is not a natural property, because although it can be computed in polynomial time (probably anyway), it fails the largeness condition: if you choose a random function, it is incredibly unlikely to correlate with $g_n.$

But :|’s observation is still interesting, since it shows that if the approach works, then it can be formulated in two ways. One way, the dual-norm approach, gives you a property that is non-constructive but large (to use the terminology of Razborov and Rudich) and the other gives you a property that is constructive but not large. But they’re essentially the same argument.

🙂 That’s interesting. I had the feeling that it was much more promising to try to sacrifice constructivity than largeness, but now it seems that there isn’t much difference. I wonder whether one can identify a general class of properties that can be converted in this way, since the argument seemed quite general.

8) It looks as though it ought to happen a lot. Roughly speaking, if your original property R is large but not constructive, then at some point you need to exhibit a function $f_0$ that does not have property R. You won’t be able to do that by applying some nice algorithm to $f_0,$ but you may be able to come up with a proof that uses special properties of your particular $f_0.$ But in that case, you ought to be able to switch from R to a property S, which is roughly defined as “The proof that $f_0\notin R$ does not work for $f$ as well.” This property is implied by property R, and obviously $f_0$ does not have it. It looks pretty likely to be a constructive property, and if it is, then by Razborov-Rudich it cannot be large. (It also doesn’t look large anyway, given that the proof that $f_0\notin R$ used special properties of $f_0.$ )

😐 I have another worry.

🙂 Fire away.

😐 Well, you admitted that you were going to want to prove that the clique-hiding function had a large $U^k$ dual norm by actually exhibiting a function with very small $U^k$ norm that correlates well with it.

🙂 Yes, that’s right.

😐 But I thought you were saying that Viola and Wigderson’s result was state of the art. Is it reasonable to hope to exhibit any function with small $U^k$ norm, let alone one that correlates with the clique-hiding function?

🙂 Hmm, that does sound more challenging than one would ideally like. But it doesn’t require us to improve on the result of Viola and Wigderson (which, you should recall, we very much don’t want to be able to do) since the function with small $U^k$ norm would not be a function in P. In fact, it wouldn’t even be a Boolean function.

8) OK, I think you escape for now. But there’s a worry I had much earlier in our discussions. We didn’t discuss it much then and I think we badly need to.

🙂 I had a vague feeling that there was some serious problem we hadn’t addressed, but I’m afraid I’ve forgotten what it was.

8) Let me remind you then. Some time ago, I came up with the following argument that suggested that the $U^k$ norm couldn’t work. I suggested focusing on just one $k$ -dimensional subspace $X$ . The parity of the restriction of $f$ to $X$ is a function of $f$ that can be computed in polynomial time in $N$ (indeed, far faster than that — it takes time proportional to $2^k$ ). So if you believe in the general heuristic that a random low-complexity function is indistinguishable from a truly random function, then you have to believe that the parity of the restriction of $f$ to $X$ is even almost exactly half the time.

🙂 Oh yes, and I then argued that “almost exactly half” meant within $2^{-\alpha n}$ of $1/2$ , and that even a discrepancy of $2^{-\alpha n}$ could be picked up by the $U^k$ norm (because one raises to the power $1/2^k,$ and $2^k$ is much bigger than $\alpha n$ if $k=C\log\log n.$

8) Right, but the problem with that is that if $k=C\log\log n,$ then everything gets incredibly close to 1. I mean, $X$ will be degenerate with probability at least $2^{-nk}$ (because it will even be a singleton with that probability) so the discrepancy will be at least $2^{-nk},$ so raising to the power $1/2^k$ will make the norm of a Boolean function incredibly close to 1.

🙂 I think this is something we have to think about seriously, but as I said earlier, we shouldn’t get too worried if the norms of all Boolean functions are incredibly close to 1. For instance, we can always look at the $2^k$ th powers of the norm, and if we do that, then you get a discrepancy of $2^{-\alpha n}$ by your argument, which is far bigger than the trivial discrepancy of $2^{-kn}.$

8) So you are hoping that random polynomial-computable functions will have $U^k$ norms $2^{-\alpha n/2^k},$ while random functions will have $U^k$ norms more like $2^{-kn/2^k}.$

🙂 I suppose so.

8) But what earthly reason is there to suppose that that is the case? Here, in my view, is a far more plausible guess: that the $2^k$ th power of the $U^k$ norm of a random function of complexity $m$ decays exponentially with $m$ . In other words, the $U^k$ norm is around $2^{-cm/2^k}.$ If this is correct, then it will be impossible to prove a lower bound that is better than $\null kn$ .

🙂 Hmm … we should think a bit about this. It feels like an approachable question: given a subspace $X$ and a random function $f$ of complexity $m$ (according to some reasonable model), what is the probability that the restriction of $f$ to $X$ is $1$ an even number of times? If it decays exponentially with $m,$ then that will certainly place a big restriction on what this method can hope to achieve. And it is unfortunately quite plausible that it does decay exponentially, because in that model of Gowers we talked about earlier, the bounds for the limited pseudorandomness that he was able to prove went down exponentially with the complexity. Though (i) I’m not sure whether anything like that could work for formula size (his functions had large formula size), and (ii) even a lower bound of $nk$ would be very good news for circuit complexity.

8) Yes, but I thought we’d convinced ourselves that we couldn’t use the $U^k$ dual norm to say anything about circuit complexity.

🙂 That is a question to which I’d like to return.

😐 Sorry to interrupt, but 🙂 seems to have made a weird mistake.

🙂 Always possible. But what is it this time?

😐 Well, you gave a lower bound of $2^{-nk}$ for the probability that a $k$ -dimensional subspace is degenerate, but in your subsequent heuristic arguments you treated it as an upper bound. In fact, the probability that a random $k$ -tuple generates a subspace of dimension less than $k$ is trivially at least $2^{-n},$ since that’s the probability that the first two elements of the $k$ -tuple are equal. So your hope that it might be possible to use the $U^k$ dual norm to get lower bounds of $kn$ doesn’t look realistic to me after all.

🙂 Damn. You’re right of course.

Except …

8) !@£$%^&*()

🙂 Look, as ever what I’m about to say may be rubbish, but it would be silly just to give up without making completely sure that the approach can’t be rescued. And I’ve just had a small thought in that direction. Any objections?

8) Sorry — I wasn’t actually objecting. I just thought that we had now made completely sure. We seem to have an argument that makes it highly plausible that this approach cannot lead to superlinear lower bounds for anything.

🙂 I’m not sure we’ve completely convinced ourselves that that is the case for formula size. But in any case what I want to propose now is a very slight modification of the approach. So far, we’ve been looking at the $U^k$ dual norm, though for the purposes of testing the approach we’ve in fact been thinking harder about the $U^k$ norm (on the grounds that this has to be large for the dual norm to be small, so we can think about whether it really does seem to be large for low-complexity functions).

Now the argument just given suggests that the largeness we can hope for if we have a function of superlinear complexity is smaller than the largeness we trivially get from the degenerate configurations in the calculation of the $U^k$ norm. But why can’t we just look at the non-degenerate ones?

😐 What do you mean? If we did that then we wouldn’t even have a norm!

🙂 Yes, but who says we need a norm?

😐 It seems pretty fundamental to your approach if you ask me. For example, you’ve made a big thing of the merits of dual norms.

🙂 Ah, I should tell you that norms are not fundamental for defining dual norms.

😐 What? Isn’t that a trivially false statement?

🙂 Not at all. The construction we used to define dual norms is a special case of the following more general construction. Given any bounded set $\Gamma$ that spans an inner product space $V$ I can define a norm on $V$ by the formula $\|f\|=\sup_{g\in\Gamma}|\langle f,g\rangle|.$

So my next suggestion is to let $\Gamma$ be the set of all functions defined on $\{0,1\}^n$ such that you get a small result if you take the usual formula for the $U^k$ norm but restrict the expectation to the sequences for which $a_1,\dots,a_k$ are linearly independent.

I’ve chosen a few details of that definition arbitrarily (for instance, perhaps it would be better to take Boolean functions only). However, the point I want to make is that the smallest possible non-zero value of the “non-degenerate $U^k$ norm expression” (which isn’t in fact a power of a norm) looks as though it is around $2^{-nk}$ rather than $2^{-n}.$ So if it goes down exponentially then we’re in with a chance.

8) Wow, you really are prepared to sacrifice your fingernails to remain hanging on to the cliff aren’t you.

🙂 I agree that this variant of the $U^k$ norm does at first look decidedly less natural than the $U^k$ norm itself. But I know of another context where just outlawing degenerate cases helps in a surprising way, and one could argue that if the degenerate cases are dominant, then it makes sense to get rid of the main term and look at the error term.

😐 But won’t it be extraordinarily delicate to look at the error term?

🙂 Not necessarily. All we’re asking is the following question, which seems to me to be not all especially delicate, though it could be tricky. I’m about to repeat myself, because it’s the question I asked before you pointed out my silly mistake to me. You take a $k$ -dimensional subspace $X$ and a random function $f$ of complexity $m$ and ask what the probability is that the restriction of $f$ to $X$ is 1 an even number of times. The hope is to prove a lower bound of $1/2+C^{-n}.$

😐 But for that you need a model of “random function of complexity $m$ “.

🙂 I know. That sounds pretty hard. But we do have the advantage that at this stage we are just looking for a heuristic argument that will answer the question of whether we can hope for a decay that is at most exponential or whether we would in fact get a faster decay. Also, we could try it out on that model that Gowers looked at. That gives a very concrete question.

😐 Can you tell me in more detail how that model works?

🙂 Certainly. It’s a way of defining an easily computable function from $\{0,1\}^n$ to $\{0,1\}^n$ . It works like this. I shall call a function $f$ a basic operation if it is an invertible operation that depends on just three bits of $x$ . For example, the following operation is basic: if $x_1=x_3=x_8$ then change all of $x_1$ , $x_3$ and $x_8$ ; otherwise do nothing. If you just look at the bits in the first, third and eighth places, then this operation interchanges 000 and 111 and leaves the other sequences as they are. Geometrically, you can think of this as dividing up the cube into eight cubes of codimension 3 and permuting them while keeping their internal stucture intact. Fans of quantum computation will be familiar with these operations: for example, the Toffoli gate is one of them.

And now, to define a random computable function one simply takes a random composition of basic operations. The nice thing about this is that it gives you a random walk, and there are lots of clever techniques for bounding the convergence rate of random walks. And what I find promising is that random walks are just iterations of linear maps, so they definitely don’t converge faster than exponentially. At this stage, that’s just a vague heuristic, but it gives me hope.

8) Why?

🙂 Well, let me try to use this model to create a problem that’s relevant to our discussion. We don’t want a function from $\{0,1\}^n$ to $\{0,1\}^n$ but rather a function from $\{0,1\}^n$ to $\{0,1\}.$ So let’s just use the model and at the end just look at the output of the first coordinate.

If we do that, then what we are interested in is the following question. Let $X$ be an arbitrary $k$ -dimensional subspace of $\{0,1\}^n$ . (All we are doing when we make the non-degeneracy assumption is insisting that $X$ has full dimension — it seems pretty innocuous to me.) Now do a random sequence of basic operations. These have the effect of chopping up $X$ — it’s a bit like playing with a huge-dimensional Rubik’s cube and watching a nice little patch of colour get spread all over the place. At first, there will always be an even number of points of the chopped up $X$ inside the subspace $x_1=0$ . But with enough chopping you can get the number to be odd. And the question is, after you’ve been going for a superlinear length of time, how quickly does the probability that the number of points is even tend to 1/2? In particular, is it bounded below by an exponential decay?

If it is, then that would give us reasonable grounds for guessing that perhaps it is impossible to avoid a bias of at least $2^{-Cm}$ if you take a function of complexity at most $m$ .

8) A quick question. Would that property distinguish between random low-complexity functions and purely random functions? How much bias do you expect in the purely random case?

🙂 Help — yes, that’s important. Well, when we choose a random function $f$ we get to choose $2^n$ values. By how much does the average parity of the restriction of $f$ to a random $k$ -dimensional subspace $X$ change if you change a value of $f$ ? Well, the probability that a random subspace contains the sequence for which you’ve just changed the value is $2^{k-n}.$ Oh dear — that shows that one cannot hope for a concentration that’s even stronger than exponential.

8) OK, now your approach really is dead isn’t it?

🙂 Unfortunately it does seem to be. I suppose it’s just conceivable that one could prove that a specific $f$ in NP correlated with a function with better than exponential uniformity. But then we would be looking at a property that did not apply to random functions. In fact, it wouldn’t have either of the two properties required of a natural proof. In particular, it wouldn’t distinguish between random low-complexity functions and random functions, which is what I’d been hoping to do.

8) Just to depress you further, I think we should revisit the question of whether the approach is naturalizable. If it is, then not only will the approach have failed, but it will have done so for an uninteresting reason.

🙂 Hey, that sounds fun.

8) If we go back to my random sampling idea, armed with the new thought that we can’t hope to get smaller than exponential bias in the formula for the norm, then it would seem that we do not need to sample more than exponentially many random subspaces in order to find such bias as there is. And exponentially many subspaces means polynomially many in $N=2^n.$

🙂 I see your point, but I think it’s not really a question of my proof naturalizing, since I was looking for a bias that was even smaller than exponential, but simply of being wrong, because random functions do not have bias that’s as good as that.

I’m sufficiently chastened by my experiences so far that I’m not going to hope that we’re hitting some fundamental new obstacle to proving complexity bounds. I basically agree with you that Razborov and Rudich can explain the difficulties I’ve just had. But I think it might be interesting to understand as well as possible what the consequences of their obstacle are. For instance, if one can prove a reasonably general result that says, “Every proof of such-and-such a kind naturalizes,” then one is not discovering a fundamental new obstacle, but one might still be performing a useful service to people who want to prove that P $\ne$ NP.

Another project I quite like is to find heuristic arguments for the impossibility of certain sorts of proofs. If one is searching for a proof, then a convincing nonrigorous argument that certain kinds of proofs don’t exist can be a very good reason not to look for such proofs. Of course, a certain caution is needed, as heuristics can be wrong, but they are still useful. So, for example, I would like to know what follows from the assumption that random functions in Gowers’s model are indistinguishable from genuinely random even permutations of $\{0,1\}^n$ . More precisely, I’d like to think about that exponential decay that we seemed to encounter. For what functions do we expect that sort of decay? And what properties are ruled out if it occurs?

This entry was posted on October 11, 2009 at 2:40 pm and is filed under complexity. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

7 Responses to “A conversation about complexity lower bounds, V”

luca Says:
October 12, 2009 at 12:16 am | Reply
At some point, the dialog assumes that the number of functions of circuit complexity at most $m$ is upper bounded by $n^{O(m)}$ . The bound, however, should be $m^{O(m)}$ ; heuristically, if we count the number of distinct circuits of size $m$ , then for each of the $m$ gates we need to specify from which of the other $m$ gates the two inputs are coming from. Rigorously, it has been shown that each of the $2^{2^n}$ functions is computable by a circuit of size at most $O(2^n/n)$ .
- gowers Says:
  October 12, 2009 at 8:36 am
  Oh yes — that was careless of me. I had got used to $n^{O(m)}$ because it was correct when $m$ is polynomial in $n$ and then allowed my brain to switch off.
boazbk Says:
October 12, 2009 at 6:11 am | Reply
In the context of counting arguments its probably simplest to think of circuit size as the number of bits it takes to describe the circuit. (Although indeed in this case one should suspect any lower bound that’s smaller than $n \log n$ .)

Regarding your conjecture that random invertible circuits of size S yield pseudorandom functions, I remember asking Sasha Razborov what he thinks about it, and he said that he believes that this or similar constructions can yield pseudorandom functions more than he believes that integer factoring is hard. Indeed I agree with the intuition that in the right model random functions of small complexity are hard to invert and perhaps even pseudorandom (there are also such candidates that are very shallow functions, where each output bit only depends on a constant number of the input bits, though in this case they can only be pseudorandom up to a point).

Indeed, that model of invertible circuits is interesting in a way similar to constant depth threshold circuits (TC0). On the one hand, it’s a very limited model and it seems hard to compute very simple functions in it. On the other hand, it’s rich enough so it plausibly contains pseudorandom functions (in this particular case pseudorandom even permutations) and hence any lower bound for it will have to be “un-natural”. So these models look like a reasonable place to try to first cross the natural proof barrier.

–Boaz Barak
- Gil Kalai Says:
  October 13, 2009 at 2:41 am
  “he believes that this or similar constructions can yield pseudorandom functions more than he believes that integer factoring is hard.” Just to make sure that I follow: these two beliefs are not in conflict but sort of in a similar direction right?
- gowers Says:
  October 13, 2009 at 8:15 am
  Gil, unless I’m missing something the answer to your question is a simple yes. I find this piece of information about Razborov fascinating for two reasons. One is that I completely agree with it but had never heard the view expressed by anyone else. The other is that it shows that the strength of one’s belief in a complexity hypothesis of that kind does not have to be determined by its status as one of the “official” hardness beliefs (since if Razborov’s aren’t then it cannot be compulsory).
- boazbk Says:
  October 13, 2009 at 2:36 pm
  First a disclaimer: this is based on what I remember from 2001, so I may not be representing Sasha’s beliefs perfectly accurately. Let me speak for myself: I’ll be more surprised if there is no way to generate a one-way function using some kind of a simple combination of random local operations (am not committing to Gowers’s particular model) than if there exists a polynomial-time algorithm for factoring.
  
  To answer Gil’s question: both beliefs (factoring is hard, inverting random circuits is hard) imply P different from NP and even that one-way functions exist (and in factoring’s case even public key cryptography). So, in that sense they can be said to be in the same direction.
  
  But there is another sense in which they are in a different direction, which is that they have different reasons why they may be true.
  
  The reason that most people believe that factoring is hard is empirical: people have tried *very* hard for a *long* time, and couldn’t do it, even not come up with a heuristic algorithm that’s just conjectured to work.
  
  Note however that the empirical evidence does include non-trivial algorithms (classical and quantum) that took some time to discover (though the best classical running time hasn’t been improved in about 20 years). Perhaps some people also have number theoretic reasoning why a better algorithm shouldn’t exist- I don’t know of such reasonings but that’s not saying much.
  
  The reason one believe assumptions of the other nature is a combination of both empirical and “philosophical” considerations. The empirical evidence is weaker than factoring, but still one can look at these constructions as similar in nature to problems such as breaking block ciphers, and solving random constraint satisfaction problems (e.g. random 3SAT) that people have worked on.
  
  The “philosophical” considerations are based on a bold assumption that we have some understanding of the nature of computational hardness, and that for such combinatorial problems there is some kind of threshold that either they can be easily solved using known algorithms (e.g. the solution space is convex) or they can’t be solved at all better than brute force. You can say that predictions made by this intuition have been empirically confirmed (e.g. NP hardness, hardness of approximation, (partial) dichotomy theorems).
  
  The philosophical considerations are to some extent reasons why one would believe that an algorithm doesn’t exist, and is not just very very hard to find.
  
  I don’t know if there is a set of “official” hardness beliefs – in cryptography we invent new assumptions all the time. There are just hardness assumptions that have more empirical evidence behind them.
  
  –Boaz Barak
Stones Cry Out - If they keep silent… » Things Heard: e89v1 Says:
October 12, 2009 at 11:55 am | Reply
[…] Some academic questions tackled. Ought/is and complexity. […]