A conversation about complexity lower bounds, VIII

In this next instalment, our characters discuss the norm you get by looking for the best possible correlation with a quadratic phase function. They end up discussing a heuristic argument that might, just conceivably, show that this norm is one of a wide class of norms that cannot possibly give rise to superlinear lower bounds. Along the way they have several thoughts, some of which are quite interesting, some not interesting at all, and some plain wrong. (The more interesting ones are mostly later on in the instalment.)

*******************************

🙂 Last time we met, I came to the understanding that if you build a norm by means of a formula of the form \|f\|=\max\{|\langle f,g\rangle|:g\in\Gamma\}, then there are two properties that \Gamma might have that will give the norm an outside chance of being a useful quasirandomness norm for proving nontrivial lower bounds. The first is that the cardinality of \Gamma should be superpolynomial in 2^n (or else there is a trivial polynomial-time algorithm for working out the norm). The second, which implies the first, but which I prefer to think of as a separate property, is there should not be some clever polynomial-time way of working out the norm—which, when \Gamma has superexponential size would require one to exploit special properties of the particular set of functions.

As I see it, if you take \Gamma to be the set of all quadratic phase functions, then you get a set that definitely has the first property and could well have the second. So I want to go back to thinking about this quadratic-correlation norm. Earlier I convinced myself that a random low-complexity function should not correlate with any quadratic phase function. But if for any fixed quadratic phase function I can get only an exponentially small probability of a huge correlation, and if there are superexponentially many quadratic phase functions, then perhaps we need to revisit this statement. Is it conceivable that every function of linear circuit complexity correlates quite heavily with a quadratic phase function?

8) Why should that happen?

🙂 I have absolutely no idea. In fact, I don’t think it does happen. But I’d like a convincing (even if nonrigorous) argument to that effect.

😐 Why do you think it doesn’t happen?

🙂 Basically because there seems to be no reason for it to happen. But here’s what happens when I try to argue that it doesn’t. A first thought is to take a cubic polynomial. But I now see that that gives us two problems: how do you find a suitable cubic that you can calculate in linear time, and how do you prove a bound that’s better than N^{-1/4}? The second question may have an answer but I’d be surprised if people can get down to N^{-1/2+\epsilon}.

😐 Yes, but you said you just wanted a heuristic argument: is there perhaps a plausible guess for what the correct bounds should be for how little cubics can correlate with quadratics (which is just the same as asking how small the average of a cubic phase function can be)? Or a similar question: if you take a random set \mathcal{A} of 1000n triples \{i,j,k\} and define a cubic polynomial to be p(x)=\sum_{\{i,j,k\}\in\mathcal{A}}x_ix_jx_k, then what is the expected size of \mathbb{E}(-1)^{p(x)}?

🙂 I’m afraid I don’t know the answers to these interesting questions. But let me point out that if you want to take a random cubic like that, then you’ll need to take the maximum over all quadratic correlations. That is, you’ll need to prove that the cubic phase function \kappa(x)=(-1)^{p(x)} has some property that causes it not to correlate with quadratic phase functions. And I don’t know of a property that gives really good bounds for that.

So let me turn to the other suggestion, which is to take a random function that’s computable in linear time.

😐 You mean with that clever pre-processing step?

🙂 Ah—as it happens, I’ve thought of a different model that seems better. It doesn’t use the pre-processing step and it doesn’t involve the worryingly low depth that 8) was complaining about.

What you do is this. Let’s call the basic functions and their negatives level-0 functions. First create from the basic functions a collection of 2\alpha n level-1 functions by picking random pairs of level-0 functions and randomly applying \wedge or \vee to them. Here, \alpha is a constant less than 1: I like to think of it as 0.99. From these, create 2\alpha^2 n level-2 functions, again by picking random pairs and randomly applying either \wedge or \vee. Continue in this way, until you’re down to just one function. The total number of functions you create is linear in n (since it is the sum of a GP that starts with n and has common ratio \alpha) but the depth of the circuit is \log n, all the coordinates make a big difference, and so on.

😐 Surely with high probability at least some of the coordinates will be missed out when you go from level 0 to level 1.

🙂 Oh yes. Well, one could deal with that by just making sure that all the functions are used at least once when you go from one layer to the next. Incidentally, one could let the layers increase in size for a short while to begin with before then decreasing geometrically.

8) You’re right, I prefer this model to your previous one, but I have a semi-serious objection to it. By that I mean that it’s a serious objection but I think there will be a way of dealing with it.

🙂 It’s good to know that in advance. What is the objection though?

8) It’s that your final function is obviously not going to be pseudorandom.

🙂 Why not?

8) Because if your model creates pseudorandom functions, then presumably the two functions you get in the penultimate layer will be pseudorandom, and one would hope that they would also be “independent” in some sense (since if not then applying a Boolean operation to them could produce a highly non-random function). But if your final operation is \wedge, say, then applying it to two independent pseudorandom functions gives you a function that takes the value 1 about 1/4 of the time instead of about 1/2 the time. This problem also applies to your earlier model of a random formula. I have to admit that I didn’t notice it myself, but was told about it by Luca Trevisan. He suggested using the basic operations from the Gowers model instead.

🙂 Well, that shouldn’t create too many problems, since we were using them in our previous model of random functions of linear circuit complexity. Let me have a go at defining something.

We start with a string of n bits. We then apply a random sequence of \alpha n basic operations (that is, permutations that act on just three bits at a time, as before). Then we randomly restrict to a set of bits of size \alpha n. And then we repeat.

In fact, I’d rather do 10n basic operations before randomly restricting to a set of size m=\alpha n. Then we can do 10m basic operations on just those m bits before randomly restricting further, to a set of size \alpha^2n bits. And so on until we get down to a constant number of bits, at which point we could just choose one.

😐 Does that really avoid the problem that Luca Trevisan pointed out?

🙂 Yes. The big difference between this model and the previous one can be summed up as follows: the basic operations take random functions to random functions. If f and g are random functions, then f\vee g will not be random, but if f is a random even permutation of \{0,1\}^n, then the composition of f with one of the basic 3-bit permutations will also be a random even permutation.

If you like that model, then you’ll agree that the following question is quite a good formulation of what I was asking just now: if you generate a function randomly in that way, will it necessarily correlate significantly with a quadratic phase function?

😐 I can see that it might be hard to prove that the answer is no, but don’t you at least have a heuristic argument that the answer should be no?

8) Can I chip in with what is either rubbish or an amusing observation?

🙂 Of course.

8) Well, if you managed to prove that every random linear-complexity function correlated significantly with some quadratic, then, as you point out, you’d have a property that distinguishes between linear-complexity functions and purely random functions. Now Razborov and Rudich tell us that such a property cannot be computable in polynomial time. However, the property in question certainly belongs to NP (since once you know which quadratic phase function you’re supposed to correlate with, it is easy to check that you really do). So in order to get this approach to work, wouldn’t you have to prove that P\neNP?

🙂 If that argument is correct, then it’s weird. It shows that an attempt to obtain merely superlinear lower bounds would prove superpolynomial bounds for a different problem. I can’t get my head round this.

😐 But I thought you said that Razborov and Rudich’s result was conditional on the existence of a strong hypothesis in complexity.

8) Ah yes. The argument I gave would rely on the existence of a certain sort of strong pseudorandom generator, which exists if factoring is hard. So one would have a proof that either factoring is easy or P\neNP. But that’s trivial: if P=NP then factoring is easy.

🙂 I’m wondering if there is a non-trivial statement one can make here. What your argument would show is that either factoring is easy or the quadratic correlation problem—let me call it QC—cannot be solved in polynomial time. But I don’t see any reason to suppose that QC is NP-complete, so although the intractability of QC would imply that P\neNP, the converse is far from obvious. So we’d have a proof that the hardness of factoring implies the hardness of QC. Or to put it another way, if we can solve QC then we can use that to factor quickly. That doesn’t seem trivial to me.

Of course, all this is so utterly conditional on statements we don’t know how to prove and don’t even necessarily believe that it is not all that exciting.

😐 Which statements?

🙂 Well, it depends on a proof that linear-complexity functions all correlate with at least one quadratic phase function, which I don’t have any reason to believe, and it then depends on finding a clever algorithm for QC, which I am at best neutral about. (An argument in favour of such an algorithm existing is that it exists for linear correlations—take the Fourier transform—and there are at least some quadratic analogues of linear Fourier analysis. But those analogues are far far weaker than what would be needed, and for correlations as small as the ones that would interest us it is not clear that anything can be said at all.)

😐 You say that you don’t have any reason to believe that every linear-complexity function correlates substantially with at least one quadratic phase function. But so far you haven’t given us any reason to believe the opposite either.

🙂 I know. So far all I have to say in that direction is that a surprising event like a quadratic correlation ought to happen, if it happens, for a reason. And I don’t see a reason. It’s a bit like most people’s belief that \pi is normal because there is no earthly reason to suppose that the digits of \pi would do anything special.

😐 But for that analogy to work, you need to convince me that correlating a bit more than a random function would with just one out of the 2^{cn^2} quadratic phase functions counts as “doing something special”. Perhaps it’s just what one would expect.

🙂 I think I see what you’re getting at, and it’s similar to the discussion we were having earlier. There are n^{Cn} functions of linear circuit complexity, so no probabilistic statements we make can hope to be accurate to better than n^{-Cn}. Furthermore, we actually expect them to be accurate to no better than 2^{-Cn}. If we believe strongly enough that detecting small quadratic correlations cannot be done in polynomial time, then we might argue that there is basically nothing we can say about the probability of such correlations occurring: the probability would be much smaller than exponential for a purely random function, but for pseudorandom functions we cannot say anything has a subexponential probability, so it could be that every linear-complexity function correlates with some quadratic phase function.

But if that were the case, how on earth might it be proved? One approach might be to show that each time you do a basic 3-bit operation, in some sense you multiply by a constant the number of quadratic phase functions that you no longer correlate with. Since there are 2^{cn^2} quadratic phase functions, it might even take quadratically long to end up correlating with none of them.

8) That sounds worryingly optimistic.

🙂 I agree, so let me try to think about what I even meant in that heuristic argument, which as it stands is nonsense of course.

Let f_1,f_2,\dots be a random sequence of functions, each produced from the previous one by using a random 3-bit operation (with occasional random restrictions to a fraction \alpha of the bits, as above). Let us also fix a Boolean function g. My guess is that the correlation of f_r(x)_1 with g(x) will behave a bit like a random walk with drift: that is, it will multiply or divide by a constant amount at each step, but the expectation of the log of the factor is negative rather than zero.

Now if one could say that all these random walks (for different quadratic phase functions g) behaved reasonably independently of one another, then one might guess that after Cn steps, an exponentially small fraction of them had done stupid things like never really getting any smaller. But that “reasonable independence” assumption looks decidedly shaky.

8) I think we ought to separate out two ideas here. The first is an attempt to prove superlinear circuit lower bounds by first showing that every function of linear complexity correlates heavily with some quadratic phase function. Of course, then one would have to find a function that did not correlate heavily with any quadratic phase function, which could be hard. Indeed, it might require one to improve on the result of Viola and Wigderson.

But for the discussion we have just been having about the hardness of QC, we didn’t need every function of linear complexity to correlate with a quadratic phase function. All we needed was for a substantial fraction of them to do so. After all, virtually no random functions do, so that would give us a way of distinguishing pseudorandom from random, which in turn would prove that either QC is hard or factoring is easy.

😐 I’m slightly troubled by that last step. Wouldn’t one need to use Razborov and Rudich’s model of random low-complexity functions if one wanted to say anything about factoring?

8) Oh dear, you’re right. I think the conclusion one would reach is not interesting after all. So perhaps one would indeed need to prove that every function of linear complexity had a large quadratic correlation.

🙂 I think it is nevertheless helpful to separate out the two problems. I’d be very interested in a proof that almost every function of linear circuit complexity correlated quite well with at least one quadratic phase function. It would at least show that there are some differences between random low-complexity functions and genuinely random functions. And perhaps it would give a clue about how to change “almost every” to “every”.

😐 Unless it was true for “almost every” but false for “every”.

🙂 Fine, but we’d still have the random/pseudorandom distinction, which would be pretty nice I think.

8) I’d like to give a possible heuristic argument that goes in the opposite direction to yours.

🙂 OK, that sounds interesting.

8) Well, I was convinced by your random-walk-with-drift idea when it came to individual correlations, but I think you are right to be suspicious of the independence.

🙂 Do you have some specific reason for that?

8) Yes. In fact, when I think about it a bit more I realize even the random-walk-with-drift idea could be wrong. I think what was in the back of your mind was that when you apply a basic 3-bit operation, it will usually chop things up, but it might put back together again something that you have already chopped up.

🙂 Something like that. In an extreme case it might even invert the previous basic 3-bit operation.

8) Right, but that happens with probability Cn^{-3}, and it seems to me that in fact the tendency to get more complicated is very strong, and the “stupid” examples will occur with probability n^{-cm} rather than e^{-cm}. And they will be very stupid examples that do things like inverting previous operations a lot.

If that is correct, then perhaps one could prove that every composition of basic 3-bit operations that obeys a few simple rules, such as not permuting the same set of k bits too many times in a row (which would allow cancellation to take place), will end up failing to correlate with g, independently of which quadratic phase function g one has chosen. Perhaps it just becomes “less quadratic”.

🙂 You’re beginning to convince me, or at the very least to weaken my already weak belief that this quadratic correlation idea could work.

But if your heuristic argument is correct, can we turn it into a much more general barrier? After all, it seems to be defeating a proof that is not obviously naturalizable.

8) I think you’re jumping ahead a little fast. To get my counterargument to work one would have to understand what it was about quadratic phase functions that caused “sensible” products of basic 3-bit operations to correlate less and less with them. Only then could one think about whether other classes of functions might have a similar property. But I agree that it would be interesting if we could identify a class of NP properties that were unlikely to be usable in any proof that P\neNP.

😐 Before you dive into that, could you explain more clearly what you mean by a “sensible” product?

8) I can try. What I’m trying to formulate is a sort of non-triviality condition on products of basic 3-bit operations. For technical reasons I’m going to restrict my attention to just one 3-bit operation, which works like this. You choose an ordered triple (i,j,k) of distinct elements of \{1,2,\dots,n\} and then given an n-bit sequence x you look at the triple (x_i,x_j,x_k). If this equals (0,0,1) then you change it to (1,1,0) and if it equals (1,1,0) you change it to (0,0,1). Otherwise, you leave it alone. For example, if n=5 and (i,j,k)=(2,3,5), then the sequence (1,0,0,0,1) maps to (1,1,1,0,0), while the sequence (1,1,0,1,0) maps to itself.

I haven’t checked, but I’m fairly sure that these operations are enough to generate all even permutations of \{0,1\}^n, and if they do then they will do so in a rapidly mixing way.

😐 They clearly don’t generate all even permutations of \{0,1\}^n.

8) Oh, really? Why not?

😐 Because the all-1s and all-0s sequences are fixed points of all those transformations.

8) Oh yes, of course. Well, even so I’m pretty sure they generate a very large group—possibly even all even permutations that fix those two sequences. Let’s not worry about that too much for now.

Instead, let us ask ourselves how cancellation can occur when we take a product of these generators. One way is trivial: if cancellation would have occurred even in the free group, then it occurs here. In other words, this is the cancellation you get just by removing inverse pairs. But there is also a less trivial form of cancellation that follows from the fact that the group is finite: we know that if we take all products of m generators, then when m is large enough we must have a lot of equalities. So we can take two products of generators that give you the same permutation and multiply one by the inverse of the other. In that way you get very random looking products of generators that miraculously cancel. However, the miracle depends on the products being very long indeed. To be more precise about it, the number of generators is Cn^3 and the number of even permuations of \{0,1\}^n is 2^{cn2^n}, so to start getting lots of coincidences you need to multiply at least cn2^n/\log n generators together.

However, this observation leads to an intermediate way of obtaining cancellation, which is to take a smallish integer k and look at products of generators that act on some fixed set of k bits. Now if you want to get coincidences, you only need to multiply ck2^k/\log k generators together.

😐 Right, but what’s the point of these observations?

8) I’m coming to that. One might make the bold conjecture that the cancellations I have just talked about are in some sense the only ones that can occur. And in fact the first type (cancelling inverse pairs) is the special case of the second in which you take k=3.

Loosely, one could state the conjecture as follows: the only way of getting the identity as a product of these 3-bit operations is by means of cancellations that arise when you stick around in some fixed set of bits for a long time (relative to the size of that set of bits).

Turning that round, the statement would be that as long as you spread the operations around enough then you are guaranteed to produce a non-trivial element of the group A_{2^n}. And one could also propose a stronger heuristic principle that the function you get is maximally complex, in a sense that I haven’t worked out how to make precise. But a consequence would be that, for example, if \pi is a product of m 3-bit operations that obeys the non-triviality condition, then the correlation of the first bit of \pi(x) with any fixed quadratic phase function decays exponentially.

😐 Don’t you have the problem that the first bit of \pi(x) might not depend on all that many of the basic operations you use?

8) Yes. I think what I mean is something like that the decay is exponential in the number of basic operations that actually make some contribution to the answer. I’ll have to think how to formulate that properly.

Before I do that, I want to make an important remark, which is that two basic 3-bit operations that act on disjoint sets of bits commute with each other. So the “trivial” cancellations I’m talking about don’t necessarily happen between consecutive terms in the product.

Here’s a way I like to visualize it. Let’s imagine we have an array of n dots, one for each bit. We now start applying basic operations. Associated with each operation is a subset of \{1,2,\dots,n\} of size 3. Let A_1,A_2,\dots,A_m be the sequence of all these subsets, and form a graph inductively as follows. The vertices of the graph are ordered pairs (i,u) of integers. You start with the graph G_0, which consists of n isolated vertices (1,0),\dots,(n,0). (These are the n dots I just mentioned.) Then if A_1=\{i,j,k\} you add in vertices (i,1),(j,1) and (k,1), joining all three of them to all three of (i,0),(j,0) and (k,0). That gives you the graph G_1. And you then continue this process. If you’ve got G_{r-1}, and A_r=\{i,j,k\} (not the same \{i,j,k\}), then you add new vertices (i,u),(j,v) and (k,w), choosing for u,v and w the smallest integers that have not yet been used (so u will be the number of s\leq r such that i\in A_s) and joining all of (i,u),(j,v) and (k,w) to all of (i,u-1),(j,v-1) and (k,w-1). Let me also imagine that these edges are directed downwards. And let me say that the index of a vertex (i,u) is i.

Given any set of vertices I can follow the paths down from that set. I call the graph non-trivial if I can’t find some small set of k vertices such that when I follow the paths downwards I go a very long way without involving any new indices. And if the graph is non-trivial, then I say that the product of generators was non-trivial.

One can look at this as a conjecture that there is a sort of pseudo group homomorphism. Whenever I do have a bunch of generators that sticks around in the same place for too long (possibly after commuting them with other generators to get them together) then I’m allowed to cancel them. So I’m imagining a non-existent group where the generators are sets of size 3 and the relations are that disjoint sets of size 3 commute and all (rather than just some) large bunches of sets of size 3 that are subsets of a fixed set of size k cancel.

🙂 Am I right in thinking that you’re introducing these thoughts in order to try to come up with some kind of heuristic principle that would suggest that norms like the quadratic correlation norm will not help us?

8) Yes, I was going to return to that, though I’m not sure I know how to do it.

🙂 Well while you’re thinking about it, let me float another idea, which is that if what you say is correct, then perhaps it could be used to prove lower bounds.

8) Eh? I was trying to use it to show that it was difficult to prove lower bounds.

🙂 Yes, but if what you say is true, then it seems to establish a very strong principle that says that you can’t compute the same function in two essentially different ways. So perhaps one could prove a lower bound by finding an NP function and a method of computing it that didn’t “collapse” when you “mapped it to your imaginary pseudogroup”.

8) Oh. I see what you mean. Of course, I was talking about reversible computations, but I don’t know whether I was relying on that. Perhaps I could have just added some extra “rubbish” bits and argued in more or less the same way.

I’m inclined to take what you say not as a promising avenue for proving complexity lower bounds, but rather as a strong indication that my “bold conjecture” is far too bold. After all, it would seem that it is often possible to compute the same function in radically different ways. For example, to work out a^{6^k} mod p I could start with a, cube it k times and then square that k times, or I could do the squaring first and then the cubing. That looks like two genuinely different ways of working out the function.

It now looks to me as though the word problem in A_{2^n}, with 3-bit operations as generators, is going to be fantastically difficult. Of course, nobody would expect it to be possible to determine in polynomial time whether a given word gave you the identity, but I was hoping for something much less: finding a tidy sufficient condition for a word not to be the identity. But it seems that even this is far too much to ask.

🙂 Are you sure? Wouldn’t it be the case that in order to do sensible computations like raising a to the power 6^k mod p you have to do quite a lot of “sticking around in the same small set of bits”? Perhaps your conjecture is true, and our impression that there are often genuinely different ways of computing the same function is an illusion, because we don’t know of any “non-collapsing” computation techniques. Maybe if you’re very strict about spreading the sets A_i about, then you’ll only ever get very random-looking functions.

8) I see your point. I now don’t know what to believe.

🙂 Neither do I but I think it’s a nice question. Is there some sense in which computation takes place in an “almost free” group? Roughly speaking, this should mean that two circuits cannot give the same function unless they operate on similar sets of bits in a similar order (in so far as the order matters).

😐 I think I have an argument against that.

🙂 Oh really? What is it?

😐 It’s that if you take two unsatisfiable bunches of clauses you could create a function that’s identically zero in two different ways. And you could choose those sets of clauses randomly—if you have a reasonable number of them then they will give you unsatisfiable formulae.

🙂 Oh yes, I was a little careless there. But I think the question still makes sense if we look at basic 3-bit operations rather than AND, OR and NOT gates and consider whether two essentially different products can be equal.

😐 But I thought you said you could simulate AND, OR and NOT gates using basic 3-bit operations.

🙂 You can, but in order to do so you have to introduce extra “rubbish” bits that start out as 0s and end up scrambled in some peculiar way that depends on the computation you carried out. So with your example of two different ways of calculating a function that’s identically zero, the difference would show up in the rubbish bits.

😐 I see.

🙂 So I think the bold conjecture, if we can really call it a conjecture, is still alive, but one has to understand that it is very much a conjecture about reversible computations. Unfortunately, I don’t know enough about combinatorial/geometric group theory to know how to tackle such a problem: that is, a problem where one is trying to prove that a word is non-trivial, using certain features of how the word is put together. Does anyone know of any results of that kind?

8) I don’t.

😐 Neither do I, I’m afraid.

🙂 In that case, can I float past you two more ideas that have occurred to me?

8) Of course.

🙂 Well, you wanted to prove that “sensible” products of basic 3-bit operations would give you functions that became “less and less quadratic”. I interpret that as follows. If \phi_m\dots\phi_1 is a “sensible” product of m basic operations and we write f_r=\phi_r\dots\phi_1, then the correlation of f_r with any quadratic decreases exponentially with r until it becomes almost minimal.

8) I don’t know that I actually believe that, but it’s roughly the hypothesis that I was entertaining in order to cast doubt on your argument that functions of linear circuit complexity might magically correlate with quadratic phase functions.

🙂 The first thought is that it might follow from the “bold conjecture”. I’m going to oversimplify a few details here, but the bold conjecture has as a consequence that no “sensible” product of basic transformations can ever compute a function that can be computed by a “collapsing” product. Since the only computations we can think of tend to involve sticking around for a long time dealing with small sets of bits (though perhaps I am wrong about this — for example, does the PCP theorem give rise to “weird” computations that can be described by means of “sensible” products?), that would imply that no “sensible” product can ever give us any of the functions that we know and love.

8) At least if they can be computed reversibly, which is a big restriction.

🙂 Yes, you’re right. But it suggests that if we could find a reversible way of creating a quadratic polynomial, then maybe it would be reasonable to conjecture that a “sensible” product of basic transformations would automatically not correlate with it.

8) Aren’t you confusing permutations of the cube with Boolean functions defined on the cube?

🙂 Yes, but that was deliberate — it was the detail I was leaving out. I don’t know the best way of formulating a conjecture along these lines, but maybe if one could reversibly create a permutation that was quadratic in each coordinate, or something like that, then one could get the argument to work.

The big problem is that proving the general principle that the 3-bit operations create a group that is in some sense “free apart from the obvious kinds of relations” looks pretty hard. But my other thought was that we could at least have a look at what happens in the linear case, because I think it is reasonably straightforward to see what a basic 3-bit operation does to the Fourier transform of a function.

😐 How do you mean?

🙂 Well, a basic 3-bit operation is a permutation of the cube \{0,1\}^n, but we can also think of it as a linear map that takes functions defined on the cube to functions defined on the cube. We can then change basis from the delta functions at each vertex to the Walsh functions. That is, we see what happens to the Fourier transform.

😐 I think I’d like to see this in a bit more detail.

🙂 All right. Let w_A stand for the function x\mapsto (-1)^{x.A}, where I have written x.A for the parity of x in A. (In other words, it’s 1 if x has an odd number of 1s in A and 0 if it has an even number of 1s.) Now let’s take as an example the basic transformation \phi that acts on the bits x_i,x_j and x_k and interchanges 000 and 111, leaving x fixed if it is not the case that x_i=x_j=x_k. The way that \phi acts as a linear map is via the formula \phi(f)(x)=f(\phi^{-1}(x)), which equals f(\phi(x)) since \phi is its own inverse.

What is the expansion of w_A(\phi(x)) in terms of Walsh functions? By definition it is \mathbb{E}_xw_A(\phi(x))w_B(x), which equals

\displaystyle\mathbb{E}_x(-1)^{\phi(x).A+x.B}=\mathbb{E}_{x_1,\dots,x_n}\prod_{t=1}^n(-1)^{\phi(x)_ta_t+x_tb_t},

where we are setting a_t=1 if t\in A and 0 otherwise, and similarly for b_t and B.

We can split the product up according to whether or not t\in\{i,j,k\}. To simplify the expression, let us take (without loss of generality) \{i,j,k\} to be \{1,2,3\}. If we do all this, then we obtain the expression

\displaystyle \mathbb{E}_{x_1,\dots,x_n}\prod_{i=1}^3(-1)^{\phi(x)_ia_i+x_ib_i}\prod_{i=4}^n(-1)^{x_i(a_i+b_i)}.

Now we can think of x_1,\dots,x_n as independent random variables, and if we do then the expectation of the product above can be rewritten as a product of expectations. But \mathbb{E}_{x_i}(-1)^{x_i(a_i+b_i)}=0 unless a_i=b_i, in which case it is 1. Therefore, the entire expression is zero unless a_i=b_i for every i\geq 4. This tells us that \langle w_A\circ\phi,w_B\rangle=0 for every B that does not equal A outside the set \{1,2,3\}. (More generally, this will be true for \{i,j,k\} for the same reason.) Another observation is that if |A\cap\{i,j,k\}| is even, then replacing x by \phi(x) does not change the parity of the restriction to A, so in that case w_A(\phi(x))=w_A(x) for every x, which tells us that \langle w_A\circ\phi,w_B\rangle=\delta_{A,B}. If, however, |A\cap\{i,j,k\}| is odd, then a small calculation shows that \langle w_A\circ\phi,w_B\rangle is \pm 1/2 for four of the eight sets B that agree with A outside \{i,j,k\} and 0 for the other four.

😐 Wow, that was more detail than I was expecting.

🙂 Well if you want to skip the calculations, just think about the conclusion. What we find is that the effect of \phi on a Walsh function w_A is either to leave it unchanged or to replace it by a linear combination of four Walsh functions w_B, where the sets B are close to A.

8) I simply have to make a remark here. What you have is an orthogonal map that takes the space of all linear combinations of Walsh functions, which we can identify with functions defined on \{0,1\}^n, to itself. Moreover, it’s a particularly simple orthogonal map in that it takes each Walsh function to a linear combination of Walsh functions that correspond to sets that differ only in the places i,j and k. If you identify w_A with the sequence that is 1 in A and 0 otherwise, then what this is saying is that each sequence is mapped to a linear combination of sequences that differ from it only in the coordinates i,j and k. But this is precisely the sort of thing that one does in a quantum computation!

🙂 I thought quantum computations involved complex numbers.

8) They do, but it’s hardly surprising that we don’t get arbitrary unitary maps, since the permutations of the cube give rise to a very small subgroup of the unitary group. It’s just amusing that the Fourier transform of a basic transformation is one of the basic transformations of quantum computation.

😐 Are you suggesting that quantum computation could be useful to us?

8) No I’m not. It was just meant as an amusing observation. Sorry if I’ve got you both excited.

🙂 Don’t apologize — I like it too. But let me get back to what I was saying. Once we see that Walsh functions have a tendency to map to linear combinations of closely related Walsh functions with coefficients equal to \pm 1/2 or 0, it becomes rather obvious that if you apply a typical sequence of basic operations to a Walsh function w_A, then it will spread out and out, preserving its L_2 norm but decreasing its L_\infty norm. Eventually you’ll start getting superpositions and the rate of decrease will become slower, and harder to analyse, but at least to start with it will be exponentially fast.

😐 I think I see what you mean.

🙂 Well, I wasn’t all that precise. But it seems to me that there might be some hope of proving that the decay happens not just for random products of basic 3-bit operations but even for 8)’ s “sensible” products.

8) You know, I think you’ve influenced my way of thinking.

🙂 Delighted to hear it, but in what way?

8) Well, very early on in our conversations, you pointed out that even a heuristic argument can have an influence on what proof methods one regards as potentially fruitful. And here, even if we haven’t the faintest idea how to prove the “bold conjecture”, it does seem to rule out certain methods for proving circuit lower bounds. Suppose, for example, that some ludicrously strong version of the principle held, and let \Gamma be some class of nice functions (such as quadratic phase functions). We might attempt to prove lower bounds by showing that every low-complexity function f correlates with some g\in\Gamma. And this could be a highly non-natural property if \Gamma has superexponential size, as it does with quadratics. Nevertheless, it could also be the case that any “sensible” product of basic 3-bit operations would have tiny correlation with every function in \Gamma. The key point here is that we go for “every” rather than “almost every”. To get that, we are arguing that when we pick a random product f of 3-bit operations, the events “f correlates with g” for g\in\Gamma could be strongly dependent. This would be the case if they always held when the product had certain properties, such as being “sensible”.

😐 What exactly are you claiming here?

8) I’m pointing out something very weak, in the sense that it’s supposition piled on supposition. But I think it’s at least possible that there is a very wide class of NP functions that cannot be used to prove circuit lower bounds. In fact, I’m tempted to say that there should be two sorts of NP functions: useless ones, such as “has low circuit complexity”, that lead to trivial reformulations of the problem, and potentially useful ones, such as “correlates with a quadratic”, that don’t work. I’d like to understand this distinction a little better, and then maybe I could formulate a very depressing conjecture that would kill off all ideas you have had so far and all that you are ever likely to have.

🙂 Oh. Well I have a confession to make.

8) What on earth could that be?

🙂 I did in fact have another idea for how to prove superlinear lower bounds. It’s still highly speculative.

8) You mean you don’t have a fully worked out argument? Wonders never cease.

🙂 Ha ha. Just let me try it out on you. I’ve just argued that if you take a linear phase function, a.k.a. a Walsh function, and apply to it a basic 3-bit transformation, then the result lies in twice the convex hull of some other linear phase functions. That means that the decay that we observed in the L_\infty norm could not be too fast.

Now suppose we were able to prove that if you apply a basic transformation to a quadratic phase function then the result could be written more efficiently, in some sense, as a combination of other quadratic phase functions. Then perhaps this would prove that the maximum correlation with a quadratic phase function went down more slowly than it does in the linear case. The decrease would probably still be exponential, but perhaps the rate would be smaller.

If we could establish that, then perhaps we would be able to go on to show that if we considered polynomials of unbounded degree, then we could prove that the maximum correlation went down more slowly than exponential. But there still wouldn’t be too many of these polynomial phase functions, so a random function would have very small correlation with them. This might give us a distinction between functions of linear complexity and random functions.

8) Well, I’ve listened politely to what you say, but it doesn’t sound to me as though it would work. If you want to show that the maximum correlation goes down slowly, don’t you have to prove that if g is a polynomial phase function and \phi is a basic transformation, then g\circ\phi^{-1} has 1-o(1) correlation with another polynomial phase function of the same degree? That doesn’t sound remotely plausible to me.

🙂 Hmm, maybe you’re right. But I think there may be something to the basic idea. Perhaps instead of showing that a single polynomial phase function transforms to a function that can be written efficiently as a convex combination, one would prove that if you do a sequence of basic transformations to a polynomial phase function, then you can write the result in some nice way as a linear combination of polynomial phase functions. I’m not quite sure what I’m proposing exactly, but it feels as though there is a method that could in principle prove non-trivial lower bound results.

😐 Why don’t you at least see what a basic 3-bit operation does to a quadratic phase function?

🙂 I will, but I think I’d better do it in private and report back at our next meeting.

8) I don’t see how this would get past my “depressing conjecture”.

🙂 How about actually making the conjecture before you say that?

8) OK, I’ve got things to think about too. Let me have a quick go before we call it a day.

Let’s suppose we have some NP property we are trying to use. If we want to, we can phrase this in the following way: we are going to call a function f “simple” if there is a Boolean function g that depends on at most linearly many bits such that P(f,g), where P is some property that can be computed in polynomial time. We want to contrast properties of two kinds. A good example of the first kind is this: P(f,g) if and only if g is a quadratic and \langle f,g\rangle is large. And a good example of the second is this: g encodes a circuit of size at most n^2 that computes f.

Both these properties could be thought of as correlation with a function in some class \Gamma. It’s just that in the second case, the correlation has to be so perfect that you actually equal a function in \Gamma. A difference between the two is then that membership of \Gamma is easy to establish in the quadratic-functions case and hard to establish in the low-complexity case.

What I think I might be heading towards here is the possibility that any NP property of the form “correlates with something in \Gamma” for some polynomially-computable property \Gamma cannot work. Or it might be that, but with a stronger restriction on \Gamma that said that it could be computed in a “collapsing” way. But that might apply to all properties in P that we could actually understand.

🙂 So we might still be able to prove lower bounds if we took a simple class \Gamma but asked for a more complicated relation to our function than “correlates with”?

8) I think that would be worth thinking about, yes.

😐 Just before we finish, can I suggest a way that might in principle lead to a more precise notion of “sensible product” or “collapsing product”?

8) That would be great.

😐 Well, I haven’t actually done it, but perhaps one could come up with a presentation of the alternating group A_{2^n} with the following properties. The generators are just the basic 3-bit operations. To each relation one associates a set R\subset\{1,2,\dots,n\}, such that the following properties hold. (They may not be a complete set of all the properties one would need.)

1. A relation associated with the set R is a product of 3-bit operations that operate only on bits that belong to the set R.

2. For each R let G_R be the group generated by all the relations for which the associated set is a proper subset of R. Then G_R is an infinite group, and no relation with index R can be written as a short word in the group G_R.

3. Basic operations that operate on disjoint sets of bits commute.

The rough idea then would be to prove that if you have a product of basic operations that doesn’t stick around for a long time in the same set R, then there is no way of shortening it. So one would have the “no unexpected relations” property that you are talking about.

8) That sounds interesting, but it also sounds pretty hard to come up with a presentation of the kind you are talking about.

🙂 I wonder if we could find a combinatorial group theorist who would be able to comment on whether it is remotely feasible.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: