This instalment has a brief discussion of another barrier to proving that PNP, known as algebrization. I don’t fully understand it, and therefore neither do my characters. (I’m hoping that maybe someone can help me with this.) But even a fuzzy understanding has some consequences, and the characters are led to formulate a simpler (and almost certainly already considered by the experts) problem that has the merit that when trying to solve it one is not tempted by proof techniques that would run up against the algebrization barrier. However, for all the usual reasons, this “simpler” problem looks very hard as well.

**************************************

I’m afraid I’m not yet ready to tell you what basic 3-bit operations do to quadratic phase functions.

8) In that case, can I instead mention something I read that looks relevant?

Sure, go ahead.

8) Well, you may remember my mentioning that Scott Aaronson and Avi Wigderson have a paper in which they introduce *another* barrier to lower bound proofs, which they call “algebrization”. If a proof can be algebrized, then it can’t prove that PNP.

π What does “algebrized” mean?

8) Unfortunately, my understanding of it is rather hazy. But it refers to a technique where in order to prove complexity results, you try to approximate low-complexity functions by low-degree polynomials over It turns out to be helpful to look at low-degree field extensions of so one can talk of results holding in various extensions.

A proof that PNP would algebrize if you can show that there is a function in that is not in . Here, is any oracle, and is a low-degree extension of

π What’s an oracle?

8) I’m a bit hazy on oracles too. Basically, it means that you can take some fairly complicated function and assume that your computer takes only one step to calculate it. (The word “oracle” is used for obvious reasons: it’s as though when the computer wants to know the value of this very complicated function, it just goes and asks the oracle.) One then writes and for the sets of functions you can compute/verify in polynomial time if you have access to the oracle It is already known that there are oracles such that and other oracles such that It follows that any proof that PNP must not *relativize*: that is, must not remain valid even if you throw in an oracle.

π Why would a proof remain valid for an arbitrary oracle?

8) The kinds of proofs we’ve been thinking about wouldn’t, but there is a very different class of techniques that would. The observation about oracles shows that you can’t hope to prove that PNP by means of some clever counting or diagonalization argument.

π What do you mean by “diagonalization argument” in this context?

8) Well, one might try to run through all functions in P and construct a clever function in NP making sure that it disagrees with each function in P in at least one place. That’s the kind of thing you do to prove the insolubility of the halting problem. But for proving that PNP it is known not to work.

π Why is that called “algebrization”?

8) It isn’t. That’s relativization. A proof that PNP must not relativize, meaning that it must not be valid “relative to an arbitrary oracle”. By the way, you might like to check out a post by Terence Tao about the relativization barrier.

π I definitely would!

8) Anyhow, algebrization is a further twist, I think, where you have *two* oracles, and one is allowed to be a low-degree field extension of the other. Roughly, they show that you cannot prove that PNP if your proof would also show that for every oracle and every low-degree field extension of

π But surely that’s already included in the relativization result? Just take

8) Hmm, that is quite confusing I agree. It may explain why I find this concept quite hard to grasp.

Aha, in the paper, which, incidentally, may be found here (or just Google “algebrization”), they say that they are referring to *algebraic oracles*, which they define as oracles that can work out not just the value of a function but also its value in any field extension. Since that doesn’t seem to make much sense for an arbitrary Boolean function, I think it must mean that the oracles are not arbitrary—since then you would surely be right—but only allow you to work out the kinds of functions, such as polynomials, that can be interpreted in a larger field. So their result is stronger in one sense—a proof is ruled out if it yields similar results for a rather small class of oracles—but weaker in another—those “similar results” have to show not just that for every algebraic oracle, but the stronger statement that for every algebraic oracle and every low-degree field extension of

π But surely *every* Boolean function can be written as a polynomial.

8) Well, yes, but the polynomial will depend on which doesn’t really count. Or does it? I’m afraid I’m not sure what the right response is to that question.

π What if you have two different polynomials that take equal values over one field but not over an extension?

8) I don’t think that’s a big problem. You’d just give some algebraic formula to the oracle and the oracle would be able to tell you the answer. It wouldn’t matter if some other algebraic formula happened to agree with it.

I’m sorry my understanding of this is less than perfect, but what I wanted to draw your attention to was the following few sentences from the Aaronson-Wigderson paper:

Can we pinpoint what it is about arithmetization that makes it incapable of solving these problems? In our view, arithmetization simply fails to βopen the black box wide enough.β In a typical arithmetization proof, one starts with a polynomial-size Boolean formula , and uses to produce a low-degree polynomial But having done so, one then treats as an arbitrary black-box function, subject only to the constraint that is small. Nowhere does one exploit the small size of except insofar as it lets one evaluate in the ο¬rst place. The message of this paper has been that, to make further progress, one will have to probe in some βdeeperβ way.

Ah, I see why you’re saying that. Even if we don’t fully understand the algebrization barrier, we can still be a little sceptical of any proof attempt that makes use of polynomials without also making use of how those polynomials are put together. I think we’ve slightly run up against this thought in some of our earlier discussions actually. It seems to suggest that it is unlikely that we would be able to prove that applying 3-bit operations to arbitrary linear combinations of degree- polynomial phase functions gave you linear combinations with coefficient sums that were not much bigger. That seems to tie in with my experiences with quadratics.

However, having said that I’d like to mention that the degree is by no means the only interesting parameter one can associate with a polynomial over

π Why, what else is there?

There’s also its rank.

π What does the rank of a polynomial mean? I thought ranks were things that linear maps had.

Well, let me first say what the rank of a quadratic is. Suppose you have a quadratic polynomial We can associate with the symmetric matrix that takes the value if and otherwise. What’s more, this is quite a natural thing to do, because you get precisely the matrix of the bilinear form as you will quickly see if you check it. And there is a very close association between quadratic forms and bilinear forms—though not quite as close over as it is over fields of higher characteristic.

We can now define the rank of the original quadratic to be the rank of the associated bilinear form (or equivalently of the matrix). The rank has a direct impact on the norm of the quadratic phase function that you build out of the quadratic. Indeed, if we write for the quadratic and for the Boolean function then we get

Using the fact that addition and subtraction are the same over one then finds that this is Therefore, the fourth power of the norm of is Now for any fixed the expectation over is unless the function is identically zero, in which case it is In other words, you get unless belongs to the kernel, so to speak, of This shows that the norm is equal to the density of the kernel, which is where is the rank. In other words, the rank of is just

π So if you’re talking about quadratics, then the rank is *exactly* what you are interested in, rather than the degree. Sorry, that was slightly silly—if the degree is fixed then obviously you aren’t interested in it.

But what you are saying is still basically right—it’s the rank that tells you all about the norm, and in fact the dual norm as well because for quadratic phase functions it turns out that is *equal* to 1 rather than just *at least* 1.

π So presumably for cubics you do the same thing, except that this time you get a trilinear map. Is that right?

Yes and no. Yes you do get a trilinear map, but now it isn’t obvious what you mean by the word “rank”.

π Why not?

Because there are several natural candidates for a definition, and they do not agree. The result is that there is no consensus about what the “right” definition should be for a multilinear map.

π What are some of these natural candidates?

Well, let be a trilinear form on an -dimensional vector space. One could define its rank to be , where is the dimension of the space of all such that for every Or one could take it to be the smallest such that it is possible to write as Or one could take it to be the smallest such that we can write as a sum where each is a product of a linear function in one of and a bilinear function in the other two. But actually I like a definition that was proposed in a recent and not yet published (or even arXived, it seems) paper of Gowers and Wolf, and is also implicit in work of Green and Tao. It’s to forget about the algebra and simply define the rank of to be (That’s the definition when the vector space is over There is a similar definition for and also in some other contexts.)

π That sounds a little tautological.

I know, but Gowers and Wolf observed that one could prove that this “analytic” rank had a number of useful properties. For example, they showed that if and are -linear, then the rank of is at most times the rank of plus the rank of I can’t quite remember whether that proof worked in low characteristic—I’ll have to check.

π So their result is weaker than what you’d get in the bilinear case from algebraic methods?

Yes, that’s odd. I expect it would be possible to improve the factor to by tweaking their argument a bit. But that’s not my main concern here. I just want to point out that this definition exists and is quite useful. Also, and I think I mentioned this in one of our earlier conversations, Green and Tao showed that a low-rank polynomial phase function could be decomposed into phase functions of lower degree. To give a simple example of this, an arbitrary Boolean function that depends on just variables has at most non-zero Fourier coefficients, so in particular a quadratic phase function that depends on at most variables, and therefore has rank at most can be decomposed into at most linear phase functions.

π Is there an algebraic interpretation of this notion of rank?

Sort of. In the trilinear case, for instance, is if the linear map is not identically zero and is if it *is* identically zero. So you could say that the rank is related to the size of the “kernel” of However, the set of all such that for every is not a linear subspace but rather some bilinear structure. And its density does not have to be a power of 2, so the rank of is not necessarily an integer.

8) I’m not sure why you think that *any* notion of rank is going to be of any use in proving lower bound results.

Why not?

8) Because the rank is never going to be more than since if then belongs to the “kernel”. Also, it goes up in a sort of subadditive way. So it seems to me that after a linear number of scrambling operations you’ll have reached something with maximal rank—or at least, it won’t be possible to use just the rank to prove that you haven’t. I think you could perhaps extend what Aaronson and Wigderson said: you have to use more about your polynomials than just their degree and their rank.

That does sound plausible, alas. I wonder if some principle like that follows from their results on algebrization. I feel like asking an expert about this.

8) Meanwhile, perhaps we should return to the question of what basic operations do to quadratic phase functions?

Oh yes, thanks for reminding me. But I’m afraid I still don’t have any progress to report. But I have had a little thought about it that I’d like to float past you.

8) OK.

I was a bit worried by your point that it is unlikely that a basic operation applied to a polynomial phase function of high degree will have correlation with another polynomial phase function of the same degree. And I’m also worried by your point about the combination of degree and rank probably not being a refined enough tool for proving complexity lower bounds. But it now occurs to me that it might just conceivably be possible to define a set-valued complexity measure using these sorts of ideas.

8) Ah, that sounds potentially interesting if you really can do it.

Well, the idea I had was this. If one is trying to prove that a function of low complexity is a linear combination, with coefficients that aren’t too big, of functions that are in some sense structured, such as low-degree polynomials and the like, then perhaps one can argue that the set of functions you use for is not much bigger than the union of the set of functions you use for and the set of functions you use for That’s not quite what I mean because it makes no mention of the coefficients, but you get the idea: if you’ve got a straight-line computation of then perhaps the set of functions you need to express all the functions that occur in the computation as efficient linear combinations is not too vast.

8) I sort of get the idea, and it sounds interesting in a way, but I worry that much of its interest may derive from its being rather vague and may vanish as soon as you try to express the idea more precisely. Can you give me a precise statement that says something like, “If there exists a set of functions with such and such a property, then there is a function in NP with a superlinear circuit lower bound”? Of course, in itself that’s not enough because some such statements are useless, such as “If there exists a set of functions that contains all functions of circuit complexity at most and does not contain the clique function, then there is a function in NP with a superlinear circuit lower bound.”

Let me tell you the kind of thing I had in mind. If you don’t mind I’ll call the set of functions instead of Now suppose that we have two Boolean functions and and that we can write them as and with the functions and belonging to

What can we then say about ? I was hoping that we would be able to write as where each was either a or a or one of a small number of “error” functions that came in. And also there would have to be some condition about the sum of the not being too big in terms of the sums of the and

8) I can see a number of problems with this. To start with, if the coefficient sums go up by even a small constant factor each time you do a Boolean operation, then after a linear number of steps they will be exponentially large. But if your set is reasonably balanced, it will be possible to write *every* Boolean function as a linear combination with coefficients adding up to at most So you won’t have distinguished between functions of linear complexity and random functions.

But even if you can solve this problem (which is conceivable—for instance, there might be parameters other than the sum of the coefficients that could give you a measure of the “size” of the linear combination), there is another that seems to me to be more fundamental. Recall from an earlier conversation of ours that Boolean operations felt more like products than sums. Now what happens if you take the product of and ? You get Now there’s good news and bad news here. The good news is that if and are degree- phase functions then so is so we’ve got another linear combination of functions in But the bad news is that we have used all the functions (and we have multiplied together the coefficient sums, but we’re ignoring that aspect now). So it seems that the set of functions we use for is going to be something like the product set of the sets we use for and for Therefore, after linearly many steps of the process it seems that we will end up with exponentially large sets. But *no* function needs more than exponentially many functions from , since the dimension of the space we are talking about is

As ever, the water you pour over my ideas is well and truly cold. But there’s one point you made that I think needs to be thought about slightly harder.

8) As ever, you carry a little immersion heater around with you.

Something like that. What I wanted to say was that at a certain point you made the assumption that the cardinality of the set of functions was equal to the product of the number of and the number of Obviously this is an upper bound, but perhaps there are coincidences. For example, if is the set of all quadratic phase functions, and and , then so if and are the sets of quadratics used by and then the set of quadratics used by is the sumset

Now in general can have size but if the sets and have additive structure (an extreme example would be when they are parallel affine subspaces of the space of quadratics over ) then can be much smaller.

8) OK, but now you’ve introduced another idea into the picture. Your hypothesis would be not so much that you can write low-complexity functions as a linear combination of *few* structured functions, and more that you can write them as a linear combination of *structured sets* of structured functions. But where is your set-theoretic complexity measure now?

π I’ve thought of a question that might be worth considering. It just feels as though it could clarify the discussion a bit.

OK, what is it?

π Well, it seems to me that, motivated by the remarks of Aaronson and Wigderson, you are trying to distinguish between low-complexity and high-complexity polynomials, even when they have the same degree. This seems worth thinking about even for quadratics: a simple counting argument tells us that most quadratics have superlinear (in fact, almost quadratic) circuit complexity. So what is the difference between a high-complexity quadratic and a low-complexity quadratic? If you force yourself to think about just this question, then maybe you won’t keep coming up with properties that fail for known reasons.

That sounds like an excellent idea, but potentially also a very difficult one, because it might be possible to define “weird” quadratics by devising some clever low-complexity calculation that involves functions that are nothing like quadratics but that magically produce a quadratic at the end. But perhaps one could have a restricted model of computation that allowed more gates but insisted that all functions at every stage were quadratics.

8) That is sounding very like arithmetic complexity to me.

π What is arithmetic complexity?

8) It’s very like circuit complexity, but you replace the Boolean operations and by arithmetic operations such as and I think a good illustration of the basic idea is the function This has arithmetic complexity because you can get it in steps by starting with and repeatedly squaring. (I’m now talking about arithmetic complexity of polynomials over the reals, by the way.) This shows that the arithmetic complexity of a polynomial can be far smaller than the degree. And in general it seems to be hard to prove lower bounds for arithmetic complexity.

A very important fact in cryptography is that the arithmetic complexity of raising a number to a power modulo a large index is small. If you want to work out mod then you can do the repeated-squaring trick to work out for all and then multiply together the ones you need to make This shows that calculating has arithmetic complexity that is polylogarithmic in Equally important is the fact that it seems to be much harder to calculate factorials: if there were a quick way of calculating , for example, then there would be a quick primality test. (Of course, one could say that there *is* a quick way: use AKS to test whether is prime and then the answer is if it is prime and otherwise. But being able to calculate factorials and similar polynomials quickly would have major consequences besides primality testing.)

But I don’t see how this applies to quadratics, because they are not closed under taking products.

8) I agree that that’s a problem. But here’s the type of thing one could do. You’ll agree that the map is a high-rank quadratic.

Yes. I think it has rank either or

8) And it’s also computable in linear time, has linear arithmetic complexity, etc.

Yes, but its associated matrix is very sparse, since it has only entries.

π Sorry, but why rather than ?

Because the bilinear form you get is symmetric: corresponding to each term you get a term in the bilinear form.

π Ah, I see.

8) The point I wanted to make is that you can produce less sparse matrices by producing some more interesting ensembles of linear forms than What you do is start with a straight-line computation that allows only addition mod 2. In other words, you write a sequence where each is a sum of two earlier terms in the sequence (either s or s). Then at the end of that process you create a quadratic form where each and each is one of the linear forms from the nice little supply that you have built up.

Ah, I like that. First of all, it really does seem a natural model for the computational complexity of quadratic forms over OK, it doesn’t allow for computations that produce quadratic forms by magic, but it does seem to allow all “natural” ways that one might want to produce them. The second thing I like about it is that it relates *very* closely to the Gaussian-elimination problem. In fact, the only difference is that in that problem when you work out the mod-2 sum of two of your functions, you have to choose one of them and vow never to use it again, whereas here we drop that restriction. It shows that what makes a quadratic computationally simple is the possibility of writing it as a sum of rank-one quadratics (by which I mean functions of the form where both and are linear) that are somehow related to each other in a “simple” way. We see that the rank is far too crude a parameter because it takes no account of this relationship (or lack of it).

π Quick question: to what extent is the representation of a quadratic unique? In particular, might it have a “simple” representation *and* a “complex” one? Then in order to prove that a quadratic could not be computed in linear time (in this model) one would have to consider all possible representations rather than just one, which would be annoying.

I think uniqueness of any kind is too much to hope for. For example, if you work mod 3 instead and choose an arbitrary “orthonormal basis” (by which I mean a collection of binary sequences such that mod 3 for every ), then the matrix works out to be the identity (as can be seen if you multiply it by any of the vectors ). The resulting quadratic form is . There are annoying characteristic-2 problems if you want to do something similar over since then but I suspect it would be easy to come up with an example that avoids the diagonal and makes the same general point.

π Sorry, I got lost there—what is the general point?

8) It’s that one can probably find “weird” ensembles of linear forms that give you quadratics that can also be produced in non-weird ways. But now it’s my turn to be lost. What was the point *you* were making?

π I haven’t quite got it clear myself. But I liked this connection between the complexity of quadratic forms and the Gaussian-elimination problem and was trying to see how closely they were related. It seems that what you do here is a slightly generalized Gaussian-elimination procedure to produce two matrices (one with rows and one with rows ) and you then multiply the first by the transpose of the second to get the matrix of the quadratic form. So we are not in fact asking whether the matrix of the quadratic form can be obtained in linear time using Gaussian row operations, or anything like that at all. And one can express the matrix as a product of two other matrices in many different ways, so it looks as though it will be pretty difficult to identify some quasirandomness property that forces all decompositions of the form to be complex in the Gaussian-elimination sense.

I suppose what I’m saying is that even if we could solve the Gaussian-elimination problem, we’d still have a long way to go.

I think that you are right, which is unfortunate.

π Thanks.

I don’t mean your correctness is unfortunate, but just that the correct fact you’ve identified (if it is correct) is a pity. But I still like your proposal that we should look at quadratics, because it still seems to be subject to the natural-proofs barrier, and we won’t be tempted to use crude parameters such as degree and rank because all our polynomials are quadratics and they can quite easily have maximal rank.

8) Yes, but now you haven’t the faintest idea how to prove anything.

That’s better than having lots of faint ideas that are guaranteed not to work. And in any case, now, despite whatπ has pointed out, I want to think about the Gaussian-elimination problem. I have a simple question: how many Gaussian operations do you need in order to produce a quasirandom matrix?

π What do you mean by that?

I mean a matrix where the number of 1s is approximately and for almost any two rows they agree in approximately half the places. (Actually, the second property implies the first.)

8) That sounds like a very natural property to me.

Thank you.

8) I didn’t mean it as a compliment. It sounds easily computable, and hence very unlikely to be a good indication of complexity. I’m sure it will be possible to produce such matrices in a linear number of steps.

Oh yes. Well, perhaps we should just confirm that.

A random sequence of row operations doesn’t work I think, because each row ends up depending on only very few of the original rows. To put it another way, the matrix you end up with is far too sparse. But one can deal with that by first forming a triangular matrix with 0s on one side of the diagonal and 1s on the other side—it’s easy to see how to do that. Perhaps if one *then* applies a random sequence of Gaussian operations one reaches a quasirandom matrix in linear time.

8) Hmm, I’m not sure. For a start there will be a few rows that you will never touch again.

Yes, but all I need is for *almost* all rows to look random at the end of the process.

8) I think you need to be fussier than that. After all, the Paley matrix is *very* quasirandom according to your definition, so if you can’t produce extremely good quasirandomness in linear time then the Walsh matrix needs a superlinear number of row operations.

That was indeed an example I had in mind actually.

π What’s the Walsh matrix?

A quick definition is this: If is the matrix with just a 1 inside, then you’ll end up with a orthogonal matrix of s and s. If you change that to a 01-matrix in the obvious way then you get what one might call perfect quasirandomness, apart from the fact that one row is all 1s.

8) Right, how do you get *that* level of quasirandomness with linearly many row operations?

Hmm, we have an inductive construction, so presumably we can translate that into some inductively defined sequence of row operations. How do we produce from the identity matrix? That’s quite easy: add the second row to the first to get , then the first to the second to get

In general, if we’ve got the matrix , we can add the second “block row” to the first and then the first to the second. That takes operations. So if it takes operations to produce then we seem to have a recurrence like That recursion is satisfied by the function If we now set we get a bound of rather than a linear bound.

π It would be very surprising if that was not the quickest way of producing It just seems so obvious and natural. Is it really true that the best known lower bounds for this problem are only linear?

Maybe we should check. There’s obviously lots to think about here, but I’m getting tired. Shall we stop for now?

8) π OK.

November 3, 2009 at 1:35 am |

In the notion of algebrization, the function is a low-degree polynomial (of degree ) over a field of size with the property that for every we have , where we identify the of the field with the of the field .

(There are actually going to be different fields for different input lengths, so can be seen as a family of functions parameterized by , and the -th function has domain where the size of the field is polynomial in the dimension .)

Having an algebrizing proof that means that for every and every that is in the above relation with we have .

In a relativizing proof, we would have that for every it holds . This would imply that the proof also algebrizes, because for which is an “extension” of as defined above, we have , simply because the oracle queries to can be realized as oracle queries to .

The converse, however, need not be true: it could be (and it is the case for certain complexity classes) that, even though there is an oracle such that , so that no relativizing proof that is possible, for every and every extension we have , thanks to the extra power that we gain by evaluating at non-boolean points.

November 3, 2009 at 1:43 am

Sorry for the mess: in the third paragraph, should be ; in the formula that does not parse a backslash is missing before ‘subseteq’, and later a backslash is missing before ’tilde.’ And ‘algebrization’ is misspelled in the first sentence.

[

Thanks — I’ve edited out those typos now.]November 3, 2009 at 4:57 pm |

I still find this concept weirdly difficult to understand. For example, you say that a proof that NP is not a subset of P algebrizes if for every and every that extends we have . Later you say that . So surely the proof algebrizes if and only if . The proof would be that if you want to show that for every that extends it is enough to show it for the smallest such , which is .

I’m sure there’s something completely obviously wrong with what I’m writing, but I just don’t see it at the moment.

November 3, 2009 at 7:27 pm

Good point, evidently I too I was confused about the definition. I said that, in order to define an admissible , we choose a sequence of field sizes and then a sequence (indexed by ) of polynomials over which agree with on . Then, as you say, one could take and .

Instead the definition of extension oracle (Def 2.2 in the paper) is as follows. To define an admissible , we pick for every and for every field a polynomial of low degree over which agrees with over . Then the oracle is the entire collection of polynomials.

The definition tries to model the setting in which we have some kind of “interpolation formula” for , and hence we are able to evaluate the formula while doing operations over any field of our choice. No matter what field we choose, the invariant is maintained that over we recover and that the polynomial degree of the function we compute is small. (The actual definition does not require that, for every , there has to be a single algebraic formula from which all the can be computed, and in fact it does not require any relationship between the for different , other than agreeing over . This is just intuition for how an might actually look like.)

November 3, 2009 at 7:50 pm

Ah, I see now. So because you’re forced to choose an extension of to every field that extends , in a certain sense is forced to be large and is a correspondingly weaker statement than , so saying that no proof that algebrizes is stronger than saying that no proof relativizes.