A conversation about complexity lower bounds, IX

This instalment has a brief discussion of another barrier to proving that P\neNP, known as algebrization. I don’t fully understand it, and therefore neither do my characters. (I’m hoping that maybe someone can help me with this.) But even a fuzzy understanding has some consequences, and the characters are led to formulate a simpler (and almost certainly already considered by the experts) problem that has the merit that when trying to solve it one is not tempted by proof techniques that would run up against the algebrization barrier. However, for all the usual reasons, this “simpler” problem looks very hard as well.


πŸ™‚ I’m afraid I’m not yet ready to tell you what basic 3-bit operations do to quadratic phase functions.

8) In that case, can I instead mention something I read that looks relevant?

πŸ™‚ Sure, go ahead.

8) Well, you may remember my mentioning that Scott Aaronson and Avi Wigderson have a paper in which they introduce another barrier to lower bound proofs, which they call “algebrization”. If a proof can be algebrized, then it can’t prove that P\neNP.

😐 What does “algebrized” mean?

8) Unfortunately, my understanding of it is rather hazy. But it refers to a technique where in order to prove complexity results, you try to approximate low-complexity functions by low-degree polynomials over \mathbb{F}_2. It turns out to be helpful to look at low-degree field extensions of \mathbb{F}_2, so one can talk of results holding in various extensions.

A proof that P\neNP would algebrize if you can show that there is a function in \mathrm{P}^{\tilde{A}} that is not in \mathrm{P}^A. Here, A is any oracle, and \tilde{A} is a low-degree extension of A.

😐 What’s an oracle?

8) I’m a bit hazy on oracles too. Basically, it means that you can take some fairly complicated function and assume that your computer takes only one step to calculate it. (The word “oracle” is used for obvious reasons: it’s as though when the computer wants to know the value of this very complicated function, it just goes and asks the oracle.) One then writes \mathrm{P}^A and \mathrm{NP}^A for the sets of functions you can compute/verify in polynomial time if you have access to the oracle A. It is already known that there are oracles A such that \mathrm{P}^A=\mathrm{NP}^A and other oracles B such that \mathrm{P}^B\ne\mathrm{NP}^B. It follows that any proof that P\neNP must not relativize: that is, must not remain valid even if you throw in an oracle.

😐 Why would a proof remain valid for an arbitrary oracle?

8) The kinds of proofs we’ve been thinking about wouldn’t, but there is a very different class of techniques that would. The observation about oracles shows that you can’t hope to prove that P\neNP by means of some clever counting or diagonalization argument.

😐 What do you mean by “diagonalization argument” in this context?

8) Well, one might try to run through all functions in P and construct a clever function in NP making sure that it disagrees with each function in P in at least one place. That’s the kind of thing you do to prove the insolubility of the halting problem. But for proving that P\neNP it is known not to work.

😐 Why is that called “algebrization”?

8) It isn’t. That’s relativization. A proof that P\neNP must not relativize, meaning that it must not be valid “relative to an arbitrary oracle”. By the way, you might like to check out a post by Terence Tao about the relativization barrier.

😐 I definitely would!

8) Anyhow, algebrization is a further twist, I think, where you have two oracles, and one is allowed to be a low-degree field extension of the other. Roughly, they show that you cannot prove that P\neNP if your proof would also show that \mathrm{P}^{\tilde{A}}\not\subset\mathrm{NP}^A for every oracle A and every low-degree field extension \tilde{A} of A.

😐 But surely that’s already included in the relativization result? Just take A=\tilde{A}.

8) Hmm, that is quite confusing I agree. It may explain why I find this concept quite hard to grasp.

Aha, in the paper, which, incidentally, may be found here (or just Google “algebrization”), they say that they are referring to algebraic oracles, which they define as oracles that can work out not just the value of a function f but also its value in any field extension. Since that doesn’t seem to make much sense for an arbitrary Boolean function, I think it must mean that the oracles are not arbitrary—since then you would surely be right—but only allow you to work out the kinds of functions, such as polynomials, that can be interpreted in a larger field. So their result is stronger in one sense—a proof is ruled out if it yields similar results for a rather small class of oracles—but weaker in another—those “similar results” have to show not just that \mathrm{P}^A\not\subset\mathrm{NP}^A for every algebraic oracle, but the stronger statement that \mathrm{P}^{\tilde{A}}\not\subset\mathrm{NP}^A for every algebraic oracle A and every low-degree field extension \tilde{A} of A.

😐 But surely every Boolean function can be written as a polynomial.

8) Well, yes, but the polynomial will depend on n, which doesn’t really count. Or does it? I’m afraid I’m not sure what the right response is to that question.

😐 What if you have two different polynomials that take equal values over one field but not over an extension?

8) I don’t think that’s a big problem. You’d just give some algebraic formula to the oracle and the oracle would be able to tell you the answer. It wouldn’t matter if some other algebraic formula happened to agree with it.

I’m sorry my understanding of this is less than perfect, but what I wanted to draw your attention to was the following few sentences from the Aaronson-Wigderson paper:

Can we pinpoint what it is about arithmetization that makes it incapable of solving these problems? In our view, arithmetization simply fails to β€œopen the black box wide enough.” In a typical arithmetization proof, one starts with a polynomial-size Boolean formula \phi, and uses \phi to produce a low-degree polynomial p. But having done so, one then treats p as an arbitrary black-box function, subject only to the constraint that \mathrm{deg}(p) is small. Nowhere does one exploit the small size of \phi, except insofar as it lets one evaluate p in the first place. The message of this paper has been that, to make further progress, one will have to probe \phi in some β€œdeeper” way.

πŸ™‚ Ah, I see why you’re saying that. Even if we don’t fully understand the algebrization barrier, we can still be a little sceptical of any proof attempt that makes use of polynomials without also making use of how those polynomials are put together. I think we’ve slightly run up against this thought in some of our earlier discussions actually. It seems to suggest that it is unlikely that we would be able to prove that applying 3-bit operations to arbitrary linear combinations of degree-d polynomial phase functions gave you linear combinations with coefficient sums that were not much bigger. That seems to tie in with my experiences with quadratics.

However, having said that I’d like to mention that the degree is by no means the only interesting parameter one can associate with a polynomial over \mathbb{F}_2.

😐 Why, what else is there?

πŸ™‚ There’s also its rank.

😐 What does the rank of a polynomial mean? I thought ranks were things that linear maps had.

πŸ™‚ Well, let me first say what the rank of a quadratic is. Suppose you have a quadratic polynomial q(x)=\sum_{\{i,j\}\in\mathcal{A}}x_ix_j. We can associate \mathcal{A} with the symmetric matrix A_{ij} that takes the value 1 if \{i,j\}\in\mathcal{A} and 0 otherwise. What’s more, this is quite a natural thing to do, because you get precisely the matrix of the bilinear form \beta(x,y)=q(x+y)-q(x)-q(y), as you will quickly see if you check it. And there is a very close association between quadratic forms and bilinear forms—though not quite as close over \mathbb{F}_2 as it is over fields of higher characteristic.

We can now define the rank of the original quadratic to be the rank of the associated bilinear form (or equivalently of the matrix). The rank has a direct impact on the U^2 norm of the quadratic phase function that you build out of the quadratic. Indeed, if we write q(x) for the quadratic and f(x) for the Boolean function (-1)^{q(x)}, then we get

\displaystyle f(x)f(x+a)f(x+b)f(x+a+b)=(-1)^{q(x)+q(x+a)+q(x+b)+q(x+a+b)}.

Using the fact that addition and subtraction are the same over \mathbb{F}_2, one then finds that this is (-1)^{\beta(a,b)}. Therefore, the fourth power of the U^2 norm of f is \mathbb{E}_{a,b}(-1)^{\beta(a,b)}. Now for any fixed a, the expectation over b is 0 unless the function b\mapsto\beta(a,b) is identically zero, in which case it is 1. In other words, you get 0 unless a belongs to the kernel, so to speak, of \beta. This shows that the U^2 norm is equal to the density of the kernel, which is 2^{-r}, where r is the rank. In other words, the rank of \beta is just -\log_2\|f\|_{U^2}^4.

😐 So if you’re talking about quadratics, then the rank is exactly what you are interested in, rather than the degree. Sorry, that was slightly silly—if the degree is fixed then obviously you aren’t interested in it.

πŸ™‚ But what you are saying is still basically right—it’s the rank that tells you all about the U^2 norm, and in fact the U^2 dual norm as well because for quadratic phase functions it turns out that \|f\|_{U^2}\|f\|_{U^2}^* is equal to 1 rather than just at least 1.

😐 So presumably for cubics you do the same thing, except that this time you get a trilinear map. Is that right?

πŸ™‚ Yes and no. Yes you do get a trilinear map, but now it isn’t obvious what you mean by the word “rank”.

😐 Why not?

πŸ™‚ Because there are several natural candidates for a definition, and they do not agree. The result is that there is no consensus about what the “right” definition should be for a multilinear map.

😐 What are some of these natural candidates?

πŸ™‚ Well, let \tau be a trilinear form on an n-dimensional vector space. One could define its rank to be n-r, where r is the dimension of the space of all a such that \tau(a,b,c)=0 for every (b,c). Or one could take it to be the smallest r such that it is possible to write \tau(a,b,c) as \sum_{i=1}^ru_i(a)v_i(b)w_i(c). Or one could take it to be the smallest r such that we can write \tau as a sum \sum_{i=1}^rf_i, where each f_i is a product of a linear function in one of a,b,c and a bilinear function in the other two. But actually I like a definition that was proposed in a recent and not yet published (or even arXived, it seems) paper of Gowers and Wolf, and is also implicit in work of Green and Tao. It’s to forget about the algebra and simply define the rank of \tau to be -\log_2\sum_{a,b,c}(-1)^{\tau(a,b,c)}. (That’s the definition when the vector space is over \mathbb{F}_2. There is a similar definition for \mathbb{F}_p, and also in some other contexts.)

😐 That sounds a little tautological.

πŸ™‚ I know, but Gowers and Wolf observed that one could prove that this “analytic” rank had a number of useful properties. For example, they showed that if \mu and \nu are d-linear, then the rank of (\mu+\nu) is at most 2^d times the rank of \mu plus the rank of \nu. I can’t quite remember whether that proof worked in low characteristic—I’ll have to check.

😐 So their result is weaker than what you’d get in the bilinear case from algebraic methods?

πŸ™‚ Yes, that’s odd. I expect it would be possible to improve the factor to 2^{d-1} by tweaking their argument a bit. But that’s not my main concern here. I just want to point out that this definition exists and is quite useful. Also, and I think I mentioned this in one of our earlier conversations, Green and Tao showed that a low-rank polynomial phase function could be decomposed into phase functions of lower degree. To give a simple example of this, an arbitrary Boolean function that depends on just k variables has at most 2^k non-zero Fourier coefficients, so in particular a quadratic phase function that depends on at most k variables, and therefore has rank at most k, can be decomposed into at most 2^k linear phase functions.

😐 Is there an algebraic interpretation of this notion of rank?

πŸ™‚ Sort of. In the trilinear case, for instance, \mathbb{E}_c(-1)^{\tau(a,b,c)} is 0 if the linear map c\mapsto\tau(a,b,c) is not identically zero and is 1 if it is identically zero. So you could say that the rank is related to the size of the “kernel” of \tau. However, the set of all (a,b) such that \tau(a,b,c)=0 for every c is not a linear subspace but rather some bilinear structure. And its density does not have to be a power of 2, so the rank of \tau is not necessarily an integer.

8) I’m not sure why you think that any notion of rank is going to be of any use in proving lower bound results.

πŸ™‚ Why not?

8) Because the rank is never going to be more than n, since if a=0 then (a,b) belongs to the “kernel”. Also, it goes up in a sort of subadditive way. So it seems to me that after a linear number of scrambling operations you’ll have reached something with maximal rank—or at least, it won’t be possible to use just the rank to prove that you haven’t. I think you could perhaps extend what Aaronson and Wigderson said: you have to use more about your polynomials than just their degree and their rank.

πŸ™‚ That does sound plausible, alas. I wonder if some principle like that follows from their results on algebrization. I feel like asking an expert about this.

8) Meanwhile, perhaps we should return to the question of what basic operations do to quadratic phase functions?

πŸ™‚ Oh yes, thanks for reminding me. But I’m afraid I still don’t have any progress to report. But I have had a little thought about it that I’d like to float past you.

8) OK.

πŸ™‚ I was a bit worried by your point that it is unlikely that a basic operation applied to a polynomial phase function of high degree will have 1-o(1) correlation with another polynomial phase function of the same degree. And I’m also worried by your point about the combination of degree and rank probably not being a refined enough tool for proving complexity lower bounds. But it now occurs to me that it might just conceivably be possible to define a set-valued complexity measure using these sorts of ideas.

8) Ah, that sounds potentially interesting if you really can do it.

πŸ™‚ Well, the idea I had was this. If one is trying to prove that a function of low complexity is a linear combination, with coefficients that aren’t too big, of functions that are in some sense structured, such as low-degree polynomials and the like, then perhaps one can argue that the set of functions you use for f\vee g is not much bigger than the union of the set of functions you use for f and the set of functions you use for g. That’s not quite what I mean because it makes no mention of the coefficients, but you get the idea: if you’ve got a straight-line computation of f, then perhaps the set of functions you need to express all the functions that occur in the computation as efficient linear combinations is not too vast.

8) I sort of get the idea, and it sounds interesting in a way, but I worry that much of its interest may derive from its being rather vague and may vanish as soon as you try to express the idea more precisely. Can you give me a precise statement that says something like, “If there exists a set of functions \mathcal{F} with such and such a property, then there is a function in NP with a superlinear circuit lower bound”? Of course, in itself that’s not enough because some such statements are useless, such as “If there exists a set of functions \mathcal{F} that contains all functions of circuit complexity at most n\log n and does not contain the clique function, then there is a function in NP with a superlinear circuit lower bound.”

πŸ™‚ Let me tell you the kind of thing I had in mind. If you don’t mind I’ll call the set of functions \mathcal{U} instead of \mathcal{F}. Now suppose that we have two Boolean functions f and g and that we can write them as f=\sum\lambda_iu_i and g=\sum\mu_jv_j with the functions u_i and v_j belonging to \mathcal{U}.

What can we then say about f\vee g? I was hoping that we would be able to write f\vee g as \sum\nu_kw_k, where each w_k was either a u_i or a v_j or one of a small number of “error” functions that came in. And also there would have to be some condition about the sum of the \nu_k not being too big in terms of the sums of the \lambda_i and \mu_j.

8) I can see a number of problems with this. To start with, if the coefficient sums go up by even a small constant factor each time you do a Boolean operation, then after a linear number of steps they will be exponentially large. But if your set \mathcal{U} is reasonably balanced, it will be possible to write every Boolean function as a linear combination with coefficients adding up to at most C^n. So you won’t have distinguished between functions of linear complexity and random functions.

But even if you can solve this problem (which is conceivable—for instance, there might be parameters other than the sum of the coefficients that could give you a measure of the “size” of the linear combination), there is another that seems to me to be more fundamental. Recall from an earlier conversation of ours that Boolean operations felt more like products than sums. Now what happens if you take the product of \sum_i\lambda_iu_i and \sum_j\mu_jv_j? You get \sum_{i,j}\lambda_i\mu_ju_iv_j. Now there’s good news and bad news here. The good news is that if u_i and v_j are degree-d phase functions then so is u_iv_j, so we’ve got another linear combination of functions in \mathcal{U}. But the bad news is that we have used all the functions u_iv_j (and we have multiplied together the coefficient sums, but we’re ignoring that aspect now). So it seems that the set of functions we use for f\vee g is going to be something like the product set of the sets we use for f and for g. Therefore, after linearly many steps of the process it seems that we will end up with exponentially large sets. But no function needs more than exponentially many functions from \mathcal{U}, since the dimension of the space we are talking about is 2^n.

πŸ™‚ As ever, the water you pour over my ideas is well and truly cold. But there’s one point you made that I think needs to be thought about slightly harder.

8) As ever, you carry a little immersion heater around with you.

πŸ™‚ Something like that. What I wanted to say was that at a certain point you made the assumption that the cardinality of the set of functions u_iv_j was equal to the product of the number of u_i and the number of v_j. Obviously this is an upper bound, but perhaps there are coincidences. For example, if \mathcal{U} is the set of all quadratic phase functions, and u_i=(-1)^{q_i} and v_j=(-1)^{r_j}, then u_iv_j=(-1)^{q_ir_j}, so if Q and R are the sets of quadratics used by f and g, then the set of quadratics used by fg is the sumset Q+R.

Now in general Q+R can have size |Q||R|, but if the sets Q and R have additive structure (an extreme example would be when they are parallel affine subspaces of the space of quadratics over \mathbb{F}_2) then |Q+R| can be much smaller.

8) OK, but now you’ve introduced another idea into the picture. Your hypothesis would be not so much that you can write low-complexity functions as a linear combination of few structured functions, and more that you can write them as a linear combination of structured sets of structured functions. But where is your set-theoretic complexity measure now?

😐 I’ve thought of a question that might be worth considering. It just feels as though it could clarify the discussion a bit.

πŸ™‚ OK, what is it?

😐 Well, it seems to me that, motivated by the remarks of Aaronson and Wigderson, you are trying to distinguish between low-complexity and high-complexity polynomials, even when they have the same degree. This seems worth thinking about even for quadratics: a simple counting argument tells us that most quadratics have superlinear (in fact, almost quadratic) circuit complexity. So what is the difference between a high-complexity quadratic and a low-complexity quadratic? If you force yourself to think about just this question, then maybe you won’t keep coming up with properties that fail for known reasons.

πŸ™‚ That sounds like an excellent idea, but potentially also a very difficult one, because it might be possible to define “weird” quadratics by devising some clever low-complexity calculation that involves functions that are nothing like quadratics but that magically produce a quadratic at the end. But perhaps one could have a restricted model of computation that allowed more gates but insisted that all functions at every stage were quadratics.

8) That is sounding very like arithmetic complexity to me.

😐 What is arithmetic complexity?

8) It’s very like circuit complexity, but you replace the Boolean operations \vee and \wedge by arithmetic operations such as + and \times. I think a good illustration of the basic idea is the function x^{2^n}. This has arithmetic complexity n because you can get it in n steps by starting with x and repeatedly squaring. (I’m now talking about arithmetic complexity of polynomials over the reals, by the way.) This shows that the arithmetic complexity of a polynomial can be far smaller than the degree. And in general it seems to be hard to prove lower bounds for arithmetic complexity.

A very important fact in cryptography is that the arithmetic complexity of raising a number to a power modulo a large index n is small. If you want to work out x^m mod n, then you can do the repeated-squaring trick to work out x^{2^k} for all k\leq\log_2m and then multiply together the ones you need to make x^m. This shows that calculating x^m has arithmetic complexity that is polylogarithmic in m. Equally important is the fact that it seems to be much harder to calculate factorials: if there were a quick way of calculating (n-1)!, for example, then there would be a quick primality test. (Of course, one could say that there is a quick way: use AKS to test whether n is prime and then the answer is -1 if it is prime and 0 otherwise. But being able to calculate factorials and similar polynomials quickly would have major consequences besides primality testing.)

πŸ™‚ But I don’t see how this applies to quadratics, because they are not closed under taking products.

8) I agree that that’s a problem. But here’s the type of thing one could do. You’ll agree that the map x\mapsto x_1x_2+\dots+x_{n-1}x_n+x_nx_1 is a high-rank quadratic.

πŸ™‚ Yes. I think it has rank either n or n-1.

8) And it’s also computable in linear time, has linear arithmetic complexity, etc.

πŸ™‚ Yes, but its associated matrix is very sparse, since it has only 2n entries.

😐 Sorry, but why 2n rather than n?

πŸ™‚ Because the bilinear form you get is symmetric: corresponding to each term x_ix_j you get a term x_iy_j+x_jy_i in the bilinear form.

😐 Ah, I see.

8) The point I wanted to make is that you can produce less sparse matrices by producing some more interesting ensembles of linear forms than x_1,\dots,x_n. What you do is start with a straight-line computation that allows only addition mod 2. In other words, you write a sequence x_1,x_2,\dots,x_n,y_1,\dots,y_m, where each y_i is a sum of two earlier terms in the sequence (either xs or ys). Then at the end of that process you create a quadratic form \sum_j u_jv_j, where each u_j and each v_j is one of the linear forms from the nice little supply that you have built up.

πŸ™‚ Ah, I like that. First of all, it really does seem a natural model for the computational complexity of quadratic forms over \mathbb{F}_2. OK, it doesn’t allow for computations that produce quadratic forms by magic, but it does seem to allow all “natural” ways that one might want to produce them. The second thing I like about it is that it relates very closely to the Gaussian-elimination problem. In fact, the only difference is that in that problem when you work out the mod-2 sum of two of your functions, you have to choose one of them and vow never to use it again, whereas here we drop that restriction. It shows that what makes a quadratic computationally simple is the possibility of writing it as a sum of rank-one quadratics (by which I mean functions of the form x\mapsto \lambda(x)\mu(x), where both \lambda and \mu are linear) that are somehow related to each other in a “simple” way. We see that the rank is far too crude a parameter because it takes no account of this relationship (or lack of it).

😐 Quick question: to what extent is the representation of a quadratic unique? In particular, might it have a “simple” representation and a “complex” one? Then in order to prove that a quadratic could not be computed in linear time (in this model) one would have to consider all possible representations rather than just one, which would be annoying.

πŸ™‚ I think uniqueness of any kind is too much to hope for. For example, if you work mod 3 instead and choose an arbitrary “orthonormal basis” (by which I mean a collection of binary sequences v_1,\dots,v_n such that \sum_r v_i(r)v_j(r)=\delta_{ij} mod 3 for every i,j), then the matrix a_{rs}=\sum_iv_i(r)v_i(s) works out to be the identity (as can be seen if you multiply it by any of the vectors v_i). The resulting quadratic form is x_1^2+\dots+x_n^2. There are annoying characteristic-2 problems if you want to do something similar over \mathbb{F}_2, since then x_i^2=x_i, but I suspect it would be easy to come up with an example that avoids the diagonal and makes the same general point.

😐 Sorry, I got lost there—what is the general point?

8) It’s that one can probably find “weird” ensembles of linear forms that give you quadratics that can also be produced in non-weird ways. But now it’s my turn to be lost. What was the point you were making?

😐 I haven’t quite got it clear myself. But I liked this connection between the complexity of quadratic forms and the Gaussian-elimination problem and was trying to see how closely they were related. It seems that what you do here is a slightly generalized Gaussian-elimination procedure to produce two matrices (one with rows u_i and one with rows v_i) and you then multiply the first by the transpose of the second to get the matrix of the quadratic form. So we are not in fact asking whether the matrix of the quadratic form can be obtained in linear time using Gaussian row operations, or anything like that at all. And one can express the matrix as a product of two other matrices in many different ways, so it looks as though it will be pretty difficult to identify some quasirandomness property that forces all decompositions of the form AB^T to be complex in the Gaussian-elimination sense.

I suppose what I’m saying is that even if we could solve the Gaussian-elimination problem, we’d still have a long way to go.

πŸ™‚ I think that you are right, which is unfortunate.

😐 Thanks.

πŸ™‚ I don’t mean your correctness is unfortunate, but just that the correct fact you’ve identified (if it is correct) is a pity. But I still like your proposal that we should look at quadratics, because it still seems to be subject to the natural-proofs barrier, and we won’t be tempted to use crude parameters such as degree and rank because all our polynomials are quadratics and they can quite easily have maximal rank.

8) Yes, but now you haven’t the faintest idea how to prove anything.

πŸ™‚ That’s better than having lots of faint ideas that are guaranteed not to work. And in any case, now, despite what 😐 has pointed out, I want to think about the Gaussian-elimination problem. I have a simple question: how many Gaussian operations do you need in order to produce a quasirandom matrix?

😐 What do you mean by that?

πŸ™‚ I mean a matrix where the number of 1s is approximately n^2/2, and for almost any two rows they agree in approximately half the places. (Actually, the second property implies the first.)

8) That sounds like a very natural property to me.

πŸ™‚ Thank you.

8) I didn’t mean it as a compliment. It sounds easily computable, and hence very unlikely to be a good indication of complexity. I’m sure it will be possible to produce such matrices in a linear number of steps.

πŸ™‚ Oh yes. Well, perhaps we should just confirm that.

A random sequence of row operations doesn’t work I think, because each row ends up depending on only very few of the original rows. To put it another way, the matrix you end up with is far too sparse. But one can deal with that by first forming a triangular matrix with 0s on one side of the diagonal and 1s on the other side—it’s easy to see how to do that. Perhaps if one then applies a random sequence of Gaussian operations one reaches a quasirandom matrix in linear time.

8) Hmm, I’m not sure. For a start there will be a few rows that you will never touch again.

πŸ™‚ Yes, but all I need is for almost all rows to look random at the end of the process.

8) I think you need to be fussier than that. After all, the Paley matrix is very quasirandom according to your definition, so if you can’t produce extremely good quasirandomness in linear time then the Walsh matrix needs a superlinear number of row operations.

πŸ™‚ That was indeed an example I had in mind actually.

😐 What’s the Walsh matrix?

πŸ™‚ A quick definition is this: W_n=\left(\begin{matrix}W_{n-1}&W_{n-1}\\W_{n-1}&-W_{n-1}\end{matrix}\right). If W_0 is the 1\times 1 matrix with just a 1 inside, then you’ll end up with a 2^n\times 2^n orthogonal matrix of 1s and -1s. If you change that to a 01-matrix in the obvious way then you get what one might call perfect quasirandomness, apart from the fact that one row is all 1s.

8) Right, how do you get that level of quasirandomness with linearly many row operations?

πŸ™‚ Hmm, we have an inductive construction, so presumably we can translate that into some inductively defined sequence of row operations. How do we produce W_1 from the 2\times 2 identity matrix? That’s quite easy: add the second row to the first to get \left(\begin{matrix}1&1\\ 0&1\\ \end{matrix}\right), then the first to the second to get W_1.

In general, if we’ve got the matrix \left(\begin{matrix}W_{k-1}&0\\ 0&W_{k-1}\\ \end{matrix}\right), we can add the second “block row” to the first and then the first to the second. That takes 2^k operations. So if it takes f(k) operations to produce W_k, then we seem to have a recurrence like f(k)=2f(k-1)+2^k. That recursion is satisfied by the function f(k)=k2^k. If we now set n=2^k we get a bound of n\log n rather than a linear bound.

😐 It would be very surprising if that was not the quickest way of producing W_k. It just seems so obvious and natural. Is it really true that the best known lower bounds for this problem are only linear?

πŸ™‚ Maybe we should check. There’s obviously lots to think about here, but I’m getting tired. Shall we stop for now?

8) 😐 OK.

5 Responses to “A conversation about complexity lower bounds, IX”

  1. luca Says:

    In the notion of algebrization, the function \tilde A is a low-degree polynomial (of degree n^{O(1)}) over a field F of size n^{O(1)} with the property that for every x \in \{0,1\}^n we have A(x)=\tilde A(x), where we identify the 0,1 of the field F_2 with the 0,1 of the field F.

    (There are actually going to be different fields F for different input lengths, so \tilde A can be seen as a family of functions parameterized by n, and the n-th function has domain F^n_{q(n)} where the size of the field q(n) is polynomial in the dimension n.)

    Having an algebrizing proof that NP \not\subseteq P means that for every A and every \tilde A that is in the above relation with A we have NP^{\tilde A} \not\subseteq P^A.

    In a relativizing proof, we would have that for every A it holds NP^A \not\subseteq  P^A. This would imply that the proof also algebrizes, because for \tilde A which is an “extension” of A as defined above, we have NP^A \subseteq NP^{\tilde A}, simply because the oracle queries to A can be realized as oracle queries to \tilde A.

    The converse, however, need not be true: it could be (and it is the case for certain complexity classes) that, even though there is an oracle A such that C^A = D^A, so that no relativizing proof that D \not\subseteq C is possible, for every A and every extension \tilde A we have D^{\tilde A} \not\subseteq C^A, thanks to the extra power that we gain by evaluating \tilde A at non-boolean points.

    • luca Says:

      Sorry for the mess: in the third paragraph, NP^{\tilde A} \not\subseteq NP^A should be NP^{\tilde A} \not\subseteq P^A; in the formula that does not parse a backslash is missing before ‘subseteq’, and later a backslash is missing before ’tilde.’ And ‘algebrization’ is misspelled in the first sentence.

      [Thanks — I’ve edited out those typos now.]

  2. gowers Says:

    I still find this concept weirdly difficult to understand. For example, you say that a proof that NP is not a subset of P algebrizes if for every A and every \tilde A that extends A we have NP^{\tilde A}\not\subseteq P^A. Later you say that NP^A\subseteq NP^{\tilde A}. So surely the proof algebrizes if and only if NP^A\not\subseteq P^A. The proof would be that if you want to show that NP^{\tilde A}\not\subseteq P^A for every \tilde A that extends A it is enough to show it for the smallest such NP^{\tilde A}, which is NP^A.

    I’m sure there’s something completely obviously wrong with what I’m writing, but I just don’t see it at the moment.

    • luca Says:

      Good point, evidently I too I was confused about the definition. I said that, in order to define an admissible \tilde A, we choose a sequence of field sizes q(n) and then a sequence (indexed by n) of polynomials over F^n_{q(n)} which agree with A on \{0,1\}^n. Then, as you say, one could take q(n)=2 and \tilde A = A.

      Instead the definition of extension oracle (Def 2.2 in the paper) is as follows. To define an admissible \tilde A, we pick for every n and for every field F a polynomial of low degree \tilde A_{n,F} over F^n which agrees with A over \{ 0,1 \}. Then the oracle \tilde A is the entire collection of polynomials.

      The definition tries to model the setting in which we have some kind of “interpolation formula” for A, and hence we are able to evaluate the formula while doing operations over any field of our choice. No matter what field we choose, the invariant is maintained that over \{ 0,1 \}^n we recover A and that the polynomial degree of the function we compute is small. (The actual definition does not require that, for every n, there has to be a single algebraic formula from which all the \tilde A_{n,F} can be computed, and in fact it does not require any relationship between the \tilde A_{n,F} for different F, other than agreeing over \{ 0,1 \}^n. This is just intuition for how an \tilde A might actually look like.)

    • gowers Says:

      Ah, I see now. So because you’re forced to choose an extension of A to every field that extends F_2, in a certain sense \tilde A is forced to be large and NP^{\tilde A}\not\subseteq P^A is a correspondingly weaker statement than NP^A\not\subseteq P^A, so saying that no proof that P\ne NP algebrizes is stronger than saying that no proof relativizes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: