If you are a mathematical researcher, do you ever stop to ask yourself what the point is of all your research? Do you worry that the world could get along just fine without it?
One person who doesn’t lose any sleep over doubts like this is Emmanuel Candès, who gave the second plenary lecture I went to. He began by talking a little about the motivation for the kinds of problems he was going to discuss, which one could summarize as follows: his research is worthwhile because it helps save the lives of children. More precisely, it used to be the case that if a child had an illness that was sufficiently serious to warrant an MRI scan, then doctors faced the following dilemma. In order for the image to be useful, the child would have to keep completely still for two minutes. The only way to achieve that was to stop the child’s breathing for those two minutes. But depriving a child’s brain (or indeed any brain, I’d imagine) of oxygen for two minutes is not without risk, to put it mildly.
Now, thanks to the famous work of Candès and others on compressed sensing, one can reconstruct the image using many fewer samples, which reduces the time the child must keep still to 15 seconds. Depriving the brain of oxygen for 15 seconds is not risky at all. Candès told us about a specific boy who had something seriously wrong with his liver (I’ve forgotten the details) who benefited from this. If you want a ready answer for when people ask you about the point of doing maths, and if you’re sick of the Hardy-said-number-theory-useless-ha-ha-but-what-about-public-key-cryptography-internet-security-blah-blah example, then I recommend watching at least some of Candès’s lecture, which is available here, and using that instead. Then you’ll really have seized the moral high ground.
Actually, I recommend watching it anyway, because it was a fascinating lecture from start to finish. In that case, you may like to regard this post as something like a film review with spoilers: if you mind spoilers, then you’d better stop reading here.
I have to admit that as I started this post, I realized that there was something fairly crucial that I didn’t understand, that meant I couldn’t give a satisfactory account of what I wanted to describe. I didn’t take many notes during the talk, because I just wanted to sit back and enjoy it, and it felt as though I would remember everything easily, but there was one important mathematical point that I missed. I’ll come back to it in a moment.
Anyhow, the basic mathematical problem that the MRI scan leads to is this. A full scan basically presents you with the Fourier transform of the image you want, so to reconstruct the image you simply invert the Fourier transform. But if you are sampling in only two percent of directions and you take an inverse Fourier transform (it’s easy to make sense of that, but I won’t bother here), then you get a distorted image with all sorts of strange lines all over it — Candès showed us a picture — and it is useless for diagnostic purposes.
So, in a moment that Candès described as one of the luckiest in his life, a radiologist approached him and asked if there was any way of getting the right image from the much smaller set of samples. On the face of it, the answer might seem to be no, since the dimension of the space of possible outputs has become much smaller, so there must be many distinctions between inputs that are not detectable any more. However, in practice the answer is yes, for reasons that I’ll discuss after I’ve mentioned Candès’s second example.
The second example was related to things like the Netflix challenge, which was to find a good way of predicting which films somebody would like, given the preferences of other people and at least some of the preferences of the person in question. If we make the reasonable hypothesis that people’s preferences depend by and large on a fairly small number of variables (describing properties of the people and properties of the films), then we might expect that a matrix where the th entry represents the strength of preference of person for film would have fairly small rank. Or more reasonably, one might expect it to be a small perturbation of a matrix with small rank.
And thus we arrive at the following problem: you are given a few scattered entries of a matrix, and you want to find a low-rank matrix that agrees pretty well with the entries you observe. Also, you want it the low-rank matrix to be unique (up to a small perturbation) since otherwise you can’t use it for prediction.
As Candès pointed out, simple examples show that the uniqueness condition cannot always be obtained. For example, suppose you have 99 people with very similar preferences and one person whose preferences are completely different. Then the underlying matrix that describes their preferences has rank 2 — basically, one row for describing the preferences of the 99 and one for describing the preference of the one eccentric outlier. If all you have is a few entries for the outlier’s preferences, then there is nothing you can do to guess anything else about those preferences.
However, there is a natural assumption you can make, which I’ve now forgotten, that rules out this kind of example, and if a matrix satisfies this assumption then it can be reconstructed exactly.
Writing this, I realize that Candès was actually discussing a slight idealization of the problem I’ve described, in that he didn’t have perturbations. In other words, the problem was to reconstruct a low-rank matrix exactly from a few entries. An obvious necessary condition is that the number of samples should exceed the number of degrees of freedom of the set of low-rank matrices. But there are other conditions such as the one I’ve mentioned, and also things like that every row and every column should have a few samples. But given those conditions (or perhaps the sampling is done at random — I can’t remember) it turns out to be possible to reconstruct the matrix exactly.
The MRI problem boils down to something like this. You have a set of linear equations to solve (because you want to invert a Fourier transform) but the number of unknowns is significantly larger than the number of equations (because you have a sparse set of samples of the Fourier transform you want to invert). This is an impossible problem unless you make some assumption about the solution, and the assumption Candès makes is that it should be a sparse vector, meaning that it has only a few non-zero entries. This reduces the number of degrees of freedom considerably, but the resulting problem is no longer pure linear algebra.
The point that I missed was what sparse vectors have to do with MRI scans, since the image you want to reconstruct doesn’t appear to be a sparse vector. But looking back at the video I see that Candès addressed this point as follows: although the image is not sparse, the gradient of the image is sparse. Roughly speaking, you get quite a lot of patches of fairly constant colour, and if you assume that that is the case, then the number of degrees of freedom in the solution goes right down and you have a chance of reconstructing the image.
Going back to the more general problem, there is another condition that is needed in order to make it soluble, which is that the matrix of equations should not have too many sparse rows, since typically a sparse row acting on a sparse vector will give you zero, which doesn’t help you to work out what the sparse vector was.
I don’t want to say too much more, but there was one point that particularly appealed to me. If you try to solve these problems in the obvious way, then you might try to find algorithms for solving the following problems.
1. Given a system of underdetermined linear equations, find the sparsest solution.
2. Given a set of entries of a matrix, find the lowest rank matrix consistent with those entries.
Unfortunately, no efficient algorithms are known for these problems, and I think in the second case it’s even NP complete. However, what Candès and his collaborators did was consider convex relaxations of these problems.
1. Given a system of underdetermined linear equations, find the solution with smallest norm.
2. Given a set of entries of a matrix, find the matrix with smallest nuclear norm consistent with those entries.
If you don’t know what the nuclear norm is, it’s simple to define. Whereas the rank of a matrix is the smallest number of rank-1 matrices such that is a linear combination of those matrices, the nuclear norm of is the minimum such that you can write with each a rank-1 matrix of norm 1. So it’s more like a quantitative notion of rank.
It’s a standard fact that convex relaxations of problems tend to be much easier than the problems themselves. But usually that comes at a significant cost: the solutions you get out are not solutions of the form you originally wanted, but more like convex combinations of such solutions. (For example, if you relax the graph-colouring problem, you can solve the relaxation but you get something called a fractional colouring of your graph, where the total amount of each colour at two adjacent vertices is at most 1, and that can’t easily be converted into a genuine colouring.)
However, in the cases that Candès was telling us about, it turns out that if you solve the convex relaxations, you get exactly correct solutions to the original problems. So you have the following very nice situation: a problem is NP-complete, but if you nevertheless go ahead and try to solve it using an algorithm that is doomed to fail in general, the algorithm still works in a wide range of interesting cases.
At first this seems miraculous, but Candès spent the rest of the talk explaining to us why it isn’t. It boiled down to a very geometrical picture: you have a convex body and a plane through one of its extreme points, and if the plane is tangent to the body then the algorithm will work. It is this geometrical condition that underlies the necessary conditions I mentioned earlier.
For me this lecture was one of the highlights of the ICM, and I met many other people who greatly enjoyed it too.