As many people have pointed out, to get to a new and better system for dealing with mathematical papers, a positive strategy of actually setting up a new system might work rather better than complaining about the current system. Or rather, since it seems unlikely that one can simply invent ex nihilo a system that’s satisfactory in all respects, one should set up systems (in the plural) and see which ones work and catch on.
I’ve already had a go at suggesting a system, back in this post and this post. Another system that has been advocated, which I also like the sound of, is free-floating “evaluation boards” that offer their stamps of approval to papers that are on the arXiv. (I associate this idea with Andrew Stacey, though I think that in this area there are several good ideas that have been had independently by several people.) But instead of discussing particular systems, which runs the risk that one ends up arguing about incidental details, I want to try to adopt a more “axiomatic” approach, and think about what it is that we want these new systems to do. Once we’re clear on that, we have a more straightforward problem to solve: how do we achieve most efficiently what we want to achieve?
The first “axiom” I suggest is one that pretty well everyone seems to agree about, so I won’t argue for it. It’s that the internet takes care of dissemination very nicely (assuming that we bother to make our papers available). So this is not a function we should worry about.
Slightly more controversially, I would suggest that typesetting and copy-editing, services that journals provide to some extent, can be ignored in this discussion. We’re all capable of producing reasonably nice versions of our papers (or at least almost all of us are) for the arXiv, so we shouldn’t let the additional fine tuning be a major consideration when we think about what we want out of a new system. Basically, if you want your paper to look good, it’s up to you to put in the work to make it look good (though getting feedback from other people may well be helpful — that’s another matter).
Why do we need an official “mark of quality” at all? One might argue that when a paper is written, then either it isn’t interesting enough to get people’s attention, or it gets people’s attention and those people provide feedback (in particular drawing attention to potential mistakes if they exist). Thus, the papers that matter get looked at and understood well before they come out in journals, and the ones that don’t are quietly forgotten.
What about an excellent result produced by a newcomer to a field? Is there not a danger that that result will be unjustly ignored? Well, yes, but I would suggest that if you are such a newcomer, then what will really make a difference to your situation is not the pat on the back you’ll get if you’re lucky by having your excellent result accepted by a good journal. Long before that happens, you should get in touch with experts, go to conferences, and so on. If your result really is an important contribution, people will be very pleased to have their attention drawn to it.
The trouble with the kind of argument I sketched out in the previous paragraph but one is that we often have to judge mathematicians in areas other than our own. For example, we might be on a hiring committee for a job with hundreds of applicants. If so, we need quick, and therefore crude, methods of evaluation. Journals give us one such method: we just skim through the publication list and get a sense of the quality of the journals that a candidate has managed to publish in. You can argue, and I would argue, that this measure is too crude. But if you’ve got to get a list down from 500 to 60, say, then the difference between spending 30 seconds per candidate and a minute per candidate is over three hours of mind-numbingly tedious work. So I’ll adopt as my second “axiom” that we need some kind of “metric”. It’s an unfortunate necessity, to be sure, and one would hope that crude metrics were used only for a preliminary sifting when there is a very long list of candidates, and not, say, to distinguish between two candidates on a shortlist. But it’s still something that some of us sometimes need.
Since I’m trying to discuss the fundamentals here, let me briefly address the question of whether the notion of the “quality” of a piece of mathematics makes sense. We certainly talk as though it makes sense, but is there something objective that underlies the seemingly subjective judgments that we make the whole time?
I think it is well worth thinking quite hard about this question. What makes a piece of mathematics good? I don’t just mean the extreme cases such as a theorem that opens up a new field or solves a major open problem. Those are the easy cases. But if we’ve got one journal that’s “a little bit below Annals” and another that “accepts high-quality papers” but is regarded as step below the first, can we say what it is that papers in the first journal have got that papers in the second lack?
Obviously we’re never going to come up with precise criteria that would allow us to give a numerical measure of quality to people’s papers. But in a way that is what we’re trying to do. We behave as though such a measure exists, even if it is extremely hard to calculate, and fondly imagine that our journal system dimly reflects the “true quality”.
Let me attempt to describe a few different quality scales, since a single linear scale doesn’t seem right.
1. Solving an open problem.
A result will cause a stir if it solves a long-standing open problem that has been worked on by many highly reputable mathematicians. [This is an inductive definition, since a reputable mathematician is one who has produced excellent papers.] An extreme example is the solution of Fermat’s last theorem by Wiles and Taylor/Wiles. A less extreme (but still pretty unusual) example is the recent solution of the Erdős distance problem by Guth and Katz.
There’s an implicit measure here: how long has the problem been around, how many mathematicians have worked on it, and how good are they? In my area people will know, or at least be pretty sure, that such-and-such a problem has been worked on by, say, Noga Alon or Jean Bourgain, and that will place a very high lower bound on the achievement of someone who manages to solve it.
But there’s another measure that is also important, which is what one might call the size of the potential audience. Is your problem of interest to all mathematicians, or all number theorists, or all analytic number theorists, or all mathematicians working on estimates for exponential sums, or all mathematicians working on refinements to what we know about Waring’s problem, or …?
2. Introducing ideas/definitions/techniques that change the way other mathematicians tackle a range of problems in some field. (Extreme example: the discovery of the Seiberg-Witten equations.)
Again there are two measures one might apply here. How radical and new were the ideas? How many top mathematicians could reasonably be said (i) to be instantly capable of recognising their significance but (ii) to have failed to notice them? That’s a measure of something like the “cleverness” of the ideas. But then there is the breadth again: how big a circle of mathematicians will have their lives changed by these ideas?
3. Making progress on a well-established project.
Quite a lot of good papers neither solve a pre-existing problem nor introduce a technique that changes a field. Rather, they make an incremental contribution to a research programme. I suppose a fairly extreme example of this is some of the work that was done on the classification of finite simple groups: it was hard and needed significant expertise, but, barring unexpected difficulties, it was always going to get done. (Just to be clear, I’m not saying that that’s a good description of every part of the classification.) The measures of quality here could be technical difficulty, expertise needed, even length. And of course all that should be multiplied by something like the size of the circle of mathematicians for whom that particular research programme is a truly central one.
4. Doing something difficult.
Leaving aside the interest of a result altogether, one of the things that makes it good, or at least is positively correlated with quality, is how difficult it is. This is particularly hard to measure, because what we want to measure is not absolute difficulty but difficulty taking into account what people could already do. If you’re not an expert in an area, then a proof may look extraordinarily difficult, because the author (all too often) has not taken the trouble to say what was truly new and what was merely a long but standard argument. Here of course the opinion of a referee can be extremely helpful. It doesn’t happen as often as I’d like, but at least in principle a referee can say, “The proof of Lemma 2.3 looks very long and complicated, but it goes along the lines that I’d expect,” or, “When I read the statement of Lemma 4.1 my first reaction was to think that it couldn’t be true, but then I understood what was going on. This is a truly important new idea.”
5. New ideas.
Now that I’ve used that phrase, let me give it a category to itself. One important measure of a paper is the degree to which that paper is going to help other mathematicians with their own research. It can do this in many ways: providing new techniques, proving statements that can be applied to solve other problems, providing proof templates that can be imitated. One nice thing it can do is enlarge our stock of ideas. It’s hard to say exactly what this means, but we do have a sense that “this paper contains two main ideas” and things like that. Maybe a loose definition would be that an idea is a piece of mathematics that can be condensed into a slogan, such as, “If the characteristic function of a set has no non-trivial large Fourier coefficients, then the set behaves like a random set.”
As ever, some new ideas (such as “try a random example”) are immensely broad, while others are considerably narrower.
No doubt the list I’ve just produced of possible attributes of a paper is incomplete. The point I’m making, however, is that there could be something to be gained from trying to be more precise about what it is we are looking for in a paper. For instance, if we are giving a stamp of quality to a paper whose main merit is that it solves a problem that was in the literature, we might want to be able to give a stamp that everyone in the relevant area understands as, “This paper solves a problem that Noga Alon has undoubtedly thought about. It is of strong interest to specialists in Ramsey theory and some interest to other combinatorialists.”
Could we hope to have a quality stamp as fine-grained as that? Possibly, but if not then there is an alternative, which one might call mini-reviews. Our evaluation procedures, whatever form they took, could result in crude stamps (for the very quick evaluations) coupled with more detailed, but short, justifications for those stamps. What I’ve just written could, for instance, be made slightly more formal as follows. “This paper solves a problem that was posed in 1995 and has attracted the attention of several of the top names in the field. It is of strong interest to specialists in Ramsey theory and some interest to other combinatorialists.” People writing such justifications would be instructed to convey as accurately as they could, in narrative terms, how good a paper was according to at least some of the criteria outlined earlier. (When it comes down to it, I think the main thing I’d want to get a sense of from one of these justifications is the breadth of interest: that is, the size of the circle of mathematicians likely to be interested in the paper. But it’s still interesting to know what kind of mathematical contribution one is dealing with.)
If you could call up, online, a list of people’s papers together with very brief summaries of that kind, then you would have an excellent way of getting a good feel for someone’s achievements. Perhaps it would be a good way of whittling a list of job candidates down from 60 to the half dozen that you really want to look into carefully.
I’m in danger of getting away from the abstractness of the discussion and actually making a concrete suggestion, which is partly because I think that there is something that the current system does not provide (roughly speaking, a way of saying how good a paper is that’s more precise than one’s rough impression of the quality of the journal it’s in but less precise than a well-written expert reference) that would be extremely useful. So I have felt the need to explain in a little detail what that missing something in the current system is.
After all that, I have arrived at the following general principles that I think should apply to any new system of evaluating papers.
1. It should be independent of whatever mechanisms we use for dissemination and formatting.
2. It should give us a “metric” that we can use to make very quick judgments in situations where that is an unfortunate necessity.
3. It should also provide us with a more precise idea of the kind of contribution that a paper makes — how genuinely difficult, how new, how broadly interesting, etc.
I’ve forgotten one other important thing, so let me state it and then discuss it.
4. It should be at least as good as the current system at identifying papers that are wrong.
One of the things that the current system is supposed to do is give us confidence that all those results in the literature are correct. My own belief is that this is an illusion, and that many published results have not been carefully scrutinized by their referees. I also think that this doesn’t matter too much, because the results that really matter, the ones that contribute to the big story of mathematics, the ones that form the lower floors of some big mathematical edifice, are, by and large, scrutinized carefully. It’s just that the scrutiny is more or less independent of the journal system.
Anyhow, the wording in 4. is chosen carefully: I’d be unhappy with a new system that left us significantly less confident in the correctness of our generally accepted results, but I also think that there is not much danger of that.
Here are a couple more features that I think a new system should have if it is to have any chance of success.
5. It should be able to coexist with the current system.
The point here is that we can’t just jump from one system to another. Rather, a new system has to get started and be seen to work, and then the old system can afford to retire gracefully.
6. It should carry authority.
What I mean here is that I can imagine a group of people setting up a wonderful paper-evaluation system and a large proportion of mathematicians not taking it seriously and not regarding it as “counting”. If that happened, then people would not be ready to rely on the evaluations from those systems when it came to building up their CVs.
In practice, the effect of this, at least to start with, could be that the new evaluation systems have to mimic journals more than one might ideally want. For instance, suppose you had an idea for a variable stamp of approval — a paper could be passed at level 1, 2 or 3, say. That probably wouldn’t work to begin with, because it would require people to buy into your system. But if you had three pseudo-journal names and made it known that one was for amazing papers, one for very good ones, and one for good ones, then it would be more like the old system (in the sense that people could make their snap judgments based on some notion of journal quality), and people wouldn’t worry that their CVs were going to be underestimated or misunderstood.
As for more radical departures from the current system, such as websites where people review papers and build up reputation points, they are likely to have to run for a long time before any young mathematician would feel able to submit a paper to one of those sites and nowhere else. But maybe it would be good to get some of these things going so as to start the process of familiarization and, perhaps, acceptance.
One final point I should make before stopping (really it belongs earlier, but I can’t face reorganizing this post) is that the journal system performs rather different functions at different quality levels. At the top, people getting papers into the very best journals may thereby improve their chances of getting jobs at the very best universities. But there are plenty of people whose needs are rather different. I have often had to referee papers that seem completely uninteresting. But these papers will often have a string of references and will state that they are answering questions from those references. In short, there are little communities of mathematicians out there who are carrying out research that has no impact at all on what one might call the big story of mathematics. Nevertheless, it is good that these researchers exist. To draw an analogy with sport, if you want your country to produce grand slam winners at tennis, you want to have a big infrastructure that includes people competing at lower levels, people who play for fun, and so on. Similarly with mathematics, if we want the subject to thrive, we need our own big infrastructure, with excellent teachers (who are likely to be better if they are doing research, whatever that research is like), large numbers of mathematics departments rather than just a few, and so on. This creates a need for some way of distinguishing papers that are bona fide research papers (even if not terribly interesting) from nonsense.
So I’ll add another requirement of a system.
7. Amongst other things it should be able to provide a “minimal stamp of approval”, the meaning of which is something along the lines of “This is a genuine piece of mathematical research and it looks correct.”
At the moment this minimal stamp of approval is obtained if you can get your paper published somewhere (with the possible exception of Chaos, Solitons and Fractals).
What I find encouraging about the current situation is that it does seem to be possible to imagine a continuous (and monotonically improving) path from where we are now to where we might want to be in the not too distant future. We could begin by setting up cheap but relatively conventional alternatives such as open-access electronic journals. But it would then be easy for the editors of an electronic journal to experiment with the refereeing procedure. (For example, one innovation I would like to see is the process of evaluation being separated from the process of carefully reading a paper and making comments about its presentation. The latter could be done non-anonymously and in consultation with the author. Or the author could even commission this work in advance from a colleague and submit the paper with something like a certification of correctness. The person who provided that certification would not be anonymous, so their reputation would be on the line if the paper turned out not to be correct.) And if different electronic journals had different refereeing procedures, the stage would be set for significantly different procedures of a kind that many people, including me, enjoy fantasizing about. (If you do too, then you may enjoy this blog discussion.)