It’s what blogging is all about I suppose, but I have been surprised in several different ways by the comments on my previous post. To begin with, I was so sure of the principle I was advocating that I thought that all I’d have to do was explain it briefly and then anybody who read it would instantly agree with it. That was clearly pretty naive of me, and I certainly didn’t expect that some people would be actively hostile to the idea (though I suspect that their real target was not precisely the same as what I was putting forward). But I was also surprised by the number of interesting further points and qualifications that were made, which I will now try to use to articulate a more nuanced version of the principle.
Amongst these further points were the following. If one is sufficiently used to a particular style of definition then it may well not be necessary to give examples first: for instance, if you know the definition of a field, then you can easily grasp the definition of a ring without having a chat about polynomials or something first. (Of course, if you want to understand the point of defining rings, then such a chat is essential, but it’s not so important to have the chat first.) JSE (who, despite his denials, gave a beautiful demonstration of the principle of examples first in his PCM article) makes the point that some mathematicians find examples confusing unless they already know what they are supposed to be illustrating, and the further point that promising one kind of explanation while one gives another can be very reassuring, whichever way round you do it.
One small point in response to JSE: if you don’t want to confuse the reader/listener when you discuss examples, one approach is not to give away what you are doing. (See your own PCM article for an instance of this.) For example, in the second explanation of fields in the previous post, there is a discussion of number systems. Since it is stating some fairly obvious and familiar facts about those number systems, there can’t really be much reason for confusion. But if one began by saying, “By the way, I’m leading up to a definition of some things called fields here,” then some people might be distracted by wondering what they were supposed to be getting out of the examples.
Another point that comes out of several comments is that a lot depends on the circumstances of a presentation. I think the principle applies most strongly when the presentation is not fully formal — e.g. in an expository article, or a conversation with another mathematician, or a colloquium talk, or in a seminar where you can’t expect too much of your audience. When it comes to a formal lecture course, I think my practice would be to write up fairly traditional notes on the blackboard, but to give a lot of accompanying chat: the preliminary examples would be part of the chat rather than part of the notes. As for textbooks, here there may well be disagreement, but I would argue for something similar to the lecture course approach, except that now the preliminary chat would be written.
On that last point, one person made the interesting comment that they were so used to reading papers and books in a non-linear way that they actually preferred papers and books that did not try to present themselves linearly (which is essentially what one is trying to do with the examples-first approach). My implicit suggestion of clearly distinguishing between the chat and the “real content” could perhaps lead to expositions that gave the best of both worlds.
More generally, one might take the attitude that, since it is an essential mathematical skill to be able to read and digest mathematics that is presented in a very formal way, and since part of that skill is to be able to supply one’s own examples, if you the author provide the examples yourself (whether before or after the generalities) then you are denying the reader the chance to develop that skill. To which I’d say: if you do not provide that chance, there will always be others who are more than happy to do so.
Now let me look at another mathematical concept and consider how it might be explained. This time I want to discuss a theorem rather than a definition, just to emphasize (as I didn’t in the previous post) that the examples-first principle is quite general and doesn’t just refer to places where you first introduce an abstract definition.
The theorem I’ve gone for is the orbit-stabilizer theorem, and I want to discuss how it might be presented to somebody who was already comfortable with the idea of a group action (though it’s quite an interesting question in itself how to explain group actions — in a funny way the examples are all too “obvious” for it to be easy to make clear what the real use of the concept is).
The approach that I’ll label “traditional” for the purposes of discussion is something like this. Let be a finite group and let be a set on which acts. Let be an element of . Then we define the orbit of to be the set and the stabilizer of to be the set . The orbit-stabilizer theorem states that . (I stress that I am summarizing the approach here rather than giving it in full: if I did it properly I’d state the theorem more formally and distinguish it much more clearly from the surrounding discussion, which itself would be a bit longer.)
To prove the theorem, we define a map from to the left cosets of by sending to . One must check that this map is well-defined and that it is a bijection, which is an easy exercise. The result then follows from the fact that all the left cosets of have the same size.
Once one has given this proof in a lecture course, a typical thing to do is to “test the understanding” of the theorem by means of some exercises, of which quite a common one is to count symmetries of Platonic solids. For instance, to count the number of rotational symmetries of an icosahedron, one lets be the group of all these symmetries and lets be a vertex of the icosahedron. Then the orbit of (under the obvious action) has size 12, since the icosahedron has twelve vertices that all “look the same” and the stabilizer has size 5 (since neighbouring vertices go to neighbouring vertices and you can’t reflect), so has size 60.
Now here’s an alternative approach. You begin by asking how many rotational symmetries an icosahedron has (as part of an informal discussion, say, before you “get down to business”). Most people will come up for themselves with the argument that a single vertex has 12 choices of where to go, and one of its neighbours then has 5, after which the rotation is determined: hence, there are 60 rotations.
At that point, one can say, “Now we are going to prove a theorem that shows that this type of argument works very generally.” And as you go through the proof outlined above, you can say, “Notice that in the example we looked at earlier, the orbit was the set of all vertices and the stabilizer was the set of all rotations that fixed a particular vertex.” Then the student will see that what you really need to know is that the set of transformations that send to always has the same size (which it does, as it’s a coset of the stabilizer). In fact, one is led to a better proof, I think: the result is true because what it is saying is that we partition according to what the elements do to a fixed element of . The cells of this partition all have the same size since they are cosets of , and obviously the number of them is the size of .
One could have given that last argument as the proof of the orbit-stabilizer theorem, but its true simplicity is much more obvious if you’ve already experienced it by counting symmetries.
I’ll probably add to this post in due course — perhaps giving a list of circumstances where it may be better not to put examples first.