Equivalence relations are in a way a fairly simple mathematical concept. After all, it’s not that hard to learn what reflexive, symmetric and transitive mean and to remember that if you’ve got all three properties then you’ve got an equivalence relation. However, equivalence relations do still cause one or two difficulties. One, which I’ll discuss first, is that many people find that the proof that equivalence classes always form a partition is rather complicated and hard to remember. The other is that equivalence relations are closely connected to the notion of quotients, which appear in many places in mathematics, and which many people find quite hard to grasp.
Proving that equivalence classes form a partition.
I’d like to try to demonstrate that this longish argument is one that does not need to be memorized, as long as you are in the habit of applying a couple of standard proof-generating techniques that I have already mentioned in these posts. But first I should say more precisely what it is we are trying to prove.
Proposition. Let be a set and let be an equivalence relation on . Then the equivalence classes of form a partition of .
Finding a proof.
1. I shall assume that you can remember the formal definition of an equivalence relation. If you can’t, then look it up in your notes or on Wikipedia.
2. I’m not going to assume that you know the formal definition of partition. That’s because, if people I’ve taught over the years are anything to go by, there is a good chance that you have a reasonable mental picture of what it is, but are a little less sure of how to turn that picture into a formal definition.
What, then, is a partition of a set ? Informally, I want to say that it is a collection of disjoint subsets of that “cover the whole of “. But how do we write that out properly?
The first thing we should do is give a name to the collection of sets. Since it’s a partition we are defining, the letter p seems appropriate. But also, what we’ve got here is a set of sets (part of the reason the concept is ever so slightly tricky), and it is nice to have a notation that distinguishes it from elements (usually denoted by lower-case letters) and sets (usually denoted by upper case letters). So I’m going to use a curly p, looking like this: .
Note that is a set whose elements are sets. To distinguish between set-of-sets kinds of sets and set-of-elements kinds of sets I myself like to call a collection of sets rather than a set of sets.
How do we say in precise language that the sets in are disjoint? One’s first thought might be to say, “Any two sets in are disjoint.” However, there is a little piece of necessary pedantry that you have to add to that formulation. It should be, “Any two distinct sets in are disjoint.” But surely, you might protest, if one says “Any two sets” then one is talking about two sets, so of course they are distinct. I think the reason we insist so strongly on this point is that if you decide to write the statement out even more formally, it will look something like this:
If we write it like that but omitting the condition that , then we end up with an incorrect definition, because there really is nothing to stop equalling .
How do we say precisely that the sets in “cover” ? Well, that says that every element of belongs to at least one set in . In symbols, it says,
Of course, in combination with the disjointness property, that tells us that every element of belongs to exactly one set in , which in fact gives us a concise definition of what it means for to be a partition.
By this time you might be reminded of the definitions of injection and surjection: we need every element of to be in at most one set in (the disjointness conditions) and at least one set in (the covering condition). Is this just an amusing echo or is there a closer relationship? As always in mathematics, the answer is the latter: one can find a general framework that incorporates both. But I’ll leave that to you to think about if you feel like it.
Something else you might be wondering is why I went to the bother of writing and worrying about collections of sets. Why didn’t I just call the sets in the partition ? The answer is that if I had done that I would have inadvertently been making the assumption that there are only finitely many sets in the partition, and that doesn’t have to be the case. OK, you might counter, what about calling them in the infinite case? Well, not only does that force me to deal with two cases, which is ugly, but it also assumes that there are only countably many sets in the partition (a statement you won’t understand until you have covered countability in Numbers and Sets), which again needn’t be the case. So the curly-P notation, though not the only way of doing things, was chosen for pretty good reasons.
3. Right, we’ve been given a set and an equivalence relation on and we want to prove that the equivalence classes of form a partition of . So we’ve got a bunch of sets, the equivalence classes of , and we must prove that it has the properties required of a partition.
I’m going to do a step that after a small amount of practice one can do without. I shall start the proof as follows.
All I’ve done here is give a name to the collection of sets that we hope to prove is a partition of .
4. Now we can be very mechanical. We must prove that for every there is some such that , and we must prove that if and are distinct sets in , then and are disjoint.
5. Next comes a very important principle: if you’ve defined a set via some property and you have the statement then it is usually much better to rewrite this statement as . That’s not very clearly expressed, but here’s an example. What does it mean to say that ? Well, is the set of all subsets of that are equivalence classes of . So the statment can be rephrased as “ is an equivalence class of “. That’s what I meant earlier by saying that the step of defining wasn’t really necessary: we can get away with simply talking about equivalence classes. I call this the getting-rid-of-sets trick. Here, we apply it to get rid of .
6. What, by the way, is an equivalence class? Well, the equivalence class of an element of is defined to be , which I shall denote by . That is, it is the set of all elements of that are related to .
Using this, let’s go back to the first of the two statements we wanted to prove. It was this.
We can begin by translating this in a way that gets rid of .
Next, we can do a further translation that takes account of what equivalence classes are.
(In words, for every there is a such that belongs to the equivalence class of .)
What could we choose as our ? Here we are in a position a bit like that of a chess player who has only one move that gets out of check. We know absolutely nothing about elements of except that is one of them. So we’d better try itself. Is it true that ? Well, what does that mean?
The getting-rid-of-sets principle applies again. is defined to be . So the statement is equivalent to the statement . Applying that with we find that the statement is equivalent to the statement . And that is true since is reflexive.
I took a long time over explaining that, but that is merely because I’m trying to spell out every minute detail of the thought process. What one would actually write for the entire proof so far is this.
7. We now come to the harder part of the proof, which is to show that distinct sets in , in other words, distinct equivalence classes, are disjoint.
Again, we want to remember that an equivalence class is an equivalence class of something. So we could formulate our task as that of showing the following statement: for every pair of elements of , if and are distinct sets, then .
I’ve noted that this is somewhat similar to proving that a function is an injection, which suggests that it might be nicer to prove the contrapositive. Another sign is that as things stand at the moment we are trying to deduce a “negative” statement from a “negative” statement: that if and are not the same, then they do not have any elements in common. It feels cleaner somehow to try to prove that if and do have an element in common, then they must be the same set.
It’s just possible that you have been accidentally confusing equivalence classes with names of equivalence classes, and therefore thinking that and are distinct equivalence classes if and only if and are distinct elements of . But that is very much not the case, or equivalence relations wouldn’t be very interesting. Let us briefly remind ourselves what it means to say that .
Isn’t that just obvious? Doesn’t it just mean that they are the same set? Well, yes, but and are not defined in the same way, so how are we supposed to prove that they are the same set? The answer is to use the definition of set equality. (This is a very important general principle, and not just one that applies to this particular proof.)
8. Having decided all that, where are we? Our aim is to prove that for every pair of elements , if and have an element in common, then . So let’s start writing a bit more of the proof. First, I’ll set up the assumptions we get to use, using the “let” and “suppose” tricks, and also giving a name to the element that and have in common. Note that all these moves are, or at any rate should be, pure reflexes.
9. Our aim now is to show that and are the same set. So we must show that every element of is an element of and that every element of is an element of . By symmetry it’s enough to do just one direction. (More precisely, once we’ve done the first direction, we can see that just switching round the letters and will do the other direction, so we don’t bother.) Since we’re now trying to prove a statement that begins with the word “every”, we apply the “let” trick and write this.
10. Next, we apply the getting-rid-of-sets trick that I’ve mentioned. We have two statements that say that certain elements belong to certain sets defined by properties. They are
By the definitions of and these statements allow us to write the following addition to our proof.
11. We shouldn’t at this stage forget what our aim is. We’re assuming that and want to prove that . That’s equivalent to proving that . So there it is, we’re allowed to assume that , , and , and we need to prove that . And we’re given that is reflexive, symmetric, and transitive.
12. At this point the proof becomes quite easy. The symmetry tells us that we can always replace with and the transitivity tells us that we’re trying to find a chain from to of “ links”. The only statement we’ve got about is that , so that’s how our chain will have to start. Then we use the fact that , which we flip round so that it says . And finally, we know that . Putting those three facts together and using transitivity, we get that .
13. Strictly speaking I had to use transitivity twice there. For example, from and I can deduce that , and from that and I can deduce that . However, once one gets used to transitivity, one just knows that if is a transitive relation and , then . (As an easy exercise, you might like to give a formal inductive proof of this statement.)
14. In case it’s not clear, the proof is finished. To the write-up I should add the following lines.
Let’s collect together all the lines of proof and see what it looks like. I’ll add a little bit of padding too.
Now let and be elements of and suppose that . Let . Then , , and . It follows by symmetry that and . Hence, by transitivity, . Therefore, . Similarly, if then .
It follows that distinct equivalence classes are disjoint. This completes the proof that the equivalence classes of form a partition of .
There’s one thing I want to make very clear now. It may look as though you have to remember rather a lot of tricks for this proof, and in a sense that’s true, but they are general tricks. They aren’t things you have to remember for this specific proof, but rather they are things like the “let” and “suppose” tricks, the getting-rid-of-sets trick, the give-it-a-name trick, and the “by symmetry” trick. I suppose I could also add the don’t-forget-to-use-the-precise-definition trick (or else an alternative definition, if a suitable one is available). Learning those tricks is a one-off investment that pays dividends throughout mathematics, and not just for specific proofs.
Once you’ve got those general tricks sorted out, and have carefully learnt the definitions of “equivalence relation” and “partition”, just about the only bit of actual mathematics you have to do is seeing how to deduce that from the fact that , , and , using the information that is symmetric and transitive. And that, I hope you’ll agree, is a pretty easy problem.
What is the point of equivalence relations?
Now that that’s sorted out, let me turn to another question: why are equivalence relations important? I’ll give two answers to this, but I’m sure there are many more.
The first answer is that we often find ourselves not interested in certain distinctions. For example, if I am doing geometry, then I might want to talk about an equilateral triangle of side length 1. If someone were to say to me, “There are uncountably many equilateral triangles of side length 1. Which one are you talking about?” I would respond by saying that it doesn’t matter: nothing that I’m about to say depends on which equilateral triangle of side length 1 I’m going to choose.
If one tries to make precise this idea of “not mattering”, then one soon finds oneself talking about the relation of congruence. We often regard two shapes as “essentially the same” if they are congruent to one another. And congruence is an equivalence relation.
Sometimes, we don’t really care about size either — just shape. There’s an equivalence relation for that too, namely similarity. So we might declare that we regard two shapes as “essentially the same” if they are similar to one another (in the geometrical sense that you can get from one to the other by translating, reflecting, rotating and expanding).
In modular arithmetic, we fix some positive integer and decide that we want to regard two integers as “essentially the same” if they differ by a multiple of . Again, an equivalence relation underlies this: that of congruence mod . (Of course, this is a different meaning of the word “congruent”.)
Another situation where we like to regard things as being “essentially the same” even when they are different is abstract algebra. For example, we often like to regard two groups as essentially the same if they are isomorphic. I’d like to say that “is isomorphic to” is an equivalence relation on the set of all groups, but, dammit, I’m not allowed to. Why not? Because there are so many groups that they don’t form a set. (A silly way of seeing this is to observe that if is any set, then I can define a one-element group by taking the set and defining to be the only thing it can be, namely .)
Here’s a good example of the disadvantage of insisting that the official definition of everything has to be in terms of sets. It’s clear that every group is isomorphic to itself, that if is isomorphic to then is isomorphic to , and that if is isomorphic to and is isomorphic to , then is isomorphic to . So “is isomorphic to” ought to be an equivalence relation, and it’s only a combination of an annoying set-theoretic paradox and an insistence that relations are “really” sets that stops it being one. If we defined relations more linguistically as something like “statements with two gaps into which you insert objects of the appropriate type”, then “is isomorphic to” is an equivalence relation.
A second important justification for equivalence relations is that they give us a very useful way of building new mathematical structures out of old ones. The basic strategy is this. First you take some structure . Then you define a carefully chosen equivalence relation on . Finally, you show that you can use the structure on to define a useful structure on the set of all the equivalence classes, which is customarily denoted by and called the quotient set.
What has this use of the word “quotient” got to do with the use of the word when we are talking about division of numbers? Well, let’s suppose we are explaining division to a child. We might ask something like this. “David is sharing out 20 sweets amongst five children. How many sweets do they get each?” Assuming that David doesn’t want a riot on his hands, they will all get the same number, and that number is four. But let’s imagine the situation in more detail. What will David do with those sweets? He will divide them up into five piles, each with four sweets in it. That is, he will partition those sweets! We can even define an equivalence relation, namely, “is destined to be eaten by the same child as”.
I’m not trying to suggest here that the mathematical notion of a quotient is just as easy as the notion of dividing up some sweets into little piles. It isn’t. The reason it isn’t is the thing I talked about just a moment ago: we take quotients not just of sets but of sets with structure, and we like our quotients to inherit some of that structure.
Since this post is already quite long, and since quotients are quite a big topic, I think I’ll stop here for now and leave this short discussion as a cliffhanger.