Equivalence relations

Equivalence relations are in a way a fairly simple mathematical concept. After all, it’s not that hard to learn what reflexive, symmetric and transitive mean and to remember that if you’ve got all three properties then you’ve got an equivalence relation. However, equivalence relations do still cause one or two difficulties. One, which I’ll discuss first, is that many people find that the proof that equivalence classes always form a partition is rather complicated and hard to remember. The other is that equivalence relations are closely connected to the notion of quotients, which appear in many places in mathematics, and which many people find quite hard to grasp.

Proving that equivalence classes form a partition.

I’d like to try to demonstrate that this longish argument is one that does not need to be memorized, as long as you are in the habit of applying a couple of standard proof-generating techniques that I have already mentioned in these posts. But first I should say more precisely what it is we are trying to prove.

Proposition. Let A be a set and let \sim be an equivalence relation on A. Then the equivalence classes of \sim form a partition of A.

Finding a proof.

1. I shall assume that you can remember the formal definition of an equivalence relation. If you can’t, then look it up in your notes or on Wikipedia.

2. I’m not going to assume that you know the formal definition of partition. That’s because, if people I’ve taught over the years are anything to go by, there is a good chance that you have a reasonable mental picture of what it is, but are a little less sure of how to turn that picture into a formal definition.

What, then, is a partition of a set A? Informally, I want to say that it is a collection of disjoint subsets of A that “cover the whole of A“. But how do we write that out properly?

The first thing we should do is give a name to the collection of sets. Since it’s a partition we are defining, the letter p seems appropriate. But also, what we’ve got here is a set of sets (part of the reason the concept is ever so slightly tricky), and it is nice to have a notation that distinguishes it from elements (usually denoted by lower-case letters) and sets (usually denoted by upper case letters). So I’m going to use a curly p, looking like this: \mathcal{P}.

Note that \mathcal{P} is a set whose elements are sets. To distinguish between set-of-sets kinds of sets and set-of-elements kinds of sets I myself like to call \mathcal{P} a collection of sets rather than a set of sets.

How do we say in precise language that the sets in \mathcal{P} are disjoint? One’s first thought might be to say, “Any two sets in \mathcal{P} are disjoint.” However, there is a little piece of necessary pedantry that you have to add to that formulation. It should be, “Any two distinct sets in \mathcal{P} are disjoint.” But surely, you might protest, if one says “Any two sets” then one is talking about two sets, so of course they are distinct. I think the reason we insist so strongly on this point is that if you decide to write the statement out even more formally, it will look something like this:

\forall X,Y\in\mathcal{P}\ \ X\ne Y\implies X\cap Y=\emptyset

If we write it like that but omitting the condition that X\ne Y, then we end up with an incorrect definition, because there really is nothing to stop X equalling Y.

How do we say precisely that the sets in \mathcal{P} “cover” A? Well, that says that every element of A belongs to at least one set in \mathcal{P}. In symbols, it says,

\forall a\in A\ \exists X\in\mathcal{P}\ \ a\in X.

Of course, in combination with the disjointness property, that tells us that every element of A belongs to exactly one set in \mathcal{P}, which in fact gives us a concise definition of what it means for \mathcal{P} to be a partition.

By this time you might be reminded of the definitions of injection and surjection: we need every element of A to be in at most one set in \mathcal{P} (the disjointness conditions) and at least one set in \mathcal{P} (the covering condition). Is this just an amusing echo or is there a closer relationship? As always in mathematics, the answer is the latter: one can find a general framework that incorporates both. But I’ll leave that to you to think about if you feel like it.

Something else you might be wondering is why I went to the bother of writing \mathcal{P} and worrying about collections of sets. Why didn’t I just call the sets in the partition X_1,X_2,\dots,X_n? The answer is that if I had done that I would have inadvertently been making the assumption that there are only finitely many sets in the partition, and that doesn’t have to be the case. OK, you might counter, what about calling them X_1,X_2,X_3,\dots in the infinite case? Well, not only does that force me to deal with two cases, which is ugly, but it also assumes that there are only countably many sets in the partition (a statement you won’t understand until you have covered countability in Numbers and Sets), which again needn’t be the case. So the curly-P notation, though not the only way of doing things, was chosen for pretty good reasons.

3. Right, we’ve been given a set A and an equivalence relation \sim on A and we want to prove that the equivalence classes of \sim form a partition of A. So we’ve got a bunch of sets, the equivalence classes of \sim, and we must prove that it has the properties required of a partition.

I’m going to do a step that after a small amount of practice one can do without. I shall start the proof as follows.

  • Let \mathcal{E} be the collection of equivalence classes of \sim.
  • All I’ve done here is give a name to the collection of sets that we hope to prove is a partition of A.

    4. Now we can be very mechanical. We must prove that for every a\in A there is some E\in\mathcal{E} such that a\in E, and we must prove that if E and F are distinct sets in \mathcal{E}, then E and F are disjoint.

    5. Next comes a very important principle: if you’ve defined a set S via some property P and you have the statement x\in S then it is usually much better to rewrite this statement as P(x). That’s not very clearly expressed, but here’s an example. What does it mean to say that E\in\mathcal{E}? Well, \mathcal{E} is the set of all subsets of A that are equivalence classes of \sim. So the statment E\in\mathcal{E} can be rephrased as “E is an equivalence class of \sim“. That’s what I meant earlier by saying that the step of defining \mathcal{E} wasn’t really necessary: we can get away with simply talking about equivalence classes. I call this the getting-rid-of-sets trick. Here, we apply it to get rid of \mathcal{E}.

    6. What, by the way, is an equivalence class? Well, the equivalence class of an element a of A is defined to be \{b\in A:a\sim b\}, which I shall denote by E(a). That is, it is the set of all elements of A that are related to a.

    Using this, let’s go back to the first of the two statements we wanted to prove. It was this.

  • For every a\in A there is some E\in\mathcal{E} such that a\in E.
  • We can begin by translating this in a way that gets rid of \mathcal{E}.

  • For every a\in A there is some equivalence class E such that a\in E.
  • Next, we can do a further translation that takes account of what equivalence classes are.

  • For every a\in A there is some element b of A such that a\in E(b).
  • (In words, for every a there is a b such that a belongs to the equivalence class of b.)

    What could we choose as our b? Here we are in a position a bit like that of a chess player who has only one move that gets out of check. We know absolutely nothing about elements of A except that a is one of them. So we’d better try a itself. Is it true that a\in E(a)? Well, what does that mean?

    The getting-rid-of-sets principle applies again. E(x) is defined to be \{y\in A:x\sim y\}. So the statement u\in E(x) is equivalent to the statement x\sim u. Applying that with u=x=a we find that the statement a\in E(a) is equivalent to the statement a\sim a. And that is true since \sim is reflexive.

    I took a long time over explaining that, but that is merely because I’m trying to spell out every minute detail of the thought process. What one would actually write for the entire proof so far is this.

  • We must first prove that every element a of A belongs to some equivalence class. But a\in E(a), since \sim is reflexive.
  • 7. We now come to the harder part of the proof, which is to show that distinct sets in \mathcal{E}, in other words, distinct equivalence classes, are disjoint.

    Again, we want to remember that an equivalence class is an equivalence class of something. So we could formulate our task as that of showing the following statement: for every pair of elements a,b of A, if E(a) and E(b) are distinct sets, then E(a)\cap E(b)=\emptyset.

    I’ve noted that this is somewhat similar to proving that a function is an injection, which suggests that it might be nicer to prove the contrapositive. Another sign is that as things stand at the moment we are trying to deduce a “negative” statement from a “negative” statement: that if E(a) and E(b) are not the same, then they do not have any elements in common. It feels cleaner somehow to try to prove that if E(a) and E(b) do have an element in common, then they must be the same set.

    It’s just possible that you have been accidentally confusing equivalence classes with names of equivalence classes, and therefore thinking that E(a) and E(b) are distinct equivalence classes if and only if a and b are distinct elements of A. But that is very much not the case, or equivalence relations wouldn’t be very interesting. Let us briefly remind ourselves what it means to say that E(a)=E(b).

    Isn’t that just obvious? Doesn’t it just mean that they are the same set? Well, yes, but E(a) and E(b) are not defined in the same way, so how are we supposed to prove that they are the same set? The answer is to use the definition of set equality. (This is a very important general principle, and not just one that applies to this particular proof.)

  • Two sets X and Y are equal if every element of X is an element of Y and every element of Y is an element of X.
  • 8. Having decided all that, where are we? Our aim is to prove that for every pair of elements a,b\in A, if E(a) and E(b) have an element in common, then E(a)=E(b). So let’s start writing a bit more of the proof. First, I’ll set up the assumptions we get to use, using the “let” and “suppose” tricks, and also giving a name to the element that E(a) and E(b) have in common. Note that all these moves are, or at any rate should be, pure reflexes.

  • Let a and b be elements of A and suppose that c\in E(a)\cap E(b).
  • 9. Our aim now is to show that E(a) and E(b) are the same set. So we must show that every element of E(a) is an element of E(b) and that every element of E(b) is an element of E(a). By symmetry it’s enough to do just one direction. (More precisely, once we’ve done the first direction, we can see that just switching round the letters a and b will do the other direction, so we don’t bother.) Since we’re now trying to prove a statement that begins with the word “every”, we apply the “let” trick and write this.

  • Let d\in E(a).
  • 10. Next, we apply the getting-rid-of-sets trick that I’ve mentioned. We have two statements that say that certain elements belong to certain sets defined by properties. They are

  • c\in E(a)\cap E(b)
  • and

  • d\in E(a).
  • By the definitions of E(a) and E(b) these statements allow us to write the following addition to our proof.

  • Then a\sim c, b\sim c, and a\sim d.
  • 11. We shouldn’t at this stage forget what our aim is. We’re assuming that d\in E(a) and want to prove that d\in E(b). That’s equivalent to proving that b\sim d. So there it is, we’re allowed to assume that a\sim c, b\sim c, and a\sim d, and we need to prove that b\sim d. And we’re given that \sim is reflexive, symmetric, and transitive.

    12. At this point the proof becomes quite easy. The symmetry tells us that we can always replace x\sim y with y\sim x and the transitivity tells us that we’re trying to find a chain from b to d of “\sim links”. The only statement we’ve got about b is that b\sim c, so that’s how our chain will have to start. Then we use the fact that a\sim c, which we flip round so that it says c\sim a. And finally, we know that a\sim d. Putting those three facts together and using transitivity, we get that b\sim d.

    13. Strictly speaking I had to use transitivity twice there. For example, from b\sim c and c\sim a I can deduce that b\sim a, and from that and a\sim d I can deduce that b\sim d. However, once one gets used to transitivity, one just knows that if R is a transitive relation and x_0Rx_1, x_1Rx_2,\dots,x_{k-1}Rx_k, then x_0Rx_k. (As an easy exercise, you might like to give a formal inductive proof of this statement.)

    14. In case it’s not clear, the proof is finished. To the write-up I should add the following lines.

  • It follows that b\sim d, and hence that d\in E(b). Similarly, if d\in E(b) then d\in E(a).
  • Let’s collect together all the lines of proof and see what it looks like. I’ll add a little bit of padding too.

  • Let \mathcal{E} be the collection of equivalence classes of \sim. We must first prove that every element a of A belongs to some element of \mathcal{E} — that is, to some equivalence class. But a\in E(a), since \sim is reflexive.

    Now let a and b be elements of A and suppose that c\in E(a)\cap E(b). Let d\in E(a). Then a\sim c, b\sim c, and a\sim d. It follows by symmetry that b\sim c, c\sim a and a\sim d. Hence, by transitivity, b\sim d. Therefore, d\in E(b). Similarly, if d\in E(b) then d\in E(a).

    It follows that distinct equivalence classes are disjoint. This completes the proof that the equivalence classes of \sim form a partition of A.

  • There’s one thing I want to make very clear now. It may look as though you have to remember rather a lot of tricks for this proof, and in a sense that’s true, but they are general tricks. They aren’t things you have to remember for this specific proof, but rather they are things like the “let” and “suppose” tricks, the getting-rid-of-sets trick, the give-it-a-name trick, and the “by symmetry” trick. I suppose I could also add the don’t-forget-to-use-the-precise-definition trick (or else an alternative definition, if a suitable one is available). Learning those tricks is a one-off investment that pays dividends throughout mathematics, and not just for specific proofs.

    Once you’ve got those general tricks sorted out, and have carefully learnt the definitions of “equivalence relation” and “partition”, just about the only bit of actual mathematics you have to do is seeing how to deduce that b\sim d from the fact that a\sim c, b\sim c, and a\sim d, using the information that \sim is symmetric and transitive. And that, I hope you’ll agree, is a pretty easy problem.


    What is the point of equivalence relations?

    Now that that’s sorted out, let me turn to another question: why are equivalence relations important? I’ll give two answers to this, but I’m sure there are many more.

    The first answer is that we often find ourselves not interested in certain distinctions. For example, if I am doing geometry, then I might want to talk about an equilateral triangle of side length 1. If someone were to say to me, “There are uncountably many equilateral triangles of side length 1. Which one are you talking about?” I would respond by saying that it doesn’t matter: nothing that I’m about to say depends on which equilateral triangle of side length 1 I’m going to choose.

    If one tries to make precise this idea of “not mattering”, then one soon finds oneself talking about the relation of congruence. We often regard two shapes as “essentially the same” if they are congruent to one another. And congruence is an equivalence relation.

    Sometimes, we don’t really care about size either — just shape. There’s an equivalence relation for that too, namely similarity. So we might declare that we regard two shapes as “essentially the same” if they are similar to one another (in the geometrical sense that you can get from one to the other by translating, reflecting, rotating and expanding).

    In modular arithmetic, we fix some positive integer m and decide that we want to regard two integers as “essentially the same” if they differ by a multiple of m. Again, an equivalence relation underlies this: that of congruence mod m. (Of course, this is a different meaning of the word “congruent”.)

    Another situation where we like to regard things as being “essentially the same” even when they are different is abstract algebra. For example, we often like to regard two groups as essentially the same if they are isomorphic. I’d like to say that “is isomorphic to” is an equivalence relation on the set of all groups, but, dammit, I’m not allowed to. Why not? Because there are so many groups that they don’t form a set. (A silly way of seeing this is to observe that if X is any set, then I can define a one-element group by taking the set \{X\} and defining X\circ X to be the only thing it can be, namely X.)

    Here’s a good example of the disadvantage of insisting that the official definition of everything has to be in terms of sets. It’s clear that every group is isomorphic to itself, that if G is isomorphic to H then H is isomorphic to G, and that if G is isomorphic to H and H is isomorphic to K, then G is isomorphic to K. So “is isomorphic to” ought to be an equivalence relation, and it’s only a combination of an annoying set-theoretic paradox and an insistence that relations are “really” sets that stops it being one. If we defined relations more linguistically as something like “statements with two gaps into which you insert objects of the appropriate type”, then “is isomorphic to” is an equivalence relation.

    Quotients.

    A second important justification for equivalence relations is that they give us a very useful way of building new mathematical structures out of old ones. The basic strategy is this. First you take some structure X. Then you define a carefully chosen equivalence relation \sim on X. Finally, you show that you can use the structure on X to define a useful structure on the set of all the equivalence classes, which is customarily denoted by X/\sim and called the quotient set.

    What has this use of the word “quotient” got to do with the use of the word when we are talking about division of numbers? Well, let’s suppose we are explaining division to a child. We might ask something like this. “David is sharing out 20 sweets amongst five children. How many sweets do they get each?” Assuming that David doesn’t want a riot on his hands, they will all get the same number, and that number is four. But let’s imagine the situation in more detail. What will David do with those sweets? He will divide them up into five piles, each with four sweets in it. That is, he will partition those sweets! We can even define an equivalence relation, namely, “is destined to be eaten by the same child as”.

    I’m not trying to suggest here that the mathematical notion of a quotient is just as easy as the notion of dividing up some sweets into little piles. It isn’t. The reason it isn’t is the thing I talked about just a moment ago: we take quotients not just of sets but of sets with structure, and we like our quotients to inherit some of that structure.

    Since this post is already quite long, and since quotients are quite a big topic, I think I’ll stop here for now and leave this short discussion as a cliffhanger.

    13 Responses to “Equivalence relations”

    1. Carl Weisman Says:

      I suggest “family” for “set of sets” and “members” for its elements.

      That way, you can have phrases like “element of a member” that don’t sound barbarous and are linguistically unambiguous.

      For sets of families? Clan? Tribe?

    2. Sermon « Log24 Says:

      […] Part I: Timothy Gowers on equivalence relations […]

    3. Luqing Ye Says:

      What’s the use of the reflective property ?Some body says that the reflective property of equivalence relation is of no use as it can be implied from the symmetry property and the transition property:If
      a\mathcal{R} b,then from the symmetry property one can verify that b\mathcal{R} a.And from transition property one can further verify that a\mathcal{R} a.\Box.
      This is an interesting argument,but it is wrong.As it is not a must that a and b have a relation.a might be “alone”.BTW,this is not an idea of me ,I see it from http://dvnw.wordpress.com/2011/09/22/equivalence-relations-and-equivalence-classes/

      • gowers Says:

        Another way of thinking about it is that we use the reflexive property to prove that every element belongs to at least one equivalence class, and the symmetric and transitive properties to show that it belongs to at most one equivalence class. So if you drop the reflexive property you get a bunch of disjoint sets, but you can no longer say that they cover A.

    4. Max Says:

      If we defined relations more linguistically as something like “statements with two gaps into which you insert objects of the appropriate type”

      It’s not clear in your text that there’s a good reason not to do this, but (as you know) there is, though not all your readers will understand it. For those who are curious: not all relations correspond to statements with two holes, though the converse is true. There are simply more relations than statements (if the set whose elements you’re relating is large enough, then it’s larger than the set of statements).

      • gowers Says:

        I’m not sure I agree, since we always have the option of doing this. “Let X be a subset of A\times A, and define a relation R on A by saying that xRy if and only if (x,y)\in R.”

        Then the statement with holes is (*,*)\in R.

        In other words, I allow my statements to depend on parameters.

    5. JuanPi Says:

      Thanks for the series of super interesting posts.
      Excuse my “noobness” but I couldn’t understand the explanation on why “is isomorph to” is not an equivalence relation.
      Could you expand this, please? Or maybe give some references to a more detail explanation.

      Thanks again!

      • gowers Says:

        Officially it isn’t an equivalence relation because it isn’t a relation at all! The reason for that is that we define relations as follows: a relation on a set A is a subset of A\times A. You might think you could take A to be the set of all groups. But there’s no such thing as the set of all groups, for roughly the same reason that there’s no such thing as the set of all sets. If you try to allow there to be a set of all groups then you open yourself up to Russell’s paradox.

      • JuanPi Says:

        Hi,
        Ok, I got that part…but if instead I took a countable subset of the set of all groups. Then I could define an equivalence relation using isomorphism right?

      • gowers Says:

        If you took any set of groups, then isomorphism would be an equivalence relation on that set.

    6. A Trip to Mathematics: Part III Relations and Functions | MY DIGITAL NOTEBOOK Says:

      […] Equivalence relations (gowers.wordpress.com) 26.740278 83.888889 Advertisement Eco World Content From Across The Internet. Featured on EcoPressed Did Fracking Help Cause Oklahoma Earthquakes? •••Share•••TweetAddThisLike this:LikeBe the first to like this post. This entry was posted in Math, Study Notes and tagged A Trip To Mathematics, Apple Inc, Equivalence relation, Functions, Ordered pair, Real Analysis, Reflexive relation, relations, Transitive relation by Gaurav Tiwari. Bookmark the permalink. […]

    7. A Trip to Mathematics: Part III Relations and Functions | MY DIGITAL NOTEBOOK Says:

      […] Equivalence relations (gowers.wordpress.com) […]

    8. Equivalence Relations (Properties and Closures) - Direct Knowledge Says:

      […] importance is equivalence relations is discussed […]

    Leave a comment